Apptainer: Application containers for Linux
Changes since v1.2.5
This release fixes two moderate severity denial of service vulnerabilities by upgrading a dependent library: CVE-2024-28176 and CVE-2024-28180.
FUSE mounts are now supported in setuid mode, enabling full functionality even when kernel filesystem mounts are insecure due to unprivileged users having write access to raw filesystems in containers.
When allow setuid-mount extfs = no
(the default) in apptainer.conf, then the fuse2fs image driver will be used to mount ext3 images in setuid mode instead of the kernel driver (ext3 images are primarily used for the --overlay
feature), restoring functionality that was removed by default in Apptainer 1.1.8 because of the security risk.
The allow setuid-mount squashfs
configuration option in apptainer.conf now has a new default called iflimited
which allows kernel squashfs mounts only if there is at least one limit container
option set or if Execution Control Lists are activated in ecl.toml. If kernel squashfs mounts are are not allowed, then the squashfuse image driver will be used instead.
iflimited
is the default because if one of those limits are used the system administrator ensures that unprivileged users do not have write access to the containers, but on the other hand using FUSE would enable a user to theoretically bypass the limits via ptrace() because the FUSE process runs as that user.
The fuse-overlayfs
image driver will also now be tried in setuid mode if the kernel overlayfs driver does not work (for example if one of the layers is a FUSE filesystem).
In addition, if allow setuid-mount encrypted = no
then the unprivileged gocryptfs format will be used for encrypting SIF files instead of the kernel device-mapper. If a SIF file was encrypted using the gocryptfs format, it can now be mounted in setuid mode in addition to non-setuid mode.
The four dependent FUSE programs for various reasons all now need to be compiled from source and included in Apptainer installations and packages. Scripts are provided to make this easy; see the updated instructions in INSTALL.md. The bundled squashfuse_ll is updated to version 0.5.1.
Change the default in user namespace mode to use either kernel overlayfs or fuse-overlayfs instead of the underlay feature for the purpose of adding bind mount points. That was already the default in setuid mode; this change makes it consistent. The underlay feature can still be used with the --underlay
option, but it is deprecated because the implementation is complicated and measurements have shown that the performance of underlay is similar to overlayfs and fuse-overlayfs.
For now the underlay feature can be made the default again with a new preferred
value on the enable underlay
configuration option.
Also the --underlay
option can be used in setuid mode or as the root user, although it was ignored previously.
Prefer again to use kernel overlayfs over fuse-overlayfs when a lower layer is FUSE and there's no writable upper layer, undoing the change from 1.2.0. Another workaround was found for the problem that change addressed. This applies in both setuid mode and in user namespace mode (except the latter not on CentOS7 where it isn't supported).
--cwd
is now the preferred form of the flag for setting the container's working directory, though --pwd
is still supported for compatibility.
When building RPM, we will now use /var/lib/apptainer
(rather than /var/apptainer
) to store local state files.
The way --home is handled when running as root (e.g. sudo apptainer
) or with --fakeroot
has changed. Previously, we were only modifying the HOME
environment variable in these cases, while leaving the container's /etc/passwd
file unchanged (with its homedir field pointing to /root
, regardless of the value passed to --home
). With this change, both value of HOME
and the contents of /etc/passwd
in the container will reflect the value passed to --home
if the container is readonly. If the container is writable, the /etc/passwd
file is left alone because it can interfere with commands that want to modify it.
The --vm
and related flags to start apptainer inside a VM have been removed. This functionality was related to the retired Singularity Desktop / SyOS projects.
The keyserver-related commands that were under remote
have been moved to their own, dedicated keyserver
command. Run apptainer help keyserver
for more information.
The commands related to OCI/Docker registries that were under remote
have been moved to their own, dedicated registry
command. Run apptainer help registry
for more information.
The the remote list
subcommand now outputs only remote endpoints (with keyservers and OCI/Docker registries having been moved to separate commands), and the output has been streamlined.
Adding a new remote endpoint using the apptainer remote add
command will now set the new endpoint as default. This behavior can be suppressed by supplying the --no-default
(or -n
) flag to remote add
.
Skip parsing build definition file template variables after comments beginning with a hash symbol.
Improved the clarity of apptainer key list
output.
The global /tmp directory is no longer used for gocryptfs mountpoints.
Updated minimum go version to 1.20
remote status
command will now print the username, realname, and email of the logged-in user, if available.apptheus
, this tool will put apptainer starter into a newly created cgroup and collect system metrics.--no-pid
flag for apptainer run/shell/exec
disables the PID namespace inferred by --containall
and --compat
.--config
option tokeyserver
commands.keyserver list
command.APPTAINER_ENCRYPTION_PEM_DATA
env var to allow for encrypting and running encrypted containers without a PEM file.--sharens
mode for apptainer exec/run/shell
, which enables to run multiple apptainer instances created by the same parent using the same image in the same user namespace..FullRaw
field introduced, which always contains the raw data for the entire definition file. Behavior of .Raw
field has changed: for multi-stage builds parsed with pkg/build/types/parser.All(), .Raw
contains the raw content of a single build stage. Otherwise, it is equal to .FullRaw
./var/tmp
on top of /tmp
in the container, where /var/tmp
resolves to same location as /tmp
.test
/ [
commands in container startup scripts, via dependency update of mvdan.cc/sh./etc/passwd
file.$HOME
points to a non-readable directory.nvidia-container-cli
on Ubuntu 22.04 where an ldconfig
wrapper script gets in the way. Instead, we use ldconfig.real
directly.ghcr.io/apptainer/apptainer
.Changes since v1.3.0-rc.1
--underlay
option, but it is deprecated because the implementation is complicated and measurements have shown that the performance of underlay is similar to overlayfs and fuse-overlayfs. For now the underlay feature can be made the default again with a new preferred
value on the enable underlay
configuration option. Also the --underlay
option can be used in setuid mode or as the root user, although it was ignored previously.--sharens
failure on EL8.$HOME
points to a non-readable directory.Changes since v1.2.5
FUSE mounts are now supported in setuid mode, enabling full functionality even when kernel filesystem mounts are insecure due to unprivileged users having write access to raw filesystems in containers.
When allow setuid-mount extfs = no
(the default) in apptainer.conf, then the fuse2fs image driver will be used to mount ext3 images in setuid mode instead of the kernel driver (ext3 images are primarily used for the --overlay
feature), restoring functionality that was removed by default in Apptainer 1.1.8 because of the security risk.
The allow setuid-mount squashfs
configuration option in apptainer.conf now has a new default called iflimited
which allows kernel squashfs mounts only if there is at least one limit container
option set or if Execution Control Lists are activated in ecl.toml. If kernel squashfs mounts are are not allowed, then the squashfuse image driver will be used instead. iflimited
is the default because if one of those limits are used the system administrator ensures that unprivileged users do not have write access to the containers, but on the other hand using FUSE would enable a user to theoretically bypass the limits via ptrace() because the FUSE process runs as that user.
The fuse-overlayfs
image driver will also now be tried in setuid mode if the kernel overlayfs driver does not work (for example if one of the layers is a FUSE filesystem).
In addition, if allow setuid-mount encrypted = no
then the unprivileged gocryptfs format will be used for encrypting SIF files instead of the kernel device-mapper. If a SIF file was encrypted using the gocryptfs format, it can now be mounted in setuid mode in addition to non-setuid mode.
The four dependent FUSE programs for various reasons all now need to be compiled from source and included in Apptainer installations and packages. Scripts are provided to make this easy; see the updated instructions in INSTALL.md.
--cwd
is now the preferred form of the flag for setting the container's working directory, though --pwd
is still supported for compatibility.
When building RPM, we will now use /var/lib/apptainer
(rather than /var/apptainer
) to store local state files.
The way --home is handled when running as root (e.g. sudo apptainer
) or with --fakeroot
has changed. Previously, we were only modifying the HOME
environment variable in these cases, while leaving the container's /etc/passwd
file unchanged (with its homedir field pointing to /root
, regardless of the value passed to --home
). With this change, both value of HOME
and the contents of /etc/passwd
in the container will reflect the value passed to --home
if the container is readonly. If the container is writable, the /etc/passwd
file is left alone because it can interfere with commands that want to modify it.
The --vm
and related flags to start apptainer inside a VM have been removed. This functionality was related to the retired Singularity Desktop / SyOS projects.
The keyserver-related commands that were under remote
have been moved to their own, dedicated keyserver
command. Run apptainer help keyserver
for more information.
The commands related to OCI/Docker registries that were under remote
have been moved to their own, dedicated registry
command. Run apptainer help registry
for more information.
The the remote list
subcommand now outputs only remote endpoints (with keyservers and OCI/Docker registries having been moved to separate commands), and the output has been streamlined.
Adding a new remote endpoint using the apptainer remote add
command will now set the new endpoint as default. This behavior can be suppressed by supplying the --no-default
(or -n
) flag to remote add
.
Skip parsing build definition file template variables after comments beginning with a hash symbol.
Improved the clarity of apptainer key list
output.
The global /tmp directory is no longer used for gocryptfs mountpoints.
Updated minimum go version to 1.20
remote status
command will now print the username, realname, and email of the logged-in user, if available.apptheus
, this tool will put apptainer starter into a newly created cgroup and collect system metrics.--no-pid
flag for apptainer run/shell/exec
disables the PID namespace inferred by --containall
and --compat
.--config
option tokeyserver
commands.keyserver list
command.APPTAINER_ENCRYPTION_PEM_DATA
env var to allow for encrypting and running encrypted containers without a PEM file.--sharens
mode for apptainer exec/run/shell
, which enables to run multiple apptainer instances created by the same parent using the same image in the same user namespace..FullRaw
field introduced, which always contains the raw data for the entire definition file. Behavior of .Raw
field has changed: for multi-stage builds parsed with pkg/build/types/parser.All(), .Raw
contains the raw content of a single build stage. Otherwise, it is equal to .FullRaw
./var/tmp
on top of /tmp
in the container, where /var/tmp
resolves to same location as /tmp
.test
/ [
commands in container startup scripts, via dependency update of mvdan.cc/sh./etc/passwd
file.libnvidia-nvvm
to nvliblist.conf
. Newer NVIDIA Drivers (known with >= 525.85.05) require this lib to compile OpenCL programs against NVIDIA GPUs, i.e. libnvidia-opencl
depends on libnvidia-nvvm.
--fakeroot
is passed.hidepid
mount option on /proc is set.--fakeroot
option without /etc/subuid
mapping. The fix was to change the switch to an unprivileged root-mapped namespace to be the equivalent of unshare -r
instead of unshare -rm
on action commands, to work around a bug in the el8 kernel.libnvidia-gpucomp.so
to the list of libraries to add to NVIDIA GPU-enabled containers.APPTAINER_TMPDIR
for temporary files during privileged image encryption.XDG_RUNTIME_DIR
or DBUS_SESSION_BUS_ADDRESS
is not set, print an info message that stats will not be available instead of exiting with a fatal error.apptainer push/pull
commands now show a progress bar for the oras protocol like there was for docker and library protocols.--nv
and --rocm
flags can now be used simultaneously.APPTAINER_CONFIGDIR
with apptainer instance start
and action commands that refer to instance://
.~/.docker/config.json
if missing in the apptainer credentials.--no-mount home
won't have any effect when running apptainer from a home directory and will require --no-mount home,cwd
to avoid mounting that directory.--underlay
action option can be used to prefer underlay instead of overlay.enable overlay = driver
configuration option to always use the overlay image driver (that is, fuse-overlayfs) even when the kernel overlayfs is usable.panfs
filesystem, allowing sandbox directories to be run from panfs
without error.sessiondir maxsize
in apptainer.conf
now defaults to 64 MiB for new installations. This is an increase from 16 MiB in prior versions.--reproducible
flag for ./mconfig
will configure Apptainer so that its binaries do not contain non-reproducible paths. This disables plugin functionality.{{ variable }}
will be replaced by a value defined either by a variable=value
entry in the %arguments
section of the definition file or through new build options --build-arg
or --build-arg-file
. By default any unused variables given in --build-arg
or --build-arg-file
result in a fatal error but the option --warn-unused-build-args
changes that to a warning rather than a fatal error.instance run
command that will execute the runscript when an instance is initiated instead of executing the startscript.sign
and verify
commands now support signing and verification with non-PGP key material by specifying the path to a private key via the --key
flag.verify
command now supports verification with X.509 certificates by specifying the path to a certificate via the --certificate
flag. By default, the system root certificate pool is used as trust anchors unless overridden via the --certificate-roots
flag. A pool of intermediate certificates that are not trust anchors, but can be used to form a certificate chain, can also be specified via the --certificate-intermediates
flag.verify --ocsp-verify
option.instance stats
command displays the resource usage every second. The --no-stream
option disables this interactive mode and shows the point-in-time usage.apptainer instance stats
to be supported by default when possible.instance start
command now accepts an optional --app <name>
argument which invokes a start script within the %appstart <name>
section in the definition file. The instance stop
command still only requires the instance name.APPTAINER_INSTANCE
environment variable.APPTAINER_CONFIGDIR
environment variable.APPTAINER_SILENT
, APPTAINER_QUIET
, and APPTAINER_VERBOSE
. Also add APPTAINER_NOCOLOR
for the --nocolor
option.--no-mount
flag now accepts the value bind-paths
to disable mounting of all bind path
entries in apptainer.conf
.DOCKER_HOST
parsing when using docker-daemon://
DOCKER_USERNAME
and DOCKER_PASSWORD
supported without APPTAINER_
prefix.CAP_PERFMON
, CAP_BPF
, and CAP_CHECKPOINT_RESTORE
.setopt
definition file header for the yum
bootstrap agent. The setopt
value is passed to yum / dnf
using the --setopt
flag. This permits setting e.g. install_weak_deps=False
to bootstrap recent versions of Fedora, where systemd
(a weak dependency) cannot install correctly in the container. See examples/Fedora
for an example definition file.yum
bootstrap of an older distro may fail if the host rpm _db_backend
is not bdb
.remote get-login-password
command allows users to retrieve a remote's token. This enables piping the secret directly into docker login while preventing it from showing up in a shell's history.--rocm
mode, the whole of /dev/dri
is now bound into the container when --contain
is in use. This makes /dev/dri/render
devices available, required for later ROCm versions.--env-file
.--nv/--rocm
when duplicate <library>.so[.version]
files are listed by ldconfig -p
.DOCKER_HOST
is honored in non-build flows.apptainer.conf
comment, to refer to correct file as source of default capabilities when root default capabilities = file
.--workdir
and --scratch
options when the former is given a relative path.