Systemd Versions Save

The systemd System and Service Manager

v255

4 months ago

systemd System and Service Manager

CHANGES WITH 255:

Announcements of Future Feature Removals and Incompatible Changes:

    * Support for split-usr (/usr/ mounted separately during late boot,
      instead of being mounted by the initrd before switching to the rootfs)
      and unmerged-usr (parallel directories /bin/ and /usr/bin/, /lib/ and
      /usr/lib/, …) has been removed. For more details, see:
      https://lists.freedesktop.org/archives/systemd-devel/2022-September/048352.html

    * We intend to remove cgroup v1 support from a systemd release after
      the end of 2023. If you run services that make explicit use of
      cgroup v1 features (i.e. the "legacy hierarchy" with separate
      hierarchies for each controller), please implement compatibility with
      cgroup v2 (i.e. the "unified hierarchy") sooner rather than later.
      Most of Linux userspace has been ported over already.

    * Support for System V service scripts is now deprecated and will be
      removed in a future release. Please make sure to update your software
      *now* to include a native systemd unit file instead of a legacy
      System V script to retain compatibility with future systemd releases.

    * Support for the SystemdOptions EFI variable is deprecated.
      'bootctl systemd-efi-options' will emit a warning when used. It seems
      that this feature is little-used and it is better to use alternative
      approaches like credentials and confexts. The plan is to drop support
      altogether at a later point, but this might be revisited based on
      user feedback.

    * systemd-run's switch --expand-environment= which currently is disabled
      by default when combined with --scope, will be changed in a future
      release to be enabled by default.

    * "systemctl switch-root" is now restricted to initrd transitions only.

      Transitions between real systems should be done with
      "systemctl soft-reboot" instead.

    * The "ip=off" and "ip=none" kernel command line options interpreted by
      systemd-network-generator will now result in IPv6RA + link-local
      addressing being disabled, too. Previously DHCP was turned off, but
      IPv6RA and IPv6 link-local addressing was left enabled.

    * The NAMING_BRIDGE_MULTIFUNCTION_SLOT naming scheme has been deprecated
      and is now disabled.

    * SuspendMode=, HibernateState= and HybridSleepState= in the [Sleep]
      section of systemd-sleep.conf are now deprecated and have no effect.
      They did not (and could not) take any value other than the respective
      default. HybridSleepMode= is also deprecated, and will now always use
      the 'suspend' disk mode.

Service Manager:

    * The way services are spawned has been overhauled. Previously, a
      process was forked that shared all of the manager's memory (via
      copy-on-write) while doing all the required setup (e.g.: mount
      namespaces, CGroup configuration, etc.) before exec'ing the target
      executable. This was problematic for various reasons: several glibc
      APIs were called that are not supposed to be used after a fork but
      before an exec, copy-on-write meant that if either process (the
      manager or the child) touched a memory page a copy was triggered, and
      also the memory footprint of the child process was that of the
      manager, but with the memory limits of the service. From this version
      onward, the new process is spawned using CLONE_VM and CLONE_VFORK
      semantics via posix_spawn(3), and it immediately execs a new internal
      binary, systemd-executor, that receives the configuration to apply
      via memfd, and sets up the process before exec'ing the target
      executable. The systemd-executor binary is pinned by file descriptor
      by each manager instance (system and users), and the reference is
      updated on daemon-reexec - it is thus important to reexec all running
      manager instances when the systemd-executor and/or libsystemd*
      libraries are updated on the filesystem.

    * Most of the internal process tracking is being changed to use PIDFDs
      instead of PIDs when the kernel supports it, to improve robustness
      and reliability.

    * A new option SurviveFinalKillSignal= can be used to configure the
      unit to be skipped in the final SIGTERM/SIGKILL spree on shutdown.
      This is part of the required configuration to let a unit's processes
      survive a soft-reboot operation.

    * System extension images (sysext) can now set
      EXTENSION_RELOAD_MANAGER=1 in their extension-release files to
      automatically reload the service manager (PID 1) when
      merging/refreshing/unmerging on boot. Generally, while this can be
      used to ship services in system extension images it's recommended to
      do that via portable services instead.

    * The ExtensionImages= and ExtensionDirectories= options now support
      confexts images/directories.

    * A new option NFTSet= provides a method for integrating dynamic cgroup
      IDs into firewall rules with NFT sets. The benefit of using this
      setting is to be able to use control group as a selector in firewall
      rules easily and this in turn allows more fine grained filtering.
      Also, NFT rules for cgroup matching use numeric cgroup IDs, which
      change every time a service is restarted, making them hard to use in
      systemd environment.

    * A new option CoredumpReceive= can be set for service and scope units,
      together with Delegate=yes, to make systemd-coredump on the host
      forward core files from processes crashing inside the delegated
      CGroup subtree to systemd-coredump running in the container. This new
      option is by default used by systemd-nspawn containers that use the
      "--boot" switch.

    * A new ConditionSecurity=measured-uki option is now available, to ensure
      a unit can only run when the system has been booted from a measured UKI.

    * MemoryAvailable= now considers physical memory if there are no CGroup
      memory limits set anywhere in the tree.

    * The $USER environment variable is now always set for services, while
      previously it was only set if User= was specified. A new option
      SetLoginEnvironment= is now supported to determine whether to also set
      $HOME, $LOGNAME, and $SHELL.

    * Socket units now support a new pair of
      PollLimitBurst=/PollLimitInterval= options to configure a limit on
      how often polling events on the file descriptors backing this unit
      will be considered within a time window.

    * Scope units can now be created using PIDFDs instead of PIDs to select
      the processes they should include.

    * Sending SIGRTMIN+18 with 0x500 as sigqueue() value will now cause the
      manager to dump the list of currently pending jobs.

    * If the kernel supports MOVE_MOUNT_BENEATH, the systemctl and
      machinectl bind and mount-image verbs will now cause the new mount to
      replace the old mount (if any), instead of overmounting it.

    * Units now have MemoryPeak, MemorySwapPeak, MemorySwapCurrent and
      MemoryZSwapCurrent properties, which respectively contain the values
      of the cgroup v2's memory.peak, memory.swap.peak, memory.swap.current
      and memory.zswap.current properties. This information is also show in
      "systemctl status" output, if available.

TPM2 Support + Disk Encryption & Authentication:

    * systemd-cryptenroll now allows specifying a PCR bank and explicit hash
      value in the --tpm2-pcrs= option.

    * systemd-cryptenroll now allows specifying a TPM2 key handle (nv
      index) to be used instead of the default SRK via the new
      --tpm2-seal-key-handle= option.

    * systemd-cryptenroll now allows TPM2 enrollment using only a TPM2
      public key (in TPM2B_PUBLIC format) – without access to the TPM2
      device itself – which enables offline sealing of LUKS images for a
      specific TPM2 chip, as long as the SRK public key is known. Pass the
      public to the tool via the new --tpm2-device-key= switch.

    * systemd-cryptsetup is now installed in /usr/bin/ and is no longer an
      internal-only executable.

    * The TPM2 Storage Root Key will now be set up, if not already present,
      by a new systemd-tpm2-setup.service early boot service. The SRK will
      be stored in PEM format and TPM2_PUBLIC format (the latter is useful
      for systemd-cryptenroll --tpm2-device-key=, as mentioned above) for
      easier access. A new "srk" verb has been added to systemd-analyze to
      allow extracting it on demand if it is already set up.

    * The internal systemd-pcrphase executable has been renamed to
      systemd-pcrextend.

    * The systemd-pcrextend tool gained a new --pcr= switch to override
      which PCR to measure into.

    * systemd-pcrextend now exposes a Varlink interface at
      io.systemd.PCRExtend that can be used to do measurements and event
      logging on demand.

    * TPM measurements are now also written to an event log at
      /run/log/systemd/tpm2-measure.log, using a derivative of the TCG
      Canonical Event Log format. Previously we'd only log them to the
      journal, where they however were subject to rotation and similar.

    * A new component "systemd-pcrlock" has been added that allows managing
      local TPM2 PCR policies for PCRs 0-7 and similar, which are hard to
      predict by the OS vendor because of the inherently local nature of
      what measurements they contain, such as firmware versions of the
      system and extension cards and suchlike. pcrlock can predict PCR
      measurements ahead of time based on various inputs, such as the local
      TPM2 event log, GPT partition tables, PE binaries, UKI kernels, and
      various other things. It can then pre-calculate a TPM2 policy from
      this, which it stores in an TPM2 NV index. TPM2 objects (such as disk
      encryption keys) can be locked against this NV index, so that they
      are locked against a specific combination of system firmware and
      state. Alternatives for each component are supported to allowlist
      multiple kernel versions or boot loader version simultaneously
      without losing access to the disk encryption keys. The tool can also
      be used to analyze and validate the local TPM2 event log.
      systemd-cryptsetup, systemd-cryptenroll, systemd-repart have all been
      updated to support such policies. There's currently no support for
      locking the system's root disk against a pcrlock policy, this will be
      added soon. Moreover, it is currently not possible to combine a
      pcrlock policy with a signed PCR policy. This component is
      experimental and its public interface is subject to change.

systemd-boot, systemd-stub, ukify, bootctl, kernel-install:

    * bootctl will now show whether the system was booted from a UKI in its
      status output.

    * systemd-boot and systemd-stub now use different project keys in their
      respective SBAT sections, so that they can be revoked individually if
      needed.

    * systemd-boot will no longer load unverified Devicetree blobs when UEFI
      SecureBoot is enabled. For more details see:
      https://github.com/systemd/systemd/security/advisories/GHSA-6m6p-rjcq-334c

    * systemd-boot gained new hotkeys to reboot and power off the system
      from the boot menu ("B" and "O"). If the "auto-poweroff" and
      "auto-reboot" options in loader.conf are set these entries are also
      shown as menu items (which is useful on devices lacking a regular
      keyboard).

    * systemd-boot gained a new configuration value "menu-disabled" for the
      set-timeout option, to allow completely disabling the boot menu,
      including the hotkey.

    * systemd-boot will now measure the content of loader.conf in TPM2
      PCR 5.

    * systemd-stub will now concatenate the content of all kernel
      command-line addons before measuring them in TPM2 PCR 12, in a single
      measurement, instead of measuring them individually.

    * systemd-stub will now measure and load Devicetree Blob addons, which
      are searched and loaded following the same model as the existing
      kernel command-line addons.

    * systemd-stub will now ignore unauthenticated kernel command line options
      passed from systemd-boot when running inside Confidential VMs with UEFI
      SecureBoot enabled.

    * systemd-stub will now load a Devicetree blob even if the firmware did
      not load any beforehand (e.g.: for ACPI systems).

    * ukify is no longer considered experimental, and now ships in /usr/bin/.

    * ukify gained a new verb inspect to describe the sections of a UKI and
      print the contents of the well-known sections.

    * ukify gained a new verb genkey to generate a set of key pairs for
      signing UKIs and their PCR data.

    * The 90-loaderentry kernel-install hook now supports installing device
      trees.

    * kernel-install now supports the --json=, --root=, --image=, and
      --image-policy= options for the inspect verb.

    * kernel-install now supports new list and add-all verbs. The former
      lists all installed kernel images (if those are available in
      /usr/lib/modules/). The latter will install all the kernels it can
      find to the ESP.

systemd-repart:

    * A new option --copy-from= has been added that synthesizes partition
      definitions from the given image, which are then applied by the
      systemd-repart algorithm.

    * A new option --copy-source= has been added, which can be used to specify
      a directory to which CopyFiles= is considered relative to.

    * New --make-ddi=confext, --make-ddi=sysext, and --make-ddi=portable
      options have been added to make it easier to generate these types of
      DDIs, without having to provide repart.d definitions for them.

    * The dm-verity salt and UUID will now be derived from the specified
      seed value.

    * New VerityDataBlockSizeBytes= and VerityHashBlockSizeBytes= can now be
      configured in repart.d/ configuration files.

    * A new Subvolumes= setting is now supported in repart.d/ configuration
      files, to indicate which directories in the target partition should be
      btrfs subvolumes.

    * A new --tpm2-device-key= option can be used to lock a disk against a
      specific TPM2 public key. This matches the same switch the
      systemd-cryptenroll tool now supports (see above).

Journal:

    * The journalctl --lines= parameter now accepts +N to show the oldest N
      entries instead of the newest.

    * journald now ensures that sealing happens once per epoch, and sets a
      new compatibility flag to distinguish old journal files that were
      created before this change, for backward compatibility.

Device Management:

    * udev will now create symlinks to loopback block devices in the
      /dev/disk/by-loop-ref/ directory that are based on the .lo_file_name
      string field selected during allocation. The systemd-dissect tool and
      the util-linux losetup command now supports a complementing new switch
      --loop-ref= for selecting the string. This means a loopback block
      device may now be allocated under a caller-chosen reference and can
      subsequently be referenced without first having to look up the block
      device name the caller ended up with.

    * udev also creates symlinks to loopback block devices in the
      /dev/disk/by-loop-inode/ directory based on the .st_dev/st_ino fields
      of the inode attached to the loopback block device. This means that
      attaching a file to a loopback device will implicitly make a handle
      available to be found via that file's inode information.

    * udevadm info gained support for JSON output via a new --json= flag, and
      for filtering output using the same mechanism that udevadm trigger
      already implements.

    * The predictable network interface naming logic is extended to include
      the SR-IOV-R "representor" information in network interface names.
      This feature was intended for v254, but even though the code was
      merged, the part that actually enabled the feature was forgotten.
      It is now enabled by default and is part of the new "v255" naming
      scheme.

    * A new hwdb/rules file has been added that sets the
      ID_NET_AUTO_LINK_LOCAL_ONLY=1 udev property on all network interfaces
      that should usually only be configured with link-local addressing
      (IPv4LL + IPv6LL), i.e. for PC-to-PC cables ("laplink") or
      Thunderbolt networking. systemd-networkd and NetworkManager (soon)
      will make use of this information to apply an appropriate network
      configuration by default.

    * The ID_NET_DRIVER property on network interfaces is now set
      relatively early in the udev rule set so that other rules may rely on
      its use. This is implemented in a new "net-driver" udev built-in.

Network Management:

    * The "duid-only" option for DHCPv4 client's ClientIdentifier= setting
      is now dropped, as it never worked, hence it should not be used by
      anyone.

    * The 'prefixstable' ipv6 address generation mode now considers the SSID
      when generating stable addresses, so that a different stable address
      is used when roaming between wireless networks. If you already use
      'prefixstable' addresses with wireless networks, the stable address
      will be changed by the update.

    * The DHCPv4 client gained a RapidCommit option, true by default, which
      enables RFC4039 Rapid Commit behavior to obtain a lease in a
      simplified 2-message exchange instead of the typical 4-message
      exchange, if also supported by the DHCP server.

    * The DHCPv4 client gained new InitialCongestionWindow= and
      InitialAdvertisedReceiveWindow= options for route configurations.

    * The DHCPv4 client gained a new RequestAddress= option that allows
      to send a preferred IP address in the initial DHCPDISCOVER message.

    * The DHCPv4 server and client gained support for IPv6-only mode
      (RFC8925).

    * The SendHostname= and Hostname= options are now available for the
      DHCPv6 client, independently of the DHCPv4= option, so that these
      configuration values can be set independently for each client.

    * The DHCPv4 and DHCPv6 client state can now be queried via D-Bus,
      including lease information.

    * The DHCPv6 client can now be configured to use a custom DUID type.

    * .network files gained a new IPv4ReversePathFilter= setting in the
      [Network] section, to control sysctl's rp_filter setting.

    * .network files gaiend a new HopLimit= setting in the [Route] section,
      to configure a per-route hop limit.

    * .network files gained a new TCPRetransmissionTimeoutSec= setting in
      the [Route] section, to configure a per-route TCP retransmission
      timeout.

    * A new directive NFTSet= provides a method for integrating network
      configuration into firewall rules with NFT sets. The benefit of using
      this setting is that static network configuration or dynamically
      obtained network addresses can be used in firewall rules with the
      indirection of NFT set types.

    * The [IPv6AcceptRA] section supports the following new options:
      UsePREF64=, UseHopLimit=, UseICMP6RateLimit=, and NFTSet=.

    * The [IPv6SendRA] section supports the following new options:
      RetransmitSec=, HopLimit=, HomeAgent=, HomeAgentLifetimeSec=, and
      HomeAgentPreference=.

    * A new [IPv6PREF64Prefix] set of options, containing Prefix= and
      LifetimeSec=, has been introduced to append pref64 options in router
      advertisements (RFC8781).

    * The network generator now configures the interfaces with only
      link-local addressing if "ip=link-local" is specified on the kernel
      command line.

    * The prefix of the configuration files generated by the network
      generator from the kernel command line is now prefixed with '70-',
      to make them have higher precedence over the default configuration
      files.

    * Added a new -Ddefault-network=BOOL meson option, that causes more
      .network files to be installed as enabled by default. These configuration
      files will which match generic setups, e.g. 89-ethernet.network matches
      all Ethernet interfaces and enables both DHCPv4 and DHCPv6 clients.

    * If a ID_NET_MANAGED_BY= udev property is set on a network device and
      it is any other string than "io.systemd.Network" then networkd will
      not manage this device. This may be used to allow multiple network
      management services to run in parallel and assign ownership of
      specific devices explicitly. NetworkManager will soon implement a
      similar logic.

systemctl:

    * systemctl is-failed now checks the system state if no unit is
      specified.

    * systemctl will now automatically soft-reboot if a new root file system
      is found under /run/nextroot/ when a reboot operation is invoked.

    Login management:

    * Wall messages now work even when utmp support is disabled, using
      systemd-logind to query the necessary information.

    * systemd-logind now sends a new PrepareForShutdownWithMetadata D-Bus
      signal before shutdown/reboot/soft-reboot that includes additional
      information compared to the PrepareForShutdown signal. Currently the
      additional information is the type of operation that is about to be
      executed.

Hibernation & Suspend:

    * The kernel and OS versions will no longer be checked on resume from
      hibernation.

    * Hibernation into swap files backed by btrfs are now
      supported. (Previously this was supported only for other file
      systems.)

Other:

    * A new systemd-vmspawn tool has been added, that aims to provide for VMs
      the same interfaces and functionality that systemd-nspawn provides for
      containers. For now it supports QEMU as a backend, and exposes some of
      its options to the user. This component is experimental and its public
      interface is subject to change.

    * "systemd-analyze plot" has gained tooltips on each unit name with
      related-unit information in its svg output, such as Before=,
      Requires=, and similar properties.

    * A new varlinkctl tool has been added to allow interfacing with
      Varlink services, and introspection has been added to all such
      services. This component is experimental and its public interface is
      subject to change.

    * systemd-sysext and systemd-confext now expose a Varlink service
      at io.systemd.sysext.

    * portable services now accept confexts as extensions.

    * systemd-sysupdate now accepts directories in the MatchPattern= option.

    * systemd-run will now output the invocation ID of the launched
      transient unit and its peak memory usage.

    * systemd-analyze, systemd-tmpfiles, systemd-sysusers, systemd-sysctl,
      and systemd-binfmt gained a new --tldr option that can be used instead
      of --cat-config to suppress uninteresting configuration lines, such as
      comments and whitespace.

    * resolvectl gained a new "show-server-state" command that shows
      current statistics of the resolver. This is backed by a new
      DumpStatistics() Varlink method provided by systemd-resolved.

    * systemd-timesyncd will now emit a D-Bus signal when the LinkNTPServers
      property changes.

    * vconsole now supports KEYMAP=@kernel for preserving the kernel keymap
      as-is.

    * seccomp now supports the LoongArch64 architecture.

    * seccomp may now be enabled for services running as a non-root User=
      without NoNewPrivileges=yes.

    * systemd-id128 now supports a new -P option to show only values. The
      combination of -P and --app options is also supported.

    * A new pam_systemd_loadkey.so PAM module is now available, which will
      automatically fetch the passphrase used by cryptsetup to unlock the
      root file system and set it as the PAM authtok. This enables, among
      other things, configuring auto-unlock of the GNOME Keyring / KDE
      Wallet when autologin is configured.

    * Many meson options now use the 'feature' type, which means they
      take enabled/disabled/auto as values.

    * A new meson option -Dconfigfiledir= can be used to change where
      configuration files with default values are installed to.

    * Options and verbs in man pages are now tagged with the version they
      were first introduced in.

    * A new component "systemd-storagetm" has been added, which exposes all
      local block devices as NVMe-TCP devices, fully automatically. It's
      hooked into a new target unit storage-target-mode.target that is
      suppsoed to be booted into via
      rd.systemd.unit=storage-target-mode.target on the kernel command
      line. This is intended to be used for installers and debugging to
      quickly get access to the local disk. It's inspired by MacOS "target
      disk mode". This component is experimental and its public interface is
      subject to change.

    * A new component "systemd-bsod" has been added, which can show logged
      error messages full screen, if they have a log level of LOG_EMERG log
      level. This component is experimental and its public interface is
      subject to change.

    * The systemd-dissect tool's --with command will now set the
      $SYSTEMD_DISSECT_DEVICE environment variable to the block device it
      operates on for the invoked process.

    * The systemd-mount tool gained a new --tmpfs switch for mounting a new
      'tmpfs' instance. This is useful since it does so via .mount units
      and thus can be executed remotely or in containers.

    * The various tools in systemd that take "verbs" (such as systemctl,
      loginctl, machinectl, …) now will suggest a close verb name in case
      the user specified an unrecognized one.

    * libsystemd now exports a new function sd_id128_get_app_specific()
      that generates "app-specific" 128bit IDs from any ID. It's similar to
      sd_id128_get_machine_app_specific() and
      sd_id128_get_boot_app_specific() but takes the ID to base calculation
      on as input. This new functionality is also exposed in the
      "systemd-id128" tool where you can now combine --app= with `show`.

    * All tools that parse timestamps now can also parse RFC3339 style
      timestamps that include the "T" and Z" characters.

    * New documentation has been added:

      https://systemd.io/FILE_DESCRIPTOR_STORE
      https://systemd.io/TPM2_PCR_MEASUREMENTS
      https://systemd.io/MOUNT_REQUIREMENTS

    * The codebase now recognizes the suffix .confext.raw and .sysext.raw
      as alternative to the .raw suffix generally accepted for DDIs. It is
      recommended to name configuration extensions and system extensions
      with such suffixes, to indicate their purpose in the name.

    * The sd-device API gained a new function
      sd_device_enumerator_add_match_property_required() which allows
      configuring matches on properties that are strictly required. This is
      different from the existing sd_device_enumerator_add_match_property()
      matches of which one one needs to apply.

    * The MAC address the veth side of an nspawn container shall get
      assigned may now be controlled via the $SYSTEMD_NSPAWN_NETWORK_MAC
      environment variable.

    * The libiptc dependency is now implemented via dlopen(), so that tools
      such as networkd and nspawn no longer have a hard dependency on the
      shared library when compiled with support for libiptc.

    * New rpm macros have been added: %systemd_user_daemon_reexec does
      daemon-reexec for all user managers, and %systemd_postun_with_reload
      and %systemd_user_postun_with_reload do a reload for system and user
      units on upgrades.

    * coredumpctl now propagates SIGTERM to the debugger process.

Contributors:

    Contributions from: 김인수, Abderrahim Kitouni, Adam Goldman,
    Adam Williamson, Alexandre Peixoto Ferreira, Alex Hudspith,
    Alvin Alvarado, André Paiusco, Antonio Alvarez Feijoo,
    Anton Lundin, Arian van Putten, Arseny Maslennikov, Arthur Shau,
    Balázs Úr, beh_10257, Benjamin Peterson, Bertrand Jacquin,
    Brian Norris, Charles Lee, Cheng-Chia Tseng, Chris Patterson,
    Christian Hergert, Christian Hesse, Christian Kirbach,
    Clayton Craft, commondservice, cunshunxia, Curtis Klein, cvlc12,
    Daan De Meyer, Daniele Medri, Daniel P. Berrangé, Daniel Rusek,
    Daniel Thompson, Dan Nicholson, Dan Streetman, David Rheinsberg,
    David Santamaría Rogado, David Tardon, dependabot[bot],
    Diego Viola, Dmitry V. Levin, Emanuele Giuseppe Esposito,
    Emil Renner Berthing, Emil Velikov, Etienne Dechamps, Fabian Vogt,
    felixdoerre, Felix Dörre, Florian Schmaus, Franck Bui,
    Frantisek Sumsal, G2-Games, Gioele Barabucci, Hugo Carvalho,
    huyubiao, Iago López Galeiras, IllusionMan1212, Jade Lovelace,
    janana, Jan Janssen, Jan Kuparinen, Jan Macku, Jeremy Fleischman,
    Jin Liu, jjimbo137, Joerg Behrmann, Johannes Segitz, Jordan Rome,
    Jordan Williams, Julien Malka, Juno Computers, Khem Raj, khm,
    Kingbom Dou, Kiran Vemula, Krzesimir Nowak, Laszlo Gombos,
    Lennart Poettering, linuxlion, Luca Boccassi, Lucas Adriano Salles,
    Lukas, Lukáš Nykrýn, Maanya Goenka, Maarten, Malte Poll,
    Marc Pervaz Boocha, Martin Beneš, Martin Joerg, Martin Wilck,
    Mathieu Tortuyaux, Matthias Schiffer, Maxim Mikityanskiy,
    Max Kellermann, Michael A Cassaniti, Michael Biebl, Michael Kuhn,
    Michael Vasseur, Michal Koutný, Michal Sekletár, Mike Yuan,
    Milton D. Miller II, mordner, msizanoen, NAHO, Nandakumar Raghavan,
    Neil Wilson, Nick Rosbrook, Nils K, NRK, Oğuz Ersen,
    Omojola Joshua, onenowy, Paul Meyer, Paymon MARANDI, pelaufer,
    Peter Hutterer, PhylLu, Pierre GRASSER, Piotr Drąg, Priit Laes,
    Rahil Bhimjiani, Raito Bezarius, Raul Cheleguini, Reto Schneider,
    Richard Maw, Robby Red, RoepLuke, Roland Hieber, Roland Singer,
    Ronan Pigott, Sam James, Sam Leonard, Sergey A, Susant Sahani,
    Sven Joachim, Tad Fisher, Takashi Sakamoto, Thorsten Kukuk, Tj,
    Tomasz Świątek, Topi Miettinen, Valentin David,
    Valentin Lefebvre, Victor Westerhuis, Vincent Haupert,
    Vishal Chillara Srinivas, Vito Caputo, Warren, Weblate,
    Xiaotian Wu, xinpeng wang, Yaron Shahrabani, Yo-Jung Lin,
    Yu Watanabe, Zbigniew Jędrzejewski-Szmek, zeroskyx,
    Дамјан Георгиевски, наб

    — Edinburgh, 2023-12-06

v255-rc4

5 months ago

systemd System and Service Manager

CHANGES WITH 255 in spe:

Announcements of Future Feature Removals and Incompatible Changes:

    * Support for split-usr (/usr/ mounted separately during late boot,
      instead of being mounted by the initrd before switching to the rootfs)
      and unmerged-usr (parallel directories /bin/ and /usr/bin/, /lib/ and
      /usr/lib/, …) has been removed. For more details, see:
      https://lists.freedesktop.org/archives/systemd-devel/2022-September/048352.html

    * We intend to remove cgroup v1 support from a systemd release after
      the end of 2023. If you run services that make explicit use of
      cgroup v1 features (i.e. the "legacy hierarchy" with separate
      hierarchies for each controller), please implement compatibility with
      cgroup v2 (i.e. the "unified hierarchy") sooner rather than later.
      Most of Linux userspace has been ported over already.

    * Support for System V service scripts is now deprecated and will be
      removed in a future release. Please make sure to update your software
      *now* to include a native systemd unit file instead of a legacy
      System V script to retain compatibility with future systemd releases.

    * Support for the SystemdOptions EFI variable is deprecated.
      'bootctl systemd-efi-options' will emit a warning when used. It seems
      that this feature is little-used and it is better to use alternative
      approaches like credentials and confexts. The plan is to drop support
      altogether at a later point, but this might be revisited based on
      user feedback.

    * systemd-run's switch --expand-environment= which currently is disabled
      by default when combined with --scope, will be changed in a future
      release to be enabled by default.

    * "systemctl switch-root" is now restricted to initrd transitions only.

      Transitions between real systems should be done with
      "systemctl soft-reboot" instead.

    * The "ip=off" and "ip=none" kernel command line options interpreted by
      systemd-network-generator will now result in IPv6RA + link-local
      addressing being disabled, too. Previously DHCP was turned off, but
      IPv6RA and IPv6 link-local addressing was left enabled.

    * The NAMING_BRIDGE_MULTIFUNCTION_SLOT naming scheme has been deprecated
      and is now disabled.

    * SuspendMode=, HibernateState= and HybridSleepState= in the [Sleep]
      section of systemd-sleep.conf are now deprecated and have no effect.
      They did not (and could not) take any value other than the respective
      default. HybridSleepMode= is also deprecated, and will now always use
      the 'suspend' disk mode.

Service Manager:

    * The way services are spawned has been overhauled. Previously, a
      process was forked that shared all of the manager's memory (via
      copy-on-write) while doing all the required setup (e.g.: mount
      namespaces, CGroup configuration, etc.) before exec'ing the target
      executable. This was problematic for various reasons: several glibc
      APIs were called that are not supposed to be used after a fork but
      before an exec, copy-on-write meant that if either process (the
      manager or the child) touched a memory page a copy was triggered, and
      also the memory footprint of the child process was that of the
      manager, but with the memory limits of the service. From this version
      onward, the new process is spawned using CLONE_VM and CLONE_VFORK
      semantics via posix_spawn(3), and it immediately execs a new internal
      binary, systemd-executor, that receives the configuration to apply
      via memfd, and sets up the process before exec'ing the target
      executable. The systemd-executor binary is pinned by file descriptor
      by each manager instance (system and users), and the reference is
      updated on daemon-reexec - it is thus important to reexec all running
      manager instances when the systemd-executor and/or libsystemd*
      libraries are updated on the filesystem.

    * Most of the internal process tracking is being changed to use PIDFDs
      instead of PIDs when the kernel supports it, to improve robustness
      and reliability.

    * A new option SurviveFinalKillSignal= can be used to configure the
      unit to be skipped in the final SIGTERM/SIGKILL spree on shutdown.
      This is part of the required configuration to let a unit's processes
      survive a soft-reboot operation.

    * System extension images (sysext) can now set
      EXTENSION_RELOAD_MANAGER=1 in their extension-release files to
      automatically reload the service manager (PID 1) when
      merging/refreshing/unmerging on boot. Generally, while this can be
      used to ship services in system extension images it's recommended to
      do that via portable services instead.

    * The ExtensionImages= and ExtensionDirectories= options now support
      confexts images/directories.

    * A new option NFTSet= provides a method for integrating dynamic cgroup
      IDs into firewall rules with NFT sets. The benefit of using this
      setting is to be able to use control group as a selector in firewall
      rules easily and this in turn allows more fine grained filtering.
      Also, NFT rules for cgroup matching use numeric cgroup IDs, which
      change every time a service is restarted, making them hard to use in
      systemd environment.

    * A new option CoredumpReceive= can be set for service and scope units,
      together with Delegate=yes, to make systemd-coredump on the host
      forward core files from processes crashing inside the delegated
      CGroup subtree to systemd-coredump running in the container. This new
      option is by default used by systemd-nspawn containers that use the
      "--boot" switch.

    * A new ConditionSecurity=measured-uki option is now available, to ensure
      a unit can only run when the system has been booted from a measured UKI.

    * MemoryAvailable= now considers physical memory if there are no CGroup
      memory limits set anywhere in the tree.

    * The $USER environment variable is now always set for services, while
      previously it was only set if User= was specified. A new option
      SetLoginEnvironment= is now supported to determine whether to also set
      $HOME, $LOGNAME, and $SHELL.

    * Socket units now support a new pair of
      PollLimitBurst=/PollLimitInterval= options to configure a limit on
      how often polling events on the file descriptors backing this unit
      will be considered within a time window.

    * Scope units can now be created using PIDFDs instead of PIDs to select
      the processes they should include.

    * Sending SIGRTMIN+18 with 0x500 as sigqueue() value will now cause the
      manager to dump the list of currently pending jobs.

    * If the kernel supports MOVE_MOUNT_BENEATH, the systemctl and
      machinectl bind and mount-image verbs will now cause the new mount to
      replace the old mount (if any), instead of overmounting it.

    * Units now have MemoryPeak, MemorySwapPeak, MemorySwapCurrent and
      MemoryZSwapCurrent properties, which respectively contain the values
      of the cgroup v2's memory.peak, memory.swap.peak, memory.swap.current
      and memory.zswap.current properties. This information is also show in
      "systemctl status" output, if available.

TPM2 Support + Disk Encryption & Authentication:

    * systemd-cryptenroll now allows specifying a PCR bank and explicit hash
      value in the --tpm2-pcrs= option.

    * systemd-cryptenroll now allows specifying a TPM2 key handle (nv
      index) to be used instead of the default SRK via the new
      --tpm2-seal-key-handle= option.

    * systemd-cryptenroll now allows TPM2 enrollment using only a TPM2
      public key (in TPM2B_PUBLIC format) – without access to the TPM2
      device itself – which enables offline sealing of LUKS images for a
      specific TPM2 chip, as long as the SRK public key is known. Pass the
      public to the tool via the new --tpm2-device-key= switch.

    * systemd-cryptsetup is now installed in /usr/bin/ and is no longer an
      internal-only executable.

    * The TPM2 Storage Root Key will now be set up, if not already present,
      by a new systemd-tpm2-setup.service early boot service. The SRK will
      be stored in PEM format and TPM2_PUBLIC format (the latter is useful
      for systemd-cryptenroll --tpm2-device-key=, as mentioned above) for
      easier access. A new "srk" verb has been added to systemd-analyze to
      allow extracting it on demand if it is already set up.

    * The internal systemd-pcrphase executable has been renamed to
      systemd-pcrextend.

    * The systemd-pcrextend tool gained a new --pcr= switch to override
      which PCR to measure into.

    * systemd-pcrextend now exposes a Varlink interface at
      io.systemd.PCRExtend that can be used to do measurements and event
      logging on demand.

    * TPM measurements are now also written to an event log at
      /run/log/systemd/tpm2-measure.log, using a derivative of the TCG
      Canonical Event Log format. Previously we'd only log them to the
      journal, where they however were subject to rotation and similar.

    * A new component "systemd-pcrlock" has been added that allows managing
      local TPM2 PCR policies for PCRs 0-7 and similar, which are hard to
      predict by the OS vendor because of the inherently local nature of
      what measurements they contain, such as firmware versions of the
      system and extension cards and suchlike. pcrlock can predict PCR
      measurements ahead of time based on various inputs, such as the local
      TPM2 event log, GPT partition tables, PE binaries, UKI kernels, and
      various other things. It can then pre-calculate a TPM2 policy from
      this, which it stores in an TPM2 NV index. TPM2 objects (such as disk
      encryption keys) can be locked against this NV index, so that they
      are locked against a specific combination of system firmware and
      state. Alternatives for each component are supported to allowlist
      multiple kernel versions or boot loader version simultaneously
      without losing access to the disk encryption keys. The tool can also
      be used to analyze and validate the local TPM2 event log.
      systemd-cryptsetup, systemd-cryptenroll, systemd-repart have all been
      updated to support such policies. There's currently no support for
      locking the system's root disk against a pcrlock policy, this will be
      added soon. Moreover, it is currently not possible to combine a
      pcrlock policy with a signed PCR policy. This component is
      experimental and its public interface is subject to change.

systemd-boot, systemd-stub, ukify, bootctl, kernel-install:

    * bootctl will now show whether the system was booted from a UKI in its
      status output.

    * systemd-boot and systemd-stub now use different project keys in their
      respective SBAT sections, so that they can be revoked individually if
      needed.

    * systemd-boot will no longer load unverified Devicetree blobs when UEFI
      SecureBoot is enabled. For more details see:
      https://github.com/systemd/systemd/security/advisories/GHSA-6m6p-rjcq-334c

    * systemd-boot gained new hotkeys to reboot and power off the system
      from the boot menu ("B" and "O"). If the "auto-poweroff" and
      "auto-reboot" options in loader.conf are set these entries are also
      shown as menu items (which is useful on devices lacking a regular
      keyboard).

    * systemd-boot gained a new configuration value "menu-disabled" for the
      set-timeout option, to allow completely disabling the boot menu,
      including the hotkey.

    * systemd-boot will now measure the content of loader.conf in TPM2
      PCR 5.

    * systemd-stub will now concatenate the content of all kernel
      command-line addons before measuring them in TPM2 PCR 12, in a single
      measurement, instead of measuring them individually.

    * systemd-stub will now measure and load Devicetree Blob addons, which
      are searched and loaded following the same model as the existing
      kernel command-line addons.

    * systemd-stub will now ignore unauthenticated kernel command line options
      passed from systemd-boot when running inside Confidential VMs with UEFI
      SecureBoot enabled.

    * systemd-stub will now load a Devicetree blob even if the firmware did
      not load any beforehand (e.g.: for ACPI systems).

    * ukify is no longer considered experimental, and now ships in /usr/bin/.

    * ukify gained a new verb inspect to describe the sections of a UKI and
      print the contents of the well-known sections.

    * ukify gained a new verb genkey to generate a set of key pairs for
      signing UKIs and their PCR data.

    * The 90-loaderentry kernel-install hook now supports installing device
      trees.

    * kernel-install now supports the --json=, --root=, --image=, and
      --image-policy= options for the inspect verb.

    * kernel-install now supports new list and add-all verbs. The former
      lists all installed kernel images (if those are available in
      /usr/lib/modules/). The latter will install all the kernels it can
      find to the ESP.

systemd-repart:

    * A new option --copy-from= has been added that synthesizes partition
      definitions from the given image, which are then applied by the
      systemd-repart algorithm.

    * A new option --copy-source= has been added, which can be used to specify
      a directory to which CopyFiles= is considered relative to.

    * New --make-ddi=confext, --make-ddi=sysext, and --make-ddi=portable
      options have been added to make it easier to generate these types of
      DDIs, without having to provide repart.d definitions for them.

    * The dm-verity salt and UUID will now be derived from the specified
      seed value.

    * New VerityDataBlockSizeBytes= and VerityHashBlockSizeBytes= can now be
      configured in repart.d/ configuration files.

    * A new Subvolumes= setting is now supported in repart.d/ configuration
      files, to indicate which directories in the target partition should be
      btrfs subvolumes.

    * A new --tpm2-device-key= option can be used to lock a disk against a
      specific TPM2 public key. This matches the same switch the
      systemd-cryptenroll tool now supports (see above).

Journal:

    * The journalctl --lines= parameter now accepts +N to show the oldest N
      entries instead of the newest.

    * journald now ensures that sealing happens once per epoch, and sets a
      new compatibility flag to distinguish old journal files that were
      created before this change, for backward compatibility.

Device Management:

    * udev will now create symlinks to loopback block devices in the
      /dev/disk/by-loop-ref/ directory that are based on the .lo_file_name
      string field selected during allocation. The systemd-dissect tool and
      the util-linux losetup command now supports a complementing new switch
      --loop-ref= for selecting the string. This means a loopback block
      device may now be allocated under a caller-chosen reference and can
      subsequently be referenced without first having to look up the block
      device name the caller ended up with.

    * udev also creates symlinks to loopback block devices in the
      /dev/disk/by-loop-inode/ directory based on the .st_dev/st_ino fields
      of the inode attached to the loopback block device. This means that
      attaching a file to a loopback device will implicitly make a handle
      available to be found via that file's inode information.

    * udevadm info gained support for JSON output via a new --json= flag, and
      for filtering output using the same mechanism that udevadm trigger
      already implements.

    * The predictable network interface naming logic is extended to include
      the SR-IOV-R "representor" information in network interface names.
      This feature was intended for v254, but even though the code was
      merged, the part that actually enabled the feature was forgotten.
      It is now enabled by default and is part of the new "v255" naming
      scheme.

    * A new hwdb/rules file has been added that sets the
      ID_NET_AUTO_LINK_LOCAL_ONLY=1 udev property on all network interfaces
      that should usually only be configured with link-local addressing
      (IPv4LL + IPv6LL), i.e. for PC-to-PC cables ("laplink") or
      Thunderbolt networking. systemd-networkd and NetworkManager (soon)
      will make use of this information to apply an appropriate network
      configuration by default.

    * The ID_NET_DRIVER property on network interfaces is now set
      relatively early in the udev rule set so that other rules may rely on
      its use. This is implemented in a new "net-driver" udev built-in.

Network Management:

    * The "duid-only" option for DHCPv4 client's ClientIdentifier= setting
      is now dropped, as it never worked, hence it should not be used by
      anyone.

    * The 'prefixstable' ipv6 address generation mode now considers the SSID
      when generating stable addresses, so that a different stable address
      is used when roaming between wireless networks. If you already use
      'prefixstable' addresses with wireless networks, the stable address
      will be changed by the update.

    * The DHCPv4 client gained a RapidCommit option, true by default, which
      enables RFC4039 Rapid Commit behavior to obtain a lease in a
      simplified 2-message exchange instead of the typical 4-message
      exchange, if also supported by the DHCP server.

    * The DHCPv4 client gained new InitialCongestionWindow= and
      InitialAdvertisedReceiveWindow= options for route configurations.

    * The DHCPv4 client gained a new RequestAddress= option that allows
      to send a preferred IP address in the initial DHCPDISCOVER message.

    * The DHCPv4 server and client gained support for IPv6-only mode
      (RFC8925).

    * The SendHostname= and Hostname= options are now available for the
      DHCPv6 client, independently of the DHCPv4= option, so that these
      configuration values can be set independently for each client.

    * The DHCPv4 and DHCPv6 client state can now be queried via D-Bus,
      including lease information.

    * The DHCPv6 client can now be configured to use a custom DUID type.

    * .network files gained a new IPv4ReversePathFilter= setting in the
      [Network] section, to control sysctl's rp_filter setting.

    * .network files gaiend a new HopLimit= setting in the [Route] section,
      to configure a per-route hop limit.

    * .network files gained a new TCPRetransmissionTimeoutSec= setting in
      the [Route] section, to configure a per-route TCP retransmission
      timeout.

    * A new directive NFTSet= provides a method for integrating network
      configuration into firewall rules with NFT sets. The benefit of using
      this setting is that static network configuration or dynamically
      obtained network addresses can be used in firewall rules with the
      indirection of NFT set types.

    * The [IPv6AcceptRA] section supports the following new options:
      UsePREF64=, UseHopLimit=, UseICMP6RateLimit=, and NFTSet=.

    * The [IPv6SendRA] section supports the following new options:
      RetransmitSec=, HopLimit=, HomeAgent=, HomeAgentLifetimeSec=, and
      HomeAgentPreference=.

    * A new [IPv6PREF64Prefix] set of options, containing Prefix= and
      LifetimeSec=, has been introduced to append pref64 options in router
      advertisements (RFC8781).

    * The network generator now configures the interfaces with only
      link-local addressing if "ip=link-local" is specified on the kernel
      command line.

    * The prefix of the configuration files generated by the network
      generator from the kernel command line is now prefixed with '70-',
      to make them have higher precedence over the default configuration
      files.

    * Added a new -Ddefault-network=BOOL meson option, that causes more
      .network files to be installed as enabled by default. These configuration
      files will which match generic setups, e.g. 89-ethernet.network matches
      all Ethernet interfaces and enables both DHCPv4 and DHCPv6 clients.

    * If a ID_NET_MANAGED_BY= udev property is set on a network device and
      it is any other string than "io.systemd.Network" then networkd will
      not manage this device. This may be used to allow multiple network
      management services to run in parallel and assign ownership of
      specific devices explicitly. NetworkManager will soon implement a
      similar logic.

systemctl:

    * systemctl is-failed now checks the system state if no unit is
      specified.

    * systemctl will now automatically soft-reboot if a new root file system
      is found under /run/nextroot/ when a reboot operation is invoked.

Login management:

    * Wall messages now work even when utmp support is disabled, using
      systemd-logind to query the necessary information.

    * systemd-logind now sends a new PrepareForShutdownWithMetadata D-Bus
      signal before shutdown/reboot/soft-reboot that includes additional
      information compared to the PrepareForShutdown signal. Currently the
      additional information is the type of operation that is about to be
      executed.

Hibernation & Suspend:

    * The kernel and OS versions will no longer be checked on resume from
      hibernation.

    * Hibernation into swap files backed by btrfs are now
      supported. (Previously this was supported only for other file
      systems.)

Other:

    * A new systemd-vmspawn tool has been added, that aims to provide for VMs
      the same interfaces and functionality that systemd-nspawn provides for
      containers. For now it supports QEMU as a backend, and exposes some of
      its options to the user. This component is experimental and its public
      interface is subject to change.

    * "systemd-analyze plot" has gained tooltips on each unit name with
      related-unit information in its svg output, such as Before=,
      Requires=, and similar properties.

    * A new varlinkctl tool has been added to allow interfacing with
      Varlink services, and introspection has been added to all such
      services.

    * systemd-sysext and systemd-confext now expose a Varlink service
      at io.systemd.sysext.

    * portable services now accept confexts as extensions.

    * systemd-sysupdate now accepts directories in the MatchPattern= option.

    * systemd-run will now output the invocation ID of the launched
      transient unit and its peak memory usage.

    * systemd-analyze, systemd-tmpfiles, systemd-sysusers, systemd-sysctl,
      and systemd-binfmt gained a new --tldr option that can be used instead
      of --cat-config to suppress uninteresting configuration lines, such as
      comments and whitespace.

    * resolvectl gained a new "show-server-state" command that shows
      current statistics of the resolver. This is backed by a new
      DumpStatistics() Varlink method provided by systemd-resolved.

    * systemd-timesyncd will now emit a D-Bus signal when the LinkNTPServers
      property changes.

    * vconsole now supports KEYMAP=@kernel for preserving the kernel keymap
      as-is.

    * seccomp now supports the LoongArch64 architecture.

    * seccomp may now be enabled for services running as a non-root User=
      without NoNewPrivileges=yes.

    * systemd-id128 now supports a new -P option to show only values. The
      combination of -P and --app options is also supported.

    * A new pam_systemd_loadkey.so PAM module is now available, which will
      automatically fetch the passphrase used by cryptsetup to unlock the
      root file system and set it as the PAM authtok. This enables, among
      other things, configuring auto-unlock of the GNOME Keyring / KDE
      Wallet when autologin is configured.

    * Many meson options now use the 'feature' type, which means they
      take enabled/disabled/auto as values.

    * A new meson option -Dconfigfiledir= can be used to change where
      configuration files with default values are installed to.

    * Options and verbs in man pages are now tagged with the version they
      were first introduced in.

    * A new component "systemd-storagetm" has been added, which exposes all
      local block devices as NVMe-TCP devices, fully automatically. It's
      hooked into a new target unit storage-target-mode.target that is
      suppsoed to be booted into via
      rd.systemd.unit=storage-target-mode.target on the kernel command
      line. This is intended to be used for installers and debugging to
      quickly get access to the local disk. It's inspired by MacOS "target
      disk mode".

    * A new component "systemd-bsod" has been added, which can show logged
      error messages full screen, if they have a log level of LOG_EMERG log
      level.

    * The systemd-dissect tool's --with command will now set the
      $SYSTEMD_DISSECT_DEVICE environment variable to the block device it
      operates on for the invoked process.

    * The systemd-mount tool gained a new --tmpfs switch for mounting a new
      'tmpfs' instance. This is useful since it does so via .mount units
      and thus can be executed remotely or in containers.

    * The various tools in systemd that take "verbs" (such as systemctl,
      loginctl, machinectl, …) now will suggest a close verb name in case
      the user specified an unrecognized one.

    * libsystemd now exports a new function sd_id128_get_app_specific()
      that generates "app-specific" 128bit IDs from any ID. It's similar to
      sd_id128_get_machine_app_specific() and
      sd_id128_get_boot_app_specific() but takes the ID to base calculation
      on as input. This new functionality is also exposed in the
      "systemd-id128" tool where you can now combine --app= with `show`.

    * All tools that parse timestamps now can also parse RFC3339 style
      timestamps that include the "T" and Z" characters.

    * New documentation has been added:

      https://systemd.io/FILE_DESCRIPTOR_STORE
      https://systemd.io/TPM2_PCR_MEASUREMENTS
      https://systemd.io/MOUNT_REQUIREMENTS

    * The codebase now recognizes the suffix .confext.raw and .sysext.raw
      as alternative to the .raw suffix generally accepted for DDIs. It is
      recommended to name configuration extensions and system extensions
      with such suffixes, to indicate their purpose in the name.

    * The sd-device API gained a new function
      sd_device_enumerator_add_match_property_required() which allows
      configuring matches on properties that are strictly required. This is
      different from the existing sd_device_enumerator_add_match_property()
      matches of which one one needs to apply.

    * The MAC address the veth side of an nspawn container shall get
      assigned may now be controlled via the $SYSTEMD_NSPAWN_NETWORK_MAC
      environment variable.

    * The libiptc dependency is now implemented via dlopen(), so that tools
      such as networkd and nspawn no longer have a hard dependency on the
      shared library when compiled with support for libiptc.

    * New rpm macros have been added: %systemd_user_daemon_reexec does
      daemon-reexec for all user managers, and %systemd_postun_with_reload
      and %systemd_user_postun_with_reload do a reload for system and user
      units on upgrades.

    * coredumpctl now propagates SIGTERM to the debugger process.

Contributors

    Contributions from: 김인수, Abderrahim Kitouni, Adam Goldman,
    Adam Williamson, Alexandre Peixoto Ferreira, Alex Hudspith,
    Alvin Alvarado, André Paiusco, Antonio Alvarez Feijoo,
    Anton Lundin, Arian van Putten, Arseny Maslennikov, Arthur Shau,
    Balázs Úr, beh_10257, Benjamin Peterson, Bertrand Jacquin,
    Brian Norris, Charles Lee, Cheng-Chia Tseng, Chris Patterson,
    Christian Hergert, Christian Hesse, Christian Kirbach,
    Clayton Craft, commondservice, cunshunxia, Curtis Klein, cvlc12,
    Daan De Meyer, Daniele Medri, Daniel P. Berrangé, Daniel Rusek,
    Daniel Thompson, Dan Nicholson, Dan Streetman, David Rheinsberg,
    David Santamaría Rogado, David Tardon, dependabot[bot],
    Diego Viola, Dmitry V. Levin, Emanuele Giuseppe Esposito,
    Emil Renner Berthing, Emil Velikov, Etienne Dechamps, Fabian Vogt,
    felixdoerre, Felix Dörre, Florian Schmaus, Franck Bui,
    Frantisek Sumsal, G2-Games, Gioele Barabucci, Hugo Carvalho,
    huyubiao, Iago López Galeiras, IllusionMan1212, Jade Lovelace,
    janana, Jan Janssen, Jan Kuparinen, Jan Macku, Jeremy Fleischman,
    Jin Liu, jjimbo137, Joerg Behrmann, Johannes Segitz, Jordan Rome,
    Jordan Williams, Julien Malka, Juno Computers, Khem Raj, khm,
    Kingbom Dou, Kiran Vemula, Krzesimir Nowak, Laszlo Gombos,
    Lennart Poettering, linuxlion, Luca Boccassi, Lucas Adriano Salles,
    Lukas, Lukáš Nykrýn, Maanya Goenka, Maarten, Malte Poll,
    Marc Pervaz Boocha, Martin Beneš, Martin Joerg, Martin Wilck,
    Mathieu Tortuyaux, Matthias Schiffer, Maxim Mikityanskiy,
    Max Kellermann, Michael A Cassaniti, Michael Biebl, Michael Kuhn,
    Michael Vasseur, Michal Koutný, Michal Sekletár, Mike Yuan,
    Milton D. Miller II, mordner, msizanoen, NAHO, Nandakumar Raghavan,
    Neil Wilson, Nick Rosbrook, Nils K, NRK, Oğuz Ersen,
    Omojola Joshua, onenowy, Paul Meyer, Paymon MARANDI, pelaufer,
    Peter Hutterer, PhylLu, Pierre GRASSER, Piotr Drąg, Priit Laes,
    Rahil Bhimjiani, Raito Bezarius, Raul Cheleguini, Reto Schneider,
    Richard Maw, Robby Red, RoepLuke, Roland Hieber, Ronan Pigott,
    Sam James, Sam Leonard, Sergey A, Susant Sahani, Sven Joachim,
    Tad Fisher, Takashi Sakamoto, Thorsten Kukuk, Tj, Tomasz Świątek,
    Topi Miettinen, Valentin David, Valentin Lefebvre,
    Victor Westerhuis, Vincent Haupert, Vishal Chillara Srinivas,
    Vito Caputo, Warren, Weblate, Xiaotian Wu, xinpeng wang,
    Yaron Shahrabani, Yo-Jung Lin, Yu Watanabe,
    Zbigniew Jędrzejewski-Szmek, zeroskyx,
    Дамјан Георгиевски, наб

    — Edinburgh, 2023-12-02

v255-rc3

5 months ago

systemd System and Service Manager

CHANGES WITH 255 in spe:

Announcements of Future Feature Removals and Incompatible Changes:

    * Support for split-usr (/usr/ mounted separately during late boot,
      instead of being mounted by the initrd before switching to the rootfs)
      and unmerged-usr (parallel directories /bin/ and /usr/bin/, /lib/ and
      /usr/lib/, …) has been removed. For more details, see:
      https://lists.freedesktop.org/archives/systemd-devel/2022-September/048352.html

    * We intend to remove cgroup v1 support from a systemd release after
      the end of 2023. If you run services that make explicit use of
      cgroup v1 features (i.e. the "legacy hierarchy" with separate
      hierarchies for each controller), please implement compatibility with
      cgroup v2 (i.e. the "unified hierarchy") sooner rather than later.
      Most of Linux userspace has been ported over already.

    * Support for System V service scripts is now deprecated and will be
      removed in a future release. Please make sure to update your software
      *now* to include a native systemd unit file instead of a legacy
      System V script to retain compatibility with future systemd releases.

    * Support for the SystemdOptions EFI variable is deprecated.
      'bootctl systemd-efi-options' will emit a warning when used. It seems
      that this feature is little-used and it is better to use alternative
      approaches like credentials and confexts. The plan is to drop support
      altogether at a later point, but this might be revisited based on
      user feedback.

    * systemd-run's switch --expand-environment= which currently is disabled
      by default when combined with --scope, will be changed in a future
      release to be enabled by default.

    * "systemctl switch-root" is now restricted to initrd transitions only.

      Transitions between real systems should be done with
      "systemctl soft-reboot" instead.

    * The "ip=off" and "ip=none" kernel command line options interpreted by
      systemd-network-generator will now result in IPv6RA + link-local
      addressing being disabled, too. Previously DHCP was turned off, but
      IPv6RA and IPv6 link-local addressing was left enabled.

    * The NAMING_BRIDGE_MULTIFUNCTION_SLOT naming scheme has been deprecated
      and is now disabled.

    * SuspendMode=, HibernateState= and HybridSleepState= in the [Sleep]
      section of systemd-sleep.conf are now deprecated and have no effect.
      They did not (and could not) take any value other than the respective
      default. HybridSleepMode= is also deprecated, and will now always use
      the 'suspend' disk mode.

Service Manager:

    * The way services are spawned has been overhauled. Previously, a
      process was forked that shared all of the manager's memory (via
      copy-on-write) while doing all the required setup (e.g.: mount
      namespaces, CGroup configuration, etc.) before exec'ing the target
      executable. This was problematic for various reasons: several glibc
      APIs were called that are not supposed to be used after a fork but
      before an exec, copy-on-write meant that if either process (the
      manager or the child) touched a memory page a copy was triggered, and
      also the memory footprint of the child process was that of the
      manager, but with the memory limits of the service. From this version
      onward, the new process is spawned using CLONE_VM and CLONE_VFORK
      semantics via posix_spawn(3), and it immediately execs a new internal
      binary, systemd-executor, that receives the configuration to apply
      via memfd, and sets up the process before exec'ing the target
      executable. The systemd-executor binary is pinned by file descriptor
      by each manager instance (system and users), and the reference is
      updated on daemon-reexec - it is thus important to reexec all running
      manager instances when the systemd-executor and/or libsystemd*
      libraries are updated on the filesystem.

    * Most of the internal process tracking is being changed to use PIDFDs
      instead of PIDs when the kernel supports it, to improve robustness
      and reliability.

    * A new option SurviveFinalKillSignal= can be used to configure the
      unit to be skipped in the final SIGTERM/SIGKILL spree on shutdown.
      This is part of the required configuration to let a unit's processes
      survive a soft-reboot operation.

    * System extension images (sysext) can now set
      EXTENSION_RELOAD_MANAGER=1 in their extension-release files to
      automatically reload the service manager (PID 1) when
      merging/refreshing/unmerging on boot. Generally, while this can be
      used to ship services in system extension images it's recommended to
      do that via portable services instead.

    * The ExtensionImages= and ExtensionDirectories= options now support
      confexts images/directories.

    * A new option NFTSet= provides a method for integrating dynamic cgroup
      IDs into firewall rules with NFT sets. The benefit of using this
      setting is to be able to use control group as a selector in firewall
      rules easily and this in turn allows more fine grained filtering.
      Also, NFT rules for cgroup matching use numeric cgroup IDs, which
      change every time a service is restarted, making them hard to use in
      systemd environment.

    * A new option CoredumpReceive= can be set for service and scope units,
      together with Delegate=yes, to make systemd-coredump on the host
      forward core files from processes crashing inside the delegated
      CGroup subtree to systemd-coredump running in the container. This new
      option is by default used by systemd-nspawn containers that use the
      "--boot" switch.

    * A new ConditionSecurity=measured-uki option is now available, to ensure
      a unit can only run when the system has been booted from a measured UKI.

    * MemoryAvailable= now considers physical memory if there are no CGroup
      memory limits set anywhere in the tree.

    * The $USER environment variable is now always set for services, while
      previously it was only set if User= was specified. A new option
      SetLoginEnvironment= is now supported to determine whether to also set
      $HOME, $LOGNAME, and $SHELL.

    * Socket units now support a new pair of
      PollLimitBurst=/PollLimitInterval= options to configure a limit on
      how often polling events on the file descriptors backing this unit
      will be considered within a time window.

    * Scope units can now be created using PIDFDs instead of PIDs to select
      the processes they should include.

    * Sending SIGRTMIN+18 with 0x500 as sigqueue() value will now cause the
      manager to dump the list of currently pending jobs.

    * If the kernel supports MOVE_MOUNT_BENEATH, the systemctl and
      machinectl bind and mount-image verbs will now cause the new mount to
      replace the old mount (if any), instead of overmounting it.

    * Units now have MemoryPeak, MemorySwapPeak, MemorySwapCurrent and
      MemoryZSwapCurrent properties, which respectively contain the values
      of the cgroup v2's memory.peak, memory.swap.peak, memory.swap.current
      and memory.zswap.current properties. This information is also show in
      "systemctl status" output, if available.

TPM2 Support + Disk Encryption & Authentication:

    * systemd-cryptenroll now allows specifying a PCR bank and explicit hash
      value in the --tpm2-pcrs= option.

    * systemd-cryptenroll now allows specifying a TPM2 key handle (nv
      index) to be used instead of the default SRK via the new
      --tpm2-seal-key-handle= option.

    * systemd-cryptenroll now allows TPM2 enrollment using only a TPM2
      public key (in TPM2B_PUBLIC format) – without access to the TPM2
      device itself – which enables offline sealing of LUKS images for a
      specific TPM2 chip, as long as the SRK public key is known. Pass the
      public to the tool via the new --tpm2-device-key= switch.

    * systemd-cryptsetup is now installed in /usr/bin/ and is no longer an
      internal-only executable.

    * The TPM2 Storage Root Key will now be set up, if not already present,
      by a new systemd-tpm2-setup.service early boot service. The SRK will
      be stored in PEM format and TPM2_PUBLIC format (the latter is useful
      for systemd-cryptenroll --tpm2-device-key=, as mentioned above) for
      easier access. A new "srk" verb has been added to systemd-analyze to
      allow extracting it on demand if it is already set up.

    * The internal systemd-pcrphase executable has been renamed to
      systemd-pcrextend.

    * The systemd-pcrextend tool gained a new --pcr= switch to override
      which PCR to measure into.

    * systemd-pcrextend now exposes a Varlink interface at
      io.systemd.PCRExtend that can be used to do measurements and event
      logging on demand.

    * TPM measurements are now also written to an event log at
      /run/log/systemd/tpm2-measure.log, using a derivative of the TCG
      Canonical Event Log format. Previously we'd only log them to the
      journal, where they however were subject to rotation and similar.

    * A new component "systemd-pcrlock" has been added that allows managing
      local TPM2 PCR policies for PCRs 0-7 and similar, which are hard to
      predict by the OS vendor because of the inherently local nature of
      what measurements they contain, such as firmware versions of the
      system and extension cards and suchlike. pcrlock can predict PCR
      measurements ahead of time based on various inputs, such as the local
      TPM2 event log, GPT partition tables, PE binaries, UKI kernels, and
      various other things. It can then pre-calculate a TPM2 policy from
      this, which it stores in an TPM2 NV index. TPM2 objects (such as disk
      encryption keys) can be locked against this NV index, so that they
      are locked against a specific combination of system firmware and
      state. Alternatives for each component are supported to allowlist
      multiple kernel versions or boot loader version simultaneously
      without losing access to the disk encryption keys. The tool can also
      be used to analyze and validate the local TPM2 event log.
      systemd-cryptsetup, systemd-cryptenroll, systemd-repart have all been
      updated to support such policies. There's currently no support for
      locking the system's root disk against a pcrlock policy, this will be
      added soon. Moreover, it is currently not possible to combine a
      pcrlock policy with a signed PCR policy. This component is
      experimental and its public interface is subject to change.

systemd-boot, systemd-stub, ukify, bootctl, kernel-install:

    * bootctl will now show whether the system was booted from a UKI in its
      status output.

    * systemd-boot and systemd-stub now use different project keys in their
      respective SBAT sections, so that they can be revoked individually if
      needed.

    * systemd-boot will no longer load unverified Devicetree blobs when UEFI
      SecureBoot is enabled. For more details see:
      https://github.com/systemd/systemd/security/advisories/GHSA-6m6p-rjcq-334c

    * systemd-boot gained new hotkeys to reboot and power off the system
      from the boot menu ("B" and "O"). If the "auto-poweroff" and
      "auto-reboot" options in loader.conf are set these entries are also
      shown as menu items (which is useful on devices lacking a regular
      keyboard).

    * systemd-boot gained a new configuration value "menu-disabled" for the
      set-timeout option, to allow completely disabling the boot menu,
      including the hotkey.

    * systemd-boot will now measure the content of loader.conf in TPM2
      PCR 5.

    * systemd-stub will now concatenate the content of all kernel
      command-line addons before measuring them in TPM2 PCR 12, in a single
      measurement, instead of measuring them individually.

    * systemd-stub will now measure and load Devicetree Blob addons, which
      are searched and loaded following the same model as the existing
      kernel command-line addons.

    * systemd-stub will now ignore unauthenticated kernel command line options
      passed from systemd-boot when running inside Confidential VMs with UEFI
      SecureBoot enabled.

    * systemd-stub will now load a Devicetree blob even if the firmware did
      not load any beforehand (e.g.: for ACPI systems).

    * ukify is no longer considered experimental, and now ships in /usr/bin/.

    * ukify gained a new verb inspect to describe the sections of a UKI and
      print the contents of the well-known sections.

    * ukify gained a new verb genkey to generate a set of key pairs for
      signing UKIs and their PCR data.

    * The 90-loaderentry kernel-install hook now supports installing device
      trees.

    * kernel-install now supports the --json=, --root=, --image=, and
      --image-policy= options for the inspect verb.

    * kernel-install now supports new list and add-all verbs. The former
      lists all installed kernel images (if those are available in
      /usr/lib/modules/). The latter will install all the kernels it can
      find to the ESP.

systemd-repart:

    * A new option --copy-from= has been added that synthesizes partition
      definitions from the given image, which are then applied by the
      systemd-repart algorithm.

    * A new option --copy-source= has been added, which can be used to specify
      a directory to which CopyFiles= is considered relative to.

    * New --make-ddi=confext, --make-ddi=sysext, and --make-ddi=portable
      options have been added to make it easier to generate these types of
      DDIs, without having to provide repart.d definitions for them.

    * The dm-verity salt and UUID will now be derived from the specified
      seed value.

    * New VerityDataBlockSizeBytes= and VerityHashBlockSizeBytes= can now be
      configured in repart.d/ configuration files.

    * A new Subvolumes= setting is now supported in repart.d/ configuration
      files, to indicate which directories in the target partition should be
      btrfs subvolumes.

    * A new --tpm2-device-key= option can be used to lock a disk against a
      specific TPM2 public key. This matches the same switch the
      systemd-cryptenroll tool now supports (see above).

Journal:

    * The journalctl --lines= parameter now accepts +N to show the oldest N
      entries instead of the newest.

    * journald now ensures that sealing happens once per epoch, and sets a
      new compatibility flag to distinguish old journal files that were
      created before this change, for backward compatibility.

Device Management:

    * udev will now create symlinks to loopback block devices in the
      /dev/disk/by-loop-ref/ directory that are based on the .lo_file_name
      string field selected during allocation. The systemd-dissect tool and
      the util-linux losetup command now supports a complementing new switch
      --loop-ref= for selecting the string. This means a loopback block
      device may now be allocated under a caller-chosen reference and can
      subsequently be referenced without first having to look up the block
      device name the caller ended up with.

    * udev also creates symlinks to loopback block devices in the
      /dev/disk/by-loop-inode/ directory based on the .st_dev/st_ino fields
      of the inode attached to the loopback block device. This means that
      attaching a file to a loopback device will implicitly make a handle
      available to be found via that file's inode information.

    * udevadm info gained support for JSON output via a new --json= flag, and
      for filtering output using the same mechanism that udevadm trigger
      already implements.

    * The predictable network interface naming logic is extended to include
      the SR-IOV-R "representor" information in network interface names.
      This feature was intended for v254, but even though the code was
      merged, the part that actually enabled the feature was forgotten.
      It is now enabled by default and is part of the new "v255" naming
      scheme.

    * A new hwdb/rules file has been added that sets the
      ID_NET_AUTO_LINK_LOCAL_ONLY=1 udev property on all network interfaces
      that should usually only be configured with link-local addressing
      (IPv4LL + IPv6LL), i.e. for PC-to-PC cables ("laplink") or
      Thunderbolt networking. systemd-networkd and NetworkManager (soon)
      will make use of this information to apply an appropriate network
      configuration by default.

    * The ID_NET_DRIVER property on network interfaces is now set
      relatively early in the udev rule set so that other rules may rely on
      its use. This is implemented in a new "net-driver" udev built-in.

Network Management:

    * The "duid-only" option for DHCPv4 client's ClientIdentifier= setting
      is now dropped, as it never worked, hence it should not be used by
      anyone.

    * The 'prefixstable' ipv6 address generation mode now considers the SSID
      when generating stable addresses, so that a different stable address
      is used when roaming between wireless networks. If you already use
      'prefixstable' addresses with wireless networks, the stable address
      will be changed by the update.

    * The DHCPv4 client gained a RapidCommit option, true by default, which
      enables RFC4039 Rapid Commit behavior to obtain a lease in a
      simplified 2-message exchange instead of the typical 4-message
      exchange, if also supported by the DHCP server.

    * The DHCPv4 client gained new InitialCongestionWindow= and
      InitialAdvertisedReceiveWindow= options for route configurations.

    * The DHCPv4 client gained a new RequestAddress= option that allows
      to send a preferred IP address in the initial DHCPDISCOVER message.

    * The DHCPv4 server and client gained support for IPv6-only mode
      (RFC8925).

    * The SendHostname= and Hostname= options are now available for the
      DHCPv6 client, independently of the DHCPv4= option, so that these
      configuration values can be set independently for each client.

    * The DHCPv4 and DHCPv6 client state can now be queried via D-Bus,
      including lease information.

    * The DHCPv6 client can now be configured to use a custom DUID type.

    * .network files gained a new IPv4ReversePathFilter= setting in the
      [Network] section, to control sysctl's rp_filter setting.

    * .network files gaiend a new HopLimit= setting in the [Route] section,
      to configure a per-route hop limit.

    * .network files gained a new TCPRetransmissionTimeoutSec= setting in
      the [Route] section, to configure a per-route TCP retransmission
      timeout.

    * A new directive NFTSet= provides a method for integrating network
      configuration into firewall rules with NFT sets. The benefit of using
      this setting is that static network configuration or dynamically
      obtained network addresses can be used in firewall rules with the
      indirection of NFT set types.

    * The [IPv6AcceptRA] section supports the following new options:
      UsePREF64=, UseHopLimit=, UseICMP6RateLimit=, and NFTSet=.

    * The [IPv6SendRA] section supports the following new options:
      RetransmitSec=, HopLimit=, HomeAgent=, HomeAgentLifetimeSec=, and
      HomeAgentPreference=.

    * A new [IPv6PREF64Prefix] set of options, containing Prefix= and
      LifetimeSec=, has been introduced to append pref64 options in router
      advertisements (RFC8781).

    * The network generator now configures the interfaces with only
      link-local addressing if "ip=link-local" is specified on the kernel
      command line.

    * The prefix of the configuration files generated by the network
      generator from the kernel command line is now prefixed with '70-',
      to make them have higher precedence over the default configuration
      files.

    * Added a new -Ddefault-network=BOOL meson option, that causes more
      .network files to be installed as enabled by default. These configuration
      files will which match generic setups, e.g. 89-ethernet.network matches
      all Ethernet interfaces and enables both DHCPv4 and DHCPv6 clients.

    * If a ID_NET_MANAGED_BY= udev property is set on a network device and
      it is any other string than "io.systemd.Network" then networkd will
      not manage this device. This may be used to allow multiple network
      management services to run in parallel and assign ownership of
      specific devices explicitly. NetworkManager will soon implement a
      similar logic.

systemctl:

    * systemctl is-failed now checks the system state if no unit is
      specified.

    * systemctl will now automatically soft-reboot if a new root file system
      is found under /run/nextroot/ when a reboot operation is invoked.

Login management:

    * Wall messages now work even when utmp support is disabled, using
      systemd-logind to query the necessary information.

    * systemd-logind now sends a new PrepareForShutdownWithMetadata D-Bus
      signal before shutdown/reboot/soft-reboot that includes additional
      information compared to the PrepareForShutdown signal. Currently the
      additional information is the type of operation that is about to be
      executed.

Hibernation & Suspend:

    * The kernel and OS versions will no longer be checked on resume from
      hibernation.

    * Hibernation into swap files backed by btrfs are now
      supported. (Previously this was supported only for other file
      systems.)

Other:

    * A new systemd-vmspawn tool has been added, that aims to provide for VMs
      the same interfaces and functionality that systemd-nspawn provides for
      containers. For now it supports QEMU as a backend, and exposes some of
      its options to the user. This component is experimental and its public
      interface is subject to change.

    * "systemd-analyze plot" has gained tooltips on each unit name with
      related-unit information in its svg output, such as Before=,
      Requires=, and similar properties.

    * A new varlinkctl tool has been added to allow interfacing with
      Varlink services, and introspection has been added to all such
      services.

    * systemd-sysext and systemd-confext now expose a Varlink service
      at io.systemd.sysext.

    * portable services now accept confexts as extensions.

    * systemd-sysupdate now accepts directories in the MatchPattern= option.

    * systemd-run will now output the invocation ID of the launched
      transient unit and its peak memory usage.

    * systemd-analyze, systemd-tmpfiles, systemd-sysusers, systemd-sysctl,
      and systemd-binfmt gained a new --tldr option that can be used instead
      of --cat-config to suppress uninteresting configuration lines, such as
      comments and whitespace.

    * resolvectl gained a new "show-server-state" command that shows
      current statistics of the resolver. This is backed by a new
      DumpStatistics() Varlink method provided by systemd-resolved.

    * systemd-timesyncd will now emit a D-Bus signal when the LinkNTPServers
      property changes.

    * vconsole now supports KEYMAP=@kernel for preserving the kernel keymap
      as-is.

    * seccomp now supports the LoongArch64 architecture.

    * seccomp may now be enabled for services running as a non-root User=
      without NoNewPrivileges=yes.

    * systemd-id128 now supports a new -P option to show only values. The
      combination of -P and --app options is also supported.

    * A new pam_systemd_loadkey.so PAM module is now available, which will
      automatically fetch the passphrase used by cryptsetup to unlock the
      root file system and set it as the PAM authtok. This enables, among
      other things, configuring auto-unlock of the GNOME Keyring / KDE
      Wallet when autologin is configured.

    * Many meson options now use the 'feature' type, which means they
      take enabled/disabled/auto as values.

    * A new meson option -Dconfigfiledir= can be used to change where
      configuration files with default values are installed to.

    * Options and verbs in man pages are now tagged with the version they
      were first introduced in.

    * A new component "systemd-storagetm" has been added, which exposes all
      local block devices as NVMe-TCP devices, fully automatically. It's
      hooked into a new target unit storage-target-mode.target that is
      suppsoed to be booted into via
      rd.systemd.unit=storage-target-mode.target on the kernel command
      line. This is intended to be used for installers and debugging to
      quickly get access to the local disk. It's inspired by MacOS "target
      disk mode".

    * A new component "systemd-bsod" has been added, which can show logged
      error messages full screen, if they have a log level of LOG_EMERG log
      level.

    * The systemd-dissect tool's --with command will now set the
      $SYSTEMD_DISSECT_DEVICE environment variable to the block device it
      operates on for the invoked process.

    * The systemd-mount tool gained a new --tmpfs switch for mounting a new
      'tmpfs' instance. This is useful since it does so via .mount units
      and thus can be executed remotely or in containers.

    * The various tools in systemd that take "verbs" (such as systemctl,
      loginctl, machinectl, …) now will suggest a close verb name in case
      the user specified an unrecognized one.

    * libsystemd now exports a new function sd_id128_get_app_specific()
      that generates "app-specific" 128bit IDs from any ID. It's similar to
      sd_id128_get_machine_app_specific() and
      sd_id128_get_boot_app_specific() but takes the ID to base calculation
      on as input. This new functionality is also exposed in the
      "systemd-id128" tool where you can now combine --app= with `show`.

    * All tools that parse timestamps now can also parse RFC3339 style
      timestamps that include the "T" and Z" characters.

    * New documentation has been added:

      https://systemd.io/FILE_DESCRIPTOR_STORE
      https://systemd.io/TPM2_PCR_MEASUREMENTS
      https://systemd.io/MOUNT_REQUIREMENTS

    * The codebase now recognizes the suffix .confext.raw and .sysext.raw
      as alternative to the .raw suffix generally accepted for DDIs. It is
      recommended to name configuration extensions and system extensions
      with such suffixes, to indicate their purpose in the name.

    * The sd-device API gained a new function
      sd_device_enumerator_add_match_property_required() which allows
      configuring matches on properties that are strictly required. This is
      different from the existing sd_device_enumerator_add_match_property()
      matches of which one one needs to apply.

    * The MAC address the veth side of an nspawn container shall get
      assigned may now be controlled via the $SYSTEMD_NSPAWN_NETWORK_MAC
      environment variable.

    * The libiptc dependency is now implemented via dlopen(), so that tools
      such as networkd and nspawn no longer have a hard dependency on the
      shared library when compiled with support for libiptc.

    * New rpm macros have been added: %systemd_user_daemon_reexec does
      daemon-reexec for all user managers, and %systemd_postun_with_reload
      and %systemd_user_postun_with_reload do a reload for system and user
      units on upgrades.

    * coredumpctl now propagates SIGTERM to the debugger process.

Contributors

    Contributions from: 김인수, Abderrahim Kitouni, Adam Williamson,
    Alexandre Peixoto Ferreira, Alex Hudspith, Alvin Alvarado,
    André Paiusco, Antonio Alvarez Feijoo, Anton Lundin,
    Arian van Putten, Arseny Maslennikov, Arthur Shau, Balázs Úr,
    beh_10257, Benjamin Peterson, Bertrand Jacquin, Brian Norris,
    Charles Lee, Cheng-Chia Tseng, Chris Patterson, Christian Hergert,
    Christian Hesse, Christian Kirbach, Clayton Craft, commondservice,
    Curtis Klein, cvlc12, Daan De Meyer, Daniele Medri,
    Daniel P. Berrangé, Daniel Rusek, Dan Nicholson, Dan Streetman,
    David Rheinsberg, David Santamaría Rogado, David Tardon,
    dependabot[bot], Diego Viola, Dmitry V. Levin,
    Emanuele Giuseppe Esposito, Emil Renner Berthing, Emil Velikov,
    Etienne Dechamps, Fabian Vogt, felixdoerre, Felix Dörre,
    Florian Schmaus, Franck Bui, Frantisek Sumsal, G2-Games,
    Gioele Barabucci, Hugo Carvalho, huyubiao, Iago López Galeiras,
    IllusionMan1212, Jade Lovelace, janana, Jan Janssen, Jan Kuparinen,
    Jan Macku, Jeremy Fleischman, Jin Liu, jjimbo137, Joerg Behrmann,
    Johannes Segitz, Jordan Rome, Jordan Williams, Julien Malka,
    Juno Computers, Khem Raj, khm, Kingbom Dou, Kiran Vemula,
    Krzesimir Nowak, Laszlo Gombos, Lennart Poettering, linuxlion,
    Luca Boccassi, Lucas Adriano Salles, Lukas, Lukáš Nykrýn,
    Maanya Goenka, Maarten, Malte Poll, Marc Pervaz Boocha,
    Martin Beneš, Martin Joerg, Martin Wilck, Mathieu Tortuyaux,
    Matthias Schiffer, Maxim Mikityanskiy, Max Kellermann,
    Michael A Cassaniti, Michael Biebl, Michael Kuhn, Michael Vasseur,
    Michal Koutný, Michal Sekletár, Mike Yuan,
    Milton D. Miller II, mordner, msizanoen, NAHO, Nandakumar Raghavan,
    Nick Rosbrook, Nils K, NRK, Oğuz Ersen, Omojola Joshua, onenowy,
    pelaufer, Peter Hutterer, PhylLu, Pierre GRASSER, Piotr Drąg,
    Priit Laes, Rahil Bhimjiani, Raito Bezarius, Raul Cheleguini,
    Reto Schneider, Richard Maw, Robby Red, RoepLuke, Roland Hieber,
    Ronan Pigott, Sam James, Sam Leonard, Sergey A, Susant Sahani,
    Sven Joachim, Tad Fisher, Takashi Sakamoto, Thorsten Kukuk, Tj,
    Tomasz Świątek, Topi Miettinen, Valentin David,
    Valentin Lefebvre, Victor Westerhuis, Vincent Haupert,
    Vishal Chillara Srinivas, Vito Caputo, Warren, Weblate,
    Xiaotian Wu, xinpeng wang, Yaron Shahrabani, Yo-Jung Lin,
    Yu Watanabe, Zbigniew Jędrzejewski-Szmek, zeroskyx, наб

    — Edinburgh, 2023-11-22

v255-rc2

5 months ago

systemd System and Service Manager

CHANGES WITH 255 in spe:

Announcements of Future Feature Removals and Incompatible Changes:

    * Support for split-usr (/usr/ mounted separately during late boot,
      instead of being mounted by the initrd before switching to the rootfs)
      and unmerged-usr (parallel directories /bin/ and /usr/bin/, /lib/ and
      /usr/lib/, …) has been removed. For more details, see:
      https://lists.freedesktop.org/archives/systemd-devel/2022-September/048352.html

    * We intend to remove cgroup v1 support from a systemd release after
      the end of 2023. If you run services that make explicit use of
      cgroup v1 features (i.e. the "legacy hierarchy" with separate
      hierarchies for each controller), please implement compatibility with
      cgroup v2 (i.e. the "unified hierarchy") sooner rather than later.
      Most of Linux userspace has been ported over already.

    * Support for System V service scripts is now deprecated and will be
      removed in a future release. Please make sure to update your software
      *now* to include a native systemd unit file instead of a legacy
      System V script to retain compatibility with future systemd releases.

    * Support for the SystemdOptions EFI variable is deprecated.
      'bootctl systemd-efi-options' will emit a warning when used. It seems
      that this feature is little-used and it is better to use alternative
      approaches like credentials and confexts. The plan is to drop support
      altogether at a later point, but this might be revisited based on
      user feedback.

    * systemd-run's switch --expand-environment= which currently is disabled
      by default when combined with --scope, will be changed in a future
      release to be enabled by default.

    * "systemctl switch-root" is now restricted to initrd transitions only.

      Transitions between real systems should be done with
      "systemctl soft-reboot" instead.

    * The "ip=off" and "ip=none" kernel command line options interpreted by
      systemd-network-generator will now result in IPv6RA + link-local
      addressing being disabled, too. Previously DHCP was turned off, but
      IPv6RA and IPv6 link-local addressing was left enabled.

    * The NAMING_BRIDGE_MULTIFUNCTION_SLOT naming scheme has been deprecated
      and is now disabled.

    * SuspendMode=, HibernateState= and HybridSleepState= in the [Sleep]
      section of systemd-sleep.conf are now deprecated and have no effect.
      They did not (and could not) take any value other than the respective
      default. HybridSleepMode= is also deprecated, and will now always use
      the 'suspend' disk mode.

Service Manager:

    * The way services are spawned has been overhauled. Previously, a
      process was forked that shared all of the manager's memory (via
      copy-on-write) while doing all the required setup (e.g.: mount
      namespaces, CGroup configuration, etc.) before exec'ing the target
      executable. This was problematic for various reasons: several glibc
      APIs were called that are not supposed to be used after a fork but
      before an exec, copy-on-write meant that if either process (the
      manager or the child) touched a memory page a copy was triggered, and
      also the memory footprint of the child process was that of the
      manager, but with the memory limits of the service. From this version
      onward, the new process is spawned using CLONE_VM and CLONE_VFORK
      semantics via posix_spawn(3), and it immediately execs a new internal
      binary, systemd-executor, that receives the configuration to apply
      via memfd, and sets up the process before exec'ing the target
      executable.

    * Most of the internal process tracking is being changed to use PIDFDs
      instead of PIDs when the kernel supports it, to improve robustness
      and reliability.

    * A new option SurviveFinalKillSignal= can be used to configure the
      unit to be skipped in the final SIGTERM/SIGKILL spree on shutdown.
      This is part of the required configuration to let a unit's processes
      survive a soft-reboot operation.

    * System extension images (sysext) can now set
      EXTENSION_RELOAD_MANAGER=1 in their extension-release files to
      automatically reload the service manager (PID 1) when
      merging/refreshing/unmerging on boot. Generally, while this can be
      used to ship services in system extension images it's recommended to
      do that via portable services instead.

    * The ExtensionImages= and ExtensionDirectories= options now support
      confexts images/directories.

    * A new option NFTSet= provides a method for integrating dynamic cgroup
      IDs into firewall rules with NFT sets. The benefit of using this
      setting is to be able to use control group as a selector in firewall
      rules easily and this in turn allows more fine grained filtering.
      Also, NFT rules for cgroup matching use numeric cgroup IDs, which
      change every time a service is restarted, making them hard to use in
      systemd environment.

    * A new option CoredumpReceive= can be set for service and scope units,
      together with Delegate=yes, to make systemd-coredump on the host
      forward core files from processes crashing inside the delegated
      CGroup subtree to systemd-coredump running in the container. This new
      option is by default used by systemd-nspawn containers that use the
      "--boot" switch.

    * A new ConditionSecurity=measured-uki option is now available, to ensure
      a unit can only run when the system has been booted from a measured UKI.

    * MemoryAvailable= now considers physical memory if there are no CGroup
      memory limits set anywhere in the tree.

    * The $USER environment variable is now always set for services, while
      previously it was only set if User= was specified. A new option
      SetLoginEnvironment= is now supported to determine whether to also set
      $HOME, $LOGNAME, and $SHELL.

    * Socket units now support a new pair of
      PollLimitBurst=/PollLimitInterval= options to configure a limit on
      how often polling events on the file descriptors backing this unit
      will be considered within a time window.

    * Scope units can now be created using PIDFDs instead of PIDs to select
      the processes they should include.

    * Sending SIGRTMIN+18 with 0x500 as sigqueue() value will now cause the
      manager to dump the list of currently pending jobs.

    * If the kernel supports MOVE_MOUNT_BENEATH, the systemctl and
      machinectl bind and mount-image verbs will now cause the new mount to
      replace the old mount (if any), instead of overmounting it.

    * Units now have MemoryPeak, MemorySwapPeak, MemorySwapCurrent and
      MemoryZSwapCurrent properties, which respectively contain the values
      of the cgroup v2's memory.peak, memory.swap.peak, memory.swap.current
      and memory.zswap.current properties. This information is also show in
      "systemctl status" output, if available.

TPM2 Support + Disk Encryption & Authentication:

    * systemd-cryptenroll now allows specifying a PCR bank and explicit hash
      value in the --tpm2-pcrs= option.

    * systemd-cryptenroll now allows specifying a TPM2 key handle (nv
      index) to be used instead of the default SRK via the new
      --tpm2-seal-key-handle= option.

    * systemd-cryptenroll now allows TPM2 enrollment using only a TPM2
      public key (in TPM2B_PUBLIC format) – without access to the TPM2
      device itself – which enables offline sealing of LUKS images for a
      specific TPM2 chip, as long as the SRK public key is known. Pass the
      public to the tool via the new --tpm2-device-key= switch.

    * systemd-cryptsetup is now installed in /usr/bin/ and is no longer an
      internal-only executable.

    * The TPM2 Storage Root Key will now be set up, if not already present,
      by a new systemd-tpm2-setup.service early boot service. The SRK will
      be stored in PEM format and TPM2_PUBLIC format (the latter is useful
      for systemd-cryptenroll --tpm2-device-key=, as mentioned above) for
      easier access. A new "srk" verb has been added to systemd-analyze to
      allow extracting it on demand if it is already set up.

    * The internal systemd-pcrphase executable has been renamed to
      systemd-pcrextend.

    * The systemd-pcrextend tool gained a new --pcr= switch to override
      which PCR to measure into.

    * systemd-pcrextend now exposes a Varlink interface at
      io.systemd.PCRExtend that can be used to do measurements and event
      logging on demand.

    * TPM measurements are now also written to an event log at
      /run/log/systemd/tpm2-measure.log, using a derivative of the TCG
      Canonical Event Log format. Previously we'd only log them to the
      journal, where they however were subject to rotation and similar.

    * A new component "systemd-pcrlock" has been added that allows managing
      local TPM2 PCR policies for PCRs 0-7 and similar, which are hard to
      predict by the OS vendor because of the inherently local nature of
      what measurements they contain, such as firmware versions of the
      system and extension cards and suchlike. pcrlock can predict PCR
      measurements ahead of time based on various inputs, such as the local
      TPM2 event log, GPT partition tables, PE binaries, UKI kernels, and
      various other things. It can then pre-calculate a TPM2 policy from
      this, which it stores in an TPM2 NV index. TPM2 objects (such as disk
      encryption keys) can be locked against this NV index, so that they
      are locked against a specific combination of system firmware and
      state. Alternatives for each component are supported to allowlist
      multiple kernel versions or boot loader version simultaneously
      without losing access to the disk encryption keys. The tool can also
      be used to analyze and validate the local TPM2 event log.
      systemd-cryptsetup, systemd-cryptenroll, systemd-repart have all been
      updated to support such policies. There's currently no support for
      locking the system's root disk against a pcrlock policy, this will be
      added soon. Moreover, it is currently not possible to combine a
      pcrlock policy with a signed PCR policy. This component is
      experimental and its public interface is subject to change.

systemd-boot, systemd-stub, ukify, bootctl, kernel-install:

    * bootctl will now show whether the system was booted from a UKI in its
      status output.

    * systemd-boot and systemd-stub now use different project keys in their
      respective SBAT sections, so that they can be revoked individually if
      needed.

    * systemd-boot will no longer load unverified Devicetree blobs when UEFI
      SecureBoot is enabled. For more details see:
      https://github.com/systemd/systemd/security/advisories/GHSA-6m6p-rjcq-334c

    * systemd-boot gained new hotkeys to reboot and power off the system
      from the boot menu ("B" and "O"). If the "auto-poweroff" and
      "auto-reboot" options in loader.conf are set these entries are also
      shown as menu items (which is useful on devices lacking a regular
      keyboard).

    * systemd-boot gained a new configuration value "menu-disabled" for the
      set-timeout option, to allow completely disabling the boot menu,
      including the hotkey.

    * systemd-boot will now measure the content of loader.conf in TPM2
      PCR 5.

    * systemd-stub will now concatenate the content of all kernel
      command-line addons before measuring them in TPM2 PCR 12, in a single
      measurement, instead of measuring them individually.

    * systemd-stub will now measure and load Devicetree Blob addons, which
      are searched and loaded following the same model as the existing
      kernel command-line addons.

    * systemd-stub will now ignore unauthenticated kernel command line options
      passed from systemd-boot when running inside Confidential VMs with UEFI
      SecureBoot enabled.

    * systemd-stub will now load a Devicetree blob even if the firmware did
      not load any beforehand (e.g.: for ACPI systems).

    * ukify is no longer considered experimental, and now ships in /usr/bin/.

    * ukify gained a new verb inspect to describe the sections of a UKI and
      print the contents of the well-known sections.

    * ukify gained a new verb genkey to generate a set of of key pairs for
      signing UKIs and their PCR data.

    * The 90-loaderentry kernel-install hook now supports installing device
      trees.

    * kernel-install now supports the --json=, --root=, --image=, and
      --image-policy= options for the inspect verb.

    * kernel-install now supports new list and add-all verbs. The former
      lists all installed kernel images (if those are available in
      /usr/lib/modules/). The latter will install all the kernels it can
      find to the ESP.

systemd-repart:

    * A new option --copy-from= has been added that synthesizes partition
      definitions from the given image, which are then applied by the
      systemd-repart algorithm.

    * A new option --copy-source= has been added, which can be used to specify
      a directory to which CopyFiles= is considered relative to.

    * New --make-ddi=confext, --make-ddi=sysext, and --make-ddi=portable
      options have been added to make it easier to generate these types of
      DDIs, without having to provide repart.d definitions for them.

    * The dm-verity salt and UUID will now be derived from the specified
      seed value.

    * New VerityDataBlockSizeBytes= and VerityHashBlockSizeBytes= can now be
      configured in repart.d/ configuration files.

    * A new Subvolumes= setting is now supported in repart.d/ configuration
      files, to indicate which directories in the target partition should be
      btrfs subvolumes.

    * A new --tpm2-device-key= option can be used to lock a disk against a
      specific TPM2 public key. This matches the same switch the
      systemd-cryptenroll tool now supports (see above).

Journal:

    * The journalctl --lines= parameter now accepts +N to show the oldest N
      entries instead of the newest.

    * journald now ensures that sealing happens once per epoch, and sets a
      new compatibility flag to distinguish old journal files that were
      created before this change, for backward compatibility.

Device Management:

    * udev will now create symlinks to loopback block devices in the
      /dev/disk/by-loop-ref/ directory that are based on the .lo_file_name
      string field selected during allocation. The systemd-dissect tool and
      the util-linux losetup command now supports a complementing new switch
      --loop-ref= for selecting the string. This means a loopback block
      device may now be allocated under a caller-chosen reference and can
      subsequently be referenced without first having to look up the block
      device name the caller ended up with.

    * udev also creates symlinks to loopback block devices in the
      /dev/disk/by-loop-inode/ directory based on the .st_dev/st_ino fields
      of the inode attached to the loopback block device. This means that
      attaching a file to a loopback device will implicitly make a handle
      available to be found via that file's inode information.

    * udevadm info gained support for JSON output via a new --json= flag, and
      for filtering output using the same mechanism that udevadm trigger
      already implements.

    * The predictable network interface naming logic is extended to include
      the SR-IOV-R "representor" information in network interface names.
      This feature was intended for v254, but even though the code was
      merged, the part that actually enabled the feature was forgotten.
      It is now enabled by default and is part of the new "v255" naming
      scheme.

    * A new hwdb/rules file has been added that sets the
      ID_NET_AUTO_LINK_LOCAL_ONLY=1 udev property on all network interfaces
      that should usually only be configured with link-local addressing
      (IPv4LL + IPv6LL), i.e. for PC-to-PC cables ("laplink") or
      Thunderbolt networking. systemd-networkd and NetworkManager (soon)
      will make use of this information to apply an appropriate network
      configuration by default.

    * The ID_NET_DRIVER property on network interfaces is now set
      relatively early in the udev rule set so that other rules may rely on
      its use. This is implemented in a new "net-driver" udev built-in.

Network Management:

    * The "duid-only" option for DHCPv4 client's ClientIdentifier= setting
      is now dropped, as it never worked, hence it should not be used by
      anyone.

    * The 'prefixstable' ipv6 address generation mode now considers the SSID
      when generating stable addresses, so that a different stable address
      is used when roaming between wireless networks. If you already use
      'prefixstable' addresses with wireless networks, the stable address
      will be changed by the update.

    * The DHCPv4 client gained a RapidCommit option, true by default, which
      enables RFC4039 Rapid Commit behavior to obtain a lease in a
      simplified 2-message exchange instead of the typical 4-message
      exchange, if also supported by the DHCP server.

    * The DHCPv4 client gained new InitialCongestionWindow= and
      InitialAdvertisedReceiveWindow= options for route configurations.

    * The DHCPv4 client gained a new RequestAddress= option that allows
      to send a preferred IP address in the initial DHCPDISCOVER message.

    * The DHCPv4 server and client gained support for IPv6-only mode
      (RFC8925).

    * The SendHostname= and Hostname= options are now available for the
      DHCPv6 client, independently of the DHCPv4= option, so that these
      configuration values can be set independently for each client.

    * The DHCPv4 and DHCPv6 client state can now be queried via D-Bus,
      including lease information.

    * The DHCPv6 client can now be configured to use a custom DUID type.

    * .network files gained a new IPv4ReversePathFilter= setting in the
      [Network] section, to control sysctl's rp_filter setting.

    * .network files gaiend a new HopLimit= setting in the [Route] section,
      to configure a per-route hop limit.

    * .network files gained a new TCPRetransmissionTimeoutSec= setting in
      the [Route] section, to configure a per-route TCP retransmission
      timeout.

    * A new directive NFTSet= provides a method for integrating network
      configuration into firewall rules with NFT sets. The benefit of using
      this setting is that static network configuration or dynamically
      obtained network addresses can be used in firewall rules with the
      indirection of NFT set types.

    * The [IPv6AcceptRA] section supports the following new options:
      UsePREF64=, UseHopLimit=, UseICMP6RateLimit=, and NFTSet=.

    * The [IPv6SendRA] section supports the following new options:
      RetransmitSec=, HopLimit=, HomeAgent=, HomeAgentLifetimeSec=, and
      HomeAgentPreference=.

    * A new [IPv6PREF64Prefix] set of options, containing Prefix= and
      LifetimeSec=, has been introduced to append pref64 options in router
      advertisements (RFC8781).

    * The network generator now configures the interfaces with only
      link-local addressing if "ip=link-local" is specified on the kernel
      command line.

    * The prefix of the configuration files generated by the network
      generator from the kernel command line is now prefixed with '70-',
      to make them have higher precedence over the default configuration
      files.

    * Added a new -Ddefault-network=BOOL meson option, that causes more
      .network files to be installed as enabled by default. These configuration
      files will which match generic setups, e.g. 89-ethernet.network matches
      all Ethernet interfaces and enables both DHCPv4 and DHCPv6 clients.

    * If a ID_NET_MANAGED_BY= udev property is set on a network device and
      it is any other string than "io.systemd.Network" then networkd will
      not manage this device. This may be used to allow multiple network
      management services to run in parallel and assign ownership of
      specific devices explicitly. NetworkManager will soon implement a
      similar logic.

systemctl:

    * systemctl is-failed now checks the system state if no unit is
      specified.

    * systemctl will now automatically soft-reboot if a new root file system
      is found under /run/nextroot/ when a reboot operation is invoked.

Login management:

    * Wall messages now work even when utmp support is disabled, using
      systemd-logind to query the necessary information.

    * systemd-logind now sends a new PrepareForShutdownWithMetadata D-Bus
      signal before shutdown/reboot/soft-reboot that includes additional
      information compared to the PrepareForShutdown signal. Currently the
      additional information is the type of operation that is about to be
      executed.

Hibernation & Suspend:

    * The kernel and OS versions will no longer be checked on resume from
      hibernation.

    * Hibernation into swap files backed by btrfs are now
      supported. (Previously this was supported only for other file
      systems.)

Other:

    * A new systemd-vmspawn tool has been added, that aims to provide for VMs
      the same interfaces and functionality that systemd-nspawn provides for
      containers. For now it supports QEMU as a backend, and exposes some of
      its options to the user. This component is experimental and its public
      interface is subject to change.

    * "systemd-analyze plot" has gained tooltips on each unit name with
      related-unit information in its svg output, such as Before=,
      Requires=, and similar properties.

    * A new varlinkctl tool has been added to allow interfacing with
      Varlink services, and introspection has been added to all such
      services.

    * systemd-sysext and systemd-confext now expose a Varlink service
      at io.systemd.sysext.

    * portable services now accept confexts as extensions.

    * systemd-sysupdate now accepts directories in the MatchPattern= option.

    * systemd-run will now output the invocation ID of the launched
      transient unit and its peak memory usage.

    * systemd-analyze, systemd-tmpfiles, systemd-sysusers, systemd-sysctl,
      and systemd-binfmt gained a new --tldr option that can be used instead
      of --cat-config to suppress uninteresting configuration lines, such as
      comments and whitespace.

    * resolvectl gained a new "show-server-state" command that shows
      current statistics of the resolver. This is backed by a new
      DumpStatistics() Varlink method provided by systemd-resolved.

    * systemd-timesyncd will now emit a D-Bus signal when the LinkNTPServers
      property changes.

    * vconsole now supports KEYMAP=@kernel for preserving the kernel keymap
      as-is.

    * seccomp now supports the LoongArch64 architecture.

    * seccomp may now be enabled for services running as a non-root User=
      without NoNewPrivileges=yes.

    * systemd-id128 now supports a new -P option to show only values. The
      combination of -P and --app options is also supported.

    * A new pam_systemd_loadkey.so PAM module is now available, which will
      automatically fetch the passphrase used by cryptsetup to unlock the
      root file system and set it as the PAM authtok. This enables, among
      other things, configuring auto-unlock of the GNOME Keyring / KDE
      Wallet when autologin is configured.

    * Many meson options now use the 'feature' type, which means they
      take enabled/disabled/auto as values.

    * A new meson option -Dconfigfiledir= can be used to change where
      configuration files with default values are installed to.

    * Options and verbs in man pages are now tagged with the version they
      were first introduced in.

    * A new component "systemd-storagetm" has been added, which exposes all
      local block devices as NVMe-TCP devices, fully automatically. It's
      hooked into a new target unit storage-target-mode.target that is
      suppsoed to be booted into via
      rd.systemd.unit=storage-target-mode.target on the kernel command
      line. This is intended to be used for installers and debugging to
      quickly get access to the local disk. It's inspired by MacOS "target
      disk mode".

    * A new component "systemd-bsod" has been added, which can show logged
      error messages full screen, if they have a log level of LOG_EMERG log
      level.

    * The systemd-dissect tool's --with command will now set the
      $SYSTEMD_DISSECT_DEVICE environment variable to the block device it
      operates on for the invoked process.

    * The systemd-mount tool gained a new --tmpfs switch for mounting a new
      'tmpfs' instance. This is useful since it does so via .mount units
      and thus can be executed remotely or in containers.

    * The various tools in systemd that take "verbs" (such as systemctl,
      loginctl, machinectl, …) now will suggest a close verb name in case
      the user specified an unrecognized one.

    * libsystemd now exports a new function sd_id128_get_app_specific()
      that generates "app-specific" 128bit IDs from any ID. It's similar to
      sd_id128_get_machine_app_specific() and
      sd_id128_get_boot_app_specific() but takes the ID to base calculation
      on as input. This new functionality is also exposed in the
      "systemd-id128" tool where you can now combine --app= with `show`.

    * All tools that parse timestamps now can also parse RFC3339 style
      timestamps that include the "T" and Z" characters.

    * New documentation has been added:

      https://systemd.io/FILE_DESCRIPTOR_STORE
      https://systemd.io/TPM2_PCR_MEASUREMENTS
      https://systemd.io/MOUNT_REQUIREMENTS

    * The codebase now recognizes the suffix .confext.raw and .sysext.raw
      as alternative to the .raw suffix generally accepted for DDIs. It is
      recommended to name configuration extensions and system extensions
      with such suffixes, to indicate their purpose in the name.

    * The sd-device API gained a new function
      sd_device_enumerator_add_match_property_required() which allows
      configuring matches on properties that are strictly required. This is
      different from the existing sd_device_enumerator_add_match_property()
      matches of which one one needs to apply.

    * The MAC address the veth side of an nspawn container shall get
      assigned may now be controlled via the $SYSTEMD_NSPAWN_NETWORK_MAC
      environment variable.

    * The libiptc dependency is now implemented via dlopen(), so that tools
      such as networkd and nspawn no longer have a hard dependency on the
      shared library when compiled with support for libiptc.

    * New rpm macros have been added: %systemd_user_daemon_reexec does
      daemon-reexec for all user managers, and %systemd_postun_with_reload
      and %systemd_user_postun_with_reload do a reload for system and user
      units on upgrades.

    * coredumpctl now propagates SIGTERM to the debugger process.

Contributors:

    Contributions from: 김인수, Abderrahim Kitouni, Adam Williamson,
    Alexandre Peixoto Ferreira, Alex Hudspith, Alvin Alvarado,
    André Paiusco, Antonio Alvarez Feijoo, Anton Lundin,
    Arseny Maslennikov, Arthur Shau, Balázs Úr, beh_10257,
    Benjamin Peterson, Bertrand Jacquin, Brian Norris,
    Cheng-Chia Tseng, Chris Patterson, Christian Hergert,
    Christian Hesse, Christian Kirbach, Clayton Craft, commondservice,
    Curtis Klein, cvlc12, Daan De Meyer, Daniele Medri,
    Daniel P. Berrangé, Daniel Rusek, Dan Nicholson, Dan Streetman,
    David Rheinsberg, David Santamaría Rogado, David Tardon,
    dependabot[bot], Diego Viola, Dmitry V. Levin,
    Emanuele Giuseppe Esposito, Emil Renner Berthing, Emil Velikov,
    Etienne Dechamps, Fabian Vogt, felixdoerre, Felix Dörre,
    Florian Schmaus, Franck Bui, Frantisek Sumsal, G2-Games,
    Gioele Barabucci, Hugo Carvalho, huyubiao, Iago López Galeiras,
    IllusionMan1212, Jade Lovelace, janana, Jan Janssen, Jan Kuparinen,
    Jan Macku, Jeremy Fleischman, Jin Liu, jjimbo137, Joerg Behrmann,
    Johannes Segitz, Jordan Rome, Jordan Williams, Julien Malka,
    Juno Computers, Khem Raj, khm, Kingbom Dou, Kiran Vemula,
    Laszlo Gombos, Lennart Poettering, Luca Boccassi,
    Lucas Adriano Salles, Lukas, Lukáš Nykrýn, Maanya Goenka,
    Maarten, Malte Poll, Marc Pervaz Boocha, Martin Beneš,
    Martin Wilck, Mathieu Tortuyaux, Matthias Schiffer,
    Maxim Mikityanskiy, Max Kellermann, Michael A Cassaniti,
    Michael Biebl, Michael Kuhn, Michael Vasseur, Michal Koutný,
    Michal Sekletár, Mike Yuan, Milton D. Miller II, mordner,
    msizanoen, NAHO, Nandakumar Raghavan, Nick Rosbrook, NRK,
    Oğuz Ersen, Omojola Joshua, pelaufer, Peter Hutterer, PhylLu,
    Pierre GRASSER, Piotr Drąg, Priit Laes, Rahil Bhimjiani,
    Raito Bezarius, Raul Cheleguini, Reto Schneider, Richard Maw,
    Robby Red, RoepLuke, Roland Hieber, Ronan Pigott, Sam James,
    Sam Leonard, Sergey A, Susant Sahani, Sven Joachim, Tad Fisher,
    Takashi Sakamoto, Thorsten Kukuk, Tj, Tomasz Świątek,
    Topi Miettinen, Valentin David, Valentin Lefebvre,
    Victor Westerhuis, Vincent Haupert, Vishal Chillara Srinivas,
    Vito Caputo, Warren, Xiaotian Wu, xinpeng wang, Yu Watanabe,
    Zbigniew Jędrzejewski-Szmek, zeroskyx, наб

    — Edinburgh, 2023-11-15

v255-rc1

6 months ago

systemd System and Service Manager

CHANGES WITH 255 in spe:

Announcements of Future Feature Removals and Incompatible Changes:

    * Support for split-usr (/usr/ mounted separately during late boot,
      instead of being mounted by the initrd before switching to the rootfs)
      and unmerged-usr (parallel directories /bin/ and /usr/bin/, /lib/ and
      /usr/lib/, …) has been removed. For more details, see:
      https://lists.freedesktop.org/archives/systemd-devel/2022-September/048352.html

    * We intend to remove cgroup v1 support from a systemd release after
      the end of 2023. If you run services that make explicit use of
      cgroup v1 features (i.e. the "legacy hierarchy" with separate
      hierarchies for each controller), please implement compatibility with
      cgroup v2 (i.e. the "unified hierarchy") sooner rather than later.
      Most of Linux userspace has been ported over already.

    * Support for System V service scripts is now deprecated and will be
      removed in a future release. Please make sure to update your software
      *now* to include a native systemd unit file instead of a legacy
      System V script to retain compatibility with future systemd releases.

    * Support for the SystemdOptions EFI variable is deprecated.
      'bootctl systemd-efi-options' will emit a warning when used. It seems
      that this feature is little-used and it is better to use alternative
      approaches like credentials and confexts. The plan is to drop support
      altogether at a later point, but this might be revisited based on
      user feedback.

    * systemd-run's switch --expand-environment= which currently is disabled
      by default when combined with --scope, will be changed in a future
      release to be enabled by default.

    * "systemctl switch-root" is now restricted to initrd transitions only.

      Transitions between real systems should be done with
      "systemctl soft-reboot" instead.

    * The "ip=off" and "ip=none" kernel command line options interpreted by
      systemd-network-generator will now result in IPv6RA + link-local
      addressing being disabled, too. Previously DHCP was turned off, but
      IPv6RA and IPv6 link-local addressing was left enabled.

    * The NAMING_BRIDGE_MULTIFUNCTION_SLOT naming scheme has been deprecated
      and is now disabled.

    * SuspendMode=, HibernateState= and HybridSleepState= in the [Sleep]
      section of systemd-sleep.conf are now deprecated and have no effect.
      They did not (and could not) take any value other than the respective
      default. HybridSleepMode= is also deprecated, and will now always use
      the 'suspend' disk mode.

Service Manager:

    * The way services are spawned has been overhauled. Previously, a
      process was forked that shared all of the manager's memory (via
      copy-on-write) while doing all the required setup (e.g.: mount
      namespaces, CGroup configuration, etc.) before exec'ing the target
      executable. This was problematic for various reasons: several glibc
      APIs were called that are not supposed to be used after a fork but
      before an exec, copy-on-write meant that if either process (the
      manager or the child) touched a memory page a copy was triggered, and
      also the memory footprint of the child process was that of the
      manager, but with the memory limits of the service. From this version
      onward, the new process is spawned using CLONE_VM and CLONE_VFORK
      semantics via posix_spawn(3), and it immediately execs a new internal
      binary, systemd-executor, that receives the configuration to apply
      via memfd, and sets up the process before exec'ing the target
      executable.

    * Most of the internal process tracking is being changed to use PIDFDs
      instead of PIDs when the kernel supports it, to improve robustness
      and reliability.

    * A new option SurviveFinalKillSignal= can be used to configure the
      unit to be skipped in the final SIGTERM/SIGKILL spree on shutdown.
      This is part of the required configuration to let a unit's processes
      survive a soft-reboot operation.

    * System extension images (sysext) can now set
      EXTENSION_RELOAD_MANAGER=1 in their extension-release files to
      automatically reload the service manager (PID 1) when
      merging/refreshing/unmerging on boot. Generally, while this can be
      used to ship services in system extension images it's recommended to
      do that via portable services instead.

    * The ExtensionImages= and ExtensionDirectories= options now support
      confexts images/directories.

    * A new option NFTSet= provides a method for integrating dynamic cgroup
      IDs into firewall rules with NFT sets. The benefit of using this
      setting is to be able to use control group as a selector in firewall
      rules easily and this in turn allows more fine grained filtering.
      Also, NFT rules for cgroup matching use numeric cgroup IDs, which
      change every time a service is restarted, making them hard to use in
      systemd environment.

    * A new option CoredumpReceive= can be set for service and scope units,
      together with Delegate=yes, to make systemd-coredump on the host
      forward core files from processes crashing inside the delegated
      CGroup subtree to systemd-coredump running in the container. This new
      option is by default used by systemd-nspawn containers that use the
      "--boot" switch.

    * A new ConditionSecurity=measured-uki option is now available, to ensure
      a unit can only run when the system has been booted from a measured UKI.

    * MemoryAvailable= now considers physical memory if there are no CGroup
      memory limits set anywhere in the tree.

    * The $USER environment variable is now always set for services, while
      previously it was only set if User= was specified. A new option
      SetLoginEnvironment= is now supported to determine whether to also set
      $HOME, $LOGNAME, and $SHELL.

    * Socket units now support a new pair of
      PollLimitBurst=/PollLimitInterval= options to configure a limit on
      how often polling events on the file descriptors backing this unit
      will be considered within a time window.

    * Scope units can now be created using PIDFDs instead of PIDs to select
      the processes they should include.

    * Sending SIGRTMIN+18 with 0x500 as sigqueue() value will now cause the
      manager to dump the list of currently pending jobs.

    * If the kernel supports MOVE_MOUNT_BENEATH, the systemctl and
      machinectl bind and mount-image verbs will now cause the new mount to
      replace the old mount (if any), instead of overmounting it.

TPM2 Support + Disk Encryption & Authentication:

    * systemd-cryptenroll now allows specifying a PCR bank and explicit hash
      value in the --tpm2-pcrs= option.

    * systemd-cryptenroll now allows specifying a TPM2 key handle to be used
      instead of the default SRK via the new --tpm2-seal-key-handle= option.

    * systemd-cryptsetup is now installed in /usr/bin/ and is no longer an
      internal-only executable.

    * The TPM2 Storage Root Key will now be set up, if not already present,
      by a new systemd-tpm2-setup.service early boot service.

    * The internal systemd-pcrphase executable has been renamed to
      systemd-pcrextend.

    * The systemd-pcrextend tool gained a new --pcr= switch to override
      which PCR to measure into.

    * systemd-pcrextend now exposes a Varlink interface at
      io.systemd.PCRExtend that can be used to do measurements and event
      logging on demand.

    * TPM measurements are now also written to an event log at
      /run/log/systemd/tpm2-measure.log, using a derivative of the TCG
      Canonical Event Log format. Previously we'd only log them to the
      journal, where they however were subject to rotation and similar.

    * A new component "systemd-pcrlock" has been added that allows managing
      local TPM2 PCR policies for PCRs 0-7 and similar, which are hard to
      predict by the OS vendor because of the inherently local nature of
      what measurements they contain, such as firmware versions of the
      system and extension cards and suchlike. pcrlock can predict PCR
      measurements ahead of time based on various inputs, such as the local
      TPM2 event log, GPT partition tables, PE binaries, UKI kernels, and
      various other things. It can then pre-calculate a TPM2 policy from
      this, which it stores in an TPM2 NV index. TPM2 objects (such as disk
      encryption keys) can be locked against this NV index, so that they
      are locked against a specific combination of system firmware and
      state. Alternatives for each component are supported to allowlist
      multiple kernel versions or boot loader version simultaneously
      without losing access to the disk encryption keys. The tool can also
      be used to analyze and validate the local TPM2 event
      log. systemd-cryptsetup, systemd-cryptenroll, systemd-repart have all
      been updated to support such policies. There's currently no support
      for locking the system's root disk against a pcrlock policy, this
      will be added soon. Moreover, it is currently not possible to combine
      a pcrlock policy with a signed PCR policy. This component is
      experimental and its public interface is subject to change.

systemd-boot, systemd-stub, ukify, bootctl, kernel-install:

    * bootctl will now show whether the system was booted from a UKI in its
      status output.

    * systemd-boot and systemd-stub now use different project keys in their
      respective SBAT sections, so that they can be revoked individually if
      needed.

    * systemd-boot will no longer load unverified Devicetree blobs when UEFI
      SecureBoot is enabled. For more details see:
      https://github.com/systemd/systemd/security/advisories/GHSA-6m6p-rjcq-334c

    * systemd-boot gained new hotkeys to reboot and power off the system
      from the boot menu ("B" and "O"). If the "auto-poweroff" and
      "auto-reboot" options in loader.conf are set these entries are also
      shown as menu items (which is useful on devices lacking a regular
      keyboard).

    * systemd-boot gained a new configuration value "menu-disabled" for the
      set-timeout option, to allow completely disabling the boot menu,
      including the hotkey.

    * systemd-boot will now measure the content of loader.conf in TPM2 PCR
      5.

    * systemd-stub will now concatenate the content of all kernel
      command-line addons before measuring them in TPM2 PCR 12, in a single
      measurement, instead of measuring them individually.

    * systemd-stub will now measure and load Devicetree Blob addons, which
      are searched and loaded following the same model as the existing
      kernel command-line addons.

    * systemd-stub will now ignore unauthenticated kernel command line options
      passed from systemd-boot when running inside Confidential VMs with UEFI
      SecureBoot enabled.

    * ukify is no longer considered experimental, and now ships in /usr/bin/.

    * ukify gained a new verb inspect to describe the sections of a UKI and
      print the contents of the well-known sections.

    * ukify gained a new verb genkey to generate a set of of key pairs for
      signing UKIs and their PCR data.

    * The 90-loaderentry kernel-install hook now supports installing device
      trees.

systemd-repart:

    * A new option --copy-from= has been added that synthesizes partition
      definitions from the given image, which are then applied by the
      systemd-repart algorithm.

    * A new option --copy-source= has been added, which can be used to specify
      a directory to which CopyFiles= is considered relative to.

    * New --make-ddi=confext, --make-ddi=sysext, and --make-ddi=portable
      options have been added to make it easier to generate these types of
      DDIs, without having to provide repart.d definitions for them.

    * The dm-verity salt and UUID will now be derived from the specified
      seed value.

    * New VerityDataBlockSizeBytes= and VerityHashBlockSizeBytes= can now be
      configured in repart.d/ configuration files.

    * A new Subvolumes= setting is now supported in repart.d/ configuration
      files, to indicate which directories in the target partition should be
      btrfs subvolumes.

Journal:

    * The journalctl --lines= parameter now accepts +N to show the oldest N
      entries instead of the newest.

Device Management:

    * udev will now create symlinks to loopback block devices in the
      /dev/disk/by-loop-ref/ directory that are based on the .lo_file_name
      string field selected during allocation. The systemd-dissect tool and
      the util-linux losetup command now supports a complementing new switch
      --loop-ref= for selecting the string. This means a loopback block
      device may now be allocated under a caller-chosen reference and can
      subsequently be referenced without first having to look up the block
      device name the caller ended up with.

    * udev also creates symlinks to loopback block devices in the
      /dev/disk/by-loop-inode/ directory based on the .st_dev/st_ino fields
      of the inode attached to the loopback block device. This means that
      attaching a file to a loopback device will implicitly make a handle
      available to be found via that file's inode information.

    * udevadm info gained support for JSON output via a new --json= flag, and
      for filtering output using the same mechanism that udevadm trigger
      already implements.

    * The predictable network interface naming logic is extended to include
      the SR-IOV-R "representor" information in network interface names.
      This feature was intended for v254, but even though the code was
      merged, the part that actually enabled the feature was forgotten.
      It is now enabled by default and is part of the new "v255" naming
      scheme.

    * A new hwdb/rules file has been added that sets the
      ID_NET_AUTO_LINK_LOCAL_ONLY=1 udev property on all network interfaces
      that should usually only be configured with link-local addressing
      (IPv4LL + IPv6LL), i.e. for PC-to-PC cables ("laplink") or
      Thunderbolt networking. systemd-networkd and NetworkManager (soon)
      will make use of this information to apply an appropriate network
      configuration by default.

    * The ID_NET_DRIVER property on network interfaces is now set
      relatively early in the udev rule set so that other rules may rely on
      its use. This is implemented in a new "net-driver" udev built-in.

Network Management:

    * The "duid-only" option for DHCPv4 client's ClientIdentifier= setting
      is now dropped, as it never worked, hence it should not be used by
      anyone.

    * The 'prefixstable' ipv6 address generation mode now considers the SSID
      when generating stable addresses, so that a different stable address
      is used when roaming between wireless networks. If you already use
      'prefixstable' addresses with wireless networks, the stable address
      will be changed by the update.

    * The DHCPv4 client gained a RapidCommit option, true by default, which
      enables RFC4039 Rapid Commit behavior to obtain a lease in a
      simplified 2-message exchange instead of the typical 4-message
      exchange, if also supported by the DHCP server.

    * The DHCPv4 client gained new InitialCongestionWindow= and
      InitialAdvertisedReceiveWindow= options for route configurations.

    * The DHCPv4 client gained a new RequestAddress= option that allows
      to send a preferred IP address in the initial DHCPDISCOVER message.

    * The DHCPv4 server and client gained support for IPv6-only mode
      (RFC8925).

    * The SendHostname= and Hostname= options are now available for the
      DHCPv6 client, independently of the DHCPv4= option, so that these
      configuration values can be set independently for each client.

    * The DHCPv4 and DHCPv6 client state can now be queried via D-Bus,
      including lease information.

    * The DHCPv6 client can now be configured to use a custom DUID type.

    * .network files gained a new IPv4ReversePathFilter= setting in the
      [Network] section, to control sysctl's rp_filter setting.

    * .network files gaiend a new HopLimit= setting in the [Route] section,
      to configure a per-route hop limit.

    * .network files gained a new TCPRetransmissionTimeoutSec= setting in
      the [Route] section, to configure a per-route TCP retransmission
      timeout.

    * A new directive NFTSet= provides a method for integrating network
      configuration into firewall rules with NFT sets. The benefit of using
      this setting is that static network configuration or dynamically
      obtained network addresses can be used in firewall rules with the
      indirection of NFT set types.

    * The [IPv6AcceptRA] section supports the following new options:
      UsePREF64=, UseHopLimit=, UseICMP6RateLimit=, and NFTSet=.

    * The [IPv6SendRA] section supports the following new options:
      RetransmitSec=, HopLimit=, HomeAgent=, HomeAgentLifetimeSec=, and
      HomeAgentPreference=.

    * A new [IPv6PREF64Prefix] set of options, containing Prefix= and
      LifetimeSec=, has been introduced to append pref64 options in router
      advertisements (RFC8781).

    * The network generator now configures the interfaces with only
      link-local addressing if "ip=link-local" is specified on the kernel
      command line.

    * The prefix of the configuration files generated by the network
      generator from the kernel command line is now prefixed with '70-',
      to make them have higher precedence over the default configuration
      files.

    * Added a new -Ddefault-network=BOOL meson option, that causes more
      .network files to be installed as enabled by default. These configuration
      files will which match generic setups, e.g. 89-ethernet.network matches
      all Ethernet interfaces and enables both DHCPv4 and DHCPv6 clients.

    * If a ID_NET_MANAGED_BY= udev property is set on a network device and
      it is any other string than "io.systemd.Network" then networkd will
      not manage this device. This may be used to allow multiple network
      management services to run in parallel and assign ownership of
      specific devices explicitly. NetworkManager will soon implement a
      similar logic.

systemctl:

    * systemctl is-failed now checks the system state if no unit is
      specified.

    * systemctl will now automatically soft-reboot if a new root file system
      is found under /run/nextroot/ when a reboot operation is invoked.

Login management:

    * Wall messages now work even when utmp support is disabled, using
      systemd-logind to query the necessary information.

    * systemd-logind now sends a new PrepareForShutdownWithMetadata D-Bus
      signal before shutdown/reboot/soft-reboot that includes additional
      information compared to the PrepareForShutdown signal. Currently the
      additional information is the type of operation that is about to be
      executed.

Hibernation & Suspend:

    * The kernel and OS versions will no longer be checked on resume from
      hibernation.

    * Hibernation into swap files backed by btrfs are now
      supported. (Previously this was supported only for other file
      systems.)

Other:

    * A new systemd-vmspawn tool has been added, that aims to provide for VMs
      the same interfaces and functionality that systemd-nspawn provides for
      containers. For now it supports QEMU as a backend, and exposes some of
      its options to the user. This component is experimental and its public
      interface is subject to change.

    * "systemd-analyze plot" has gained tooltips on each unit name with
      related-unit information in its svg output, such as Before=,
      Requires=, and similar properties.

    * A new varlinkctl tool has been added to allow interfacing with
      Varlink services, and introspection has been added to all such
      services.

    * systemd-sysext and systemd-confext now expose a Varlink service
      at io.systemd.sysext.

    * portable services now accept confexts as extensions.

    * systemd-sysupdate now accepts directories in the MatchPattern= option.

    * systemd-run will now output the invocation ID of the launched
      transient unit.

    * systemd-analyze, systemd-tmpfiles, systemd-sysusers, systemd-sysctl,
      and systemd-binfmt gained a new --tldr option that can be used instead
      of --cat-config to suppress uninteresting configuration lines, such as
      comments and whitespace.

    * resolvectl gained a new "show-server-state" command that shows
      current statistics of the resolver. This is backed by a new
      DumpStatistics() Varlink method provided by systemd-resolved.

    * systemd-timesyncd will now emit a D-Bus signal when the LinkNTPServers
      property changes.

    * vconsole now supports KEYMAP=@kernel for preserving the kernel keymap
      as-is.

    * seccomp now supports the LoongArch64 architecture.

    * systemd-id128 now supports a new -P option to show only values. The
      combination of -P and --app options is also supported.

    * A new pam_systemd_loadkey.so PAM module is now available, which will
      automatically fetch the passphrase used by cryptsetup to unlock the
      root file system and set it as the PAM authtok. This enables, among
      other things, configuring auto-unlock of the GNOME Keyring / KDE
      Wallet when autologin is configured.

    * Many meson options now use the 'feature' type, which means they
      take enabled/disabled/auto as values.

    * A new meson option -Dconfigfiledir= can be used to change where
      configuration files with default values are installed to.

    * Options and verbs in man pages are now tagged with the version they
      were first introduced in.

    * A new component "systemd-storagetm" has been added, which exposes all
      local block devices as NVMe-TCP devices, fully automatically. It's
      hooked into a new target unit storage-target-mode.target that is
      suppsoed to be booted into via
      rd.systemd.unit=storage-target-mode.target on the kernel command
      line. This is intended to be used for installers and debugging to
      quickly get access to the local disk. It's inspired by MacOS "target
      disk mode".

    * A new component "systemd-bsod" has been added, which can show logged
      error messages full screen, if they have a log level of LOG_EMERG log
      level.

    * The systemd-dissect tool's --with command will now set the
      $SYSTEMD_DISSECT_DEVICE environment variable to the block device it
      operates on for the invoked process.

    * The systemd-mount tool gained a new --tmpfs switch for mounting a new
      'tmpfs' instance. This is useful since it does so via .mount units
      and thus can be executed remotely or in containers.

    * The various tools in systemd that take "verbs" (such as systemctl,
      loginctl, machinectl, …) now will suggest a close verb name in case
      the user specified an unrecognized one.

    * libsystemd now exports a new function sd_id128_get_app_specific()
      that generates "app-specific" 128bit IDs from any ID. It's similar to
      sd_id128_get_machine_app_specific() and
      sd_id128_get_boot_app_specific() but takes the ID to base calculation
      on as input. This new functionality is also exposed in the
      "systemd-id128" tool where you can now combine --app= with `show`.

    * All tools that parse timestamps now can also parse RFC3339 style
      timestamps that include the "T" and Z" characters.

    * New documentation has been added:

      https://systemd.io/FILE_DESCRIPTOR_STORE
      https://systemd.io/TPM2_PCR_MEASUREMENTS
      https://systemd.io/MOUNT_REQUIREMENTS.md

    * The codebase now recognizes the suffix .confext.raw and .sysext.raw
      as alternative to the .raw suffix generally accepted for DDIs. It is
      recommended to name configuration extensions and system extensions
      with such suffixes, to indicate their purpose in the name.

    * The sd-device API gained a new function
      sd_device_enumerator_add_match_property_required() which allows
      configuring matches on properties that are strictly required. This is
      different from the existing sd_device_enumerator_add_match_property()
      matches of which one one needs to apply.

    * The MAC address the veth side of an nspawn container shall get
      assigned may now be controlled via the $SYSTEMD_NSPAWN_NETWORK_MAC
      environment variable.

    * The libiptc dependency is now implemented via dlopen(), so that tools
      such as networkd and nspawn no longer have a hard dependency on the
      shared library when compiled with support for libiptc.

    * New rpm macros have been added: %systemd_user_daemon_reexec does
      daemon-reexec for all user managers, and %systemd_postun_with_reload
      and %systemd_user_postun_with_reload do a reload for system and user
      units on upgrades.

Contributors

    Contributions from: 김인수, Abderrahim Kitouni, Adam Williamson,
    Alexandre Peixoto Ferreira, Alex Hudspith, Alvin Alvarado,
    André Paiusco, Antonio Alvarez Feijoo, Anton Lundin,
    Arseny Maslennikov, Arthur Shau, Balázs Úr, beh_10257,
    Benjamin Peterson, Bertrand Jacquin, Brian Norris, Chris Patterson,
    Christian Hergert, Christian Hesse, Christian Kirbach,
    commondservice, Curtis Klein, cvlc12, Daan De Meyer,
    Daniel P. Berrangé, Daniel Rusek, Dan Streetman,
    David Rheinsberg, David Santamaría Rogado, David Tardon,
    dependabot[bot], Dmitry V. Levin, Emanuele Giuseppe Esposito,
    Emil Renner Berthing, Emil Velikov, Etienne Dechamps, Fabian Vogt,
    felixdoerre, Franck Bui, Frantisek Sumsal, G2-Games,
    Gioele Barabucci, Hugo Carvalho, huyubiao, IllusionMan1212,
    Jade Lovelace, janana, Jan Janssen, Jan Kuparinen, Jan Macku,
    Jin Liu, Joerg Behrmann, Johannes Segitz, Jordan Rome,
    Jordan Williams, Julien Malka, Juno Computers, Khem Raj, khm,
    Kingbom Dou, Kiran Vemula, Laszlo Gombos, Lennart Poettering,
    Luca Boccassi, Lucas Adriano Salles, Lukas, Lukáš Nykrýn,
    Maanya Goenka, Maarten, Malte Poll, Marc Pervaz Boocha,
    Martin Beneš, Martin Wilck, Mathieu Tortuyaux, Matthias Schiffer,
    Maxim Mikityanskiy, Max Kellermann, Michael A Cassaniti,
    Michael Biebl, Michael Kuhn, Michael Vasseur, Michal Koutný,
    Michal Sekletár, Mike Yuan, Milton D. Miller II, mordner,
    msizanoen, NAHO, Nandakumar Raghavan, Nick Rosbrook, NRK,
    Oğuz Ersen, Omojola Joshua, pelaufer, Peter Hutterer, PhylLu,
    Pierre GRASSER, Piotr Drąg, Priit Laes, Rahil Bhimjiani,
    Raito Bezarius, Raul Cheleguini, Reto Schneider, Richard Maw,
    Robby Red, RoepLuke, Roland Hieber, Ronan Pigott, Sam James,
    Sam Leonard, Sergey A, Susant Sahani, Sven Joachim,
    Takashi Sakamoto, Thorsten Kukuk, Tj, Tomasz Świątek,
    Topi Miettinen, Valentin David, Valentin Lefebvre,
    Victor Westerhuis, Vincent Haupert, Vishal Chillara Srinivas,
    Warren, Xiaotian Wu, xinpeng wang, Yu Watanabe,
    Zbigniew Jędrzejewski-Szmek, наб

    — Edinburgh, 2023-11-06

v254

9 months ago

systemd System and Service Manager

CHANGES WITH 254:

Announcements of Future Feature Removals and Incompatible Changes:

    * The next release (v255) will remove support for split-usr (/usr/
      mounted separately during late boot, instead of being mounted by the
      initrd before switching to the rootfs) and unmerged-usr (parallel
      directories /bin/ and /usr/bin/, /lib/ and /usr/lib/, …). For more
      details, see:
      https://lists.freedesktop.org/archives/systemd-devel/2022-September/048352.html

    * We intend to remove cgroup v1 support from a systemd release after
      the end of 2023. If you run services that make explicit use of
      cgroup v1 features (i.e. the "legacy hierarchy" with separate
      hierarchies for each controller), please implement compatibility with
      cgroup v2 (i.e. the "unified hierarchy") sooner rather than later.
      Most of Linux userspace has been ported over already.

    * Support for System V service scripts is now deprecated and will be
      removed in a future release. Please make sure to update your software
      *now* to include a native systemd unit file instead of a legacy
      System V script to retain compatibility with future systemd releases.

    * Support for the SystemdOptions EFI variable is deprecated.
      'bootctl systemd-efi-options' will emit a warning when used. It seems
      that this feature is little-used and it is better to use alternative
      approaches like credentials and confexts. The plan is to drop support
      altogether at a later point, but this might be revisited based on
      user feedback.

    * EnvironmentFile= now treats the line following a comment line
      trailing with escape as a non comment line. For details, see:
      https://github.com/systemd/systemd/issues/27975

    * PrivateNetwork=yes and NetworkNamespacePath= now imply
      PrivateMounts=yes unless PrivateMounts=no is explicitly specified.

    * Behaviour of sandboxing options for the per-user service manager
      units has changed. They now imply PrivateUsers=yes, which means user
      namespaces will be implicitly enabled when a sandboxing option is
      enabled in a user unit. Enabling user namespaces has the drawback
      that system users will no longer be visible (and processes/files will
      appear as owned by 'nobody') in the user unit.

      By definition a sandboxed user unit should run with reduced
      privileges, so impact should be small. This will remove a great
      source of confusion that has been reported by users over the years,
      due to how these options require an extra setting to be manually
      enabled when used in the per-user service manager, which is not
      needed in the system service manager. For more details, see:
      https://lists.freedesktop.org/archives/systemd-devel/2022-December/048682.html

    * systemd-run's switch --expand-environment= which currently is disabled
      by default when combined with --scope, will be changed in a future
      release to be enabled by default.

Security Relevant Changes:

    * pam_systemd will now by default pass the CAP_WAKE_ALARM ambient
      process capability to invoked session processes of regular users on
      local seats (as well as to systemd --user), unless configured
      otherwise via data from JSON user records, or via the PAM module's
      parameter list. This is useful in order allow desktop tools such as
      GNOME's Alarm Clock application to set a timer for
      CLOCK_REALTIME_ALARM that wakes up the system when it elapses. A
      per-user service unit file may thus use AmbientCapability= to pass
      the capability to invoked processes. Note that this capability is
      relatively narrow in focus (in particular compared to other process
      capabilities such as CAP_SYS_ADMIN) and we already — by default —
      permit more impactful operations such as system suspend to local
      users.

Service Manager:

    * Memory limits that apply while the unit is activating are now
      supported. Previously IO and CPU settings were already supported via
      StartupCPUWeight= and similar. The same logic has been added for the
      various manager and unit memory settings (DefaultStartupMemoryLow=,
      StartupMemoryLow=, StartupMemoryHigh=, StartupMemoryMax=,
      StartupMemorySwapMax=, StartupMemoryZSwapMax=).

    * The service manager gained support for enqueuing POSIX signals to
      services that carry an additional integer value, exposing the
      sigqueue() system call. This is accessible via new D-Bus calls
      org.freedesktop.systemd1.Manager.QueueSignalUnit() and
      org.freedesktop.systemd1.Unit.QueueSignal(), as well as in systemctl
      via the new --kill-value= option.

    * systemctl gained a new "list-paths" verb, which shows all currently
      active .path units, similarly to how "systemctl list-timers" shows
      active timers, and "systemctl list-sockets" shows active sockets.

    * systemctl gained a new --when= switch which is honoured by the various
      forms of shutdown (i.e. reboot, kexec, poweroff, halt) and allows
      scheduling these operations by time, similar in fashion to how this
      has been supported by SysV shutdown.

    * If MemoryDenyWriteExecute= is enabled for a service and the kernel
      supports the new PR_SET_MDWE prctl() call, it is used instead of the
      seccomp()-based system call filter to achieve the same effect.

    * A new set of kernel command line options is now understood:
      systemd.tty.term.<name>=, systemd.tty.rows.<name>=,
      systemd.tty.columns.<name>= allow configuring the TTY type and
      dimensions for the tty specified via <name>. When systemd invokes a
      service on a tty (via TTYName=) it will look for these and configure
      the TTY accordingly. This is particularly useful in VM environments
      to propagate host terminal settings into the appropriate TTYs of the
      guest.

    * A new RootEphemeral= setting is now understood in service units. It
      takes a boolean argument. If enabled for services that use RootImage=
      or RootDirectory= an ephemeral copy of the disk image or directory
      tree is made when the service is started. It is removed automatically
      when the service is stopped. That ephemeral copy is made using
      btrfs/xfs reflinks or btrfs snapshots, if available.

    * The service activation logic gained new settings RestartSteps= and
      RestartMaxDelaySec= which allow exponentially-growing restart
      intervals for Restart=.

    * The service activation logic gained a new setting RestartMode= which
      can be set to 'direct' to skip the inactive/failed states when
      restarting, so that dependent units are not notified until the service
      converges to a final (successful or failed) state. For example, this
      means that OnSuccess=/OnFailure= units will not be triggered until the
      service state has converged.

    * PID 1 will now automatically load the virtio_console kernel module
      during early initialization if running in a suitable VM. This is done
      so that early-boot logging can be written to the console if available.

    * Similarly, virtio-vsock support is loaded early in suitable VM
      environments. PID 1 will send sd_notify() notifications via AF_VSOCK
      to the VMM if configured, thus loading this early is beneficial.

    * A new verb "fdstore" has been added to systemd-analyze to show the
      current contents of the file descriptor store of a unit. This is
      backed by a new D-Bus call DumpUnitFileDescriptorStore() provided by
      the service manager.

    * The service manager will now set a new $FDSTORE environment variable
      when invoking processes for services that have the file descriptor
      store enabled.

    * A new service option FileDescriptorStorePreserve= has been added that
      allows tuning the life-cycle of the per-service file descriptor
      store. If set to "yes", the entries in the fd store are retained even
      after the service has been fully stopped.

    * The "systemctl clean" command may now be used to clear the fdstore of
      a service.

    * Unit *.preset files gained a new directive "ignore", in addition to
      the existing "enable" and "disable". As the name suggests, matching
      units are left unchanged, i.e. neither enabled nor disabled.

    * Service units gained a new setting DelegateSubgroup=. It takes the
      name of a sub-cgroup to place any processes the service manager forks
      off in. Previously, the service manager would place all service
      processes directly in the top-level cgroup it created for the
      service. This usually meant that main process in a service with
      delegation enabled would first have to create a subgroup and move
      itself down into it, in order to not conflict with the "no processes
      in inner cgroups" rule of cgroup v2. With this option, this step is
      now handled by PID 1.

    * The service manager will now look for .upholds/ directories,
      similarly to the existing support for .wants/ and .requires/
      directories. Symlinks in this directory result in Upholds=
      dependencies.

      The [Install] section of unit files gained support for a new
      UpheldBy= directive to generate .upholds/ symlinks automatically when
      a unit is enabled.

    * The service manager now supports a new kernel command line option
      systemd.default_device_timeout_sec=, which may be used to override
      the default timeout for .device units.

    * A new "soft-reboot" mechanism has been added to the service manager.
      A "soft reboot" is similar to a regular reboot, except that it
      affects userspace only: the service manager shuts down any running
      services and other units, then optionally switches into a new root
      file system (mounted to /run/nextroot/), and then passes control to a
      systemd instance in the new file system which then starts the system
      up again. The kernel is not rebooted and neither is the hardware,
      firmware or boot loader. This provides a fast, lightweight mechanism
      to quickly reset or update userspace, without the latency that a full
      system reset involves. Moreover, open file descriptors may be passed
      across the soft reboot into the new system where they will be passed
      back to the originating services. This allows pinning resources
      across the reboot, thus minimizing grey-out time further. This new
      reboot mechanism is accessible via the new "systemctl soft-reboot"
      command.

    * Services using RootDirectory= or RootImage= will now have read-only
      access to a copy of the host's os-release file under
      /run/host/os-release, which will be kept up-to-date on 'soft-reboot'.
      This was already the case for Portable Services, and the feature has
      now been extended to all services that do not run off the host's
      root filesystem.

    * A new service setting MemoryKSM= has been added to enable kernel
      same-page merging individually for services.

    * A new service setting ImportCredentials= has been added that augments
      LoadCredential= and LoadCredentialEncrypted= and searches for
      credentials to import from the system, and supports globbing.

    * A new job mode "restart-dependencies" has been added to the service
      manager (exposed via systemctl --job-mode=). It is only valid when
      used with "start" jobs, and has the effect that the "start" job will
      be propagated as "restart" jobs to currently running units that have
      a BindsTo= or Requires= dependency on the started unit.

    * A new verb "whoami" has been added to "systemctl" which determines as
      part of which unit the command is being invoked. It writes the unit
      name to standard output. If one or more PIDs are specified reports
      the unit names the processes referenced by the PIDs belong to.

    * The system and service credential logic has been improved: there's
      now a clearly defined place where system provisioning tools running
      in the initrd can place credentials that will be imported into the
      system's set of credentials during the initrd → host transition: the
      /run/credentials/@initrd/ directory. Once the credentials placed
      there are imported into the system credential set they are deleted
      from this directory, and the directory itself is deleted afterwards
      too.

    * A new kernel command line option systemd.set_credential_binary= has
      been added, that is similar to the pre-existing
      systemd.set_credential= but accepts arbitrary binary credential data,
      encoded in Base64. Note that the kernel command line is not a
      recommend way to transfer credentials into a system, since it is
      world-readable from userspace.

    * The default machine ID to use may now be configured via the
      system.machine_id system credential. It will only be used if no
      machine ID was set yet on the host.

    * On Linux kernel 6.4 and newer system and service credentials will now
      be placed in a tmpfs instance that has the "noswap" mount option
      set. Previously, a "ramfs" instance was used. By switching to tmpfs
      ACL support and overall size limits can now be enforced, without
      compromising on security, as the memory is never paged out either
      way.

    * The service manager now can detect when it is running in a
      'Confidential Virtual Machine', and a corresponding 'cvm' value is now
      accepted by ConditionSecurity= for units that want to conditionalize
      themselves on this. systemd-detect-virt gained new 'cvm' and
      '--list-cvm' switches to respectively perform the detection or list
      all known flavours of confidential VM, depending on the vendor. The
      manager will publish a 'ConfidentialVirtualization' D-Bus property,
      and will also set a SYSTEMD_CONFIDENTIAL_VIRTUALIZATION= environment
      variable for unit generators. Finally, udev rules can match on a new
      'cvm' key that will be set when in a confidential VM.
      Additionally, when running in a 'Confidential Virtual Machine', SMBIOS
      strings and QEMU's fw_cfg protocol will not be used to import
      credentials and kernel command line parameters by the system manager,
      systemd-boot and systemd-stub, because the hypervisor is considered
      untrusted in this particular setting.

Journal:

    * The sd-journal API gained a new call sd_journal_get_seqnum() to
      retrieve the current log record's sequence number and sequence number
      ID, which allows applications to order records the same way as
      journal does internally. The sequence number is now also exported in
      the JSON and "export" output of the journal.

    * journalctl gained a new switch --truncate-newline. If specified
      multi-line log records will be truncated at the first newline,
      i.e. only the first line of each log message will be shown.

    * systemd-journal-upload gained support for --namespace=, similar to
      the switch of the same name of journalctl.

systemd-repart:

    * systemd-repart's drop-in files gained a new ExcludeFiles= option which
      may be used to exclude certain files from the effect of CopyFiles=.

    * systemd-repart's Verity support now implements the Minimize= setting
      to minimize the size of the resulting partition.

    * systemd-repart gained a new --offline= switch, which may be used to
      control whether images shall be built "online" or "offline",
      i.e. whether to make use of kernel facilities such as loopback block
      devices and device mapper or not.

    * If systemd-repart is told to populate a newly created ESP or XBOOTLDR
      partition with some files, it will now default to VFAT rather than
      ext4.

    * systemd-repart gained a new --architecture= switch. If specified, the
      per-architecture GPT partition types (i.e. the root and /usr/
      partitions) configured in the partition drop-in files are
      automatically adjusted to match the specified CPU architecture, in
      order to simplify cross-architecture DDI building.

    * systemd-repart will now default to a minimum size of 300MB for XFS
      filesystems if no size parameter is specified. This matches what the
      XFS tools (xfsprogs) can support.

systemd-boot, systemd-stub, ukify, bootctl, kernel-install:

    * gnu-efi is no longer required to build systemd-boot and systemd-stub.
      Instead, pyelftools is now needed, and it will be used to perform the
      ELF -> PE relocations at build time.

    * bootctl gained a new switch --print-root-device/-R that prints the
      block device the root file system is backed by. If specified twice,
      it returns the whole disk block device (as opposed to partition block
      device) the root file system is on. It's useful for invocations such
      as "cfdisk $(bootctl -RR)" to quickly show the partition table of the
      running OS.

    * systemd-stub will now look for the SMBIOS Type 1 field
      "io.systemd.stub.kernel-cmdline-extra" and append its value to the
      kernel command line it invokes. This is useful for VMMs such as qemu
      to pass additional kernel command lines into the system even when
      booting via full UEFI. The contents of the field are measured into
      TPM PCR 12.

    * The KERNEL_INSTALL_LAYOUT= setting for kernel-install gained a new
      value "auto". With this value, a kernel will be automatically
      analyzed, and if it qualifies as UKI, it will be installed as if the
      setting was to set to "uki", otherwise as "bls".

    * systemd-stub can now optionally load UEFI PE "add-on" images that may
      contain additional kernel command line information. These "add-ons"
      superficially look like a regular UEFI executable, and are expected
      to be signed via SecureBoot/shim. However, they do not actually
      contain code, but instead a subset of the PE sections that UKIs
      support. They are supposed to provide a way to extend UKIs with
      additional resources in a secure and authenticated way. Currently,
      only the .cmdline PE section may be used in add-ons, in which case
      any specified string is appended to the command line embedded into
      the UKI itself. A new 'addon<EFI-ARCH>.efi.stub' is now provided that
      can be used to trivially create addons, via 'ukify' or 'objcopy'. In
      the future we expect other sections to be made extensible like this as
      well.

    * ukify has been updated to allow building these UEFI PE "add-on"
      images, using the new 'addon<EFI-ARCH>.efi.stub'.

    * ukify gained a new "genkey" verb for generating a set of of key pairs
      to sign UKIs and their PCR data with.

    * ukify now accepts SBAT information to place in the .sbat PE section
      of UKIs and addons. If a UKI is built the SBAT information from the
      inner kernel is merged with any SBAT information associated with
      systemd-stub and the SBAT data specified on the ukify command line.

    * The kernel-install script has been rewritten in C, and reuses much of
      the infrastructure of existing tools such as bootctl. It also gained
      --esp-path= and --boot-path= options to override the path to the ESP,
      and the $BOOT partition. Options --make-entry-directory= and
      --entry-token= have been added as well, similar to bootctl's options
      of the same name.

    * A new kernel-install plugin 60-ukify has been added which will
      combine kernel/initrd locally into a UKI and optionally sign them
      with a local key. This may be used to switch to UKI mode even on
      systems where a local kernel or initrd is used. (Typically UKIs are
      built and signed by the vendor.)

    * The ukify tool now supports "pesign" in addition to the pre-existing
      "sbsign" for signing UKIs.

    * systemd-measure and systemd-stub now look for the .uname PE section
      that should contain the kernel's "uname -r" string.

    * systemd-measure and ukify now calculate expected PCR hashes for a UKI
      "offline", i.e. without access to a TPM (physical or
      software-emulated).

Memory Pressure & Control:

    * The sd-event API gained new calls sd_event_add_memory_pressure(),
      sd_event_source_set_memory_pressure_type(),
      sd_event_source_set_memory_pressure_period() to create and configure
      an event source that is called whenever the OS signals memory
      pressure. Another call sd_event_trim_memory() is provided that
      compacts the process' memory use by releasing allocated but unused
      malloc() memory back to the kernel. Services can also provide their
      own custom callback to do memory trimming. This should improve system
      behaviour under memory pressure, as on Linux traditionally provided
      no mechanism to return process memory back to the kernel if the
      kernel was under memory pressure. This makes use of the kernel's PSI
      interface. Most long-running services in systemd have been hooked up
      with this, and in particular systems with low memory should benefit
      from this.

    * Service units gained new settings MemoryPressureWatch= and
      MemoryPressureThresholdSec= to configure the PSI memory pressure
      logic individually. If these options are used, the
      $MEMORY_PRESSURE_WATCH and $MEMORY_PRESSURE_WRITE environment
      variables will be set for the invoked processes to inform them about
      the requested memory pressure behaviour. (This is used by the
      aforementioned sd-events API additions, if set.)

    * systemd-analyze gained a new "malloc" verb that shows the output
      generated by glibc's malloc_info() on services that support it. Right
      now, only the service manager has been updated accordingly. This
      call requires privileges.

User & Session Management:

    * The sd-login API gained a new call sd_session_get_username() to
      return the user name of the owner of a login session. It also gained
      a new call sd_session_get_start_time() to retrieve the time the login
      session started. A new call sd_session_get_leader() has been added to
      return the PID of the "leader" process of a session. A new call
      sd_uid_get_login_time() returns the time since the specified user has
      most recently been continuously logged in with at least one session.

    * JSON user records gained a new set of fields capabilityAmbientSet and
      capabilityBoundingSet which contain a list of POSIX capabilities to
      set for the logged in users in the ambient and bounding sets,
      respectively. homectl gained the ability to configure these two sets
      for users via --capability-bounding-set=/--capability-ambient-set=.

    * pam_systemd learnt two new module options
      default-capability-bounding-set= and default-capability-ambient-set=,
      which configure the default bounding sets for users as they are
      logging in, if the JSON user record doesn't specify this explicitly
      (see above). The built-in default for the ambient set now contains
      the CAP_WAKE_ALARM, thus allowing regular users who may log in
      locally to resume from a system suspend via a timer.

    * The Session D-Bus objects systemd-logind gained a new SetTTY() method
      call to update the TTY of a session after it has been allocated. This
      is useful for SSH sessions which are typically allocated first, and
      for which a TTY is added later.

    * The sd-login API gained a new call sd_pid_notifyf_with_fds() which
      combines the various other sd_pid_notify() flavours into one: takes a
      format string, an overriding PID, and a set of file descriptors to
      send. It also gained a new call sd_pid_notify_barrier() call which is
      equivalent to sd_notify_barrier() but allows the originating PID to
      be specified.

    * "loginctl list-users" and "loginctl list-sessions" will now show the
      state of each logged in user/session in their tabular output. It will
      also show the current idle state of sessions.

DDIs:

    * systemd-dissect will now show the intended CPU architecture of an
      inspected DDI.

    * systemd-dissect will now install itself as mount helper for the "ddi"
      pseudo-file system type. This means you may now mount DDIs directly
      via /bin/mount or /etc/fstab, making full use of embedded Verity
      information and all other DDI features.

      Example: mount -t ddi myimage.raw /some/where

    * The systemd-dissect tool gained the new switches --attach/--detach to
      attach/detach a DDI to a loopback block device without mounting it.
      It will automatically derive the right sector size from the image
      and set up Verity and similar, but not mount the file systems in it.

    * When systemd-gpt-auto-generator or the DDI mounting logic mount an
      ESP or XBOOTLDR partition the MS_NOSYMFOLLOW mount option is now
      implied. Given that these file systems are typically untrusted, this
      should make mounting them automatically have less of a security
      impact.

    * All tools that parse DDIs (such as systemd-nspawn, systemd-dissect,
      systemd-tmpfiles, …) now understand a new switch --image-policy= which
      takes a string encoding image dissection policy. With this mechanism
      automatic discovery and use of specific partition types and the
      cryptographic requirements on the partitions (Verity, LUKS, …) can be
      restricted, permitting better control of the exposed attack surfaces
      when mounting disk images. systemd-gpt-auto-generator will honour such
      an image policy too, configurable via the systemd.image_policy= kernel
      command line option. Unit files gained the RootImagePolicy=,
      MountImagePolicy= and ExtensionImagePolicy= to configure the same for
      disk images a service runs off.

    * systemd-analyze gained a new verb "image-policy" to validate and
      parse image policy strings.

    * systemd-dissect gained support for a new --validate switch to
      superficially validate DDI structure, and check whether a specific
      image policy allows the DDI.

    * systemd-dissect gained support for a new --mtree-hash switch to
      optionally disable calculating mtree hashes, which can be slow on
      large images.

    * systemd-dissect --copy-to, --copy-from, --list and --mtree switches
      are now able to operate on directories too, other than images.

Network Management:

    * networkd's GENEVE support as gained a new .network option
      InheritInnerProtocol=.

    * The [Tunnel] section in .netdev files has gained a new setting
      IgnoreDontFragment for controlling the IPv4 "DF" flag of datagrams.

    * A new global IPv6PrivacyExtensions= setting has been added that
      selects the default value of the per-network setting of the same
      name.

    * The predictable network interface naming logic will now include
      SR-IOV-R "representor" information in network interface names.

    * The DHCPv4 + DHCPv6 + IPv6 RA logic in networkd gained support for
      the RFC8910 captive portal option.

Device Management:

    * udevadm gained the new "verify" verb for validating udev rules files
      offline.

    * udev gained a new tool "iocost" that can be used to configure QoS IO
      cost data based on hwdb information onto suitable block devices. Also
      see https://github.com/iocost-benchmark/iocost-benchmarks.

TPM2 Support + Disk Encryption & Authentication:

    * systemd-cryptenroll/systemd-cryptsetup will now install a TPM2 SRK
      ("Storage Root Key") as first step in the TPM2, and then use that
      for binding FDE to, if TPM2 support is used. This matches
      recommendations of TCG (see
      https://trustedcomputinggroup.org/wp-content/uploads/TCG-TPM-v2.0-Provisioning-Guidance-Published-v1r1.pdf)

    * systemd-cryptenroll and other tools that take TPM2 PCR parameters now
      understand textual identifiers for these PCRs.

    * systemd-veritysetup + /etc/veritytab gained support for a series of
      new options: hash-offset=, superblock=, format=, data-block-size=,
      hash-block-size=, data-blocks=, salt=, uuid=, hash=, fec-device=,
      fec-offset=, fec-roots= to configure various aspects of a Verity
      volume.

    * systemd-cryptsetup + /etc/crypttab gained support for a new
      veracrypt-pim= option for setting the Personal Iteration Multiplier
      of veracrypt volumes.

    * systemd-integritysetup + /etc/integritytab gained support for a new
      mode= setting for controlling the dm-integrity mode (journal, bitmap,
      direct) for the volume.

    * systemd-analyze gained a new verb "pcrs" that shows the known TPM PCR
      registers, their symbolic names and current values.

systemd-tmpfiles:

    * The ACL support in tmpfiles.d/ has been updated: if an uppercase "X"
      access right is specified this is equivalent to "x" but only if the
      inode in question already has the executable bit set for at least
      some user/group. Otherwise the "x" bit will be turned off.

    * tmpfiles.d/'s C line type now understands a new modifier "+": a line
      with C+ will result in a "merge" copy, i.e. all files of the source
      tree are copied into the target tree, even if that tree already
      exists, resulting in a combined tree of files already present in the
      target tree and those copied in.

    * systemd-tmpfiles gained a new --graceful switch. If specified lines
      with unknown users/groups will silently be skipped.

systemd-notify:

    * systemd-notify gained two new options --fd= and --fdname= for sending
      arbitrary file descriptors to the service manager (while specifying an
      explicit name for it).

    * systemd-notify gained a new --exec switch, which makes it execute the
      specified command line after sending the requested messages. This is
      useful for sending out READY=1 first, and then continuing invocation
      without changing process ID, so that the tool can be nicely used
      within an ExecStart= line of a unit file that uses Type=notify.

sd-event + sd-bus APIs:

    * The sd-event API gained a new call sd_event_source_leave_ratelimit()
      which may be used to explicitly end a rate-limit state an event
      source might be in, resetting all rate limiting counters.

    * When the sd-bus library is used to make connections to AF_UNIX D-Bus
      sockets, it will now encode the "description" set via
      sd_bus_set_description() into the source socket address. It will also
      look for this information when accepting a connection. This is useful
      to track individual D-Bus connections on a D-Bus broker for debug
      purposes.

systemd-resolved:

    * systemd-resolved gained a new resolved.conf setting
      StateRetentionSec= which may be used to retain cached DNS records
      even after their nominal TTL, and use them in case upstream DNS
      servers cannot be reached. This can be used to make name resolution
      more resilient in case of network problems.

    * resolvectl gained a new verb "show-cache" to show the current cache
      contents of systemd-resolved. This verb communicates with the
      systemd-resolved daemon and requires privileges.

Other:

    * Meson >= 0.60.0 is now required to build systemd.

    * The default keymap to apply may now be chosen at build-time via the
      new -Ddefault-keymap= meson option.

    * Most of systemd's long-running services now have a generic handler of
      the SIGRTMIN+18 signal handler which executes various operations
      depending on the sigqueue() parameter sent along. For example, values
      0x100…0x107 allow changing the maximum log level of such
      services. 0x200…0x203 allow changing the log target of such
      services. 0x300 make the services trim their memory similarly to the
      automatic PSI-triggered action, see above. 0x301 make the services
      output their malloc_info() data to the logs.

    * machinectl gained new "edit" and "cat" verbs for editing .nspawn
      files, inspired by systemctl's verbs of the same name which edit unit
      files. Similarly, networkctl gained the same verbs for editing
      .network, .netdev, .link files.

    * A new syscall filter group "@sandbox" has been added that contains
      syscalls for sandboxing system calls such as those for seccomp and
      Landlock.

    * New documentation has been added:

      https://systemd.io/COREDUMP
      https://systemd.io/MEMORY_PRESSURE
      smbios-type-11(7)

    * systemd-firstboot gained a new --reset option. If specified, the
      settings in /etc/ it knows how to initialize are reset.

    * systemd-sysext is now a multi-call binary and is also installed under
      the systemd-confext alias name (via a symlink). When invoked that way
      it will operate on /etc/ instead of /usr/ + /opt/. It thus becomes a
      powerful, atomic, secure configuration management of sorts, that
      locally can merge configuration from multiple confext configuration
      images into a single immutable tree.

    * The --network-macvlan=, --network-ipvlan=, --network-interface=
      switches of systemd-nspawn may now optionally take the intended
      network interface inside the container.

    * All our programs will now send an sd_notify() message with their exit
      status in the EXIT_STATUS= field when exiting, using the usual
      protocol, including PID 1. This is useful for VMMs and container
      managers to collect an exit status from a system as it shuts down, as
      set via "systemctl exit …". This is particularly useful in test cases
      and similar, as invocations via a VM can now nicely propagate an exit
      status to the host, similar to local processes.

    * systemd-run gained a new switch --expand-environment=no to disable
      server-side environment variable expansion in specified command
      lines. Expansion defaults to enabled for all execution types except
      --scope, where it defaults to off (and prints a warning) for backward
      compatibility reasons. --scope will be flipped to enabled by default
      too in a future release. If you are using --scope and passing a '$'
      character in the payload you should start explicitly using
      --expand-environment=yes/no according to the use case.

    * The systemd-system-update-generator has been updated to also look for
      the special flag file /etc/system-update in addition to the existing
      support for /system-update to decide whether to enter system update
      mode.

    * The /dev/hugepages/ file system is now mounted with nosuid + nodev
      mount options by default.

    * systemd-fstab-generator now understands two new kernel command line
      options systemd.mount-extra= and systemd.swap-extra=, which configure
      additional mounts or swaps in a format similar to /etc/fstab. 'fsck'
      will be ran on these block devices, like it already happens for
      'root='. It also now supports the new fstab.extra and
      fstab.extra.initrd credentials that may contain additional /etc/fstab
      lines to apply at boot.

    * systemd-getty-generator now understands two new credentials
      getty.ttys.container and getty.ttys.serial. These credentials may
      contain a list of TTY devices – one per line – to instantiate
      [email protected] and [email protected] on.

    * The getty/serial-getty/container-getty units now import the 'agetty.*'
      and 'login.*' credentials, which are consumed by the 'login' and
      'agetty' programs starting from util-linux v2.40.

    * systemd-sysupdate's sysupdate.d/ drop-ins gained a new setting
      PathRelativeTo=, which can be set to "esp", "xbootldr", "boot", in
      which case the Path= setting is taken relative to the ESP or XBOOTLDR
      partitions, rather than the system's root directory /. The relevant
      directories are automatically discovered.

    * The systemd-ac-power tool gained a new switch --low, which reports
      whether the battery charge is considered "low", similar to how the
      s2h suspend logic checks this state to decide whether to enter system
      suspend or hibernation.

    * The /etc/os-release file can now have two new optional fields
      VENDOR_NAME= and VENDOR_URL= to carry information about the vendor of
      the OS.

    * When the system hibernates, information about the device and offset
      used is now written to a non-volatile EFI variable. On next boot the
      system will attempt to resume from the location indicated in this EFI
      variable. This should make hibernation a lot more robust, while
      requiring no manual configuration of the resume location.

    * The $XDG_STATE_HOME environment variable (added in more recent
      versions of the XDG basedir specification) is now honoured to
      implement the StateDirectory= setting in user services.

    * A new component "systemd-battery-check" has been added. It may run
      during early boot (usually in the initrd), and checks the battery
      charge level of the system. In case the charge level is very low the
      user is notified (graphically via Plymouth – if available – as well
      as in text form on the console), and the system is turned off after a
      10s delay. The feature can be disabled by passing
      systemd.battery-check=0 through the kernel command line.

    * The 'passwdqc' library is now supported as an alternative to the
      'pwquality' library and can be selected at build time.

Contributors

    Contributions from: 김인수, 07416, Addison Snelling, Adrian Vovk,
    Aidan Dang, Alexander Krabler, Alfred Klomp, Anatoli Babenia,
    Andrei Stepanov, Andrew Baxter, Antonio Alvarez Feijoo,
    Arian van Putten, Arthur Shau, A S Alam,
    Asier Sarasua Garmendia, Balló György, Bastien Nocera,
    Benjamin Herrenschmidt, Benjamin Raison, Bill Peterson,
    Brad Fitzpatrick, Brett Holman, bri, Chen Qi, Chitoku,
    Christian Hesse, Christoph Anton Mitterer, Christopher Gurnee,
    Colin Walters, Cornelius Hoffmann, Cristian Rodríguez, cunshunxia,
    cvlc12, Cyril Roelandt, Daan De Meyer, Daniele Medri,
    Daniel P. Berrangé, Daniel Rusek, Dan Streetman, David Edmundson,
    David Schroeder, David Tardon, dependabot[bot],
    Dimitri John Ledkov, Dmitrii Fomchenkov, Dmitry V. Levin, dmkUK,
    Dominique Martinet, don bright, drosdeck, Edson Juliano Drosdeck,
    Egor Ignatov, EinBaum, Emanuele Giuseppe Esposito, Eric Curtin,
    Erik Sjölund, Evgeny Vereshchagin, Florian Klink, Franck Bui,
    François Rigault, Fran Diéguez, Franklin Yu, Frantisek Sumsal,
    Fuminobu TAKEYAMA, Gaël PORTAY, Gerd Hoffmann, Gertalitec,
    Gibeom Gwon, Gustavo Noronha Silva, Hannu Lounento,
    Hans de Goede, Haochen Tong, HATAYAMA Daisuke, Henrik Holst,
    Hoe Hao Cheng, Igor Tsiglyar, Ivan Vecera, James Hilliard,
    Jan Engelhardt, Jan Janssen, Jan Luebbe, Jan Macku, Janne Sirén,
    jcg, Jeidnx, Joan Bruguera, Joerg Behrmann, jonathanmetzman,
    Jordan Rome, Josef Miegl, Joshua Goins, Joyce, Joyce Brum,
    Juno Computers, Kai Lueke, Kevin P. Fleming, Kiran Vemula, Klaus,
    Klaus Zipfel, Lawrence Thorpe, Lennart Poettering, licunlong,
    Lily Foster, Luca Boccassi, Ludwig Nussel, Luna Jernberg,
    maanyagoenka, Maanya Goenka, Maksim Kliazovich, Malte Poll,
    Marko Korhonen, Masatake YAMATO, Mateusz Poliwczak, Matt Johnston,
    Miao Wang, Micah Abbott, Michael A Cassaniti, Michal Koutný,
    Michal Sekletár, Mike Yuan, mooo, Morten Linderud, msizanoen,
    Nick Rosbrook, nikstur, Olivier Gayot, Omojola Joshua,
    Paolo Velati, Paul Barker, Pavel Borecki, Petr Menšík,
    Philipp Kern, Philip Withnall, Piotr Drąg, Quintin Hill,
    Rene Hollander, Richard Phibel, Robert Meijers, Robert Scheck,
    Roger Gammans, Romain Geissler, Ronan Pigott, Russell Harmon,
    saikat0511, Samanta Navarro, Sam James, Sam Morris,
    Simon Braunschmidt, Sjoerd Simons, Sorah Fukumori,
    Stanislaw Gruszka, Stefan Roesch, Steven Luo, Steve Ramage,
    Susant Sahani, taniishkaaa, Tanishka, Temuri Doghonadze,
    Thierry Martin, Thomas Blume, Thomas Genty, Thomas Weißschuh,
    Thorsten Kukuk, Times-Z, Tobias Powalowski, tofylion,
    Topi Miettinen, Uwe Kleine-König, Velislav Ivanov,
    Vitaly Kuznetsov, Vít Zikmund, Weblate, Will Fancher,
    William Roberts, Winterhuman, Wolfgang Müller, Xeonacid,
    Xiaotian Wu, Xi Ruoyao, Yuri Chornoivan, Yu Watanabe, Yuxiang Zhu,
    Zbigniew Jędrzejewski-Szmek, zhmylove, ZjYwMj,
    Дамјан Георгиевски, наб

    — Edinburgh, 2023-07-28

v254-rc3

9 months ago

systemd System and Service Manager

CHANGES WITH 254 in spe:

Announcements of Future Feature Removals and Incompatible Changes:

    * The next release (v255) will remove support for split-usr (/usr/
      mounted separately during late boot, instead of being mounted by the
      initrd before switching to the rootfs) and unmerged-usr (parallel
      directories /bin/ and /usr/bin/, /lib/ and /usr/lib/, …). For more
      details, see:
      https://lists.freedesktop.org/archives/systemd-devel/2022-September/048352.html

    * We intend to remove cgroup v1 support from a systemd release after
      the end of 2023. If you run services that make explicit use of
      cgroup v1 features (i.e. the "legacy hierarchy" with separate
      hierarchies for each controller), please implement compatibility with
      cgroup v2 (i.e. the "unified hierarchy") sooner rather than later.
      Most of Linux userspace has been ported over already.

    * Support for System V service scripts is now deprecated and will be
      removed in a future release. Please make sure to update your software
      *now* to include a native systemd unit file instead of a legacy
      System V script to retain compatibility with future systemd releases.

    * Support for the SystemdOptions EFI variable is deprecated.
      'bootctl systemd-efi-options' will emit a warning when used. It seems
      that this feature is little-used and it is better to use alternative
      approaches like credentials and confexts. The plan is to drop support
      altogether at a later point, but this might be revisited based on
      user feedback.

    * EnvironmentFile= now treats the line following a comment line
      trailing with escape as a non comment line. For details, see:
      https://github.com/systemd/systemd/issues/27975

    * Behaviour of sandboxing options for the per-user service manager
      units has changed. They now imply PrivateUsers=yes, which means user
      namespaces will be implicitly enabled when a sandboxing option is
      enabled in a user unit. Enabling user namespaces has the drawback
      that system users will no longer be visible (and processes/files will
      appear as owned by 'nobody') in the user unit.

      By definition a sandboxed user unit should run with reduced
      privileges, so impact should be small. This will remove a great
      source of confusion that has been reported by users over the years,
      due to how these options require an extra setting to be manually
      enabled when used in the per-user service manager, which is not
      needed in the system service manager. For more details, see:
      https://lists.freedesktop.org/archives/systemd-devel/2022-December/048682.html

    * systemd-run's switch --expand-environment= which currently is disabled
      by default when combined with --scope, will be changed in a future
      release to be enabled by default.

Security Relevant Changes:

    * pam_systemd will now by default pass the CAP_WAKE_ALARM ambient
      process capability to invoked session processes of regular users on
      local seats (as well as to systemd --user), unless configured
      otherwise via data from JSON user records, or via the PAM module's
      parameter list. This is useful in order allow desktop tools such as
      GNOME's Alarm Clock application to set a timer for
      CLOCK_REALTIME_ALARM that wakes up the system when it elapses. A
      per-user service unit file may thus use AmbientCapability= to pass
      the capability to invoked processes. Note that this capability is
      relatively narrow in focus (in particular compared to other process
      capabilities such as CAP_SYS_ADMIN) and we already — by default —
      permit more impactful operations such as system suspend to local
      users.

Service Manager:

    * "Startup" memory settings are now supported. Previously IO and CPU
      settings were already supported via StartupCPUWeight= and similar.
      The same logic has been added for the various per-unit memory
      settings StartupMemoryMax= and related.

    * The service manager gained support for enqueuing POSIX signals to
      services that carry an additional integer value, exposing the
      sigqueue() system call. This is accessible via new D-Bus calls
      org.freedesktop.systemd1.Manager.QueueSignalUnit() and
      org.freedesktop.systemd1.Unit.QueueSignal(), as well as in systemctl
      via the new --kill-value= option.

    * systemctl gained a new "list-paths" verb, which shows all currently
      active .path units, similarly to how "systemctl list-timers" shows
      active timers, and "systemctl list-sockets" shows active sockets.

    * systemctl gained a new --when= switch which is honoured by the various
      forms of shutdown (i.e. reboot, kexec, poweroff, halt) and allows
      scheduling these operations by time, similar in fashion to how this
      has been supported by SysV shutdown.

    * If MemoryDenyWriteExecute= is enabled for a service and the kernel
      supports the new PR_SET_MDWE prctl() call, it is used instead of the
      seccomp()-based system call filter to achieve the same effect.

    * A new set of kernel command line options is now understood:
      systemd.tty.term.<name>=, systemd.tty.rows.<name>=,
      systemd.tty.columns.<name>= allow configuring the TTY type and
      dimensions for the tty specified via <name>. When systemd invokes a
      service on a tty (via TTYName=) it will look for these and configure
      the TTY accordingly. This is particularly useful in VM environments
      to propagate host terminal settings into the appropriate TTYs of the
      guest.

    * A new RootEphemeral= setting is now understood in service units. It
      takes a boolean argument. If enabled for services that use RootImage=
      or RootDirectory= an ephemeral copy of the disk image or directory
      tree is made when the service is started. It is removed automatically
      when the service is stopped. That ephemeral copy is made using
      btrfs/xfs reflinks or btrfs snapshots, if available.

    * The service activation logic gained new settings RestartSteps= and
      RestartMaxDelaySec= which allow exponentially-growing restart
      intervals for Restart=.

    * The service activation logic gained a new setting RestartMode= which
      can be set to 'direct' to skip the inactive/failed states when
      restarting, so that dependent units are not notified until the service
      converges to a final (successful or failed) state. For example, this
      means that OnSuccess=/OnFailure= units will not be triggered until the
      service state has converged.

    * PID 1 will now automatically load the virtio_console kernel module
      during early initialization if running in a suitable VM. This is done
      so that early-boot logging can be written to the console if available.

    * Similarly, virtio-vsock support is loaded early in suitable VM
      environments. PID 1 will send sd_notify() notifications via AF_VSOCK
      to the VMM if configured, thus loading this early is beneficial.

    * A new verb "fdstore" has been added to systemd-analyze to show the
      current contents of the file descriptor store of a unit. This is
      backed by a new D-Bus call DumpUnitFileDescriptorStore() provided by
      the service manager.

    * The service manager will now set a new $FDSTORE environment variable
      when invoking processes for services that have the file descriptor
      store enabled.

    * A new service option FileDescriptorStorePreserve= has been added that
      allows tuning the life-cycle of the per-service file descriptor
      store. If set to "yes", the entries in the fd store are retained even
      after the service has been fully stopped.

    * The "systemctl clean" command may now be used to clear the fdstore of
      a service.

    * Unit *.preset files gained a new directive "ignore", in addition to
      the existing "enable" and "disable". As the name suggests, matching
      units are left unchanged, i.e. neither enabled nor disabled.

    * Service units gained a new setting DelegateSubgroup=. It takes the
      name of a sub-cgroup to place any processes the service manager forks
      off in. Previously, the service manager would place all service
      processes directly in the top-level cgroup it created for the
      service. This usually meant that main process in a service with
      delegation enabled would first have to create a subgroup and move
      itself down into it, in order to not conflict with the "no processes
      in inner cgroups" rule of cgroup v2. With this option, this step is
      now handled by PID 1.

    * The service manager will now look for .upholds/ directories,
      similarly to the existing support for .wants/ and .requires/
      directories. Symlinks in this directory result in Upholds=
      dependencies.

      The [Install] section of unit files gained support for a new
      UpheldBy= directive to generate .upholds/ symlinks automatically when
      a unit is enabled.

    * The service manager now supports a new kernel command line option
      systemd.default_device_timeout_sec=, which may be used to override
      the default timeout for .device units.

    * A new "soft-reboot" mechanism has been added to the service manager.
      A "soft reboot" is similar to a regular reboot, except that it
      affects userspace only: the service manager shuts down any running
      services and other units, then optionally switches into a new root
      file system (mounted to /run/nextroot/), and then passes control to a
      systemd instance in the new file system which then starts the system
      up again. The kernel is not rebooted and neither is the hardware,
      firmware or boot loader. This provides a fast, lightweight mechanism
      to quickly reset or update userspace, without the latency that a full
      system reset involves. Moreover, open file descriptors may be passed
      across the soft reboot into the new system where they will be passed
      back to the originating services. This allows pinning resources
      across the reboot, thus minimizing grey-out time further. This new
      reboot mechanism is accessible via the new "systemctl soft-reboot"
      command.

    * Services using RootDirectory= or RootImage= will now have read-only
      access to a copy of the host's os-release file under
      /run/host/os-release, which will be kept up-to-date on 'soft-reboot'.
      This was already the case for Portable Services, and the feature has
      now been extended to all services that do not run off the host's
      root filesystem.

    * A new service setting MemoryKSM= has been added to enable kernel
      same-page merging individually for services.

    * A new service setting ImportCredentials= has been added that augments
      LoadCredential= and LoadCredentialEncrypted= and searches for
      credentials to import from the system, and supports globbing.

    * A new job mode "restart-dependencies" has been added to the service
      manager (exposed via systemctl --job-mode=). It is only valid when
      used with "start" jobs, and has the effect that the "start" job will
      be propagated as "restart" jobs to currently running units that have
      a BindsTo= or Requires= dependency on the started unit.

    * A new verb "whoami" has been added to "systemctl" which determines as
      part of which unit the command is being invoked. It writes the unit
      name to standard output. If one or more PIDs are specified reports
      the unit names the processes referenced by the PIDs belong to.

    * The system and service credential logic has been improved: there's
      now a clearly defined place where system provisioning tools running
      in the initrd can place credentials that will be imported into the
      system's set of credentials during the initrd → host transition: the
      /run/credentials/@initrd/ directory. Once the credentials placed
      there are imported into the system credential set they are deleted
      from this directory, and the directory itself is deleted afterwards
      too.

    * A new kernel command line option systemd.set_credential_binary= has
      been added, that is similar to the pre-existing
      systemd.set_credential= but accepts arbitrary binary credential data,
      encoded in Base64. Note that the kernel command line is not a
      recommend way to transfer credentials into a system, since it is
      world-readable from userspace.

    * The default machine ID to use may now be configured via the
      system.machine_id system credential. It will only be used if no
      machine ID was set yet on the host.

    * On Linux kernel 6.4 and newer system and service credentials will now
      be placed in a tmpfs instance that has the "noswap" mount option
      set. Previously, a "ramfs" instance was used. By switching to tmpfs
      ACL support and overall size limits can now be enforced, without
      compromising on security, as the memory is never paged out either
      way.

    * The service manager now can detect when it is running in a
      'Confidential Virtual Machine', and a corresponding 'cvm' value is now
      accepted by ConditionSecurity= for units that want to conditionalize
      themselves on this. systemd-detect-virt gained new 'cvm' and
      '--list-cvm' switches to respectively perform the detection or list
      all known flavours of confidential VM, depending on the vendor. The
      manager will publish a 'ConfidentialVirtualization' D-Bus property,
      and will also set a SYSTEMD_CONFIDENTIAL_VIRTUALIZATION= environment
      variable for unit generators. Finally, udev rules can match on a new
      'cvm' key that will be set when in a confidential VM.
      Additionally, when running in a 'Confidential Virtual Machine', SMBIOS
      strings and QEMU's fw_cfg protocol will not be used to import
      credentials and kernel command line parameters by the system manager,
      systemd-boot and systemd-stub, because the hypervisor is considered
      untrusted in this particular setting.

Journal:

    * The sd-journal API gained a new call sd_journal_get_seqnum() to
      retrieve the current log record's sequence number and sequence number
      ID, which allows applications to order records the same way as
      journal does internally. The sequence number is now also exported in
      the JSON and "export" output of the journal.

    * journalctl gained a new switch --truncate-newline. If specified
      multi-line log records will be truncated at the first newline,
      i.e. only the first line of each log message will be shown.

    * systemd-journal-upload gained support for --namespace=, similar to
      the switch of the same name of journalctl.

systemd-repart:

    * systemd-repart's drop-in files gained a new ExcludeFiles= option which
      may be used to exclude certain files from the effect of CopyFiles=.

    * systemd-repart's Verity support now implements the Minimize= setting
      to minimize the size of the resulting partition.

    * systemd-repart gained a new --offline= switch, which may be used to
      control whether images shall be built "online" or "offline",
      i.e. whether to make use of kernel facilities such as loopback block
      devices and device mapper or not.

    * If systemd-repart is told to populate a newly created ESP or XBOOTLDR
      partition with some files, it will now default to VFAT rather than
      ext4.

    * systemd-repart gained a new --architecture= switch. If specified, the
      per-architecture GPT partition types (i.e. the root and /usr/
      partitions) configured in the partition drop-in files are
      automatically adjusted to match the specified CPU architecture, in
      order to simplify cross-architecture DDI building.

    * systemd-repart will now default to a minimum size of 300MB for XFS
      filesystems if no size parameter is specified. This matches what the
      XFS tools (xfsprogs) 

systemd-boot, systemd-stub, ukify, bootctl, kernel-install:

    * gnu-efi is no longer required to build systemd-boot and systemd-stub.
      Instead, pyelftools is now needed, and it will be used to perform the
      ELF -> PE relocations at build time.

    * bootctl gained a new switch --print-root-device/-R that prints the
      block device the root file system is backed by. If specified twice,
      it returns the whole disk block device (as opposed to partition block
      device) the root file system is on. It's useful for invocations such
      as "cfdisk $(bootctl -RR)" to quickly show the partition table of the
      running OS.

    * systemd-stub will now look for the SMBIOS Type 1 field
      "io.systemd.stub.kernel-cmdline-extra" and append its value to the
      kernel command line it invokes. This is useful for VMMs such as qemu
      to pass additional kernel command lines into the system even when
      booting via full UEFI. The contents of the field are measured into
      TPM PCR 12.

    * The KERNEL_INSTALL_LAYOUT= setting for kernel-install gained a new
      value "auto". With this value, a kernel will be automatically
      analyzed, and if it qualifies as UKI, it will be installed as if the
      setting was to set to "uki", otherwise as "bls".

    * systemd-stub can now optionally load UEFI PE "add-on" images that may
      contain additional kernel command line information. These "add-ons"
      superficially look like a regular UEFI executable, and are expected
      to be signed via SecureBoot/shim. However, they do not actually
      contain code, but instead a subset of the PE sections that UKIs
      support. They are supposed to provide a way to extend UKIs with
      additional resources in a secure and authenticated way. Currently,
      only the .cmdline PE section may be used in add-ons, in which case
      any specified string is appended to the command line embedded into
      the UKI itself. A new 'addon<EFI-ARCH>.efi.stub' is now provided that
      can be used to trivially create addons, via 'ukify' or 'objcopy'. In
      the future we expect other sections to be made extensible like this as
      well.

    * ukify has been updated to allow building these UEFI PE "add-on"
      images, using the new 'addon<EFI-ARCH>.efi.stub'.

    * ukify gained a new "genkey" verb for generating a set of of key pairs
      to sign UKIs and their PCR data with.

    * ukify now accepts SBAT information to place in the .sbat PE section
      of UKIs and addons. If a UKI is built the SBAT information from the
      inner kernel is merged with any SBAT information associated with
      systemd-stub and the SBAT data specified on the ukify command line.

    * The kernel-install script has been rewritten in C, and reuses much of
      the infrastructure of existing tools such as bootctl. It also gained
      --esp-path= and --boot-path= options to override the path to the ESP,
      and the $BOOT partition. Options --make-entry-directory= and
      --entry-token= have been added as well, similar to bootctl's options
      of the same name.

    * A new kernel-install plugin 60-ukify has been added which will
      combine kernel/initrd locally into a UKI and optionally sign them
      with a local key. This may be used to switch to UKI mode even on
      systems where a local kernel or initrd is used. (Typically UKIs are
      built and signed by the vendor.)

    * The ukify tool now supports "pesign" in addition to the pre-existing
      "sbsign" for signing UKIs.

    * systemd-measure and systemd-stub now look for the .uname PE section
      that should contain the kernel's "uname -r" string.

    * systemd-measure and ukify now calculate expected PCR hashes for a UKI
      "offline", i.e. without access to a TPM (physical or
      software-emulated).

Memory Pressure & Control:

    * The sd-event API gained new calls sd_event_add_memory_pressure(),
      sd_event_source_set_memory_pressure_type(),
      sd_event_source_set_memory_pressure_period() to create and configure
      an event source that is called whenever the OS signals memory
      pressure. Another call sd_event_trim_memory() is provided that
      compacts the process' memory use by releasing allocated but unused
      malloc() memory back to the kernel. Services can also provide their
      own custom callback to do memory trimming. This should improve system
      behaviour under memory pressure, as on Linux traditionally provided
      no mechanism to return process memory back to the kernel if the
      kernel was under memory pressure. This makes use of the kernel's PSI
      interface. Most long-running services in systemd have been hooked up
      with this, and in particular systems with low memory should benefit
      from this.

    * Service units gained new settings MemoryPressureWatch= and
      MemoryPressureThresholdSec= to configure the PSI memory pressure
      logic individually. If these options are used, the
      $MEMORY_PRESSURE_WATCH and $MEMORY_PRESSURE_WRITE environment
      variables will be set for the invoked processes to inform them about
      the requested memory pressure behaviour. (This is used by the
      aforementioned sd-events API additions, if set.)

    * systemd-analyze gained a new "malloc" verb that shows the output
      generated by glibc's malloc_info() on services that support it. Right
      now, only the service manager has been updated accordingly. This
      call requires privileges.

User & Session Management:

    * The sd-login API gained a new call sd_session_get_username() to
      return the user name of the owner of a login session. It also gained
      a new call sd_session_get_start_time() to retrieve the time the login
      session started. A new call sd_session_get_leader() has been added to
      return the PID of the "leader" process of a session. A new call
      sd_uid_get_login_time() returns the time since the specified user has
      most recently been continuously logged in with at least one session.

    * JSON user records gained a new set of fields capabilityAmbientSet and
      capabilityBoundingSet which contain a list of POSIX capabilities to
      set for the logged in users in the ambient and bounding sets,
      respectively. homectl gained the ability to configure these two sets
      for users via --capability-bounding-set=/--capability-ambient-set=.

    * pam_systemd learnt two new module options
      default-capability-bounding-set= and default-capability-ambient-set=,
      which configure the default bounding sets for users as they are
      logging in, if the JSON user record doesn't specify this explicitly
      (see above). The built-in default for the ambient set now contains
      the CAP_WAKE_ALARM, thus allowing regular users who may log in
      locally to resume from a system suspend via a timer.

    * The Session D-Bus objects systemd-logind gained a new SetTTY() method
      call to update the TTY of a session after it has been allocated. This
      is useful for SSH sessions which are typically allocated first, and
      for which a TTY is added later.

    * The sd-login API gained a new call sd_pid_notifyf_with_fds() which
      combines the various other sd_pid_notify() flavours into one: takes a
      format string, an overriding PID, and a set of file descriptors to
      send. It also gained a new call sd_pid_notify_barrier() call which is
      equivalent to sd_notify_barrier() but allows the originating PID to
      be specified.

    * "loginctl list-users" and "loginctl list-sessions" will now show the
      state of each logged in user/session in their tabular output. It will
      also show the current idle state of sessions.

DDIs:

    * systemd-dissect will now show the intended CPU architecture of an
      inspected DDI.

    * systemd-dissect will now install itself as mount helper for the "ddi"
      pseudo-file system type. This means you may now mount DDIs directly
      via /bin/mount or /etc/fstab, making full use of embedded Verity
      information and all other DDI features.

      Example: mount -t ddi myimage.raw /some/where

    * The systemd-dissect tool gained the new switches --attach/--detach to
      attach/detach a DDI to a loopback block device without mounting it.
      It will automatically derive the right sector size from the image
      and set up Verity and similar, but not mount the file systems in it.

    * When systemd-gpt-auto-generator or the DDI mounting logic mount an
      ESP or XBOOTLDR partition the MS_NOSYMFOLLOW mount option is now
      implied. Given that these file systems are typically untrusted, this
      should make mounting them automatically have less of a security
      impact.

    * All tools that parse DDIs (such as systemd-nspawn, systemd-dissect,
      systemd-tmpfiles, …) now understand a new switch --image-policy= which
      takes a string encoding image dissection policy. With this mechanism
      automatic discovery and use of specific partition types and the
      cryptographic requirements on the partitions (Verity, LUKS, …) can be
      restricted, permitting better control of the exposed attack surfaces
      when mounting disk images. systemd-gpt-auto-generator will honour such
      an image policy too, configurable via the systemd.image_policy= kernel
      command line option. Unit files gained the RootImagePolicy=,
      MountImagePolicy= and ExtensionImagePolicy= to configure the same for
      disk images a service runs off.

    * systemd-analyze gained a new verb "image-policy" to validate and
      parse image policy strings.

    * systemd-dissect gained support for a new --validate switch to
      superficially validate DDI structure, and check whether a specific
      image policy allows the DDI.

    * systemd-dissect gained support for a new --mtree-hash switch to
      optionally disable calculating mtree hashes, which can be slow on
      large images.

    * systemd-dissect --copy-to, --copy-from, --list and --mtree switches
      are now able to operate on directories too, other than images.

Network Management:

    * networkd's GENEVE support as gained a new .network option
      InheritInnerProtocol=.

    * The [Tunnel] section in .netdev files has gained a new setting
      IgnoreDontFragment for controlling the IPv4 "DF" flag of datagrams.

    * A new global IPv6PrivacyExtensions= setting has been added that
      selects the default value of the per-network setting of the same
      name.

    * The predictable network interface naming logic will now include
      SR-IOV-R "representor" information in network interface names.

    * The DHCPv4 + DHCPv6 + IPv6 RA logic in networkd gained support for
      the RFC8910 captive portal option.

Device Management:

    * udevadm gained the new "verify" verb for validating udev rules files
      offline.

    * udev gained a new tool "iocost" that can be used to configure QoS IO
      cost data based on hwdb information onto suitable block devices. Also
      see https://github.com/iocost-benchmark/iocost-benchmarks.

TPM2 Support + Disk Encryption & Authentication:

    * systemd-cryptenroll/systemd-cryptsetup will now install a TPM2 SRK
      ("Storage Root Key") as first step in the TPM2, and then use that
      for binding FDE to, if TPM2 support is used. This matches
      recommendations of TCG (see
      https://trustedcomputinggroup.org/wp-content/uploads/TCG-TPM-v2.0-Provisioning-Guidance-Published-v1r1.pdf)

    * systemd-cryptenroll and other tools that take TPM2 PCR parameters now
      understand textual identifiers for these PCRs.

    * systemd-veritysetup + /etc/veritytab gained support for a series of
      new options: hash-offset=, superblock=, format=, data-block-size=,
      hash-block-size=, data-blocks=, salt=, uuid=, hash=, fec-device=,
      fec-offset=, fec-roots= to configure various aspects of a Verity
      volume.

    * systemd-cryptsetup + /etc/crypttab gained support for a new
      veracrypt-pim= option for setting the Personal Iteration Multiplier
      of veracrypt volumes.

    * systemd-integritysetup + /etc/integritytab gained support for a new
      mode= setting for controlling the dm-integrity mode (journal, bitmap,
      direct) for the volume.

    * systemd-analyze gained a new verb "pcrs" that shows the known TPM PCR
      registers, their symbolic names and current values.

systemd-tmpfiles:

    * The ACL support in tmpfiles.d/ has been updated: if an uppercase "X"
      access right is specified this is equivalent to "x" but only if the
      inode in question already has the executable bit set for at least
      some user/group. Otherwise the "x" bit will be turned off.

    * tmpfiles.d/'s C line type now understands a new modifier "+": a line
      with C+ will result in a "merge" copy, i.e. all files of the source
      tree are copied into the target tree, even if that tree already
      exists, resulting in a combined tree of files already present in the
      target tree and those copied in.

    * systemd-tmpfiles gained a new --graceful switch. If specified lines
      with unknown users/groups will silently be skipped.

systemd-notify:

    * systemd-notify gained two new options --fd= and --fdname= for sending
      arbitrary file descriptors to the service manager (while specifying an
      explicit name for it).

    * systemd-notify gained a new --exec switch, which makes it execute the
      specified command line after sending the requested messages. This is
      useful for sending out READY=1 first, and then continuing invocation
      without changing process ID, so that the tool can be nicely used
      within an ExecStart= line of a unit file that uses Type=ready.

sd-event + sd-bus APIs:

    * The sd-event API gained a new call sd_event_source_leave_ratelimit()
      which may be used to explicitly end a rate-limit state an event
      source might be in, resetting all rate limiting counters.

    * When the sd-bus library is used to make connections to AF_UNIX D-Bus
      sockets, it will now encode the "description" set via
      sd_bus_set_description() into the source socket address. It will also
      look for this information when accepting a connection. This is useful
      to track individual D-Bus connections on a D-Bus broker for debug
      purposes.

systemd-resolved:

    * systemd-resolved gained a new resolved.conf setting
      StateRetentionSec= which may be used to retain cached DNS records
      even after their nominal TTL, and use them in case upstream DNS
      servers cannot be reached. This can be sued to make name resolution
      more resilient in case of network problems.

    * resolvectl gained a new verb "show-cache" to show the current cache
      contents of systemd-resolved. This verb communicates with the
      systemd-resolved daemon and requires privileges.

Other:

    * Meson >= 0.60.0 is now required to build systemd.

    * The default keymap to apply may now be chosen at build-time via the
      new -Ddefault-keymap= meson option.

    * Most of systemd's long-running services now have a generic handler of
      the SIGRTMIN+18 signal handler which executes various operations
      depending on the sigqueue() parameter sent along. For example, values
      0x100…0x107 allow changing the maximum log level of such
      services. 0x200…0x203 allow changing the log target of such
      services. 0x300 make the services trim their memory similarly to the
      automatic PSI-triggered action, see above. 0x301 make the services
      output their malloc_info() data to the logs.

    * machinectl gained new "edit" and "cat" verbs for editing .nspawn
      files, inspired by systemctl's verbs of the same name which edit unit
      files. Similarly, networkctl gained the same verbs for editing
      .network, .netdev, .link files.

    * A new syscall filter group "@sandbox" has been added that contains
      syscalls for sandboxing system calls such as those for seccomp and
      Landlock.

    * New documentation has been added:

      https://systemd.io/COREDUMP
      https://systemd.io/MEMORY_PRESSURE
      smbios-type-11(7)

    * systemd-firstboot gained a new --reset option. If specified, the
      settings in /etc/ it knows how to initialize are reset.

    * systemd-sysext is now a multi-call binary and is also installed under
      the systemd-confext alias name (via a symlink). When invoked that way
      it will operate on /etc/ instead of /usr/ + /opt/. It thus becomes a
      powerful, atomic, secure configuration management of sorts, that
      locally can merge configuration from multiple confext configuration
      images into a single immutable tree.

    * The --network-macvlan=, --network-ipvlan=, --network-interface=
      switches of systemd-nspawn may now optionally take the intended
      network interface inside the container.

    * All our programs will now send an sd_notify() message with their exit
      status in the EXIT_STATUS= field when exiting, using the usual
      protocol, including PID 1. This is useful for VMMs and container
      managers to collect an exit status from a system as it shuts down, as
      set via "systemctl exit …". This is particularly useful in test cases
      and similar, as invocations via a VM can now nicely propagate an exit
      status to the host, similar to local processes.

    * systemd-run gained a new switch --expand-environment=no to disable
      server-side environment variable expansion in specified command
      lines. Expansion defaults to enabled for all execution types except
      --scope, where it defaults to off (and prints a warning) for backward
      compatbility reasons. --scope will be flipped to default enabled too
      in a future release, so if you are using --scope and passing a '$'
      character in the payload you should start explicitly using
      --expand-environment=yes/no according to the use case.

    * The systemd-system-update-generator has been updated to also look for
      the special flag file /etc/system-update in addition to the existing
      support for /system-update to decide whether to enter system update
      mode.

    * The /dev/hugepages/ file system is now mounted with nosuid + nodev
      mount options by default.

    * systemd-fstab-generator now understands two new kernel command line
      options systemd.mount-extra= and systemd.swap-extra=, which configure
      additional mounts or swaps in a format similar to /etc/fstab. It also
      now supports the new fstab.extra and fstab.extra.initrd credentials
      that may contain additional /etc/fstab lines to apply at boot.

    * systemd-getty-generator now understands two new credentials
      getty.ttys.container and getty.ttys.serial. These credentials may
      contain a list of TTY devices – one per line – to instantiate
      [email protected] and [email protected] on.

    * systemd-sysupdate's sysupdate.d/ drop-ins gained a new setting
      PathRelativeTo=, which can be set to "esp", "xbootldr", "boot", in
      which case the Path= setting is taken relative to the ESP or XBOOTLDR
      partitions, rather than the system's root directory /. The relevant
      directories are automatically discovered.

    * The systemd-ac-power tool gained a new switch --low, which reports
      whether the battery charge is considered "low", similar to how the
      s2h suspend logic checks this state to decide whether to enter system
      suspend or hibernation.

    * The /etc/os-release file can now have two new optional fields
      VENDOR_NAME= and VENDOR_URL= to carry information about the vendor of
      the OS.

    * When the system hibernates, information about the device and offset
      used is now written to a non-volatile EFI variable. On next boot the
      system will attempt to resume from the location indicated in this EFI
      variable. This should make hibernation a lot more robust, while
      requiring no manual configuration of the resume location.

    * The $XDG_STATE_HOME environment variable (added in more recent
      versions of the XDG basedir specification) is now honoured to
      implement the StateDirectory= setting in user services.

    * A new component "systemd-battery-check" has been added. It may run
      during early boot (usually in the initrd), and checks the battery
      charge level of the system. In case the charge level is very low the
      user is notified (graphically via Plymouth – if available – as well
      as in text form on the console), and the system is turned off after a
      10s delay. The feature can be disabled by passing
      systemd.battery-check=0 through the kernel command line.

    * The 'passwdqc' library is now supported as an alternative to the
      'pwquality' library and it can be selected at build time.

Contributors

    Contributions from: 김인수, 07416, Addison Snelling, Adrian Vovk,
    Aidan Dang, Alexander Krabler, Alfred Klomp, Anatoli Babenia,
    Andrei Stepanov, Andrew Baxter, Antonio Alvarez Feijoo,
    Arian van Putten, Arthur Shau, A S Alam,
    Asier Sarasua Garmendia, Balló György, Bastien Nocera,
    Benjamin Herrenschmidt, Benjamin Raison, Bill Peterson,
    Brad Fitzpatrick, Brett Holman, bri, Chen Qi, Chitoku,
    Christian Hesse, Christoph Anton Mitterer, Christopher Gurnee,
    Colin Walters, Cornelius Hoffmann, Cristian Rodríguez, cunshunxia,
    cvlc12, Cyril Roelandt, Daan De Meyer, Daniele Medri,
    Daniel P. Berrangé, Dan Streetman, David Edmundson,
    David Schroeder, David Tardon, dependabot[bot],
    Dimitri John Ledkov, Dmitrii Fomchenkov, Dmitry V. Levin, dmkUK,
    Dominique Martinet, don bright, drosdeck, Edson Juliano Drosdeck,
    Egor Ignatov, EinBaum, Emanuele Giuseppe Esposito, Eric Curtin,
    Evgeny Vereshchagin, Florian Klink, Franck Bui, François Rigault,
    Fran Diéguez, Franklin Yu, Frantisek Sumsal, Fuminobu TAKEYAMA,
    Gaël PORTAY, Gerd Hoffmann, Gertalitec, Gibeom Gwon,
    Gustavo Noronha Silva, Hannu Lounento, Hans de Goede,
    Haochen Tong, HATAYAMA Daisuke, Henrik Holst, Hoe Hao Cheng,
    Igor Tsiglyar, Ivan Vecera, James Hilliard, Jan Engelhardt,
    Jan Janssen, Jan Luebbe, Jan Macku, Janne Sirén, jcg, Jeidnx,
    Joan Bruguera, Joerg Behrmann, jonathanmetzman, Jordan Rome,
    Josef Miegl, Joshua Goins, Joyce, Joyce Brum, Juno Computers,
    Kai Lueke, Kevin P. Fleming, Kiran Vemula, Klaus, Klaus Zipfel,
    Lawrence Thorpe, Lennart Poettering, licunlong, Lily Foster,
    Luca Boccassi, Ludwig Nussel, Luna Jernberg, maanyagoenka,
    Maanya Goenka, Maksim Kliazovich, Malte Poll, Marko Korhonen,
    Masatake YAMATO, Mateusz Poliwczak, Matt Johnston, Miao Wang,
    Micah Abbott, Michal Koutný, Michal Sekletár, Mike Yuan, mooo,
    Morten Linderud, msizanoen, Nick Rosbrook, nikstur, Olivier Gayot,
    Omojola Joshua, Paolo Velati, Paul Barker, Pavel Borecki,
    Philipp Kern, Philip Withnall, Piotr Drąg, Quintin Hill,
    Rene Hollander, Richard Phibel, Robert Meijers, Robert Scheck,
    Roger Gammans, Romain Geissler, Ronan Pigott, Russell Harmon,
    saikat0511, Samanta Navarro, Sam James, Sam Morris,
    Simon Braunschmidt, Sjoerd Simons, Sorah Fukumori,
    Stanislaw Gruszka, Stefan Roesch, Steven Luo, Steve Ramage,
    Susant Sahani, taniishkaaa, Tanishka, Temuri Doghonadze,
    Thierry Martin, Thomas Blume, Thomas Genty, Thomas Weißschuh,
    Thorsten Kukuk, Times-Z, Tobias Powalowski, tofylion,
    Topi Miettinen, Uwe Kleine-König, Velislav Ivanov,
    Vitaly Kuznetsov, Vít Zikmund, Weblate, Will Fancher,
    William Roberts, Winterhuman, Wolfgang Müller, Xeonacid,
    Xiaotian Wu, Xi Ruoyao, Yuri Chornoivan, Yu Watanabe, Yuxiang Zhu,
    Zbigniew Jędrzejewski-Szmek, zhmylove, ZjYwMj,
    Дамјан Георгиевски, наб

    — Edinburgh, 2023-07-24

v254-rc2

9 months ago

systemd System and Service Manager

CHANGES WITH 254 in spe:

Announcements of Future Feature Removals and Incompatible Changes:

    * The next release (v255) will remove support for split-usr (/usr/
      mounted separately during late boot, instead of being mounted by the
      initrd before switching to the rootfs) and unmerged-usr (parallel
      directories /bin/ and /usr/bin/, /lib/ and /usr/lib/, …). For more
      details, see:
      https://lists.freedesktop.org/archives/systemd-devel/2022-September/048352.html

    * We intend to remove cgroup v1 support from a systemd release after
      the end of 2023. If you run services that make explicit use of
      cgroup v1 features (i.e. the "legacy hierarchy" with separate
      hierarchies for each controller), please implement compatibility with
      cgroup v2 (i.e. the "unified hierarchy") sooner rather than later.
      Most of Linux userspace has been ported over already.

    * Support for System V service scripts is now deprecated and will be
      removed in a future release. Please make sure to update your software
      *now* to include a native systemd unit file instead of a legacy
      System V script to retain compatibility with future systemd releases.

    * EnvironmentFile= now treats the line following a comment line
      trailing with escape as a non comment line. For details, see:
      https://github.com/systemd/systemd/issues/27975

    * Behaviour of sandboxing options for the per-user service manager
      units has changed. They now imply PrivateUsers=yes, which means user
      namespaces will be implicitly enabled when a sandboxing option is
      enabled in a user unit. Enabling user namespaces has the the drawback
      that system users will no longer be visible (and processes/files will
      appear as owned by 'nobody') in the user unit.

      By definition a sandboxed user unit should run with reduced
      privileges, so impact should be small. This will remove a great
      source of confusion that has been reported by users over the years,
      due to how these options require an extra setting to be manually
      enabled when used in the per-user service manager, which is not
      needed in the system service manager. For more details, see:
      https://lists.freedesktop.org/archives/systemd-devel/2022-December/048682.html

Security Relevant Changes:

    * pam_systemd will now by default pass the CAP_WAKE_ALARM ambient
      process capability to invoked session processes of regular users on
      local seats (as well as to systemd --user), unless configured
      otherwise via data from JSON user records, or via the PAM module's
      parameter list. This is useful in order allow desktop tools such as
      GNOME's Alarm Clock application to set a timer for
      CLOCK_REALTIME_ALARM that wakes up the system when it elapses. A
      per-user service unit file may thus use AmbientCapability= to pass
      the capability to invoked processes. Note that this capability is
      relatively narrow in focus (in particular compared to other process
      capabilities such as CAP_SYS_ADMIN) and we already — by default —
      permit more impactful operations such as system suspend to local
      users.

Service Manager:

    * "Startup" memory settings are now supported. Previously IO and CPU
      settings were already supported via StartupCPUWeight= and similar.
      The same logic has been added for the various per-unit memory
      settings StartupMemoryMax= and related.

    * The service manager gained support for enqueuing POSIX signals to
      services that carry an additional integer value, exposing the
      sigqueue() system call. This is accessible via new D-Bus calls
      org.freedesktop.systemd1.Manager.QueueSignalUnit() and
      org.freedesktop.systemd1.Unit.QueueSignal(), as well as in systemctl
      via the new --kill-value= option.

    * systemctl gained a new "list-paths" verb, which shows all currently
      active .path units, similarly to how "systemctl list-timers" shows
      active timers, and "systemctl list-sockets" shows active sockets.

    * systemctl gained a new --when= switch which is honoured by the various
      forms of shutdown (i.e. reboot, kexec, poweroff, halt) and allows
      scheduling these operations by time, similar in fashion to how this
      has been supported by SysV shutdown.

    * If MemoryDenyWriteExecute= is enabled for a service and the kernel
      supports the new PR_SET_MDWE prctl() call, it is used instead of the
      seccomp()-based system call filter to achieve the same effect.

    * A new set of kernel command line options is now understood:
      systemd.tty.term.<name>=, systemd.tty.rows.<name>=,
      systemd.tty.columns.<name>= allow configuring the TTY type and
      dimensions for the tty specified via <name>. When systemd invokes a
      service on a tty (via TTYName=) it will look for these and configure
      the TTY accordingly. This is particularly useful in VM environments
      to propagate host terminal settings into the appropriate TTYs of the
      guest.

    * A new RootEphemeral= setting is now understood in service units. It
      takes a boolean argument. If enabled for services that use RootImage=
      or RootDirectory= an ephemeral copy of the disk image or directory
      tree is made when the service is started. It is removed automatically
      when the service is stopped. That ephemeral copy is made using
      btrfs/xfs reflinks or btrfs snaphots, if available.

    * The service activation logic gained new settings RestartSteps= and
      RestartMaxDelaySec= which allow exponentially-growing restart
      intervals for Restart=.

    * The service activation logic gained a new setting RestartMode= which
      can be set to 'direct' to skip the inactive/failed states when
      restarting, so that dependent units are not notified until the service
      converges to a final (successful or failed) state. For example, this
      means that OnSuccess=/OnFailure= units will not be triggered until the
      service state has converged.

    * PID 1 will now automatically load the virtio_console kernel module
      during early initialization if running in a suitable VM. This is done
      so that early-boot logging can be written to the console if available.

    * Similarly, virtio-vsock support is loaded early in suitable VM
      environments. PID 1 will send sd_notify() notifications via AF_VSOCK
      to the VMM if configured, thus loading this early is beneficial.

    * A new verb "fdstore" has been added to systemd-analyze to show the
      current contents of the file descriptor store of a unit. This is
      backed by a new D-Bus call DumpUnitFileDescriptorStore() provided by
      the service manager.

    * The service manager will now set a new $FDSTORE environment variable
      when invoking processes for services that have the file descriptor
      store enabled.

    * A new service option FileDescriptorStorePreserve= has been added that
      allows tuning the life-cycle of the per-service file descriptor
      store. If set to "yes", the entries in the fd store are retained even
      after the service has been fully stopped.

    * The "systemctl clean" command may now be used to clear the fdstore of
      a service.

    * Unit *.preset files gained a new directive "ignore", in addition to
      the existing "enable" and "disable". As the name suggests, matching
      units are left unchanged, i.e. neither enabled nor disabled.

    * Service units gained a new setting DelegateSubgroup=. It takes the
      name of a sub-cgroup to place any processes the service manager forks
      off in. Previously, the service manager would place all service
      processes directly in the top-level cgroup it created for the
      service. This usually meant that main process in a service with
      delegation enabled would first have to create a subgroup and move
      itself down into it, in order to not conflict with the "no processes
      in inner cgroups" rule of cgroup v2. With this option, this step is
      now handled by PID 1.

    * The service manager will now look for .upholds/ directories,
      similarly to the existing support for .wants/ and .requires/
      directories. Symlinks in this directory result in Upholds=
      dependencies.

      The [Install] section of unit files gained support for a new
      UpheldBy= directive to generate .upholds/ symlinks automatically when
      a unit is enabled.

    * The service manager now supports a new kernel command line option
      systemd.default_device_timeout_sec=, which may be used to override
      the default timeout for .device units.

    * A new "soft-reboot" mechanism has been added to the service manager.
      A "soft reboot" is similar to a regular reboot, except that it
      affects userspace only: the service manager shuts down any running
      services and other units, then optionally switches into a new root
      file system (mounted to /run/nextroot/), and then passes control to a
      systemd instance in the new file system which then starts the system
      up again. The kernel is not rebooted and neither is the hardware,
      firmware or boot loader. This provides a fast, lightweight mechanism
      to quickly reset or update userspace, without the latency that a full
      system reset involves. Moreover, open file descriptors may be passed
      across the soft reboot into the new system where they will be passed
      back to the originating services. This allows pinning resources
      across the reboot, thus minimizing grey-out time further. Moreover,
      it is possible to allow specific crucial services to survive the
      reboot process, if they run off a separate root file system (i.e. use
      RootDirectory= or RootImage=, or are portable services). This new
      reboot mechanism is accessible via the new "systemctl soft-reboot"
      command.

    * A new service setting MemoryKSM= has been added to enable kernel
      same-page merging individually for services.

    * A new service setting ImportCredentials= has been added that augments
      LoadCredential= and LoadCredentialEncrypted= and searches for
      credentials to import from the system, and supports globbing.

    * A new job mode "restart-dependencies" has been added to the service
      manager (exposed via systemctl --job-mode=). It is only valid when
      used with "start" jobs, and has the effect that the "start" job will
      be propagated as "restart" jobs to currently running units that have
      a BindsTo= or Requires= dependency on the started unit.

    * A new verb "whoami" has been added to "systemctl" which determines as
      part of which unit the command is being invoked. It writes the unit
      name to standard output. If one or more PIDs are specified reports
      the unit names the processes referenced by the PIDs belong to.

    * The system and service credential logic has been improved: there's
      now a clearly defined place where system provisioning tools running
      in the initrd can place credentials that will be imported into the
      system's set of credentials during the initrd → host transition: the
      /run/credentials/@initrd/ directory. Once the credentials placed
      there are imported into the system credential set they are deleted
      from this directory, and the directory itself is deleted afterwards
      too.

    * A new kernel command line option systemd.set_credential_binary= has
      been added, that is similar to the pre-existing
      systemd.set_credential= but accepts arbitrary binary credential data,
      encoded in Base64. Note that the kernel command line is not a
      recommend way to transfer credentials into a system, since it is
      world-readable from userspace.

    * The default machine ID to use may now be configured via the
      system.machine_id system credential. It will only be used if no
      machine ID was set yet on the host.

    * On Linux kernel 6.4 and newer system and service credentials will now
      be placed in a tmpfs instance that has the "noswap" mount option
      set. Previously, a "ramfs" instance was used. By switching to tmpfs
      ACL support and overall size limits can now be enforced, without
      compromising on security, as the memory is never paged out either
      way.

    * The service manager now can detect when it is running in a
      'Confidential Virtual Machine', and a corresponding 'cvm' value is now
      accepted by ConditionSecurity= for units that want to conditionalize
      themselves on this. systemd-detect-virt gained new 'cvm' and
      '--list-cvm' switches to respectively perform the detection or list
      all known flavours of confidential VM, depending on the vendor. The
      manager will publish a 'ConfidentialVirtualization' D-Bus property,
      and will also set a SYSTEMD_CONFIDENTIAL_VIRTUALIZATION= environment
      variable for unit generators. Finally, udev rules can match on a new
      'cvm' key that will be set when in a confidential VM.
      Additionally, when running in a 'Confidential Virtual Machine', SMBIOS
      strings and QEMU's fw_cfg protocol will not be used to import
      credentials and kernel command line parameters by the system manager,
      systemd-boot and systemd-stub, because the hypervisor is considered
      untrusted in this particular setting.

Journal:

    * The sd-journal API gained a new call sd_journal_get_seqnum() to
      retrieve the current log record's sequence number and sequence number
      ID, which allows applications to order records the same way as
      journal does internally. The sequence number is now also exported in
      the JSON and "export" output of the journal.

    * journalctl gained a new switch --truncate-newline. If specified
      multi-line log records will be truncated at the first newline,
      i.e. only the first line of each log message will be shown.

    * systemd-journal-upload gained support for --namespace=, similar to
      the switch of the same name of journalctl.

systemd-repart:

    * systemd-repart's drop-in files gained a new ExcludeFiles= option which
      may be used to exclude certain files from the effect of CopyFiles=.

    * systemd-repart's Verity support now implements the Minimize= setting
      to minimize the size of the resulting partition.

    * systemd-repart gained a new --offline= switch, which may be used to
      control whether images shall be built "online" or "offline",
      i.e. whether to make use of kernel facilities such as loopback block
      devices and device mapper or not.

    * If systemd-repart is told to populate a newly created ESP or XBOOTLDR
      partition with some files, it will now default to VFAT rather than
      ext4.

    * systemd-repart gained a new --architecture= switch. If specified, the
      per-architecture GPT partition types (i.e. the root and /usr/
      partitions) configured in the partition drop-in files are
      automatically adjusted to match the specified CPU architecture, in
      order to simplify cross-architecture DDI building.

systemd-boot, systemd-stub, ukify, bootctl, kernel-install:

    * gnu-efi is no longer required to build systemd-boot and systemd-stub.
      Instead, pyelftools is now needed, and it will be used to perform the
      ELF -> PE relocations at build time.

    * bootctl gained a new switch --print-root-device/-R that prints the
      block device the root file system is backed by. If specified twice,
      it returns the whole disk block device (as opposed to partition block
      device) the root file system is on. It's useful for invocations such
      as "cfdisk $(bootctl -RR)" to quickly show the partition table of the
      running OS.

    * systemd-stub will now look for the SMBIOS Type 1 field
      "io.systemd.stub.kernel-cmdline-extra" and append its value to the
      kernel command line it invokes. This is useful for VMMs such as qemu
      to pass additional kernel command lines into the system even when
      booting via full UEFI. The contents of the field are measured into
      TPM PCR 12.

    * The KERNEL_INSTALL_LAYOUT= setting for kernel-install gained a new
      value "auto". With this value, a kernel will be automatically
      analyzed, and if it qualifies as UKI, it will be installed as if the
      setting was to set to "uki", otherwise as "bls".

    * systemd-stub can now optionally load UEFI PE "add-on" images that may
      contain additional kernel command line information. These "add-ons"
      superficially look like a regular UEFI executable, and are expected
      to be signed via SecureBoot/shim. However, they do not actually
      contain code, but instead a subset of the PE sections that UKIs
      support. They are supposed to provide a way to extend UKIs with
      additional resources in a secure and authenticated way. Currently,
      only the .cmdline PE section may be used in add-ons, in which case
      any specified string is appended to the command line embedded into
      the UKI itself. A new 'addon<EFI-ARCH>.efi.stub' is now provided that
      can be used to trivially create addons, via 'ukify' or 'objcopy'. In
      the future we expect other sections to be made extensible like this as
      well.

    * ukify has been updated to allow building these UEFI PE "add-on"
      images, using the new 'addon<EFI-ARCH>.efi.stub'.

    * ukify gained a new "genkey" verb for generating a set of of key pairs
      to sign UKIs and their PCR data with.

    * ukify now accepts SBAT information to place in the .sbat PE section
      of UKIs and addons. If a UKI is built the SBAT information from the
      inner kernel is merged with any SBAT information associated with
      systemd-stub and the SBAT data specified on the ukify command line.

    * The kernel-install script has been rewritten in C, and reuses much of
      the infrastructure of existing tools such as bootctl. It also gained
      --esp-path= and --boot-path= options to override the path to the ESP,
      and the $BOOT partition. Options --make-entry-directory= and
      --entry-token= have been added as well, similar to bootctl's options
      of the same name.

    * A new kernel-install plugin 60-ukify has been added which will
      combine kernel/initrd locally into a UKI and optionally sign them
      with a local key. This may be used to switch to UKI mode even on
      systems where a local kernel or initrd is used. (Typically UKIs are
      built and signed by the vendor.)

    * The ukify tool now supports "pesign" in addition to the pre-existing
      "sbsign" for signing UKIs.

    * systemd-measure and systemd-stub now look for the .uname PE section
      that should contain the kernel's "uname -r" string.

    * systemd-measure and ukify now calculate expected PCR hashes for a UKI
      "offline", i.e. without access to a TPM (physical or
      software-emulated).

Memory Pressure & Control:

    * The sd-event API gained new calls sd_event_add_memory_pressure(),
      sd_event_source_set_memory_pressure_type(),
      sd_event_source_set_memory_pressure_period() to create and configure
      an event source that is called whenever the OS signals memory
      pressure. Another call sd_event_trim_memory() is provided that
      compacts the process' memory use by releasing allocated but unused
      malloc() memory back to the kernel. Services can also provide their
      own custom callback to do memory trimming. This should improve system
      behaviour under memory pressure, as on Linux traditionally provided
      no mechanism to return process memory back to the kernel if the
      kernel was under memory pressure. This makes use of the kernel's PSI
      interface. Most long-running services in systemd have been hooked up
      with this, and in particular systems with low memory should benefit
      from this.

    * Service units gained new settings MemoryPressureWatch= and
      MemoryPressureThresholdSec= to configure the PSI memory pressure
      logic individually. If these options are used, the
      $MEMORY_PRESSURE_WATCH and $MEMORY_PRESSURE_WRITE environment
      variables will be set for the invoked processes to inform them about
      the requested memory pressure behaviour. (This is used by the
      aforementioned sd-events API additions, if set.)

    * systemd-analyze gained a new "malloc" verb that shows the output
      generated by glibc's malloc_info() on services that support it. Right
      now, only the service manager has been updated accordingly. This
      call requires privileges.

User & Session Management:

    * The sd-login API gained a new call sd_session_get_username() to
      return the user name of the owner of a login session. It also gained
      a new call sd_session_get_start_time() to retrieve the time the login
      session started. A new call sd_session_get_leader() has been added to
      return the PID of the "leader" process of a session. A new call
      sd_uid_get_login_time() returns the time since the specified user has
      most recently been continuously logged in with at least one session.

    * JSON user records gained a new set of fields capabilityAmbientSet and
      capabilityBoundingSet which contain a list of POSIX capabilities to
      set for the logged in users in the ambient and bounding sets,
      respectively. homectl gained the ability to configure these two sets
      for users via --capability-bounding-set=/--capability-ambient-set=.

    * pam_systemd learnt two new module options
      default-capability-bounding-set= and default-capability-ambient-set=,
      which configure the default bounding sets for users as they are
      logging in, if the JSON user record doesn't specify this explicitly
      (see above). The built-in default for the ambient set now contains
      the CAP_WAKE_ALARM, thus allowing regular users who may log in
      locally to resume from a system suspend via a timer.

    * The Session D-Bus objects systemd-logind gained a new SetTTY() method
      call to update the TTY of a session after it has been allocated. This
      is useful for SSH sessions which are typically allocated first, and
      for which a TTY is added later.

    * The sd-login API gained a new call sd_pid_notifyf_with_fds() which
      combines the various other sd_pid_notify() flavours into one: takes a
      format string, an overriding PID, and a set of file descriptors to
      send. It also gained a new call sd_pid_notify_barrier() call which is
      equivalent to sd_notify_barrier() but allows the originating PID to
      be specified.

    * "loginctl list-users" and "loginctl list-sessions" will now show the
      state of each logged in user/session in their tabular output. It will
      also show the current idle state of sessions.

DDIs:

    * systemd-dissect will now show the intended CPU architecture of an
      inspected DDI.

    * systemd-dissect will now install itself as mount helper for the "ddi"
      pseudo-file system type. This means you may now mount DDIs directly
      via /bin/mount or /etc/fstab, making full use of embedded Verity
      information and all other DDI features.

      Example: mount -t ddi myimage.raw /some/where

    * The systemd-dissect tool gained the new switches --attach/--detach to
      attach/detach a DDI to a loopback block device without mounting it.
      It will automatically derive the right sector size from the image
      and set up Verity and similar, but not mount the file systems in it.

    * When systemd-gpt-auto-generator or the DDI mounting logic mount an
      ESP or XBOOTLDR partition the MS_NOSYMFOLLOW mount option is now
      implied. Given that these file systems are typically untrusted, this
      should make mounting them automatically have less of a security
      impact.

    * All tools that parse DDIs (such as systemd-nspawn, systemd-dissect,
      systemd-tmpfiles, …) now understand a new switch --image-policy= which
      takes a string encoding image dissection policy. With this mechanism
      automatic discovery and use of specific partition types and the
      cryptographic requirements on the partitions (Verity, LUKS, …) can be
      restricted, permitting better control of the exposed attack surfaces
      when mounting disk images. systemd-gpt-auto-generator will honour such
      an image policy too, configurable via the systemd.image_policy= kernel
      command line option. Unit files gained the RootImagePolicy=,
      MountImagePolicy= and ExtensionImagePolicy= to configure the same for
      disk images a service runs off.

    * systemd-analyze gained a new verb "image-policy" to validate and
      parse image policy strings.

    * systemd-dissect gained support for a new --validate switch to
      superficially validate DDI structure, and check whether a specific
      image policy allows the DDI.

    * systemd-dissect gained support for a new --mtree-hash switch to
      optionally disable calculating mtree hashes, which can be slow on
      large images.

    * systemd-dissect --copy-to, --copy-from, --list and --mtree switches
      are now able to operate on directories too, other than images.

Network Management:

    * networkd's GENEVE support as gained a new .network option
      InheritInnerProtocol=.

    * The [Tunnel] section in .netdev files has gained a new setting
      IgnoreDontFragment for controlling the IPv4 "DF" flag of datagrams.

    * A new global IPv6PrivacyExtensions= setting has been added that
      selects the default value of the per-network setting of the same
      name.

    * The predictable network interface naming logic will now include
      SR-IOV-R "representor" information in network interface names.

    * The DHCPv4 + DHCPv6 + IPv6 RA logic in networkd gained support for
      the RFC8910 captive portal option.

Device Management:

    * udevadm gained the new "verify" verb for validating udev rules files
      offline.

    * udev will now create symlinks to loopback block devices in the
      /dev/loop/by-ref/ directory that are based on the .lo_file_name
      string field selected during allocation. The systemd-dissect tool and
      the util-linux losetup command now supports a complementing new
      switch --loop-ref= for selecting the string. This means a loopback
      block device may now be allocated under a caller-chosen reference and
      can subsequently be referenced by that without first having to look
      up the block device name the caller ended up with.

    * udev also creates symlinks to loopback block devices in the
      /dev/loop/by-ref/ directory based on the .st_dev/st_ino fields of the
      inode attached to the loopback block device. This means that attaching
      a file to a loopback device will implicitly make a handle available to
      be found via that file's inode information.

    * udev gained a new tool "iocost" that can be used to configure QoS IO
      cost data based on hwdb information onto suitable block devices. Also
      see https://github.com/iocost-benchmark/iocost-benchmarks.

TPM2 Support + Disk Encryption & Authentication:

    * systemd-cryptenroll/systemd-cryptsetup will now install a TPM2 SRK
      ("Storage Root Key") as first step in the TPM2, and then use that
      for binding FDE to, if TPM2 support is used. This matches
      recommendations of TCG (see
      https://trustedcomputinggroup.org/wp-content/uploads/TCG-TPM-v2.0-Provisioning-Guidance-Published-v1r1.pdf)

    * systemd-cryptenroll and other tools that take TPM2 PCR parameters now
      understand textual identifiers for these PCRs.

    * systemd-veritysetup + /etc/veritytab gained support for a series of
      new options: hash-offset=, superblock=, format=, data-block-size=,
      hash-block-size=, data-blocks=, salt=, uuid=, hash=, fec-device=,
      fec-offset=, fec-roots= to configure various aspects of a Verity
      volume.

    * systemd-cryptsetup + /etc/crypttab gained support for a new
      veracrypt-pim= option for setting the Personal Iteration Multiplier
      of veracrypt volumes.

    * systemd-integritysetup + /etc/integritytab gained support for a new
      mode= setting for controlling the dm-integrity mode (journal, bitmap,
      direct) for the volume.

    * systemd-analyze gained a new verb "pcrs" that shows the known TPM PCR
      registers, their symbolic names and current values.

systemd-tmpfiles:

    * The ACL support in tmpfiles.d/ has been updated: if an uppercase "X"
      access right is specified this is equivalent to "x" but only if the
      inode in question already has the executable bit set for at least
      some user/group. Otherwise the "x" bit will be turned off.

    * tmpfiles.d/'s C line type now understands a new modifier "+": a line
      with C+ will result in a "merge" copy, i.e. all files of the source
      tree are copied into the target tree, even if that tree already
      exists, resulting in a combined tree of files already present in the
      target tree and those copied in.

    * systemd-tmpfiles gained a new --graceful switch. If specified lines
      with unknown users/groups will silently be skipped.

systemd-notify:

    * systemd-notify gained two new options --fd= and --fdname= for sending
      arbitrary file descriptors to the service manager (while specifying an
      explicit name for it).

    * systemd-notify gained a new --exec switch, which makes it execute the
      specified command line after sending the requested messages. This is
      useful for sending out READY=1 first, and then continuing invocation
      without changing process ID, so that the tool can be nicely used
      within an ExecStart= line of a unit file that uses Type=ready.

    sd-event + sd-bus APIs:

    * The sd-event API gained a new call sd_event_source_leave_ratelimit()
      which may be used to explicitly end a rate-limit state an event
      source might be in, resetting all rate limiting counters.

    * When the sd-bus library is used to make connections to AF_UNIX D-Bus
      sockets, it will now encode the "description" set via
      sd_bus_set_description() into the source socket address. It will also
      look for this information when accepting a connection. This is useful
      to track individual D-Bus connections on a D-Bus broker for debug
      purposes.

systemd-resolved:

    * systemd-resolved gained a new resolved.conf setting
      StateRetentionSec= which may be used to retain cached DNS records
      even after their nominal TTL, and use them in case upstream DNS
      servers cannot be reached. This can be sued to make name resolution
      more resilient in case of network problems.

    * resolvectl gained a new verb "show-cache" to show the current cache
      contents of systemd-resolved. This verb communicates with the
      systemd-resolved daemon and requires privileges.

Other:

    * Meson >= 0.60.0 is now required to build systemd.

    * The default keymap to apply may now be chosen at build-time via the
      new -Ddefault-keymap= meson option.

    * Most of systemd's long-running services now have a generic handler of
      the SIGRTMIN+18 signal handler which executes various operations
      depending on the sigqueue() parameter sent along. For example, values
      0x100…0x107 allow changing the maximum log level of such
      services. 0x200…0x203 allow changing the log target of such
      services. 0x300 make the services trim their memory similarly to the
      automatic PSI-triggered action, see above. 0x301 make the services
      output their malloc_info() data to the logs.

    * machinectl gained new "edit" and "cat" verbs for editing .nspawn
      files, inspired by systemctl's verbs of the same name which edit unit
      files. Similarly, networkctl gained the same verbs for editing
      .network, .netdev, .link files.

    * A new syscall filter group "@sandbox" has been added that contains
      syscalls for sandboxing system calls such as those for seccomp and
      Landlock.

    * New documentation has been added:

      https://systemd.io/COREDUMP
      https://systemd.io/MEMORY_PRESSURE
      smbios-type-11(7)

    * systemd-firstboot gained a new --reset option. If specified, the
      settings in /etc/ it knows how to initialize are reset.

    * systemd-sysext is now a multi-call binary and is also installed under
      the systemd-confext alias name (via a symlink). When invoked that way
      it will operate on /etc/ instead of /usr/ + /opt/. It thus becomes a
      powerful, atomic, secure configuration management of sorts, that
      locally can merge configuration from multiple confext configuration
      images into a single immutable tree.

    * The --network-macvlan=, --network-ipvlan=, --network-interface=
      switches of systemd-nspawn may now optionally take the intended
      network interface inside the container.

    * All our programs will now send an sd_notify() message with their exit
      status in the EXIT_STATUS= field when exiting, using the usual
      protocol, including PID 1. This is useful for VMMs and container
      managers to collect an exit status from a system as it shuts down, as
      set via "systemctl exit …". This is particularly useful in test cases
      and similar, as invocations via a VM can now nicely propagate an exit
      status to the host, similar to local processes.

    * systemd-run gained a new switch --expand-environment=no to disable
      server-side environment variable expansion in specified command
      lines.

    * The systemd-system-update-generator has been updated to also look for
      the special flag file /etc/system-update in addition to the existing
      support for /system-update to decide whether to enter system update
      mode.

    * The /dev/hugepages/ file system is now mounted with nosuid + nodev
      mount options by default.

    * systemd-fstab-generator now understands two new kernel command line
      options systemd.mount-extra= and systemd.swap-extra=, which configure
      additional mounts or swaps in a format similar to /etc/fstab. It also
      now supports the new fstab.extra and fstab.extra.initrd credentials
      that may contain additional /etc/fstab lines to apply at boot.

    * systemd-getty-generator now understands two new credentials
      getty.ttys.container and getty.ttys.serial. These credentials may
      contain a list of TTY devices – one per line – to instantiate
      [email protected] and [email protected] on.

    * systemd-sysupdate's sysupdate.d/ drop-ins gained a new setting
      PathRelativeTo=, which can be set to "esp", "xbootldr", "boot", in
      which case the Path= setting is taken relative to the ESP or XBOOTLDR
      partitions, rather than the system's root directory /. The relevant
      directories are automatically discovered.

    * The systemd-ac-power tool gained a new switch --low, which reports
      whether the battery charge is considered "low", similar to how the
      s2h suspend logic checks this state to decide whether to enter system
      suspend or hibernation.

    * The /etc/os-release file can now have two new optional fields
      VENDOR_NAME= and VENDOR_URL= to carry information about the vendor of
      the OS.

    * When the system hibernates, information about the device and offset
      used is now written to a non-volatile EFI variable. On next boot the
      system will attempt to resume from the location indicated in this EFI
      variable. This should make hibernation a lot more robust, while
      requiring no manual configuration of the resume location.

    * The $XDG_STATE_HOME environment variable (added in more recent
      versions of the XDG basedir specification) is now honoured to
      implement the StateDirectory= setting in user services.

    * A new component "systemd-battery-check" has been added. It may run
      during early boot (usually in the initrd), and checks the battery
      charge level of the system. In case the charge level is very low the
      user is notified (graphically via Plymouth – if available – as well
      as in text form on the console), and the system is turned off after a
      10s delay. The feature can be disabled by passing
      systemd.battery-check=0 through the kernel command line.

    * The 'passwdqc' library is now supported as an alternative to the
      'pwquality' library and it can be selected at build time.

Contributors

    Contributions from: 김인수, 07416, Addison Snelling, Adrian Vovk,
    Aidan Dang, Alexander Krabler, Alfred Klomp, Anatoli Babenia,
    Andrei Stepanov, Andrew Baxter, Antonio Alvarez Feijoo,
    Arian van Putten, Arthur Shau, A S Alam,
    Asier Sarasua Garmendia, Balló György, Bastien Nocera,
    Benjamin Herrenschmidt, Benjamin Raison, Bill Peterson,
    Brad Fitzpatrick, Brett Holman, bri, Chen Qi, Chitoku,
    Christian Hesse, Christoph Anton Mitterer, Christopher Gurnee,
    Colin Walters, Cornelius Hoffmann, Cristian Rodríguez, cunshunxia,
    cvlc12, Cyril Roelandt, Daan De Meyer, Daniele Medri,
    Daniel P. Berrangé, Dan Streetman, David Edmundson,
    David Schroeder, David Tardon, dependabot[bot],
    Dimitri John Ledkov, Dmitrii Fomchenkov, Dmitry V. Levin, dmkUK,
    Dominique Martinet, don bright, drosdeck, Edson Juliano Drosdeck,
    Egor Ignatov, EinBaum, Emanuele Giuseppe Esposito, Eric Curtin,
    Evgeny Vereshchagin, Florian Klink, Franck Bui, François Rigault,
    Fran Diéguez, Franklin Yu, Frantisek Sumsal, Fuminobu TAKEYAMA,
    Gaël PORTAY, Gerd Hoffmann, Gertalitec, Gibeom Gwon,
    Gustavo Noronha Silva, Hannu Lounento, Hans de Goede,
    Haochen Tong, HATAYAMA Daisuke, Henrik Holst, Hoe Hao Cheng,
    Igor Tsiglyar, Ivan Vecera, James Hilliard, Jan Engelhardt,
    Jan Janssen, Jan Luebbe, Jan Macku, Janne Sirén, jcg, Jeidnx,
    Joan Bruguera, Joerg Behrmann, jonathanmetzman, Jordan Rome,
    Josef Miegl, Joshua Goins, Joyce, Joyce Brum, Juno Computers,
    Kai Lueke, Kevin P. Fleming, Kiran Vemula, Klaus, Klaus Zipfel,
    Lawrence Thorpe, Lennart Poettering, licunlong, Lily Foster,
    Luca Boccassi, Ludwig Nussel, Luna Jernberg, maanyagoenka,
    Maanya Goenka, Maksim Kliazovich, Malte Poll, Marko Korhonen,
    Masatake YAMATO, Mateusz Poliwczak, Matt Johnston, Miao Wang,
    Micah Abbott, Michal Koutný, Michal Sekletár, Mike Yuan, mooo,
    Morten Linderud, msizanoen, Nick Rosbrook, nikstur, Olivier Gayot,
    Omojola Joshua, Paolo Velati, Paul Barker, Pavel Borecki,
    Philipp Kern, Philip Withnall, Piotr Drąg, Quintin Hill,
    Rene Hollander, Richard Phibel, Robert Meijers, Robert Scheck,
    Roger Gammans, Romain Geissler, Ronan Pigott, Russell Harmon,
    saikat0511, Samanta Navarro, Sam James, Sam Morris,
    Simon Braunschmidt, Sjoerd Simons, Sorah Fukumori,
    Stanislaw Gruszka, Stefan Roesch, Steven Luo, Steve Ramage,
    Susant Sahani, taniishkaaa, Tanishka, Temuri Doghonadze,
    Thierry Martin, Thomas Blume, Thomas Genty, Thomas Weißschuh,
    Thorsten Kukuk, Times-Z, Tobias Powalowski, tofylion,
    Topi Miettinen, Uwe Kleine-König, Velislav Ivanov,
    Vitaly Kuznetsov, Vít Zikmund, Weblate, Will Fancher,
    William Roberts, Winterhuman, Wolfgang Müller, Xeonacid,
    Xiaotian Wu, Xi Ruoyao, Yuri Chornoivan, Yu Watanabe, Yuxiang Zhu,
    Zbigniew Jędrzejewski-Szmek, zhmylove, ZjYwMj,
    Дамјан Георгиевски, наб

    — Edinburgh, 2023-07-14

v254-rc1

9 months ago

systemd System and Service Manager

CHANGES WITH 254 in spe:

Announcements of Future Feature Removals and Incompatible Changes:

    * The next release (v255) will remove support for split-usr (/usr/
      mounted separately during late boot, instead of being mounted by the
      initrd before switching to the rootfs) and unmerged-usr (parallel
      directories /bin/ and /usr/bin/, /lib/ and /usr/lib/, …). For more
      details, see:
      https://lists.freedesktop.org/archives/systemd-devel/2022-September/048352.html

    * We intend to remove cgroup v1 support from a systemd release after
      the end of 2023. If you run services that make explicit use of
      cgroup v1 features (i.e. the "legacy hierarchy" with separate
      hierarchies for each controller), please implement compatibility with
      cgroup v2 (i.e. the "unified hierarchy") sooner rather than later.
      Most of Linux userspace has been ported over already.

    * Support for System V service scripts is now deprecated and will be
      removed in a future release. Please make sure to update your software
      *now* to include a native systemd unit file instead of a legacy
      System V script to retain compatibility with future systemd releases.

    * EnvironmentFile= now treats the line following a comment line
      trailing with escape as a non comment line. For details, see:
      https://github.com/systemd/systemd/issues/27975

    * Behaviour of sandboxing options for the per-user service manager
      units has changed. They now imply PrivateUsers=yes, which means user
      namespaces will be implicitly enabled when a sandboxing option is
      enabled in a user unit. Enabling user namespaces has the the drawback
      that system users will no longer be visible (and processes/files will
      appear as owned by 'nobody') in the user unit.

      By definition a sandboxed user unit should run with reduced
      privileges, so impact should be small. This will remove a great
      source of confusion that has been reported by users over the years,
      due to how these options require an extra setting to be manually
      enabled when used in the per-user service manager, which is not
      needed in the system service manager. For more details, see:
      https://lists.freedesktop.org/archives/systemd-devel/2022-December/048682.html

Security Relevant Changes:

    * pam_systemd will now by default pass the CAP_WAKE_ALARM ambient
      process capability to invoked session processes of regular users on
      local seats (as well as to systemd --user), unless configured
      otherwise via data from JSON user records, or via the PAM module's
      parameter list. This is useful in order allow desktop tools such as
      GNOME's Alarm Clock application to set a timer for
      CLOCK_REALTIME_ALARM that wakes up the system when it elapses. A
      per-user service unit file may thus use AmbientCapability= to pass
      the capability to invoked processes. Note that this capability is
      relatively narrow in focus (in particular compared to other process
      capabilities such as CAP_SYS_ADMIN) and we already — by default —
      permit more impactful operations such as system suspend to local
      users.

Service Manager:

    * "Startup" memory settings are now supported. Previously IO and CPU
      settings were already supported via StartupCPUWeight= and similar.
      The same logic has been added for the various per-unit memory
      settings StartupMemoryMax= and related.

    * The service manager gained support for enqueuing POSIX signals to
      services that carry an additional integer value, exposing the
      sigqueue() system call. This is accessible via new D-Bus calls
      org.freedesktop.systemd1.Manager.QueueSignalUnit() and
      org.freedesktop.systemd1.Unit.QueueSignal(), as well as in systemctl
      via the new --kill-value= option.

    * systemctl gained a new "list-paths" verb, which shows all currently
      active .path units, similarly to how "systemctl list-timers" shows
      active timers, and "systemctl list-sockets" shows active sockets.

    * systemctl gained a new --when= switch which is honoured by the various
      forms of shutdown (i.e. reboot, kexec, poweroff, halt) and allows
      scheduling these operations by time, similar in fashion to how this
      has been supported by SysV shutdown.

    * If MemoryDenyWriteExecute= is enabled for a service and the kernel
      supports the new PR_SET_MDWE prctl() call, it is used instead of the
      seccomp()-based system call filter to achieve the same effect.

    * A new set of kernel command line options is now understood:
      systemd.tty.term.<name>=, systemd.tty.rows.<name>=,
      systemd.tty.columns.<name>= allow configuring the TTY type and
      dimensions for the tty specified via <name>. When systemd invokes a
      service on a tty (via TTYName=) it will look for these and configure
      the TTY accordingly. This is particularly useful in VM environments
      to propagate host terminal settings into the appropriate TTYs of the
      guest.

    * A new RootEphemeral= setting is now understood in service units. It
      takes a boolean argument. If enabled for services that use RootImage=
      or RootDirectory= an ephemeral copy of the disk image or directory
      tree is made when the service is started. It is removed automatically
      when the service is stopped. That ephemeral copy is made using
      btrfs/xfs reflinks or btrfs snaphots, if available.

    * The service activation logic gained new settings RestartSteps= and
      RestartMaxDelaySec= which allow exponentially-growing restart
      intervals for Restart=.

    * The service activation logic gained a new setting RestartMode= which
      can be set to 'direct' to skip the inactive/failed states when
      restarting, so that dependent units are not notified until the service
      converges to a final (successful or failed) state. For example, this
      means that OnSuccess=/OnFailure= units will not be triggered until the
      service state has converged.

    * PID 1 will now automatically load the virtio_console kernel module
      during early initialization if running in a suitable VM. This is done
      so that early-boot logging can be written to the console if available.

    * Similarly, virtio-vsock support is loaded early in suitable VM
      environments. PID 1 will send sd_notify() notifications via AF_VSOCK
      to the VMM if configured, thus loading this early is beneficial.

    * A new verb "fdstore" has been added to systemd-analyze to show the
      current contents of the file descriptor store of a unit. This is
      backed by a new D-Bus call DumpUnitFileDescriptorStore() provided by
      the service manager.

    * The service manager will now set a new $FDSTORE environment variable
      when invoking processes for services that have the file descriptor
      store enabled.

    * A new service option FileDescriptorStorePreserve= has been added that
      allows tuning the life-cycle of the per-service file descriptor
      store. If set to "yes", the entries in the fd store are retained even
      after the service has been fully stopped.

    * The "systemctl clean" command may now be used to clear the fdstore of
      a service.

    * Unit *.preset files gained a new directive "ignore", in addition to
      the existing "enable" and "disable". As the name suggests, matching
      units are left unchanged, i.e. neither enabled nor disabled.

    * Service units gained a new setting DelegateSubgroup=. It takes the
      name of a sub-cgroup to place any processes the service manager forks
      off in. Previously, the service manager would place all service
      processes directly in the top-level cgroup it created for the
      service. This usually meant that main process in a service with
      delegation enabled would first have to create a subgroup and move
      itself down into it, in order to not conflict with the "no processes
      in inner cgroups" rule of cgroup v2. With this option, this step is
      now handled by PID 1.

    * The service manager will now look for .upholds/ directories,
      similarly to the existing support for .wants/ and .requires/
      directories. Symlinks in this directory result in Upholds=
      dependencies.

      The [Install] section of unit files gained support for a new
      UpheldBy= directive to generate .upholds/ symlinks automatically when
      a unit is enabled.

    * The service manager now supports a new kernel command line option
      systemd.default_device_timeout_sec=, which may be used to override
      the default timeout for .device units.

    * A new "soft-reboot" mechanism has been added to the service manager.
      A "soft reboot" is similar to a regular reboot, except that it
      affects userspace only: the service manager shuts down any running
      services and other units, then optionally switches into a new root
      file system (mounted to /run/nextroot/), and then passes control to a
      systemd instance in the new file system which then starts the system
      up again. The kernel is not rebooted and neither is the hardware,
      firmware or boot loader. This provides a fast, lightweight mechanism
      to quickly reset or update userspace, without the latency that a full
      system reset involves. Moreover, open file descriptors may be passed
      across the soft reboot into the new system where they will be passed
      back to the originating services. This allows pinning resources
      across the reboot, thus minimizing grey-out time further. Moreover,
      it is possible to allow specific crucial services to survive the
      reboot process, if they run off a separate root file system (i.e. use
      RootDirectory= or RootImage=, or are portable services). This new
      reboot mechanism is accessible via the new "systemctl soft-reboot"
      command.

    * A new service setting MemoryKSM= has been added to enable kernel
      same-page merging individually for services.

    * A new service setting ImportCredentials= has been added that augments
      LoadCredential= and LoadCredentialEncrypted= and searches for
      credentials to import from the system, and supports globbing.

    * A new job mode "restart-dependencies" has been added to the service
      manager (exposed via systemctl --job-mode=). It is only valid when
      used with "start" jobs, and has the effect that the "start" job will
      be propagated as "restart" jobs to currently running units that have
      a BindsTo= or Requires= dependency on the started unit.

    * A new verb "whoami" has been added to "systemctl" which determines as
      part of which unit the command is being invoked. It writes the unit
      name to standard output. If one or more PIDs are specified reports
      the unit names the processes referenced by the PIDs belong to.

    * The system and service credential logic has been improved: there's
      now a clearly defined place where system provisioning tools running
      in the initrd can place credentials that will be imported into the
      system's set of credentials during the initrd → host transition: the
      /run/credentials/@initrd/ directory. Once the credentials placed
      there are imported into the system credential set they are deleted
      from this directory, and the directory itself is deleted afterwards
      too.

    * A new kernel command line option systemd.set_credential_binary= has
      been added, that is similar to the pre-existing
      systemd.set_credential= but accepts arbitrary binary credential data,
      encoded in Base64. Note that the kernel command line is not a
      recommend way to transfer credentials into a system, since it is
      world-readable from userspace.

    * The default machine ID to use may now be configured via the
      system.machine_id system credential. It will only be used if no
      machine ID was set yet on the host.

    * On Linux kernel 6.4 and newer system and service credentials will now
      be placed in a tmpfs instance that has the "noswap" mount option
      set. Previously, a "ramfs" instance was used. By switching to tmpfs
      ACL support and overall size limits can now be enforced, without
      compromising on security, as the memory is never paged out either
      way.

    * The service manager now can detect when it is running in a
      'Confidential Virtual Machine', and a corresponding 'cvm' value is now
      accepted by ConditionSecurity= for units that want to conditionalize
      themselves on this. systemd-detect-virt gained new 'cvm' and
      '--list-cvm' switches to respectively perform the detection or list
      all known flavours of confidential VM, depending on the vendor. The
      manager will publish a 'ConfidentialVirtualization' D-Bus property,
      and will also set a SYSTEMD_CONFIDENTIAL_VIRTUALIZATION= environment
      variable for unit generators. Finally, udev rules can match on a new
      'cvm' key that will be set when in a confidential VM.

Journal:

    * The sd-journal API gained a new call sd_journal_get_seqnum() to
      retrieve the current log record's sequence number and sequence number
      ID, which allows applications to order records the same way as
      journal does internally. The sequence number is now also exported in
      the JSON and "export" output of the journal.

    * journalctl gained a new switch --truncate-newline. If specified
      multi-line log records will be truncated at the first newline,
      i.e. only the first line of each log message will be shown.

    * systemd-journal-upload gained support for --namespace=, similar to
      the switch of the same name of journalctl.

systemd-repart:

    * systemd-repart's drop-in files gained a new ExcludeFiles= option which
      may be used to exclude certain files from the effect of CopyFiles=.

    * systemd-repart's Verity support now implements the Minimize= setting
      to minimize the size of the resulting partition.

    * systemd-repart gained a new --offline= switch, which may be used to
      control whether images shall be built "online" or "offline",
      i.e. whether to make use of kernel facilities such as loopback block
      devices and device mapper or not.

    * If systemd-repart is told to populate a newly created ESP or XBOOTLDR
      partition with some files, it will now default to VFAT rather than
      ext4.

    * systemd-repart gained a new --architecture= switch. If specified, the
      per-architecture GPT partition types (i.e. the root and /usr/
      partitions) configured in the partition drop-in files are
      automatically adjusted to match the specified CPU architecture, in
      order to simplify cross-architecture DDI building.

systemd-boot, systemd-stub, ukify, bootctl, kernel-install:

    * bootctl gained a new switch --print-root-device/-R that prints the
      block device the root file system is backed by. If specified twice,
      it returns the whole disk block device (as opposed to partition block
      device) the root file system is on. It's useful for invocations such
      as "cfdisk $(bootctl -RR)" to quickly show the partition table of the
      running OS.

    * systemd-stub will now look for the SMBIOS Type 1 field
      "io.systemd.stub.kernel-cmdline-extra" and append its value to the
      kernel command line it invokes. This is useful for VMMs such as qemu
      to pass additional kernel command lines into the system even when
      booting via full UEFI. The contents of the field are measured into
      TPM PCR 12.

    * The KERNEL_INSTALL_LAYOUT= setting for kernel-install gained a new
      value "auto". With this value, a kernel will be automatically
      analyzed, and if it qualifies as UKI, it will be installed as if the
      setting was to set to "uki", otherwise as "bls".

    * systemd-stub can now optionally load UEFI PE "add-on" images that may
      contain additional kernel command line information. These "add-ons"
      superficially look like a regular UEFI executable, and are expected
      to be signed via SecureBoot/shim. However, they do not actually
      contain code, but instead a subset of the PE sections that UKIs
      support. They are supposed to provide a way to extend UKIs with
      additional resources in a secure and authenticated way. Currently,
      only the .cmdline PE section may be used in add-ons, in which case
      any specified string is appended to the command line embedded into
      the UKI itself. A new 'addon<EFI-ARCH>.efi.stub' is now provided that
      can be used to trivially create addons, via 'ukify' or 'objcopy'. In
      the future we expect other sections to be made extensible like this as
      well.

    * ukify has been updated to allow building these UEFI PE "add-on"
      images, using the new 'addon<EFI-ARCH>.efi.stub'.

    * ukify gained a new "genkey" verb for generating a set of of key pairs
      to sign UKIs and their PCR data with.

    * ukify now accepts SBAT information to place in the .sbat PE section
      of UKIs and addons. If a UKI is built the SBAT information from the
      inner kernel is merged with any SBAT information associated with
      systemd-stub and the SBAT data specified on the ukify command line.

    * The kernel-install script has been rewritten in C, and reuses much of
      the infrastructure of existing tools such as bootctl. It also gained
      --esp-path= and --boot-path= options to override the path to the ESP,
      and the $BOOT partition. Options --make-entry-directory= and
      --entry-token= have been added as well, similar to bootctl's options
      of the same name.

    * A new kernel-install plugin 60-ukify has been added which will
      combine kernel/initrd locally into a UKI and optionally sign them
      with a local key. This may be used to switch to UKI mode even on
      systems where a local kernel or initrd is used. (Typically UKIs are
      built and signed by the vendor.)

    * The ukify tool now supports "pesign" in addition to the pre-existing
      "sbsign" for signing UKIs.

    * systemd-measure and systemd-stub now look for the .uname PE section
      that should contain the kernel's "uname -r" string.

    * systemd-measure and ukify now calculate expected PCR hashes for a UKI
      "offline", i.e. without access to a TPM (physical or
      software-emulated).

Memory Pressure & Control:

    * The sd-event API gained new calls sd_event_add_memory_pressure(),
      sd_event_source_set_memory_pressure_type(),
      sd_event_source_set_memory_pressure_period() to create and configure
      an event source that is called whenever the OS signals memory
      pressure. Another call sd_event_trim_memory() is provided that
      compacts the process' memory use by releasing allocated but unused
      malloc() memory back to the kernel. Services can also provide their
      own custom callback to do memory trimming. This should improve system
      behaviour under memory pressure, as on Linux traditionally provided
      no mechanism to return process memory back to the kernel if the
      kernel was under memory pressure. This makes use of the kernel's PSI
      interface. Most long-running services in systemd have been hooked up
      with this, and in particular systems with low memory should benefit
      from this.

    * Service units gained new settings MemoryPressureWatch= and
      MemoryPressureThresholdSec= to configure the PSI memory pressure
      logic individually. If these options are used, the
      $MEMORY_PRESSURE_WATCH and $MEMORY_PRESSURE_WRITE environment
      variables will be set for the invoked processes to inform them about
      the requested memory pressure behaviour. (This is used by the
      aforementioned sd-events API additions, if set.)

    * systemd-analyze gained a new "malloc" verb that shows the output
      generated by glibc's malloc_info() on services that support it. Right
      now, only the service manager has been updated accordingly. This
      call requires privileges.

User & Session Management:

    * The sd-login API gained a new call sd_session_get_username() to
      return the user name of the owner of a login session. It also gained
      a new call sd_session_get_start_time() to retrieve the time the login
      session started. A new call sd_session_get_leader() has been added to
      return the PID of the "leader" process of a session. A new call
      sd_uid_get_login_time() returns the time since the specified user has
      most recently been continuously logged in with at least one session.

    * JSON user records gained a new set of fields capabilityAmbientSet and
      capabilityBoundingSet which contain a list of POSIX capabilities to
      set for the logged in users in the ambient and bounding sets,
      respectively. homectl gained the ability to configure these two sets
      for users via --capability-bounding-set=/--capability-ambient-set=.

    * pam_systemd learnt two new module options
      default-capability-bounding-set= and default-capability-ambient-set=,
      which configure the default bounding sets for users as they are
      logging in, if the JSON user record doesn't specify this explicitly
      (see above). The built-in default for the ambient set now contains
      the CAP_WAKE_ALARM, thus allowing regular users who may log in
      locally to resume from a system suspend via a timer.

    * The Session D-Bus objects systemd-logind gained a new SetTTY() method
      call to update the TTY of a session after it has been allocated. This
      is useful for SSH sessions which are typically allocated first, and
      for which a TTY is added later.

    * The sd-login API gained a new call sd_pid_notifyf_with_fds() which
      combines the various other sd_pid_notify() flavours into one: takes a
      format string, an overriding PID, and a set of file descriptors to
      send. It also gained a new call sd_pid_notify_barrier() call which is
      equivalent to sd_notify_barrier() but allows the originating PID to
      be specified.

    * "loginctl list-users" and "loginctl list-sessions" will now show the
      state of each logged in user/session in their tabular output. It will
      also show the current idle state of sessions.

DDIs:

    * systemd-dissect will now show the intended CPU architecture of an
      inspected DDI.

    * systemd-dissect will now install itself as mount helper for the "ddi"
      pseudo-file system type. This means you may now mount DDIs directly
      via /bin/mount or /etc/fstab, making full use of embedded Verity
      information and all other DDI features.

      Example: mount -t ddi myimage.raw /some/where

    * The systemd-dissect tool gained the new switches --attach/--detach to
      attach/detach a DDI to a loopback block device without mounting it.
      It will automatically derive the right sector size from the image
      and set up Verity and similar, but not mount the file systems in it.

    * When systemd-gpt-auto-generator or the DDI mounting logic mount an
      ESP or XBOOTLDR partition the MS_NOSYMFOLLOW mount option is now
      implied. Given that these file systems are typically untrusted, this
      should make mounting them automatically have less of a security
      impact.

    * All tools that parse DDIs (such as systemd-nspawn, systemd-dissect,
      systemd-tmpfiles, …) now understand a new switch --image-policy= which
      takes a string encoding image dissection policy. With this mechanism
      automatic discovery and use of specific partition types and the
      cryptographic requirements on the partitions (Verity, LUKS, …) can be
      restricted, permitting better control of the exposed attack surfaces
      when mounting disk images. systemd-gpt-auto-generator will honour such
      an image policy too, configurable via the systemd.image_policy= kernel
      command line option. Unit files gained the RootImagePolicy=,
      MountImagePolicy= and ExtensionImagePolicy= to configure the same for
      disk images a service runs off.

    * systemd-analyze gained a new verb "image-policy" to validate and
      parse image policy strings.

    * systemd-dissect gained support for a new --validate switch to
      superficially validate DDI structure, and check whether a specific
      image policy allows the DDI.

    * systemd-dissect gained support for a new --mtree-hash switch to
      optionally disable calculating mtree hashes, which can be slow on
      large images.

    * systemd-dissect --copy-to, --copy-from, --list and --mtree switches
      are now able to operate on directories too, other than images.

Network Management:

    * networkd's GENEVE support as gained a new .network option
      InheritInnerProtocol=.

    * The [Tunnel] section in .netdev files has gained a new setting
      IgnoreDontFragment for controlling the IPv4 "DF" flag of datagrams.

    * A new global IPv6PrivacyExtensions= setting has been added that
      selects the default value of the per-network setting of the same
      name.

    * The predictable network interface naming logic will now include
      SR-IOV-R "representor" information in network interface names.

    * The DHCPv4 + DHCPv6 + IPv6 RA logic in networkd gained support for
      the RFC8910 captive portal option.

Device Management:

    * udevadm gained the new "verify" verb for validating udev rules files
      offline.

    * udev will now create symlinks to loopback block devices in the
      /dev/loop/by-ref/ directory that are based on the .lo_file_name
      string field selected during allocation. The systemd-dissect tool and
      the util-linux losetup command now supports a complementing new
      switch --loop-ref= for selecting the string. This means a loopback
      block device may now be allocated under a caller-chosen reference and
      can subsequently be referenced by that without first having to look
      up the block device name the caller ended up with.

    * udev also creates symlinks to loopback block devices in the
      /dev/loop/by-ref/ directory based on the .st_dev/st_ino fields of the
      inode attached to the loopback block device. This means that attaching
      a file to a loopback device will implicitly make a handle available to
      be found via that file's inode information.

    * udev gained a new tool "iocost" that can be used to configure QoS IO
      cost data based on hwdb information onto suitable block devices. Also
      see https://github.com/iocost-benchmark/iocost-benchmarks.

TPM2 Support + Disk Encryption & Authentication:

    * systemd-cryptenroll/systemd-cryptsetup will now install a TPM2 SRK
      ("Storage Root Key") as first step in the TPM2, and then use that
      for binding FDE to, if TPM2 support is used. This matches
      recommendations of TCG (see
      https://trustedcomputinggroup.org/wp-content/uploads/TCG-TPM-v2.0-Provisioning-Guidance-Published-v1r1.pdf)

    * systemd-cryptenroll and other tools that take TPM2 PCR parameters now
      understand textual identifiers for these PCRs.

    * systemd-veritysetup + /etc/veritytab gained support for a series of
      new options: hash-offset=, superblock=, format=, data-block-size=,
      hash-block-size=, data-blocks=, salt=, uuid=, hash=, fec-device=,
      fec-offset=, fec-roots= to configure various aspects of a Verity
      volume.

    * systemd-cryptsetup + /etc/crypttab gained support for a new
      veracrypt-pim= option for setting the Personal Iteration Multiplier
      of veracrypt volumes.

    * systemd-integritysetup + /etc/integritytab gained support for a new
      mode= setting for controlling the dm-integrity mode (journal, bitmap,
      direct) for the volume.

    * systemd-analyze gained a new verb "pcrs" that shows the known TPM PCR
      registers, their symbolic names and current values.

systemd-tmpfiles:

    * The ACL support in tmpfiles.d/ has been updated: if an uppercase "X"
      access right is specified this is equivalent to "x" but only if the
      inode in question already has the executable bit set for at least
      some user/group. Otherwise the "x" bit will be turned off.

    * tmpfiles.d/'s C line type now understands a new modifier "+": a line
      with C+ will result in a "merge" copy, i.e. all files of the source
      tree are copied into the target tree, even if that tree already
      exists, resulting in a combined tree of files already present in the
      target tree and those copied in.

    * systemd-tmpfiles gained a new --graceful switch. If specified lines
      with unknown users/groups will silently be skipped.

systemd-notify:

    * systemd-notify gained two new options --fd= and --fdname= for sending
      arbitrary file descriptors to the service manager (while specifying an
      explicit name for it).

    * systemd-notify gained a new --exec switch, which makes it execute the
      specified command line after sending the requested messages. This is
      useful for sending out READY=1 first, and then continuing invocation
      without changing process ID, so that the tool can be nicely used
      within an ExecStart= line of a unit file that uses Type=ready.

sd-event + sd-bus APIs:

    * The sd-event API gained a new call sd_event_source_leave_ratelimit()
      which may be used to explicitly end a rate-limit state an event
      source might be in, resetting all rate limiting counters.

    * When the sd-bus library is used to make connections to AF_UNIX D-Bus
      sockets, it will now encode the "description" set via
      sd_bus_set_description() into the source socket address. It will also
      look for this information when accepting a connection. This is useful
      to track individual D-Bus connections on a D-Bus broker for debug
      purposes.

systemd-resolved:

    * systemd-resolved gained a new resolved.conf setting
      StateRetentionSec= which may be used to retain cached DNS records
      even after their nominal TTL, and use them in case upstream DNS
      servers cannot be reached. This can be sued to make name resolution
      more resilient in case of network problems.

    * resolvectl gained a new verb "show-cache" to show the current cache
      contents of systemd-resolved. This verb comunicates with the
      systemd-resolved daemon and requires privileges.

Other:

    * The default keymap to apply may now be chosen at build-time via the
      new -Ddefault-keymap= meson option.

    * Most of systemd's long-running services now have a generic handler of
      the SIGRTMIN+18 signal handler which executes various operations
      depending on the sigqueue() parameter sent along. For example, values
      0x100…0x107 allow changing the maximum log level of such
      services. 0x200…0x203 allow changing the log target of such
      services. 0x300 make the services trim their memory similarly to the
      automatic PSI-triggered action, see above. 0x301 make the services
      output their malloc_info() data to the logs.

    * machinectl gained new "edit" and "cat" verbs for editing .nspawn
      files, inspired by systemctl's verbs of the same name which edit unit
      files. Similarly, networkctl gained the same verbs for editing
      .network, .netdev, .link files.

    * A new syscall filter group "@sandbox" has been added that contains
      syscalls for sandboxing system calls such as those for seccomp and
      Landlock.

    * New documentation has been added:

      https://systemd.io/COREDUMP
      https://systemd.io/MEMORY_PRESSURE
      smbios-type-11(7)

    * systemd-firstboot gained a new --reset option. If specified, the
      settings in /etc/ it knows how to initialize are reset.

    * systemd-sysext is now a multi-call binary and is also installed under
      the systemd-confext alias name (via a symlink). When invoked that way
      it will operate on /etc/ instead of /usr/ + /opt/. It thus becomes a
      powerful, atomic, secure configuration management of sorts, that
      locally can merge configuration from multiple confext configuration
      images into a single immutable tree.

    * The --network-macvlan=, --network-ipvlan=, --network-interface=
      switches of systemd-nspawn may now optionally take the intended
      network interface inside the container.

    * All our programs will now send an sd_notify() message with their exit
      status in the EXIT_STATUS= field when exiting, using the usual
      protocol, including PID 1. This is useful for VMMs and container
      managers to collect an exit status from a system as it shuts down, as
      set via "systemctl exit …". This is particularly useful in test cases
      and similar, as invocations via a VM can now nicely propagate an exit
      status to the host, similar to local processes.

    * systemd-run gained a new switch --expand-environment=no to disable
      server-side environment variable expansion in specified command
      lines.

    * The systemd-system-update-generator has been updated to also look for
      the special flag file /etc/system-update in addition to the existing
      support for /system-update to decide whether to enter system update
      mode.

    * The /dev/hugepages/ file system is now mounted with nosuid + nodev
      mount options by default.

    * systemd-fstab-generator now understands two new kernel command line
      options systemd.mount-extra= and systemd.swap-extra=, which configure
      additional mounts or swaps in a format similar to /etc/fstab. It also
      now supports the new fstab.extra and fstab.extra.initrd credentials
      that may contain additional /etc/fstab lines to apply at boot.

    * systemd-getty-generator now understands two new credentials
      getty.ttys.container and getty.ttys.serial. These credentials may
      contain a list of TTY devices – one per line – to instantiate
      [email protected] and [email protected] on.

    * systemd-sysupdate's sysupdate.d/ drop-ins gained a new setting
      PathRelativeTo=, which can be set to "esp", "xbootldr", "boot", in
      which case the Path= setting is taken relative to the ESP or XBOOTLDR
      partitions, rather than the system's root directory /. The relevant
      directories are automatically discovered.

    * The systemd-ac-power tool gained a new switch --low, which reports
      whether the battery charge is considered "low", similar to how the
      s2h suspend logic checks this state to decide whether to enter system
      suspend or hibernation.

    * The /etc/os-release file can now have two new optional fields
      VENDOR_NAME= and VENDOR_URL= to carry information about the vendor of
      the OS.

    * When the system hibernates, information about the device and offset
      used is now written to a non-volatile EFI variable. On next boot the
      system will attempt to resume from the location indicated in this EFI
      variable. This should make hibernation a lot more robust, while
      requiring no manual configuration of the resume location.

    * The $XDG_STATE_HOME environment variable (added in more recent
      versions of the XDG basedir specification) is now honoured to
      implement the StateDirectory= setting in user services.

    * A new component "systemd-battery-check" has been added. It may run
      during early boot (usually in the initrd), and checks the battery
      charge level of the system. In case the charge level is very low the
      user is notified (graphically via Plymouth – if available – as well
      as in text form on the console), and the system is turned off after a
      10s delay.

    * The 'passwdqc' library is now supported as an alternative to the
      'pwquality' library and it can be selected at build time.

Contributors

    Contributions from: 김인수, 07416, Addison Snelling, Adrian Vovk,
    Aidan Dang, Alexander Krabler, Alfred Klomp, Anatoli Babenia,
    Andrei Stepanov, Andrew Baxter, Antonio Alvarez Feijoo,
    Arian van Putten, Arthur Shau, A S Alam,
    Asier Sarasua Garmendia, Balló György, Bastien Nocera,
    Benjamin Herrenschmidt, Benjamin Raison, Bill Peterson,
    Brad Fitzpatrick, Brett Holman, bri, Chen Qi, Chitoku,
    Christoph Anton Mitterer, Christopher Gurnee, Colin Walters,
    Cornelius Hoffmann, Cristian Rodríguez, cunshunxia, cvlc12,
    Cyril Roelandt, Daan De Meyer, Daniele Medri,
    Daniel P. Berrangé, Dan Streetman, David Edmundson,
    David Schroeder, David Tardon, dependabot[bot],
    Dimitri John Ledkov, Dmitrii Fomchenkov, Dmitry V. Levin, dmkUK,
    Dominique Martinet, don bright, drosdeck, Edson Juliano Drosdeck,
    Egor Ignatov, EinBaum, Emanuele Giuseppe Esposito, Eric Curtin,
    Evgeny Vereshchagin, Florian Klink, Franck Bui, François Rigault,
    Fran Diéguez, Franklin Yu, Frantisek Sumsal, Gaël PORTAY,
    Gerd Hoffmann, Gertalitec, Gibeom Gwon, Gustavo Noronha Silva,
    Hannu Lounento, Hans de Goede, Haochen Tong, HATAYAMA Daisuke,
    Henrik Holst, Hoe Hao Cheng, Igor Tsiglyar, Ivan Vecera,
    James Hilliard, Jan Engelhardt, Jan Janssen, Jan Luebbe,
    Jan Macku, Janne Sirén, jcg, Jeidnx, Joan Bruguera,
    Joerg Behrmann, jonathanmetzman, Jordan Rome, Josef Miegl,
    Joshua Goins, Joyce, Joyce Brum, Juno Computers, Kai Lueke,
    Kevin P. Fleming, Kiran Vemula, Klaus, Klaus Zipfel,
    Lawrence Thorpe, Lennart Poettering, licunlong, Lily Foster,
    Luca Boccassi, Ludwig Nussel, maanyagoenka, Maksim Kliazovich,
    Malte Poll, Marko Korhonen, Masatake YAMATO, Mateusz Poliwczak,
    Matt Johnston, Miao Wang, Michal Koutný, Michal Sekletár,
    Mike Yuan, mooo, Morten Linderud, msizanoen, Nick Rosbrook, nikstur,
    Olivier Gayot, Omojola Joshua, Paolo Velati, Paul Barker,
    Philipp Kern, Philip Withnall, Piotr Drąg, Quintin Hill,
    Rene Hollander, Richard Phibel, Robert Meijers, Robert Scheck,
    Romain Geissler, Ronan Pigott, Russell Harmon, saikat0511,
    Samanta Navarro, Sam James, Sam Morris, Simon Braunschmidt,
    Sjoerd Simons, Sorah Fukumori, Stanislaw Gruszka, Stefan Roesch,
    Steven Luo, Steve Ramage, taniishkaaa, Tanishka, Thierry Martin,
    Thomas Blume, Thomas Genty, Thomas Weißschuh, Thorsten Kukuk,
    Times-Z, Tobias Powalowski, tofylion, Topi Miettinen,
    Uwe Kleine-König, Velislav Ivanov, Vitaly Kuznetsov, Vít Zikmund,
    Will Fancher, William Roberts, Winterhuman, Wolfgang Müller,
    Xiaotian Wu, Xi Ruoyao, Yu Watanabe, Yuxiang Zhu,
    Zbigniew Jędrzejewski-Szmek, zhmylove, ZjYwMj,
    Дамјан Георгиевски, наб

    — Edinburgh, 2023-07-06

v253

1 year ago

systemd System and Service Manager

CHANGES WITH 253:

Announcements of Future Feature Removals and Incompatible Changes:

    * We intend to remove cgroup v1 support from systemd release after the
      end of 2023. If you run services that make explicit use of cgroup v1
      features (i.e. the "legacy hierarchy" with separate hierarchies for
      each controller), please implement compatibility with cgroup v2 (i.e.
      the "unified hierarchy") sooner rather than later. Most of Linux
      userspace has been ported over already.

    * We intend to remove support for split-usr (/usr mounted separately
      during boot) and unmerged-usr (parallel directories /bin and
      /usr/bin, /lib and /usr/lib, etc). This will happen in the second
      half of 2023, in the first release that falls into that time window.
      For more details, see:
      https://lists.freedesktop.org/archives/systemd-devel/2022-September/048352.html

    * We intend to change behaviour w.r.t. units of the per-user service
      manager and sandboxing options, so that they work without having to
      manually enable PrivateUsers= as well, which is not required for
      system units. To make this work, we will implicitly enable user
      namespaces (PrivateUsers=yes) when a sandboxing option is enabled in a
      user unit. The drawback is that system users will no longer be visible
      (and appear as 'nobody') to the user unit when a sandboxing option is
      enabled. By definition a sandboxed user unit should run with reduced
      privileges, so impact should be small. This will remove a great source
      of confusion that has been reported by users over the years, due to
      how these options require an extra setting to be manually enabled when
      used in the per-user service manager, as opposed as to the system
      service manager. We plan to enable this change in the next release
      later this year. For more details, see:
      https://lists.freedesktop.org/archives/systemd-devel/2022-December/048682.html

Deprecations and incompatible changes:

    * systemctl will now warn when invoked without /proc/ mounted
      (e.g. when invoked after chroot() into an directory tree without the
      API mount points like /proc/ being set up.)  Operation in such an
      environment is not fully supported.

    * The return value of 'systemctl is-active|is-enabled|is-failed' for
      unknown units is changed: previously 1 or 3 were returned, but now 4
      (EXIT_PROGRAM_OR_SERVICES_STATUS_UNKNOWN) is used as documented.

    * 'udevadm hwdb' subcommand is deprecated and will emit a warning.
      systemd-hwdb (added in 2014) should be used instead.

    * 'bootctl --json' now outputs a single JSON array, instead of a stream
      of newline-separated JSON objects.

    * Udev rules in 60-evdev.rules have been changed to load hwdb
      properties for all modalias patterns. Previously only the first
      matching pattern was used. This could change what properties are
      assigned if the user has more and less specific patterns that could
      match the same device, but it is expected that the change will have
      no effect for most users.

    * systemd-networkd-wait-online exits successfully when all interfaces
      are ready or unmanaged. Previously, if neither '--any' nor
      '--interface=' options were used, at least one interface had to be in
      configured state. This change allows the case where systemd-networkd
      is enabled, but no interfaces are configured, to be handled
      gracefully. It may occur in particular when a different network
      manager is also enabled and used.

    * Some compatibility helpers were dropped: EmergencyAction= in the user
      manager, as well as measuring kernel command line into PCR 8 in
      systemd-stub, along with the -Defi-tpm-pcr-compat compile-time
      option.

    * The '-Dupdate-helper-user-timeout=' build-time option has been
      renamed to '-Dupdate-helper-user-timeout-sec=', and now takes an
      integer as parameter instead of a string.

    * The DDI image dissection logic (which backs RootImage= in service
      unit files, the --image= switch in various tools such as
      systemd-nspawn, as well as systemd-dissect) will now only mount file
      systems of types btrfs, ext4, xfs, erofs, squashfs, vfat. This list
      can be overridden via the $SYSTEMD_DISSECT_FILE_SYSTEMS environment
      variable. These file systems are fairly well supported and maintained
      in current kernels, while others are usually more niche, exotic or
      legacy and thus typically do not receive the same level of security
      support and fixes.

    * The default per-link multicast DNS mode is changed to "yes"
      (that was previously "no"). As the default global multicast DNS mode
      has been "yes" (but can be changed by the build option), now the
      multicast DNS is enabled on all links by default. You can disable the
      multicast DNS on all links by setting MulticastDNS= in resolved.conf,
      or on an interface by calling "resolvectl mdns INTERFACE no".

New components:

    * A tool 'ukify' tool to build, measure, and sign Unified Kernel Images
      (UKIs) has been added. This replaces functionality provided by
      'dracut --uefi' and extends it with automatic calculation of PE file
      offsets, insertion of signed PCR policies generated by
      systemd-measure, support for initrd concatenation, signing of the
      embedded Linux image and the combined image with sbsign, and
      heuristics to autodetect the kernel uname and verify the splash
      image.

Changes in systemd and units:

    * A new service type Type=notify-reload is defined. When such a unit is
      reloaded a UNIX process signal (typically SIGHUP) is sent to the main
      service process. The manager will then wait until it receives a
      "RELOADING=1" followed by a "READY=1" notification from the unit as
      response (via sd_notify()). Otherwise, this type is the same as
      Type=notify. A new setting ReloadSignal= may be used to change the
      signal to send from the default of SIGHUP.

      [email protected], systemd-networkd.service, systemd-udevd.service, and
      systemd-logind have been updated to this type.

    * Initrd environments which are not on a pure memory file system (e.g.
      overlayfs combination as opposed to tmpfs) are now supported. With
      this change, during the initrd → host transition ("switch root")
      systemd will erase all files of the initrd only when the initrd is
      backed by a memory file system such as tmpfs.

    * New per-unit MemoryZSwapMax= option has been added to configure
      memory.zswap.max cgroup properties (the maximum amount of zswap
      used).

    * A new LogFilterPatterns= option has been added for units. It may be
      used to specify accept/deny regular expressions for log messages
      generated by the unit, that shall be enforced by systemd-journald.
      Rejected messages are neither stored in the journal nor forwarded.
      This option may be used to suppress noisy or uninteresting messages
      from units.

    * The manager has a new
      org.freedesktop.systemd1.Manager.GetUnitByPIDFD() D-Bus method to
      query process ownership via a PIDFD, which is more resilient against
      PID recycling issues.

    * Scope units now support OOMPolicy=. Login session scopes default to
      OOMPolicy=continue, allowing login scopes to survive the OOM killer
      terminating some processes in the scope.

    * systemd-fstab-generator now supports x-systemd.makefs option for
      /sysroot/ (in the initrd).

    * The maximum rate at which daemon reloads are executed can now be
      limited with the new ReloadLimitIntervalSec=/ReloadLimitBurst=
      options. (Or the equivalent on the kernel command line:
      systemd.reload_limit_interval_sec=/systemd.reload_limit_burst=). In
      addition, systemd now logs the originating unit and PID when a reload
      request is received over D-Bus.

    * When enabling a swap device systemd will now reinitialize the device
      when the page size of the swap space does not match the page size of
      the running kernel. Note that this requires the 'swapon' utility to
      provide the '--fixpgsz' option, as implemented by util-linux, and it
      is not supported by busybox at the time of writing.

    * systemd now executes generator programs in a mount namespace
      "sandbox" with most of the file system read-only and write access
      restricted to the output directories, and with a temporary /tmp/
      mount provided. This provides a safeguard against programming errors
      in the generators, but also fixes here-docs in shells, which
      previously didn't work in early boot when /tmp/ wasn't available
      yet. (This feature has no security implications, because the code is
      still privileged and can trivially exit the sandbox.)

    * The system manager will now parse a new "vmm.notify_socket"
      system credential, which may be supplied to a VM via SMBIOS. If
      found, the manager will send a "READY=1" notification on the
      specified socket after boot is complete. This allows readiness
      notification to be sent from a VM guest to the VM host over a VSOCK
      socket.

    * The sample PAM configuration file for [email protected] now
      includes a call to pam_namespace. This puts children of [email protected]
      in the expected namespace. (Many distributions replace their file
      with something custom, so this change has limited effect.)

    * A new environment variable $SYSTEMD_DEFAULT_MOUNT_RATE_LIMIT_BURST
      can be used to override the mount units burst late limit for
      parsing '/proc/self/mountinfo', which was introduced in v249.
      Defaults to 5.

    * Drop-ins for init.scope changing control group resource limits are
      now applied, while they were previously ignored.

    * New build-time configuration options '-Ddefault-timeout-sec=' and
      '-Ddefault-user-timeout-sec=' have been added, to let distributions
      choose the default timeout for starting/stopping/aborting system and
      user units respectively.

    * Service units gained a new setting OpenFile= which may be used to
      open arbitrary files in the file system (or connect to arbitrary
      AF_UNIX sockets in the file system), and pass the open file
      descriptor to the invoked process via the usual file descriptor
      passing protocol. This is useful to give unprivileged services access
      to select files which have restrictive access modes that would
      normally not allow this. It's also useful in case RootDirectory= or
      RootImage= is used to allow access to files from the host environment
      (which is after all not visible from the service if these two options
      are used.)

Changes in udev:

    * The new net naming scheme "v253" has been introduced. In the new
      scheme, ID_NET_NAME_PATH is also set for USB devices not connected via
      a PCI bus. This extends the coverage of predictable interface names
      in some embedded systems.

      The "amba" bus path is now included in ID_NET_NAME_PATH, resulting in
      a more informative path on some embedded systems.

    * Partition block devices will now also get symlinks in
      /dev/disk/by-diskseq/<seq>-part<n>, which may be used to reference
      block device nodes via the kernel's "diskseq" value. Previously those
      symlinks were only created for the main block device.

    * A new operator '-=' is supported for SYMLINK variables. This allows
      symlinks to be unconfigured even if an earlier rule added them.

    * 'udevadm --trigger --settle' now also works for network devices
      that are being renamed.

Changes in sd-boot, bootctl, and the Boot Loader Specification:

    * systemd-boot now passes its random seed directly to the kernel's RNG
      via the LINUX_EFI_RANDOM_SEED_TABLE_GUID configuration table, which
      means the RNG gets seeded very early in boot before userspace has
      started.

    * systemd-boot will pass a disk-backed random seed – even when secure
      boot is enabled – if it can additionally get a random seed from EFI
      itself (via EFI's RNG protocol), or a prior seed in
      LINUX_EFI_RANDOM_SEED_TABLE_GUID from a preceding bootloader.

    * systemd-boot-system-token.service was renamed to
      systemd-boot-random-seed.service and extended to always save a random
      seed to ESP on every boot when a compatible boot loader is used. This
      allows a refreshed random seed to be used in the boot loader.

    * systemd-boot handles various seed inputs using a domain- and
      field-separated hashing scheme.

    * systemd-boot's 'random-seed-mode' option has been removed. A system
      token is now always required to be present for random seeds to be
      used.

    * systemd-boot now supports being loaded from other locations than the
      ESP, for example for direct kernel boot under QEMU or when embedded
      into the firmware.

    * systemd-boot now parses SMBIOS information to detect
      virtualization. This information is used to skip some warnings which
      are not useful in a VM and to conditionalize other aspects of
      behaviour.

    * systemd-boot now supports a new 'if-safe' mode that will perform UEFI
      Secure Boot automated certificate enrollment from the ESP only if it
      is considered 'safe' to do so. At the moment 'safe' means running in
      a virtual machine.

    * systemd-stub now processes random seeds in the same way as
      systemd-boot already does, in case a unified kernel image is being
      used from a different bootloader than systemd-boot, or without any
      boot load at all.

    * bootctl will now generate a system token on all EFI systems, even
      virtualized ones, and is activated in the case that the system token
      is missing from either sd-boot and sd-stub booted systems.

    * bootctl now implements two new verbs: 'kernel-identify' prints the
      type of a kernel image file, and 'kernel-inspect' provides
      information about the embedded command line and kernel version of
      UKIs.

    * bootctl now honours $KERNEL_INSTALL_CONF_ROOT with the same meaning
      as for kernel-install.

    * The JSON output of "bootctl list" will now contain two more fields:
      isDefault and isSelected are boolean fields set to true on the
      default and currently booted boot menu entries.

    * bootctl gained a new verb "unlink" for removing a boot loader entry
      type #1 file from disk in a safe and robust way.

    * bootctl also gained a new verb "cleanup" that automatically removes
      all files from the ESP's and XBOOTLDR's "entry-token" directory, that
      is not referenced anymore by any installed Type #1 boot loader
      specification entry. This is particularly useful in environments where
      a large number of entries reference the same or partly the same
      resources (for example, for snapshot-based setups).

Changes in kernel-install:

    * A new "installation layout" can be configured as layout=uki. With
      this setting, a Boot Loader Specification Type#1 entry will not be
      created.  Instead, a new kernel-install plugin 90-uki-copy.install
      will copy any .efi files from the staging area into the boot
      partition. A plugin to generate the UKI .efi file must be provided
      separately.

Changes in systemctl:

    * 'systemctl reboot' has dropped support for accepting a positional
      argument as the argument to the reboot(2) syscall. Please use the
      --reboot-argument= option instead.

    * 'systemctl disable' will now warn when called on units without
      install information. A new --no-warn option has been added that
      silences this warning.

    * New option '--drop-in=' can be used to tell 'systemctl edit' the name
      of the drop-in to edit. (Previously, 'override.conf' was always
      used.)

    * 'systemctl list-dependencies' now respects --type= and --state=.

    * 'systemctl kexec' now supports XEN VMM environments.

    * 'systemctl edit' will now tell the invoked editor to jump into the
      first line with actual unit file data, skipping over synthesized
      comments.
    * The [DHCPv4] section in .network file gained new SocketPriority=
      setting that assigns the Linux socket priority used by the DHCPv4 raw
      socket. This may be used in conjunction with the
      EgressQOSMaps=setting in [VLAN] section of .netdev file to send the
      desired ethernet 802.1Q frame priority for DHCPv4 initial
      packets. This cannot be achieved with netfilter mangle tables because
      of the raw socket bypass.

    * The [DHCPv4] and [IPv6AcceptRA] sections in .network file gained a
      new QuickAck= boolean setting that enables the TCP quick ACK mode for
      the routes configured by the acquired DHCPv4 lease or received router
      advertisements (RAs).

    * The RouteMetric= option (for DHCPv4, DHCPv6, and IPv6 advertised
      routes) now accepts three values, for high, medium, and low preference
      of the router (which can be set with the RouterPreference=) setting.

    * systemd-networkd-wait-online now supports matching via alternative
      interface names.

    * The [DHCPv6] section in .network file gained new SendRelease=
      setting which enables the DHCPv6 client to send release when
      it stops. This is the analog of the [DHCPv4] SendRelease= setting.
      It is enabled by default.

    * If the Address= setting in [Network] or [Address] sections in .network
      specified without its prefix length, then now systemd-networkd assumes
      /32 for IPv4 or /128 for IPv6 addresses.

    * networkctl shows network and link file dropins in status output.

Changes in systemd-dissect:

    * systemd-dissect gained a new option --list, to print the paths of
      all files and directories in a DDI.

    * systemd-dissect gained a new option --mtree, to generate a file
      manifest compatible with BSD mtree(5) of a DDI

    * systemd-dissect gained a new option --with, to execute a command with
      the specified DDI temporarily mounted and used as working
      directory. This is for example useful to convert a DDI to "tar"
      simply by running it within a "systemd-dissect --with" invocation.

    * systemd-dissect gained a new option --discover, to search for
      Discoverable Disk Images (DDIs) in well-known directories of the
      system. This will list machine, portable service and system extension
      disk images.

    * systemd-dissect now understands 2nd stage initrd images stored as a
      Discoverable Disk Image (DDI).

    * systemd-dissect will now display the main UUID of GPT DDIs (i.e. the
      disk UUID stored in the GPT header) among the other data it can show.

    * systemd-dissect gained a new --in-memory switch to operate on an
      in-memory copy of the specified DDI file. This is useful to access a
      DDI with write access without persisting any changes. It's also
      useful for accessing a DDI without keeping the originating file
      system busy.

    * The DDI dissection logic will now automatically detect the intended
      sector size of disk images stored in files, based on the GPT
      partition table arrangement. Loopback block devices for such DDIs
      will then be configured automatically for the right sector size. This
      is useful to make dealing with modern 4K sector size DDIs fully
      automatic. The systemd-dissect tool will now show the detected sector
      size among the other DDI information in its output.

Changes in systemd-repart:

    * systemd-repart gained new options --include-partitions= and
      --exclude-partitions= to filter operation on partitions by type UUID.
      This allows systemd-repart to be used to build images in which the
      type of one partition is set based on the contents of another
      partition (for example when the boot partition shall include a verity
      hash of the root partition).

    * systemd-repart also gained a --defer-partitions= option that is
      similar to --exclude-partitions=, but the size of the partition is
      still taken into account when sizing partitions, but without
      populating it.

    * systemd-repart gained a new --sector-size= option to specify what
      sector size should be used when an image is created.

    * systemd-repart now supports generating erofs file systems via
      CopyFiles= (a read-only file system similar to squashfs).

    * The Minimize= option was extended to accept "best" (which means the
      most minimal image possible, but may require multiple attempts) and
      "guess" (which means a reasonably small image).

    * The systemd-growfs binary now comes with a regular unit file template
      [email protected] which can be instantiated directly for any
      desired file system. (Previously, the unit was generated dynamically
      by various generators, but no regular unit file template was
      available.)

Changes in journal tools:

    * Various systemd tools will append extra fields to log messages when
      in debug mode, or when SYSTEMD_ENABLE_LOG_CONTEXT=1 is set. Currently
      this includes information about D-Bus messages when sd-bus is used,
      e.g. DBUS_SENDER=, DBUS_DESTINATION=, and DBUS_PATH=, and information
      about devices when sd-device is used, e.g. DEVNAME= and DRIVER=.
      Details of what is logged and when are subject to change.

    * The systemd-journald-audit.socket can now be disabled via the usual
      "systemctl disable" mechanism to stop collection of audit
      messages. Please note that it is not enabled statically anymore and
      must be handled by the preset/enablement logic in package
      installation scripts.

    * New options MaxUse=, KeepFree=, MaxFileSize=, and MaxFiles= can
      be used to curtail disk use by systemd-journal-remote. This is
      similar to the options supported by systemd-journald.
    * When enrolling new keys systemd-cryptenroll now supports unlocking
      via FIDO2 tokens (option --unlock-fido2-device=). Previously, a
      password was strictly required to be specified.

    * systemd-cryptsetup now supports pre-flight requests for FIDO2 tokens
      (except for tokens with user verification, UV) to identify tokens
      before authentication. Multiple FIDO2 tokens can now be enrolled at
      the same time, and systemd-cryptsetup will automatically select one
      that corresponds to one of the available LUKS key slots.

    * systemd-cryptsetup now supports new options tpm2-measure-bank= and
      tpm2-measure-pcr= in crypttab(5). These allow specifying the TPM2 PCR
      bank and number into which the volume key should be measured. This is
      automatically enabled for the encrypted root volume discovered and
      activated by systemd-gpt-auto-generator.

    * systemd-gpt-auto-generator mounts the ESP and XBOOTLDR partitions with
      "noexec,nosuid,nodev".

    * systemd-gpt-auto-generator will now honour the rootfstype= and
      rootflags= kernel command line switches for root file systems it
      discovers, to match behaviour in case an explicit root fs is
      specified via root=.

    * systemd-pcrphase gained new options --machine-id and --file-system=
      to measure the machine-id and mount point information into PCR 15. New
      service unit files systemd-pcrmachine.service and
      [email protected] have been added that invoke the tool with
      these switches during early boot.

    * systemd-pcrphase gained a --graceful switch will make it exit cleanly
      with a success exit code even if no TPM device is detected.

    * systemd-cryptenroll now stores the user-supplied PIN with a salt,
      making it harder to brute-force.

Changes in other tools:

    * systemd-homed gained support for luksPbkdfForceIterations (the
      intended number of iterations for the PBKDF operation on LUKS).

    * Environment variables $SYSTEMD_HOME_MKFS_OPTIONS_BTRFS,
      $SYSTEMD_HOME_MKFS_OPTIONS_EXT4, and $SYSTEMD_HOME_MKFS_OPTIONS_XFS
      may now be used to specify additional arguments for mkfs when
      systemd-homed formats a file system.

    * systemd-hostnamed now exports the contents of
      /sys/class/dmi/id/bios_vendor and /sys/class/dmi/id/bios_date via two
      new D-Bus properties: FirmwareVendor and FirmwareDate. This allows
      unprivileged code to access those values.

      systemd-hostnamed also exports the SUPPORT_END= field from
      os-release(5) as OperatingSystemSupportEnd. hostnamectl make uses of
      this to show the status of the installed system.

    * systemd-measure gained an --append= option to sign multiple phase
      paths with different signing keys. This allows secrets to be
      accessible only in certain parts of the boot sequence. Note that
      'ukify' provides similar functionality in a more accessible form.

    * systemd-timesyncd will now write a structured log message with
      MESSAGE_ID set to SD_MESSAGE_TIME_BUMP when it bumps the clock based
      on a on-disk timestamp, similarly to what it did when reaching
      synchronization via NTP.

    * systemd-timesyncd will now update the on-disk timestamp file on each
      boot at least once, making it more likely that the system time
      increases in subsequent boots.

    * systemd-vconsole-setup gained support for system/service credentials:
      vconsole.keymap/vconsole.keymap_toggle and
      vconsole.font/vconsole.font_map/vconsole.font_unimap are analogous
      the similarly-named options in vconsole.conf.

    * systemd-localed will now save the XKB keyboard configuration to
      /etc/vconsole.conf, and also read it from there with a higher
      preference than the /etc/X11/xorg.conf.d/00-keyboard.conf config
      file. Previously, this information was stored in the former file in
      converted form, and only in latter file in the original form. Tools
      which want to access keyboard configuration can now do so from a
      standard location.

    * systemd-resolved gained support for configuring the nameservers and
      search domains via kernel command line (nameserver=, domain=) and
      credentials (network.dns, network.search_domains).

    * systemd-resolved will now synthesize host names for the DNS stub
      addresses it supports. Specifically when "_localdnsstub" is resolved,
      127.0.0.53 is returned, and if "_localdnsproxy" is resolved
      127.0.0.54 is returned.

    * systemd-notify will now send a "RELOADING=1" notification when called
      with --reloading, and "STOPPING=1" when called with --stopping. This
      can be used to implement notifications from units where it's easier
      to call a program than to use the sd-daemon library.

    * systemd-analyze's 'plot' command can now output its information in
      JSON, controlled via the --json= switch. Also, new --table, and
      --no-legend options have been added.

    * 'machinectl enable' will now automatically enable machines.target
      unit in addition to adding the machine unit to the target.

      Similarly, 'machinectl start|stop' gained a --now option to enable or
      disable the machine unit when starting or stopping it.

    * systemd-sysusers will now create /etc/ if it is missing.

    * systemd-sleep 'HibernateDelaySec=' setting is changed back to
      pre-v252's behaviour, and a new 'SuspendEstimationSec=' setting is
      added to provide the new initial value for the new automated battery
      estimation functionality. If 'HibernateDelaySec=' is set to any value,
      the automated estimate (and thus the automated hibernation on low
      battery to avoid data loss) functionality will be disabled.

    * Default tmpfiles.d/ configuration will now automatically create
      credentials storage directory '/etc/credstore/' with the appropriate,
      secure permissions. If '/run/credstore/' exists, its permissions will
      be fixed too in case they are not correct.

Changes in libsystemd and shared code:

    * sd-bus gained new convenience functions sd_bus_emit_signal_to(),
      sd_bus_emit_signal_tov(), and sd_bus_message_new_signal_to().

    * sd-id128 functions now return -EUCLEAN (instead of -EIO) when the
      128bit ID in files such as /etc/machine-id has an invalid
      format. They also accept NULL as output parameter in more places,
      which is useful when the caller only wants to validate the inputs and
      does not need the output value.

    * sd-login gained new functions sd_pidfd_get_session(),
      sd_pidfd_get_owner_uid(), sd_pidfd_get_unit(),
      sd_pidfd_get_user_unit(), sd_pidfd_get_slice(),
      sd_pidfd_get_user_slice(), sd_pidfd_get_machine_name(), and
      sd_pidfd_get_cgroup(), that are analogous to sd_pid_get_*(),
      but accept a PIDFD instead of a PID.

    * sd-path (and systemd-path) now export four new paths:
      SD_PATH_SYSTEMD_SYSTEM_ENVIRONMENT_GENERATOR,
      SD_PATH_SYSTEMD_USER_ENVIRONMENT_GENERATOR,
      SD_PATH_SYSTEMD_SEARCH_SYSTEM_ENVIRONMENT_GENERATOR, and
      SD_PATH_SYSTEMD_SEARCH_USER_ENVIRONMENT_GENERATOR,

    * sd_notify() now supports AF_VSOCK as transport for notification
      messages (in addition to the existing AF_UNIX support). This is
      enabled if $NOTIFY_SOCKET is set in a "vsock:CID:port" format.

    * Detection of chroot() environments now works if /proc/ is not
      mounted.  This affects systemd-detect-virt --chroot, but also means
      that systemd tools will silently skip various operations in such an
      environment.

    * "Lockheed Martin Hardened Security for Intel Processors" (HS SRE)
      virtualization is now detected.

Changes in the build system:

    * Standalone variants of systemd-repart and systemd-shutdown may now be
      built (if -Dstandalone=true).

    * systemd-ac-power has been moved from /usr/lib/ to /usr/bin/, to, for
      example, allow scripts to conditionalize execution on AC power
      supply.

    * The libp11kit library is now loaded through dlopen(3).

Changes in the documentation:

    * Specifications that are not closely tied to systemd have moved to
      https://uapi-group.org/specifications/: the Boot Loader Specification
      and the Discoverable Partitions Specification.

    Contributions from: 김인수, 13r0ck, Aidan Dang, Alberto Planas,
    Alvin Šipraga, Andika Triwidada, AndyChi, angus-p, Anita Zhang,
    Antonio Alvarez Feijoo, Arsen Arsenović, asavah, Benjamin Fogle,
    Benjamin Tissoires, berenddeschouwer, BerndAdameit,
    Bernd Steinhauser, blutch112, cake03, Callum Farmer, Carlo Teubner,
    Charles Hardin, chris, Christian Brauner, Christian Göttsche,
    Cristian Rodríguez, Daan De Meyer, Dan Streetman, DaPigGuy,
    Darrell Kavanagh, David Tardon, dependabot[bot], Dirk Su,
    Dmitry V. Levin, drosdeck, Edson Juliano Drosdeck, edupont,
    Eric DeVolder, Erik Moqvist, Evgeny Vereshchagin, Fabian Gurtner,
    Felix Riemann, Franck Bui, Frantisek Sumsal, Geert Lorang,
    Gerd Hoffmann, Gio, Hannoskaj, Hans de Goede, Hugo Carvalho,
    igo95862, Ilya Leoshkevich, Ivan Shapovalov, Jacek Migacz,
    Jade Lovelace, Jan Engelhardt, Jan Janssen, Jan Macku, January,
    Jason A. Donenfeld, jcg, Jean-Tiare Le Bigot, Jelle van der Waa,
    Jeremy Linton, Jian Zhang, Jiayi Chen, Jia Zhang, Joerg Behrmann,
    Jörg Thalheim, Joshua Goins, joshuazivkovic, Joshua Zivkovic,
    Kai-Chuan Hsieh, Khem Raj, Koba Ko, Lennart Poettering, lichao,
    Li kunyu, Luca Boccassi, Luca BRUNO, Ludwig Nussel,
    Łukasz Stelmach, Lycowolf, marcel151, Marcus Schäfer, Marek Vasut,
    Mark Laws, Michael Biebl, Michał Kotyla, Michal Koutný,
    Michal Sekletár, Mike Gilbert, Mike Yuan, MkfsSion, ml,
    msizanoen1, mvzlb, MVZ Ludwigsburg, Neil Moore, Nick Rosbrook,
    noodlejetski, Pasha Vorobyev, Peter Cai, p-fpv, Phaedrus Leeds,
    Philipp Jungkamp, Quentin Deslandes, Raul Tambre, Ray Strode,
    reuben olinsky, Richard E. van der Luit, Richard Phibel,
    Ricky Tigg, Robin Humble, rogg, Rudi Heitbaum, Sam James,
    Samuel Cabrero, Samuel Thibault, Siddhesh Poyarekar, Simon Brand,
    Space Meyer, Spindle Security, Steve Ramage, Takashi Sakamoto,
    Thomas Haller, Tonći Galić, Topi Miettinen, Torsten Hilbrich,
    Tuetuopay, uerdogan, Ulrich Ölmann, Valentin David,
    Vitaly Kuznetsov, Vito Caputo, Waltibaba, Will Fancher,
    William Roberts, wouter bolsterlee, Youfu Zhang, Yu Watanabe,
    Zbigniew Jędrzejewski-Szmek, Дамјан Георгиевски,
    наб

    — Warsaw, 2023-02-15