Pre-RFC: Systemd Hardening

hauleth · February 14, 2024, 12:52pm

Summary

Provide options for simplified hardening of systemd services.

Motivation

One of the most basic principles in information security is Principle of Least
Privilege, which states that each module should have as little access to
the system information as possible for proper functioning. However that is not
really a case with many services in Nixpkgs. Example excerpt from
systemd-analyze security ran on one of my NixOS deployments:

UNIT                                  EXPOSURE PREDICATE HAPPY
dbus.service                               9.6 UNSAFE    :-{
dhcpcd.service                             9.6 UNSAFE    :-{
emergency.service                          9.5 UNSAFE    :-{
getty@tty1.service                         9.6 UNSAFE    :-{
go-autoconfig.service                      1.2 OK        :-)
logrotate.service                          9.6 UNSAFE    :-{
miniflux.service                           1.2 OK        :-)
netdata.service                            7.2 MEDIUM    :-|
nginx.service                              1.6 OK        :-)
nix-daemon.service                         9.6 UNSAFE    :-{
nix-optimise.service                       9.6 UNSAFE    :-{
nscd.service                               8.2 EXPOSED   :-(
postgresql.service                         1.5 OK        :-)
reload-systemd-vconsole-setup.service      9.6 UNSAFE    :-{
rescue.service                             9.5 UNSAFE    :-{
soju.service                               1.3 OK        :-)
sshd.service                               9.6 UNSAFE    :-{
stalwart-mail.service                      1.5 OK        :-)
systemd-ask-password-console.service       9.4 UNSAFE    :-{
systemd-ask-password-wall.service          9.4 UNSAFE    :-{
systemd-journald.service                   4.3 OK        :-)
systemd-logind.service                     2.8 OK        :-)
systemd-oomd.service                       1.8 OK        :-)
systemd-rfkill.service                     9.4 UNSAFE    :-{
systemd-timesyncd.service                  2.1 OK        :-)
systemd-udevd.service                      7.0 MEDIUM    :-|
tailscale-nginx-auth.service               1.6 OK        :-)
tailscaled.service                         9.6 UNSAFE    :-{
ubin.service                               1.1 OK        :-)
user@0.service                             9.8 UNSAFE    :-{

As you can see, a lot of services are marked as UNSAFE with high score, even
though for sure some of them could be secured down more.

As an example we can look at nix-optimize.service which:

run as root - this is actually needed as root is the owner of /nix/store
has full access to whole disk
has full access to internet
has full access to kernel tunables, logs
can set any executable as SUID/SGID
can run any executable
etc.

It all can be checked by using systemd-analyze security nix-optimize.service.
However it do not need most of these privileges, and because it runs as root
it could cause serious damage in the system in case of exploitable bug.

However set of rules that are needed for hardening service is quite long, and
often repeats between different services. For example set of rules that I
applied for postgres.service:

# Do not allow changing personality of rhe process
LockPersonality = true;

# Limit view into `/home`, `/dev` and extra mounts
PrivateDevices = true;
ProtectHome = true;
PrivateMounts = true;

# Prevent accessing unneeded OS features
ProtectControlGroups = true;
RestrictNamespaces = true;
RestrictRealtime = true;

# Create private `/tmp` directory which lifetime is bounded to the service
PrivateTmp = true;

# Remove ability to read kernel logs and to modify kernel behaviour, even if
# attacker gets `root` privileges.
ProtectKernelLogs = true;
ProtectKernelModules = true;
ProtectKernelTunables = true;

# Do not allow clock modification
ProtectClock = true;

# Limit view into `/proc` directory to protect other processes
ProtectProc = "invisible";
ProcSubset = "pid";

# Allow creating only internet and Unix sockets
RestrictAddressFamilies = ["AF_INET" "AF_INET6" "AF_UNIX"];
# Allow listening only on `localhost`
IPAddressAllow = ["localhost"];
IPAddressDeny = ["any"];

# Do not allow executing any file outside of `/nix/store`
NoExecPaths = ["/"];
ExecPaths = ["/nix/store"];

# Postgres do not need any extra capabilities from the system
CapabilityBoundingSet = [""];

# Prevent calling some system calls which are used for advanced system
# maintenance.
SystemCallFilter = [ "@system-service" ];
# As filtering works only on x86-64, prevent process from using x86 syscall
# conventions
SystemCallArchitecture = "native";
SystemCallErrorNumber = "EPERM";

Even now that list of options is not 100% complete. Many of the services in
Nixpkgs modules do not use any of the above rules. There are in my opinion 3
main reasons for current state of things:

Lack of knowledge about these facilities among modules maintainers.
A lot of repetition of obscure systemConfig options that need to be applied
to get reasonable set of rules (sometimes with weird workarounds needed, see
CapabilityBoundingSet value).
The set of options is subtractive, which mean that each option remove
capabilities from service (and by doing so, increase security). Ideally
protection layers like these should be additive, which mean, that by default
service has minimal privileges, and then each new option allows new
behaviours.

Goal of authors is to provide options to simplify hardening their services by
providing additive aliases for above configuration.

Detailed design

The goal there is to provide systemd.services.<name>.harden options set that
would provide additive security system for services. This would allow
developers implementing modules to secure their services with as little work and
mental overhead as possible.

Initial plan is to have option harden.enable which will lock down most of the
system facilities to safe (but not paranoid) subset that can be later expanded
with further options.

Most of the options set by this configuration should be marked with mkDefault
to allow altering them manually via serviceConfig. The idea is that this
RFC should not replace usage of security options in serviceConfig, but
rather provide a base for altering them further manually and to group common
options together.

Examples and Interactions

With service:

systemd.service.foo = {
  enable = true;
  script = "…";

  harden.enable = true;
};

We would treat it the same as:

systemd.service.foo = {
  enable = true;
  script = "…";

  serviceConfig = {
    RemoveIPC = mkDefault true;
    LockPersonality = mkDefault true;
    PrivateDevices = mkDefault true;
    PrivateUsers = mkDefault true;
    PrivateMounts = mkDefault true;
    RestrictNamespaces = mkDefault true;
    RestrictRealtime = mkDefault true;
    PrivateTmp = mkDefault true;
    ProtectHostname = mkDefault true;
    ProtectHome = mkDefault true;
    ProtectControlGroups = mkDefault true;
    ProtectKernelLogs = mkDefault true;
    ProtectKernelModules = mkDefault true;
    ProtectKernelTunables = mkDefault true;
    ProtectClock = mkDefault true;
    ProtectProc = mkDefault "invisible";
    ProcSubset = "pid";
    RestrictAddressFamilies = ["AF_UNIX"];
    IPAddressAllow = ["localhost"];
    IPAddressDeny = ["any"];
    NoExecPaths = ["/"];
    ExecPaths = ["/nix/store"];
    CapabilityBoundingSet = [""];
    SystemCallFilter = [ "@system-service" ];
    SystemCallArchitecture = [ "native" ];
    SystemCallErrorNumber = mkDefault "EPERM";
  };
};

This results in quite secure service that has quite limited view into overall
system.

This define that this process:

can access to their own data under /proc
can access to common system calls (as defined by systemd) and only native system
call API, all other calls will result with EPERM error code
can execute only files found in /nix/store even if these files have executable
flag on (it will be ignored for files outside of /nix/store)
can create only Unix sockets, which functionally prohibit any internet
connectivity (Unix sockets are needed for journal integration if service uses
that, also their attack surface is substantially smaller than internet
sockets)
can listen only on localhost addresses, even if internet sockets will be enabled
can access private /tmp directory which lifetime is bound to lifetime of the service
can view into limited subset of /dev directory that will be restricted to some basic
devices (like stin)
cannot use any capabilities
cannot change any kernel behaviour nor load modules nor access kernel logs
cannot change system time
cannot view system /home directory (is denied any access to users data)
cannot access realtime kernel features
has limited access to namespacing and control groups
all kernel IPC features (like locks) will be removed when service dies (do not
apply to services running as root)

This set of options should give any service rather good grade when analyzed with
systemd-analyze security and should give more secured service.

TBD: prepare set of configuration options for bulk updates of the hardened
options.

If there is need for fine graded customisation of the security options, then
operator can still use serviceConfig options to set required values at finer
resolution. This should be preferred way for power users and
high-risk/high-power services like sshd daemon.

Drawbacks

The main problem with this design is that it is harder to add new hardening
options there in non-breaking way, because new hardening options would mean that
services that were expecting newly locked capability will need to be updated to
allow that feature again. This also brings problem of naming options and their
scope.

Alternatives

What other designs have been considered?

We can leave it as is, and then manually adjust each of the services
independently, but that imposes a lot of maintenance and mental burden on the
maintainers of the modules.

What is the impact of not doing this?

Just adding this module will have negligible impact for now.

Prior art

Unresolved questions

Prepare set of options and their values that can be used to expand set of
capabilities of services
How strict should be default locking rules.

Future work

Ideally in the future I would like to see harden.enable to be by default on,
and force users to define needed capabilities manually. That would be huge
breaking change, but this would result in more secure system services.

jtojnar · February 14, 2024, 2:57pm

Another alternative would be working with upstream projects so that all distros can benefit from the shared work. Though maybe it would be nice to work with systemd first so that we do not have to add such a long lists everywhere (example of upstream project add various hardenings to the systemd service by jsegitz · Pull Request #153 · hughsie/colord · GitHub).

Or as a compromise, hardening downstream as a short-term goal but aiming to upstream it eventually.

rhendric · February 14, 2024, 3:33pm

Could there also be a systemd.hardenByDefault boolean that users could set to true in lieu of setting all the individual harden.enable options? That doesn’t seem like it would be that much of an increase in scope but it would be a big factor in how likely I am to take this for a spin while it’s in development.

andir · February 14, 2024, 4:04pm

Just to provide some prior exploratory work in this area: nixos/systemd: introduce hardening profiles for services · andir/nixpkgs@4d9c0cf · GitHub (read the commit message)

The gist was to have a versioned set of hardening defaults or multiple profiles that can be applied. The reasoning for that being upgrade-ability instead of a huge big-bang upgrade that is essentially harder to test than to do incremental upgrades whenever we gain new hardening capabilities.

I bounced the idea off several people back then. Nobody was really opposed. Some thought a global default would be better. I still believe a versioned / profile based approach will be best to provide (downstream and first party modules) a good hardening story without permanently stopping on your own toes.

Unfortunately I mostly lost interest in contributing to nixpkgs due to reason unrelated to this topic…

hauleth · February 14, 2024, 4:38pm

That could be problematic as that would break all not-yet-migrated services definitions with weird messages like couldn't open socket 127.0.0.1:2137 or similar. So I would be careful with even providing such option, as it would be really for “here be dragons” area. Maybe in future that could be a thing.

hauleth · February 14, 2024, 4:46pm

The problem there is that NixOS modules do not use upstream service definitions, but instead there are custom ones created from Nix expressions. So even upstream fixes would not help us. In addition to that, it is easier to keep strongest possible set of options locally, as we know what systemd version we are using, upstream doesn’t know that, so they may be using outdated versions of the unit definition. For example pull request you pointed out uses CapabilityBoundingSet= with block list instead of just setting it to empty allow list, that mean that any new capability will not be filtered out by default.

emilylange · February 14, 2024, 5:37pm

This is not quite true.

We have a lot of services that re-use upstream definitions by leveraging systemd.packages.

I also want to provide an example from the past regarding hardening:

Back when I helped to maintain nixos/gitea, instance-wide gpg signing was broken due to too aggressive SysCallFilter hardening.

Upstream provided no hardening in their provided systemd unit at all.
So all of this was due to hardening in our nixos module definition.

We had tests back then and gitea itself (within our test config) seemed to work fine.

No regressions, no nothing.
Simply because the tests did not cover everything.

Because, well, covering everything in our tests is sadly unrealistic.
Eventually, some user reported that issue and a non-working fix got merged.

Until nixos/gitea: fix commit signing (`gpg`) core dump, add nixos test by emilylange · Pull Request #219073 · NixOS/nixpkgs · GitHub, which also made changes to that specific VM test, so this won’t happen again.

What I want to get across is:

Hardening comes with lots of risks, in the way, that software breaks in extremely subtle ways, or only in specific configs (but perfectly supported by upstream) that maintainers are unaware of.

Getting systemd-analyze security to return a happy face is one thing, and fairly easy, but making sure that edge-cases that, again, one might not be aware of to begin with, is a huge risk in my eyes.

Additionally, a package bump alone, might come with the need to relax hardening at any time.
We would have to revisit and confirm each hardening option each and every time. Not matter how trivial a PR might seem.

This further increases workload and overhead.

Don’t get me wrong, hardening is important, and I would love to see it more often.

But I don’t feel like this is time for it right now.

A while ago a (seemingly proprietary) tool went around, that tried its best to (statically) analyze compiled code to return a sane list of SysCallFilters.

I would much rather like to explore such tools first, before continuing any further with this (pre-) RFC.

claes · February 14, 2024, 5:46pm

Hardening is hard I was pondering this problem a while back, some hardening needs are common but in the end, every service is unique. I wanted to learn if there were approaches that created hardening profiles by monitoring services while running, to collect information about their normal behaviour. I found a tool SHH, Systemd Hardening Helper. See systemd hardening made easy with SHH

It illustrates another approach to hardening in general, using strace to monitor system calls under normal use, and then suggest a hardening configuration.

hauleth · February 14, 2024, 7:29pm

I feel that there will never be a time for that. It is something that either will be done at some point or it will not be done ever. And now is as good as any moment in the future.

Unfortunately, the only way to find these edge cases in my opinion is to go to the edge yourself. There is no other realistic way to do so. We can use some tooling to do basic analysis, but it will never be able to find all edge cases on its own.

Not hardening comes with a lot of risks as well. Exactly that is the reason why everything described in this pre-RFC is opt-in. I am fully aware that setting harden.enable = true by default would break shitload of stuff in NixOS modules, that is why by default it is off. It can be switched on by operators who are aware of the requirements. It is also done in a way, that is strict enough for most services, but relaxed enough to match expectations of most of services. Systemd’s @system-service meta-group for SystemCallFilter is quite broad, using your example, it allows @memlock calls.

So while fully hardening everything would be nice thing, it is not feasible to do in global way. Here I 100% agree with you. The goal of this RFC is to provide set of helpers to make it slightly easier to start with rather than doing absolutely nothing and leaving services to run willy nilly with absolutely no hardening, even the super basic one like ProtectKernelModules= or ProtectClock=. Adding basics of basics of hardening should not require studying man systemd.exec for most of the basic services like PostgreSQL or Soju IRC bouncer, it should be provided behind simple and readable flag. This not only will lower entry point for basic hardening, it will also allow to disable hardening when it will became as a problem.

No one ever said that it is. The point isn’t to make it easy, the point is to make it easier, that is

That is super nice tool. I need to study it further, however IMHO that is orthogonal to this proposal, as it can be used as a means to implement hardening on top of this proposal.

aanderse · February 14, 2024, 10:46pm

yes there is - as @jtojnar pointed out having a discussion with upstream projects is a good way… who would understand the requirements and limits of their software better than the people who developed it? even if upstream doesn’t want to adopt hardening changes you may propose to them it doesn’t mean they can’t offer very useful information on hardening their services in ways that won’t break it.

correct - risks that collectively linux sysadmins have lived (and still live) with for decades before systemd became prolific. sure, the landscape has changed and threats have dramatically increased… but let’s not undersell basic user privilege separation and let frowning ascii faces scare us into dramatic and uncalled for action.

overall i really support this effort but i side more with @emilylange wanting to err more on the side of caution here… systemd hardening in nixos has been a real pain for me on occasion because someone didn’t anticipate something. when i’m running a hobby box i can deal with that, but at work it really irks me

i would love if you went ahead and pushed this as an RFC where you presented some serious research on what all of our options are and compared them, how we could possibly roll them out, and how we avoid biting people with breakages. i think it would be important to discuss classifications of software in this RFC - some software is way too generic to effectively be sandboxed more than some trivial options, while others can be fully sandboxed without any concern. etc…

speaking of options… how much research have you done into the confinement options we have? this makes it relatively easy to do some fun things you might be interested in.

ehmry · March 14, 2025, 9:43am

I agree with the concept here and I don’t think there would be a long term benefit to delaying this work. I would however propose a shift in scope.

Instead of this:

systemd.service.foo = {
  enable = true;
  harden.enable = true;
};

… something like this

services.foo = {
  harden = lib.mkDefault someEnumOrAttrSet;
};

systemd.service.foo = {
  enable = true;
  serviceConfig = (lib.systemd.hardenDefaults config.services.foo.harden) // {
    # …
  };
};

Basically if we are going to define a class or classes of restrictions independently of the systemd options, then we should actually define those options in a more abstract scope.

Benefits:

The admin signals their intention before getting into the banal details of what systemd switches to flip.
NixOS can suppliment systemd hardening with other tools, like inserting wrappers between systemd and the service.
Alternative service-managers can apply hardening based on intent rather than by examining systemd options.
Warnings and assertions can be raised when a service-manager cannot provide the requested level of hardening.

Drawbacks:

More complicated to implement.
Difficult to consistently map intent with concrete options.
Layering options in multiple stages can be confusing.
Init portability threatens systemd monoculture.

Hardening in the abstract is more work but there is already funding for similar stuff.