Summary
Provide options for simplified hardening of systemd services.
Motivation
One of the most basic principles in information security is Principle of Least
Privilege, which states that each module should have as little access to
the system information as possible for proper functioning. However that is not
really a case with many services in Nixpkgs. Example excerpt from
systemd-analyze security
ran on one of my NixOS deployments:
UNIT EXPOSURE PREDICATE HAPPY
dbus.service 9.6 UNSAFE :-{
dhcpcd.service 9.6 UNSAFE :-{
emergency.service 9.5 UNSAFE :-{
getty@tty1.service 9.6 UNSAFE :-{
go-autoconfig.service 1.2 OK :-)
logrotate.service 9.6 UNSAFE :-{
miniflux.service 1.2 OK :-)
netdata.service 7.2 MEDIUM :-|
nginx.service 1.6 OK :-)
nix-daemon.service 9.6 UNSAFE :-{
nix-optimise.service 9.6 UNSAFE :-{
nscd.service 8.2 EXPOSED :-(
postgresql.service 1.5 OK :-)
reload-systemd-vconsole-setup.service 9.6 UNSAFE :-{
rescue.service 9.5 UNSAFE :-{
soju.service 1.3 OK :-)
sshd.service 9.6 UNSAFE :-{
stalwart-mail.service 1.5 OK :-)
systemd-ask-password-console.service 9.4 UNSAFE :-{
systemd-ask-password-wall.service 9.4 UNSAFE :-{
systemd-journald.service 4.3 OK :-)
systemd-logind.service 2.8 OK :-)
systemd-oomd.service 1.8 OK :-)
systemd-rfkill.service 9.4 UNSAFE :-{
systemd-timesyncd.service 2.1 OK :-)
systemd-udevd.service 7.0 MEDIUM :-|
tailscale-nginx-auth.service 1.6 OK :-)
tailscaled.service 9.6 UNSAFE :-{
ubin.service 1.1 OK :-)
user@0.service 9.8 UNSAFE :-{
As you can see, a lot of services are marked as UNSAFE
with high score, even
though for sure some of them could be secured down more.
As an example we can look at nix-optimize.service
which:
- run as
root
- this is actually needed asroot
is the owner of/nix/store
- has full access to whole disk
- has full access to internet
- has full access to kernel tunables, logs
- can set any executable as SUID/SGID
- can run any executable
- etc.
It all can be checked by using systemd-analyze security nix-optimize.service
.
However it do not need most of these privileges, and because it runs as root
it could cause serious damage in the system in case of exploitable bug.
However set of rules that are needed for hardening service is quite long, and
often repeats between different services. For example set of rules that I
applied for postgres.service
:
# Do not allow changing personality of rhe process
LockPersonality = true;
# Limit view into `/home`, `/dev` and extra mounts
PrivateDevices = true;
ProtectHome = true;
PrivateMounts = true;
# Prevent accessing unneeded OS features
ProtectControlGroups = true;
RestrictNamespaces = true;
RestrictRealtime = true;
# Create private `/tmp` directory which lifetime is bounded to the service
PrivateTmp = true;
# Remove ability to read kernel logs and to modify kernel behaviour, even if
# attacker gets `root` privileges.
ProtectKernelLogs = true;
ProtectKernelModules = true;
ProtectKernelTunables = true;
# Do not allow clock modification
ProtectClock = true;
# Limit view into `/proc` directory to protect other processes
ProtectProc = "invisible";
ProcSubset = "pid";
# Allow creating only internet and Unix sockets
RestrictAddressFamilies = ["AF_INET" "AF_INET6" "AF_UNIX"];
# Allow listening only on `localhost`
IPAddressAllow = ["localhost"];
IPAddressDeny = ["any"];
# Do not allow executing any file outside of `/nix/store`
NoExecPaths = ["/"];
ExecPaths = ["/nix/store"];
# Postgres do not need any extra capabilities from the system
CapabilityBoundingSet = [""];
# Prevent calling some system calls which are used for advanced system
# maintenance.
SystemCallFilter = [ "@system-service" ];
# As filtering works only on x86-64, prevent process from using x86 syscall
# conventions
SystemCallArchitecture = "native";
SystemCallErrorNumber = "EPERM";
Even now that list of options is not 100% complete. Many of the services in
Nixpkgs modules do not use any of the above rules. There are in my opinion 3
main reasons for current state of things:
- Lack of knowledge about these facilities among modules maintainers.
- A lot of repetition of obscure
systemConfig
options that need to be applied
to get reasonable set of rules (sometimes with weird workarounds needed, see
CapabilityBoundingSet
value). - The set of options is subtractive, which mean that each option remove
capabilities from service (and by doing so, increase security). Ideally
protection layers like these should be additive, which mean, that by default
service has minimal privileges, and then each new option allows new
behaviours.
Goal of authors is to provide options to simplify hardening their services by
providing additive aliases for above configuration.
Detailed design
The goal there is to provide systemd.services.<name>.harden
options set that
would provide additive security system for services. This would allow
developers implementing modules to secure their services with as little work and
mental overhead as possible.
Initial plan is to have option harden.enable
which will lock down most of the
system facilities to safe (but not paranoid) subset that can be later expanded
with further options.
Most of the options set by this configuration should be marked with mkDefault
to allow altering them manually via serviceConfig
. The idea is that this
RFC should not replace usage of security options in serviceConfig
, but
rather provide a base for altering them further manually and to group common
options together.
Examples and Interactions
With service:
systemd.service.foo = {
enable = true;
script = "…";
harden.enable = true;
};
We would treat it the same as:
systemd.service.foo = {
enable = true;
script = "…";
serviceConfig = {
RemoveIPC = mkDefault true;
LockPersonality = mkDefault true;
PrivateDevices = mkDefault true;
PrivateUsers = mkDefault true;
PrivateMounts = mkDefault true;
RestrictNamespaces = mkDefault true;
RestrictRealtime = mkDefault true;
PrivateTmp = mkDefault true;
ProtectHostname = mkDefault true;
ProtectHome = mkDefault true;
ProtectControlGroups = mkDefault true;
ProtectKernelLogs = mkDefault true;
ProtectKernelModules = mkDefault true;
ProtectKernelTunables = mkDefault true;
ProtectClock = mkDefault true;
ProtectProc = mkDefault "invisible";
ProcSubset = "pid";
RestrictAddressFamilies = ["AF_UNIX"];
IPAddressAllow = ["localhost"];
IPAddressDeny = ["any"];
NoExecPaths = ["/"];
ExecPaths = ["/nix/store"];
CapabilityBoundingSet = [""];
SystemCallFilter = [ "@system-service" ];
SystemCallArchitecture = [ "native" ];
SystemCallErrorNumber = mkDefault "EPERM";
};
};
This results in quite secure service that has quite limited view into overall
system.
This define that this process:
- can access to their own data under
/proc
- can access to common system calls (as defined by systemd) and only native system
call API, all other calls will result withEPERM
error code - can execute only files found in
/nix/store
even if these files have executable
flag on (it will be ignored for files outside of/nix/store
) - can create only Unix sockets, which functionally prohibit any internet
connectivity (Unix sockets are needed for journal integration if service uses
that, also their attack surface is substantially smaller than internet
sockets) - can listen only on
localhost
addresses, even if internet sockets will be enabled - can access private
/tmp
directory which lifetime is bound to lifetime of the service - can view into limited subset of
/dev
directory that will be restricted to some basic
devices (likestin
) - cannot use any capabilities
- cannot change any kernel behaviour nor load modules nor access kernel logs
- cannot change system time
- cannot view system
/home
directory (is denied any access to users data) - cannot access realtime kernel features
- has limited access to namespacing and control groups
- all kernel IPC features (like locks) will be removed when service dies (do not
apply to services running asroot
)
This set of options should give any service rather good grade when analyzed with
systemd-analyze security
and should give more secured service.
TBD: prepare set of configuration options for bulk updates of the hardened
options.
If there is need for fine graded customisation of the security options, then
operator can still use serviceConfig
options to set required values at finer
resolution. This should be preferred way for power users and
high-risk/high-power services like sshd
daemon.
Drawbacks
The main problem with this design is that it is harder to add new hardening
options there in non-breaking way, because new hardening options would mean that
services that were expecting newly locked capability will need to be updated to
allow that feature again. This also brings problem of naming options and their
scope.
Alternatives
What other designs have been considered?
We can leave it as is, and then manually adjust each of the services
independently, but that imposes a lot of maintenance and mental burden on the
maintainers of the modules.
What is the impact of not doing this?
Just adding this module will have negligible impact for now.
Prior art
- Hardening systemd services
- Systemd Hardening
- ft: add systemd hardening helpers by hauleth · Pull Request #288418 · NixOS/nixpkgs · GitHub
Unresolved questions
- Prepare set of options and their values that can be used to expand set of
capabilities of services - How strict should be default locking rules.
Future work
Ideally in the future I would like to see harden.enable
to be by default on,
and force users to define needed capabilities manually. That would be huge
breaking change, but this would result in more secure system services.