Having a strange issue since upgrading a server from 20.03
to 21.05
. Initially I was getting read-only filesystem errors with the logs, but I found some docs that mentioned needing to use a new ReadWritePaths
option in my config as of 20.09
, so I fixed that and now enters my current issue.
First, I’ve made no serious changes to my config besides the aforementioned ReadWritePaths
addition. This setup has been working without issue for the past couple of years, and through at least one upgrade.
Second, the issue only occurs when I start nginx through systemd. If I start nginx manually then all is well and good; that is to say, I know there’s nothing wrong with the nginx.conf
that ultimately gets generated by nix on switching.
Here are the relevant details from logs, etc.:
- No errors are output when I start the service as usual
$ systemctl start nginx.service
- After starting the service, running status tells me the workers are exiting
$ systemctl status nginx.service
● nginx.service - Nginx Web Server
Loaded: loaded (/nix/store/cyy4wdgm32v67gpgh6gnhxx8197v15h8-unit-nginx.service/nginx.service; enabled; vendor preset: enabled)
Active: active (running) since Sat 2021-06-05 21:38:36 UTC; 1min 56s ago
Process: 395995 ExecStartPre=/nix/store/50difldgpz2h0qc4g4p8mbwkgy5ihzib-unit-script-nginx-pre-start/bin/nginx-pre-start (code=exited, status=0/SUCCESS)
Main PID: 395997 (nginx)
IP: 0B in, 0B out
IO: 0B read, 0B written
Tasks: 3 (limit: 2373)
Memory: 52.6M
CPU: 25.248s
CGroup: /system.slice/nginx.service
├─395997 nginx: master process /nix/store/z0rqfwaw46hl0snzaiw8wzr1sxbkjqiw-nginx-1.21.0/bin/nginx -c /nix/store/jfd309xx69g4hs6x4p8fznkjh3lfjy2q-nginx.conf
├─398832 nginx: master process /nix/store/z0rqfwaw46hl0snzaiw8wzr1sxbkjqiw-nginx-1.21.0/bin/nginx -c /nix/store/jfd309xx69g4hs6x4p8fznkjh3lfjy2q-nginx.conf
└─398833 nginx: master process /nix/store/z0rqfwaw46hl0snzaiw8wzr1sxbkjqiw-nginx-1.21.0/bin/nginx -c /nix/store/jfd309xx69g4hs6x4p8fznkjh3lfjy2q-nginx.conf
Jun 05 21:40:30 hostname systemd-coredump[398798]: Process 398793 (nginx) of user 0 dumped core.
Jun 05 21:40:30 hostname nginx[395997]: 2021/06/05 21:40:30 [alert] 395997#395997: worker process 398793 exited on signal 31 (core dumped)
Jun 05 21:40:31 hostname systemd-coredump[398806]: Process 398804 (nginx) of user 0 dumped core.
Jun 05 21:40:31 hostname nginx[395997]: 2021/06/05 21:40:31 [alert] 395997#395997: worker process 398804 exited on signal 31 (core dumped)
Jun 05 21:40:31 hostname systemd-coredump[398813]: Process 398808 (nginx) of user 0 dumped core.
Jun 05 21:40:31 hostname nginx[395997]: 2021/06/05 21:40:31 [alert] 395997#395997: worker process 398808 exited on signal 31 (core dumped)
Jun 05 21:40:32 hostname systemd-coredump[398828]: Process 398823 (nginx) of user 0 dumped core.
Jun 05 21:40:32 hostname systemd-coredump[398821]: Process 398816 (nginx) of user 0 dumped core.
Jun 05 21:40:32 hostname nginx[395997]: 2021/06/05 21:40:32 [alert] 395997#395997: worker process 398823 exited on signal 31 (core dumped)
Jun 05 21:40:32 hostname nginx[395997]: 2021/06/05 21:40:32 [alert] 395997#395997: worker process 398816 exited on signal 31 (core dumped)
- Here’s what nginx.service looks like (generated by nixos):
[Unit]
After=network.target
Description=Nginx Web Server
StartLimitIntervalSec=60
[Service]
Environment="LOCALE_ARCHIVE=/nix/store/in621vh2kj0ayqa6qc9pqnjvx6hzj5h5-glibc-locales-2.32-46/lib/locale/locale-archive"
Environment="PATH=/nix/store/a4v1akahda85rl9gfphb07zzw79z8pb1-coreutils-8.32/bin:/nix/store/1hvm45djn8wkfg64gbmlqpfj4dnjh594-findutils-4.7.0/bin:/nix/store/7n3yzh9wza4bdqc04v01xddnfhkrwk2a-gnugrep-3.6/bin:/nix/store/g34ldykl1cal5b9ir3xinnq70m52fcnq-gnused-4.8/bin:/nix/store/r2bw74x7zci7shzxq3cikww9kp1wxc6i-systemd-247.6/bin:/nix/store/a4v1akahda85rl9gfphb07zzw79z8pb1-coreutils-8.32/sbin:/nix/store/1hvm45djn8wkfg64gbmlqpfj4dnjh594-findutils-4.7.0/sbin:/nix/store/7n3yzh9wza4bdqc04v01xddnfhkrwk2a-gnugrep-3.6/sbin:/nix/store/g34ldykl1cal5b9ir3xinnq70m52fcnq-gnused-4.8/sbin:/nix/store/r2bw74x7zci7shzxq3cikww9kp1wxc6i-systemd-247.6/sbin"
Environment="TZDIR=/nix/store/y4j4k0l6w941wriprxz13dhvz896lw3m-tzdata-2020f/share/zoneinfo"
X-StopIfChanged=false
AmbientCapabilities=CAP_NET_BIND_SERVICE
AmbientCapabilities=CAP_SYS_RESOURCE
CacheDirectory=nginx
CacheDirectoryMode=0750
CapabilityBoundingSet=CAP_NET_BIND_SERVICE
CapabilityBoundingSet=CAP_SYS_RESOURCE
ExecReload=/nix/store/z0rqfwaw46hl0snzaiw8wzr1sxbkjqiw-nginx-1.21.0/bin/nginx -c '/nix/store/jfd309xx69g4hs6x4p8fznkjh3lfjy2q-nginx.conf' -t
ExecReload=/nix/store/a4v1akahda85rl9gfphb07zzw79z8pb1-coreutils-8.32/bin/kill -HUP $MAINPID
ExecStart=/nix/store/z0rqfwaw46hl0snzaiw8wzr1sxbkjqiw-nginx-1.21.0/bin/nginx -c '/nix/store/jfd309xx69g4hs6x4p8fznkjh3lfjy2q-nginx.conf'
ExecStartPre=/nix/store/50difldgpz2h0qc4g4p8mbwkgy5ihzib-unit-script-nginx-pre-start/bin/nginx-pre-start
Group=root
LockPersonality=true
LogsDirectory=nginx
LogsDirectoryMode=0750
MemoryDenyWriteExecute=true
NoNewPrivileges=true
PrivateDevices=true
PrivateMounts=true
PrivateTmp=true
ProcSubset=pid
ProtectClock=true
ProtectControlGroups=true
ProtectHome=true
ProtectHostname=true
ProtectKernelLogs=true
ProtectKernelModules=true
ProtectKernelTunables=true
ProtectProc=invisible
ProtectSystem=strict
ReadWritePaths=/var/www/
ReadWritePaths=/run/
RemoveIPC=true
Restart=always
RestartSec=10s
RestrictAddressFamilies=AF_UNIX
RestrictAddressFamilies=AF_INET
RestrictAddressFamilies=AF_INET6
RestrictNamespaces=true
RestrictRealtime=true
RestrictSUIDSGID=true
RuntimeDirectory=nginx
RuntimeDirectoryMode=0750
SystemCallArchitectures=native
SystemCallFilter=~@cpu-emulation @debug @keyring @ipc @mount @obsolete @privileged @setuid
UMask=0027
User=root
I don’t really know much about systemd, so I’m a little confused and lost at the moment. Like I mentioned before, nginx runs without any issues if I stop the service, then take the ExecStart
command from the above service file and run it directly.
$ systemctl stop nginx.service
$ /nix/store/z0rqfwaw46hl0snzaiw8wzr1sxbkjqiw-nginx-1.21.0/bin/nginx -c '/nix/store/jfd309xx69g4hs6x4p8fznkjh3lfjy2q-nginx.conf'
At this point, I can access the sites that are served on this machine from any browser.
Other details that may or may not be important:
- The server has 2GB of RAM, with 1.4GB available. This hasn’t changed since I spun it up 2 years ago, and has never caused an issue.
- I’ve rebooted the server a couple times since
nixos-rebuild switch --upgrade
, but no difference in behavior. - I noticed an issue with
fail2ban
when I initially upgraded (it failed to start during the upgrade), but since a reboot it has been running without issue. - No other errors encountered during the upgrade process.
- Server config is fairly basic, no
nixops
involved. I really only use nginx to proxy to Docker containers and sign certs. -
/var/www
is the path used to store pretty much all the certs, logs, and html files./run/
I only added for the default pidfile, but I don’t think it’s actually needed. I get similar results if this is removed from theReadWritePaths
option in my config.
If anyone has any insight or thoughts, it’d be much appreciated. Otherwise, if I happen to make any headway I’ll update the thread with any relevant info.