Nixbld leaving around shared memory segments?

I’m on macos 13.4 using nix 2.16.1.

I went to start a couple of local postgres servers and got this error:

2023-07-05 16:08:56.418 UTC [6334] FATAL:  could not create shared memory segment: No space left on device
2023-07-05 16:08:56.418 UTC [6334] DETAIL:  Failed system call was shmget(key=64054529, size=56, 03600).
2023-07-05 16:08:56.418 UTC [6334] HINT:  This error does *not* mean that you have run out of disk space.  It occurs either if all available shared memory IDs have been taken, in which case you need to raise the SHMMNI parameter in your kernel, or because the system's overall limit for shared memory has been reached.
	The PostgreSQL documentation contains more information about shared memory configuration.

The default shared memory settings are

❯ sysctl -a | grep shm
kern.sysv.shmmax: 4194304
kern.sysv.shmmin: 1
kern.sysv.shmmni: 32
kern.sysv.shmseg: 8
kern.sysv.shmall: 1024
security.mac.posixshm_enforce: 1
security.mac.sysvshm_enforce: 1

notably shmni being 32.

I took a look and there were a huge amount of segments being taken up by nixbld

❯ ipcs -am 
T     ID     KEY        MODE       OWNER    GROUP  CREATOR   CGROUP NATTCH  SEGSZ  CPID  LPID   ATIME    DTIME    CTIME
Shared Memory:
m 1179648 0x0214b953 --rw------- _nixbld1   nixbld _nixbld1   nixbld      0     56  15127  15127 12:58:11 12:58:20 12:58:11
m 1376257 0x020bc35c --rw------- _nixbld1   nixbld _nixbld1   nixbld      0     56  78330  78330 11:53:33 11:53:44 11:53:33
m 8323074 0x020bb3da --rw------- _nixbld1   nixbld _nixbld1   nixbld      0     56  78229  78229 11:53:03 11:53:24 11:53:03
m 2359299 0x0214c7db --rw------- _nixbld1   nixbld _nixbld1   nixbld      0     56  15249  15249 12:59:35 12:59:49 12:59:35
m 1835012 0x03bb11b6 --rw------- _nixbld1   nixbld _nixbld1   nixbld      0     56  19581  19581 13:11:05 13:11:07 13:11:05
m  65541 0x03bb2ca9 --rw------- _nixbld1   nixbld _nixbld1   nixbld      0     56  19716  19716 13:13:28 13:13:32 13:13:28
m  65542 0x03bb476b --rw------- _nixbld1   nixbld _nixbld1   nixbld      0     56  19878  19878 13:13:42 13:13:44 13:13:42
m  65543 0x03bb61f7 --rw------- _nixbld1   nixbld _nixbld1   nixbld      0     56  19933  19933 13:13:50 13:13:51 13:13:50
m 196616 0x03bbb394 --rw------- _nixbld1   nixbld _nixbld1   nixbld      0     56  20334  20334 13:16:50 13:16:51 13:16:50
m 2490378 0x03cc92d4 --rw------- _nixbld5   nixbld _nixbld5   nixbld      0     56  69233  69233 16:14:29 16:14:35 16:14:29
m 1376267 0x03cc9007 --rw------- _nixbld2   nixbld _nixbld2   nixbld      0     56  69234  69234 16:14:29 16:14:35 16:14:29
m  65548 0x03ccca36 --rw------- _nixbld2   nixbld _nixbld2   nixbld      0     56  75141  75141 16:19:14 16:19:20 16:19:14
m  65549 0x03ccce65 --rw------- _nixbld3   nixbld _nixbld3   nixbld      0     56  75151  75151 16:19:14 16:19:20 16:19:14
m  65550 0x03cd013d --rw------- _nixbld2   nixbld _nixbld2   nixbld      0     56  75310  75310 16:19:55 16:20:01 16:19:55
m  65551 0x03cd05aa --rw------- _nixbld3   nixbld _nixbld3   nixbld      0     56  75320  75320 16:19:55 16:20:01 16:19:55
m  65552 0x03cd4017 --rw------- _nixbld2   nixbld _nixbld2   nixbld      0     56  80293  80293 16:33:44 16:33:50 16:33:44
m  65553 0x03cd4537 --rw------- _nixbld3   nixbld _nixbld3   nixbld      0     56  80292  80292 16:33:44 16:33:50 16:33:44
m  65554 0x03cd78ff --rw------- _nixbld2   nixbld _nixbld2   nixbld      0     56  80583  80583 16:37:33 16:37:39 16:37:33
m  65555 0x03cd7ef4 --rw------- _nixbld3   nixbld _nixbld3   nixbld      0     56  80593  80593 16:37:33 16:37:39 16:37:33
m  65556 0x03cdb438 --rw------- _nixbld2   nixbld _nixbld2   nixbld      0     56  80980  80980 16:44:53 16:44:59 16:44:53
m  65557 0x03cdbb80 --rw------- _nixbld3   nixbld _nixbld3   nixbld      0     56  80991  80991 16:44:53 16:44:59 16:44:53
m  65558 0x03cdeaf2 --rw------- _nixbld2   nixbld _nixbld2   nixbld      0     56  81137  81137 16:45:18 16:45:24 16:45:18
m  65559 0x03cdf258 --rw------- _nixbld3   nixbld _nixbld3   nixbld      0     56  81150  81150 16:45:18 16:45:23 16:45:18
m  65560 0x03ce22bf --rw------- _nixbld2   nixbld _nixbld2   nixbld      0     56  81421  81421 16:48:55 16:49:01 16:48:55
m  65561 0x03ce2976 --rw------- _nixbld3   nixbld _nixbld3   nixbld      0     56  81432  81432 16:48:55 16:49:00 16:48:55
m  65562 0x03ce59b1 --rw------- _nixbld2   nixbld _nixbld2   nixbld      0     56  81570  81570 16:49:08 16:49:14 16:49:08
m  65563 0x03ce612f --rw------- _nixbld3   nixbld _nixbld3   nixbld      0     56  81582  81582 16:49:08 16:49:14 16:49:08
m  65564 0x03ce9309 --rw------- _nixbld2   nixbld _nixbld2   nixbld      0     56  81812  81812 16:51:32 16:51:38 16:51:32
m  65565 0x03ce98a4 --rw------- _nixbld3   nixbld _nixbld3   nixbld      0     56  81822  81822 16:51:32 16:51:38 16:51:32
m  65566 0x03cec9d7 --rw------- _nixbld2   nixbld _nixbld2   nixbld      0     56  81967  81967 16:51:50 16:51:55 16:51:50
m  65567 0x03ced29d --rw------- _nixbld3   nixbld _nixbld3   nixbld      0     56  81978  81978 16:51:50 16:51:55 16:51:50

I killed them with ipcs -b -m | egrep ^m | grep nixbld | awk '{ print $2; }' | xargs -n1 sudo ipcrm -m and was able to start up postgres.

My question is: Is this normal for nixbld to leave around so many shared memory segments and not clean up after itself?

Even now a while later there are starting to be quite a few new segments laying around

❯ ipcs -am
IPC status from <running system> as of Wed Jul  5 22:20:20 CEST 2023
T     ID     KEY        MODE       OWNER    GROUP  CREATOR   CGROUP NATTCH  SEGSZ  CPID  LPID   ATIME    DTIME    CTIME
Shared Memory:
m 1310720 0x03d1bb3f --rw------- _nixbld3   nixbld _nixbld3   nixbld      0     56   7769   7769 18:27:46 18:27:52 18:27:46
m 1441793 0x03d1ed37 --rw------- _nixbld2   nixbld _nixbld2   nixbld      0     56   7924   7924 18:28:16 18:28:22 18:28:16
m 8388610 0x03d1f2b1 --rw------- _nixbld3   nixbld _nixbld3   nixbld      0     56   7935   7935 18:28:17 18:28:22 18:28:17
m 2424835 0x03d22fca --rw------- _nixbld2   nixbld _nixbld2   nixbld      0     56  17507  17507 18:47:43 18:47:49 18:47:43
m 1900548 0x03d234fd --rw------- _nixbld3   nixbld _nixbld3   nixbld      0     56  17506  17506 18:47:43 18:47:48 18:47:43
m 131077 0x03d26809 --rw------- _nixbld2   nixbld _nixbld2   nixbld      0     56  17678  17678 18:48:32 18:48:38 18:48:32
m 131078 0x03d26cd5 --rw------- _nixbld3   nixbld _nixbld3   nixbld      0     56  17688  17688 18:48:32 18:48:38 18:48:32
m 131079 0x03d29f0d --rw------- _nixbld2   nixbld _nixbld2   nixbld      0     56  17831  17831 18:49:02 18:49:08 18:49:02
m 262152 0x03d2a400 --rw------- _nixbld3   nixbld _nixbld3   nixbld      0     56  17842  17842 18:49:02 18:49:07 18:49:02
m 786441 0x03d1b574 --rw------- _nixbld2   nixbld _nixbld2   nixbld      0     56   7770   7770 18:27:46 18:27:52 18:27:46
m 2555914 0x03d2d5b4 --rw------- _nixbld2   nixbld _nixbld2   nixbld      0     56  17986  17986 18:49:28 18:49:33 18:49:28
m 1441803 0x03d2dad6 --rw------- _nixbld3   nixbld _nixbld3   nixbld      0     56  17997  17997 18:49:28 18:49:33 18:49:28
m 131084 0x03d30c72 --rw------- _nixbld2   nixbld _nixbld2   nixbld      0     56  18140  18140 18:49:56 18:50:02 18:49:56
m 131085 0x03d3116e --rw------- _nixbld3   nixbld _nixbld3   nixbld      0     56  18151  18151 18:49:56 18:50:01 18:49:56
1 Like

Do you know what processes correspond to those CPIDs?

3 Likes

Tracked down the issue, it was self-inflicted.

As part of a pkgs.linkFarmFromDrvs some of the other derivations boot up and stop postgres during the build. But I wasn’t using --keep-going and when one would error out, it’d kill all the others mid-job, not giving a chance to shut down postgres.

1 Like

Could you elaborate a bit on what was going on there? It sounds like the sort of thing Nix really shouldn’t be allowing to happen. Nix should clean up after builds that don’t exit cleanly.

1 Like

It would be maybe cool if nix tracked shared memory segments created during builds and cleared them out. I dont know if that is or isn’t beyond the scope of the builders or not, but it’d be nice for my use for sure.

This reproduces it:

let
  pkgs = import <nixpkgs> {};
  pg = pkgs.stdenvNoCC.mkDerivation {
    name = "pg";
    src = ./.;
    nativeBuildInputs = [ pkgs.postgresql ];
    installPhase = ''
      mkdir $out
      export PGDATA="$out"
      export PGHOST="$out"
      export PGUSER=postgres
      export PGDATABASE=postgres

      PGTZ=UTC initdb --no-locale --encoding=UTF8 --nosync -U "$PGUSER"
      echo "listen_addresses='''" >> $PGDATA/postgresql.conf
      trap 'pg_ctl stop' sigint sigterm
      pg_ctl start -o "-k $PGDATA"

      sleep 100
    '';
  };
  err = pkgs.stdenvNoCC.mkDerivation {
    name = "error-simulate";
    src = ./.;
    installPhase = "sleep 2; exit 1";
  };
in
pkgs.linkFarmFromDrvs "sharedmemoryleak" [pg err]

before no shared memory is being used

❯ ipcs -am
IPC status from <running system> as of Sat Jul  8 08:14:25 CEST 2023
T     ID     KEY        MODE       OWNER    GROUP  CREATOR   CGROUP NATTCH  SEGSZ  CPID  LPID   ATIME    DTIME    CTIME
Shared Memory:

Then run the build

❯ nix-build sharedmem.nix
these 3 derivations will be built:
  /nix/store/d8913l1f33hbcp8n4mrhpl0bf9bln2q7-pg.drv
  /nix/store/dg4y5dz3axnwh0n0zhk8p0473viiq975-error-simulate.drv
  /nix/store/qgvb2d0mph3w1aif05i3r9k01i76nxi1-sharedmemoryleak.drv
building '/nix/store/dg4y5dz3axnwh0n0zhk8p0473viiq975-error-simulate.drv'...
building '/nix/store/d8913l1f33hbcp8n4mrhpl0bf9bln2q7-pg.drv'...
unpacking sources
unpacking source archive /nix/store/qywapxl1lqw0kq742afzqz4pbvmf1pm5-demo
unpacking sources
unpacking source archive /nix/store/qywapxl1lqw0kq742afzqz4pbvmf1pm5-demo
source root is demo
source root is demo
patching sources
patching sources
updateAutotoolsGnuConfigScriptsPhase
updateAutotoolsGnuConfigScriptsPhase
configuring
configuring
no configure script, doing nothing
no configure script, doing nothing
building
building
no Makefile or custom buildPhase, doing nothing
no Makefile or custom buildPhase, doing nothing
installing
installing
The files belonging to this database system will be owned by user "_nixbld2".
This user must also own the server process.

The database cluster will be initialized with locale "C".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /nix/store/rr5262bhl9rd96zx03k2qwjx3ca8fj1v-pg ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... UTC
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok

Sync to disk skipped.
The data directory might become corrupt if the operating system crashes.

initdb: warning: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

    pg_ctl -D /nix/store/rr5262bhl9rd96zx03k2qwjx3ca8fj1v-pg -l logfile start

waiting for server to start....2023-07-08 06:14:32.462 UTC [33417] LOG:  starting PostgreSQL 14.8 on aarch64-apple-darwin22.4.0, compiled by clang version 11.1.0, 64-bit
2023-07-08 06:14:32.462 UTC [33417] LOG:  listening on Unix socket "/nix/store/rr5262bhl9rd96zx03k2qwjx3ca8fj1v-pg/.s.PGSQL.5432"
2023-07-08 06:14:32.465 UTC [33418] LOG:  database system was shut down at 2023-07-08 06:14:32 UTC
2023-07-08 06:14:32.467 UTC [33417] LOG:  database system is ready to accept connections
 done
server started
error: builder for '/nix/store/dg4y5dz3axnwh0n0zhk8p0473viiq975-error-simulate.drv' failed with exit code 1;
       last 10 log lines:
       > unpacking sources
       > unpacking source archive /nix/store/qywapxl1lqw0kq742afzqz4pbvmf1pm5-demo
       > source root is demo
       > patching sources
       > updateAutotoolsGnuConfigScriptsPhase
       > configuring
       > no configure script, doing nothing
       > building
       > no Makefile or custom buildPhase, doing nothing
       > installing
       For full logs, run 'nix log /nix/store/dg4y5dz3axnwh0n0zhk8p0473viiq975-error-simulate.drv'.
error: 1 dependencies of derivation '/nix/store/qgvb2d0mph3w1aif05i3r9k01i76nxi1-sharedmemoryleak.drv' failed to build

And you’re left with one shared memory segment hanging around:

❯ ipcs -am
IPC status from <running system> as of Sat Jul  8 08:14:34 CEST 2023
T     ID     KEY        MODE       OWNER    GROUP  CREATOR   CGROUP NATTCH  SEGSZ  CPID  LPID   ATIME    DTIME    CTIME
Shared Memory:
m 2752513 0x02246a21 --rw------- _nixbld2   nixbld _nixbld2   nixbld      0     56  33417  33417  8:14:32  8:14:32  8:14:32
1 Like