Cant get coredump from nixbld user

I am trying to get a coredump from crash that only happens during nix-build

here is foo.nix to reproduce

{ pkgs ? import (fetchTarball "https://github.com/NixOS/nixpkgs/archive/e2022dbe6f691e98396ea5fdbacd6ad3be130ba5.tar.gz") { } }:
with pkgs ; symlinkJoin {
  name = "customisation-layer";
  paths = [ bashInteractive ];
  nativeBuildInputs = [ fakeroot fakechroot coreutils util-linux ];
  postBuild = ''
    mv $out old_out
    mkdir $out
    export FAKECHROOT_EXCLUDE_PATH=/dev:/proc:/sys:${builtins.storeDir}:$out/layer.tar
    fakechroot chroot $PWD/old_out fakeroot bash -c '
      eval "$fakeRootCommands"
      tar -cf $out/layer.tar .
    '
  '';
}

[kirillvr@tsutenkaku:~/nixpkgs]$ coredumpctl gdb
           PID: 3216108 (tar)
           UID: 30001 (nixbld1)
           GID: 30000 (nixbld)
        Signal: 6 (ABRT)
     Timestamp: Sun 2023-07-30 17:17:27 AEST (2h 54min ago)
  Command Line: tar -cf /nix/store/2zfqdg21272amls6d9lixn8ni1m0gxj5-customisation-layer/layer.tar .
    Executable: /nix/store/z7ziky1qlp3qajmrri2flizfhm0z42gk-gnutar-1.34/bin/tar
 Control Group: /system.slice/nix-daemon.service
          Unit: nix-daemon.service
         Slice: system.slice
       Boot ID: 016dd133be9149b6a2bb5ebbbb37666f
    Machine ID: 055e04520bee440e98b05b8f0a63e0dd
      Hostname: localhost
       Storage: none
       Message: Process 3216108 (tar) of user 30001 dumped core.

Coredump entry has no core attached (neither internally in the journal nor externally on disk).

I’m not sure if it is the correct answer but, I’m about having your dump as output of your package?

{ pkgs ? import (fetchTarball "https://github.com/NixOS/nixpkgs/archive/e2022dbe6f691e98396ea5fdbacd6ad3be130ba5.tar.gz") { } }:
with pkgs;
symlinkJoin {
  name = "customisation-layer";
  paths = [ bashInteractive ];
  nativeBuildInputs = [ fakeroot fakechroot coreutils util-linux gdb ];
  postBuild = ''
    mv $out old_out
    mkdir $out
    export FAKECHROOT_EXCLUDE_PATH=/dev:/proc:/sys:${builtins.storeDir}:$out/layer.tar
    fakechroot chroot $PWD/old_out fakeroot bash -c '
      eval "$fakeRootCommands"
      tar -cf $out/layer.tar .
    ' &
    gcore -o $out/danke $!
  '';
}

edit: note gdb in nativeBuildInputs

I remember having similar issues. I dont remember what the outcome was
but here are some snippets from my config that fixed it IIRC

  # don't forget to run ulimit -c unlimited to get the actual coredump
  # check thos comment to setup user ulimits https://github.com/NixOS/nixpkgs/issues/159964#issuecomment-1252682060

  environment.etc."security/limits.conf".text = ''
    #[domain]        [type]  [item]  [value]
    teto  soft  core  unlimited
    teto  soft  memlock 128
    *  hard  memlock  256
    @audio   -  nice     -20
  '';

I don’t see how that would work for Nix, because it calls setrlimit here to set the core size limit to 0, overriding whatever limit is configured.

I suppose calling ulimit -c unlimited from the derivation might work.

2 Likes

why didn’t I think about it

setting that inside derivation worked

systemd still says that coredump is inaccessible, but at least it captures stacktrace in its message

coredumpctl info
           PID: 141335 (tar)
           UID: 30001 (nixbld1)
           GID: 30000 (nixbld)
        Signal: 6 (ABRT)
     Timestamp: Wed 2023-08-02 06:56:59 AEST (8s ago)
  Command Line: tar -cf /nix/store/icldghcsim13wyxkddpm07clgi9dh686-customisation-layer/layer.tar .
    Executable: /nix/store/z7ziky1qlp3qajmrri2flizfhm0z42gk-gnutar-1.34/bin/tar
 Control Group: /system.slice/nix-daemon.service
          Unit: nix-daemon.service
         Slice: system.slice
       Boot ID: 0babcccb409e410886fdf93406ea42e4
    Machine ID: 055e04520bee440e98b05b8f0a63e0dd
      Hostname: localhost
       Storage: /var/lib/systemd/coredump/core.tar.30001.0babcccb409e410886fdf93406ea42e4.141335.1690923419000000.zst (inaccessible)
       Message: Process 141335 (tar) of user 30001 dumped core.
                
                Module libattr.so.1 without build-id.
                Module libacl.so.1 without build-id.
                Module libfakechroot.so without build-id.
                Module libfakeroot.so without build-id.
                Module tar without build-id.
                Stack trace of thread 45:
                #0  0x00007ffff7e26a8c __pthread_kill_implementation (libc.so.6 + 0x87a8c)
                #1  0x00007ffff7dd7c86 raise (libc.so.6 + 0x38c86)
                #2  0x00007ffff7dc18ba abort (libc.so.6 + 0x228ba)
                #3  0x00007ffff7dc25f5 __libc_message.cold (libc.so.6 + 0x235f5)
                #4  0x00007ffff7eb6679 __fortify_fail (libc.so.6 + 0x117679)
                #5  0x00007ffff7eb4ea4 __chk_fail (libc.so.6 + 0x115ea4)
                #6  0x00007ffff7eb5404 __readlinkat_chk (libc.so.6 + 0x116404)
                #7  0x00007ffff7f979ab __readlinkat_chk (libfakechroot.so + 0x79ab)
                #8  0x000000000043560c areadlinkat_with_size (tar + 0x3560c)
                #9  0x0000000000410a69 dump_file (tar + 0x10a69)
                #10 0x00000000004110c0 dump_file (tar + 0x110c0)
                #11 0x00000000004110c0 dump_file (tar + 0x110c0)
                #12 0x00000000004110c0 dump_file (tar + 0x110c0)
                #13 0x000000000041138d create_archive (tar + 0x1138d)
                #14 0x0000000000407000 main (tar + 0x7000)
                #15 0x00007ffff7dc2ace __libc_start_call_main (libc.so.6 + 0x23ace)
                #16 0x00007ffff7dc2b89 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x23b89)
                #17 0x0000000000407445 _start (tar + 0x7445)
                ELF object binary architecture: AMD x86-64

1 Like

Great!

That’s because of permissions. Should be able to read it as root.

1 Like

two for two ! indeed as root I can access coredump !