Strategies for debugging systemd services (ideally with debugger)

Hi all,

Hoping to hear some suggestions for streamlining the process of debugging a process that is misbehaving when being run as a systemd service.

For example, my current case is a python application. I’m decent at python and if I could just attach PDB I think I could figure things out pretty quickly. However, I don’t think there’s a great way to attach PDB to a running systemd service.

ChatGPT mentions that I could possibly use something like debugpy – which I can look into, but requires additional steps to patch this into the source package definition and application. It also mentions directly attaching the service to a TTY, but this is on a headless / keyboardless server I’m connecting to over SSH, so probably not.

I can obviously just run the command described by the systemd service file, but frankly the NixOS service definitions are often pretty complex; adding in all of the contextual and environmental details manually seems pretty tedious (and this situation is an issue I seem to run into relatively frequently).

I figured I could probably write a script that would parse a systemd file and automatically generate a bash script that matched at least the major parts (user, working directory, environment). I could then run it imperatively and break into PDB. I’m sure there would be a million edge cases, but this might be the path of least resistance. Does anyone know of an existing project to accomplish this?

I would ideally like the same strategy to be usable for other languages; I’m pretty sure that gdb can attach to a running process which makes some of this moot depending on the language, but having a strategy that would also work for bash or ruby scripts would be nice.

Does anyone have a low-friction workflow they would recommend here? Thanks for any thoughts.

Maybe something like systemd-analyze unit-shell could help here? It applies most of the settings from the systemd unit definitions.

systemd-analyze unit-shell SERVICE [command...]
       The given command runs on the namespace of the specified running service. If no command is given, spawn
       and attach a shell with the namespace to the service.

       Example 31. Example output

           $ systemd-analyze unit-shell systemd-resolved.service ls
           bin   dev  etc    home  lib   lib64     lost+found  mnt  proc  run   srv  tmp  var   vmlinuz.old
           boot  efi  exitrd  init  lib32     libx32    media         opt  root  sbin  sys  usr  vmlinuz  work

       Added in version 258.

Interesting, thanks for your input. That’s a new one for me; I’m not exactly sure how I’d use it, as it doesn’t seem to inherit the environment, without which I haven’t gained much.

For example:

$ sudo systemd-analyze unit-shell music-assistant
# echo $HOME
/root
# systemctl cat music-assistant | rg HOME
Environment="HOME=/var/lib/music-assistant"
# echo $PATH
/root/.local/bin:/run/wrappers/bin:/root/.nix-profile/bin:/nix/profile/bin:/root/.local/state/nix/profile/bin:/etc/profiles/per-user/root/bin:/nix/var/nix/profiles/default/bin:/run/current-system/sw/bin
# systemctl cat music-assistant | rg PATH
... super long nixos-style path ...

It looks like debugpy might work, but my first few attempts have been fruitless:

$ sudo nix run --impure --expr 'with import <nixpkgs> {}; python3.withPackages(ps: with ps; [ debugpy ])' . \
    -- -m debugpy --listen localhost:5678 --pid $(pgrep mass)
0.00s - Debugger warning: It seems that frozen modules are being used, which may
0.00s - make the debugger miss breakpoints. Please pass -Xfrozen_modules=off
0.00s - to python to disable frozen modules.
0.00s - Note: Debugging will proceed. Set PYDEVD_DISABLE_FILE_VALIDATION=1 to disable this validation.
PYDEVD_GDB_SCAN_SHARED_LIBRARIES not set (scanning all libraries for needed symbols).
Running: /nix/store/935m0ckihjv7l9mp10v1dw2fxkmffv29-gdb-17.1/bin/gdb --nw --nh --nx --pid 908396 --batch --eval-command='set scheduler-locking off' --eval-command='set architecture auto' --eval-command='call (void*)dlopen("/nix/store/92yy94jnp1646vx71ry1sr9gcrpqkba4-python3-3.13.11-env/lib/python3.13/site-packages/debugpy/_vendored/pydevd/pydevd_attach_to_process/attach_linux_amd64.so", 2)' --eval-command='sharedlibrary attach_linux_amd64' --eval-command='call (int)DoAttach(0, "import codecs;import json;import sys;decode = lambda s: codecs.utf_8_decode(bytearray(s))[0] if s is not None else None;script_dir = decode([47, 110, 105, 120, 47, 115, 116, 111, 114, 101, 47, 57, 50, 121, 121, 57, 52, 106, 110, 112, 49, 54, 52, 54, 118, 120, 55, 49, 114, 121, 49, 115, 114, 57, 103, 99, 114, 112, 113, 107, 98, 97, 52, 45, 112, 121, 116, 104, 111, 110, 51, 45, 51, 46, 49, 51, 46, 49, 49, 45, 101, 110, 118, 47, 108, 105, 98, 47, 112, 121, 116, 104, 111, 110, 51, 46, 49, 51, 47, 115, 105, 116, 101, 45, 112, 97, 99, 107, 97, 103, 101, 115, 47, 100, 101, 98, 117, 103, 112, 121, 47, 115, 101, 114, 118, 101, 114]);setup = json.loads(decode([123, 34, 109, 111, 100, 101, 34, 58, 32, 34, 108, 105, 115, 116, 101, 110, 34, 44, 32, 34, 97, 100, 100, 114, 101, 115, 115, 34, 58, 32, 91, 34, 108, 111, 99, 97, 108, 104, 111, 115, 116, 34, 44, 32, 53, 54, 55, 56, 93, 44, 32, 34, 119, 97, 105, 116, 95, 102, 111, 114, 95, 99, 108, 105, 101, 110, 116, 34, 58, 32, 102, 97, 108, 115, 101, 44, 32, 34, 108, 111, 103, 95, 116, 111, 34, 58, 32, 110, 117, 108, 108, 44, 32, 34, 97, 100, 97, 112, 116, 101, 114, 95, 97, 99, 99, 101, 115, 115, 95, 116, 111, 107, 101, 110, 34, 58, 32, 110, 117, 108, 108, 125]));sys.path.insert(0, script_dir);import attach_pid_injected;del sys.path[0];attach_pid_injected.attach(setup);", 0)'
[New LWP 908437]
[New LWP 908436]
[New LWP 908435]
[New LWP 908433]
[New LWP 908432]
[New LWP 908431]
[New LWP 908430]
[New LWP 908429]
[New LWP 908428]
[New LWP 908427]
[New LWP 908426]
[New LWP 908425]
[New LWP 908424]
[New LWP 908423]
[New LWP 908422]
[New LWP 908421]
[New LWP 908420]
[New LWP 908419]
[New LWP 908418]
[New LWP 908417]
[New LWP 908416]

warning: Expected absolute pathname for libpthread in the inferior, but got target:/nix/store/j193mfi0f921y0kfs8vjc1znnr45ispv-glibc-2.40-66/lib/libc.so.6.

warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
0x00007f32c36c0b9a in epoll_wait () from target:/nix/store/j193mfi0f921y0kfs8vjc1znnr45ispv-glibc-2.40-66/lib/libc.so.6
The target architecture is set to "auto" (currently "i386:x86-64").
$1 = (void *) 0x563ab7177a60
$2 = 0
[Inferior 1 (process 908396) detached]

Right, while I did kinda expect it to enter the pure environment, I guess it does run a bash or so in the end that does read profiles/rc files. You can still do this within the unit-shell if you call something like;

env -i $(cat /proc/$(systemctl show -p MainPID --value grafana)/environ | tr ‘\0’ ’ ') /run/current-system/sw/bin/bash --norc --noprofile - a bit suckey but probably does the trick

Hmmm, I’m not sure the debugpy route is going to work. Seems to basically require VS Code, which I don’t generally use, and it specifically states that you need to have the identical source code available on both machines if debugging remotely; because of the way nix packages Python, I imagine this won’t be straightforward.

Well, a tiny bit of progress.

I wrote a bash script that accepts a service name and tries to drop into a “pretty close” environment:

#!/usr/bin/env bash

set -Eeuf -o pipefail

log() {
  printf "%s\n" "$*" >&2
}

err() {
  log "err: $*"
  exit 1
}

usage() {
  printf 'systemd-service-env.sh :: Enters an environment somewhat like that of a specified systemd service.
USAGE: systemd-service-env.sh SERVICE-NAME
'
}

main() {
  if [[ "${EUID}" -ne 0 ]]; then
    sudo "$0" "$@"
    exit $?
  fi

  case "${1:-}" in
    "")
      usage
      exit 1
      ;;
    -h)
      usage
      exit 0
      ;;
    *)
      :
      local service=${1}
      ;;
  esac

  if ! systemctl is-active --quiet "${service}"; then
    err "service must be running"
  fi

  local pid=$(systemctl show --property MainPID --value "${service}")
  local environment
  mapfile -t -d $'\0' environment < /proc/"${pid}"/environ

  systemd-analyze unit-shell "${service}" env -i "${environment[@]}" /run/current-system/sw/bin/bash --norc --noprofile
}
main "$@"

If I add debugpy to the dependencies for the package in question, then use this script to enter that environment, I’m able to attach the vscode debugger by searching through the systemd service file and following its ExecStart path to eventually find where debugpy is installed, setting up an SSH tunnel to my local machine, then using it like so:

# /nix/store/f2l9dha8rg6hiy0pcm55ifdr8i3l4c9q-python3.13-debugpy-1.8.19/bin/debugpy \
    --listen 5678 \
    --pid "$(
        systemctl show -p MainPID --value music-assistant
    )"  

Without the source code locally it has extremely limited utility so far, but I can import music_assistant without an error, suggesting that it’s working.

Unfortunately in this environment it’s pretty difficult for me to enter a nix-shell, so I haven’t yet discovered a technique that works without modifying the source derivation (to add debugpy).

Trying to do so from outside of this environment fails with some complaints about libpthread:

$ nix-shell -p 'python3.withPackages (ps: [ ps.debugpy ])' --command 'sudo debugpy --listen 45678 --pid "$(systemctl show --property MainPID --value music-assistant)"'
0.00s - Debugger warning: It seems that frozen modules are being used, which may
0.00s - make the debugger miss breakpoints. Please pass -Xfrozen_modules=off
0.00s - to python to disable frozen modules.
0.00s - Note: Debugging will proceed. Set PYDEVD_DISABLE_FILE_VALIDATION=1 to disable this validation.
PYDEVD_GDB_SCAN_SHARED_LIBRARIES not set (scanning all libraries for needed symbols).
Running: /nix/store/935m0ckihjv7l9mp10v1dw2fxkmffv29-gdb-17.1/bin/gdb --nw --nh --nx --pid 1874850 --batch --eval-command='set scheduler-locking off' --eval-command='set architecture auto' --eval-command='call (void*)dlopen("/nix/store/92yy94jnp1646vx71ry1sr9gcrpqkba4-python3-3.13.11-env/lib/python3.13/site-packages/debugpy/_vendored/pydevd/pydevd_attach_to_process/attach_linux_amd64.so", 2)' --eval-command='sharedlibrary attach_linux_amd64' --eval-command='call (int)DoAttach(0, "import codecs;import json;import sys;decode = lambda s: codecs.utf_8_decode(bytearray(s))[0] if s is not None else None;script_dir = decode([47, 110, 105, 120, 47, 115, 116, 111, 114, 101, 47, 57, 50, 121, 121, 57, 52, 106, 110, 112, 49, 54, 52, 54, 118, 120, 55, 49, 114, 121, 49, 115, 114, 57, 103, 99, 114, 112, 113, 107, 98, 97, 52, 45, 112, 121, 116, 104, 111, 110, 51, 45, 51, 46, 49, 51, 46, 49, 49, 45, 101, 110, 118, 47, 108, 105, 98, 47, 112, 121, 116, 104, 111, 110, 51, 46, 49, 51, 47, 115, 105, 116, 101, 45, 112, 97, 99, 107, 97, 103, 101, 115, 47, 100, 101, 98, 117, 103, 112, 121, 47, 115, 101, 114, 118, 101, 114]);setup = json.loads(decode([123, 34, 109, 111, 100, 101, 34, 58, 32, 34, 108, 105, 115, 116, 101, 110, 34, 44, 32, 34, 97, 100, 100, 114, 101, 115, 115, 34, 58, 32, 91, 34, 49, 50, 55, 46, 48, 46, 48, 46, 49, 34, 44, 32, 52, 53, 54, 55, 56, 93, 44, 32, 34, 119, 97, 105, 116, 95, 102, 111, 114, 95, 99, 108, 105, 101, 110, 116, 34, 58, 32, 102, 97, 108, 115, 101, 44, 32, 34, 108, 111, 103, 95, 116, 111, 34, 58, 32, 110, 117, 108, 108, 44, 32, 34, 97, 100, 97, 112, 116, 101, 114, 95, 97, 99, 99, 101, 115, 115, 95, 116, 111, 107, 101, 110, 34, 58, 32, 110, 117, 108, 108, 125]));sys.path.insert(0, script_dir);import attach_pid_injected;del sys.path[0];attach_pid_injected.attach(setup);", 0)'
[New LWP 1874878]
[New LWP 1874877]
[New LWP 1874874]
[New LWP 1874872]
[New LWP 1874871]
[New LWP 1874870]
[New LWP 1874869]
[New LWP 1874868]
[New LWP 1874867]
[New LWP 1874866]
[New LWP 1874865]
[New LWP 1874864]
[New LWP 1874863]
[New LWP 1874862]
[New LWP 1874861]
[New LWP 1874860]
[New LWP 1874859]
[New LWP 1874858]
[New LWP 1874857]
[New LWP 1874856]
[New LWP 1874855]

warning: Expected absolute pathname for libpthread in the inferior, but got target:/nix/store/j193mfi0f921y0kfs8vjc1znnr45ispv-glibc-2.40-66/lib/libc.so.6.

warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
0x00007f00aba31639 in PySlice_Unpack () from target:/nix/store/qzc04a3npl70cyyy6flnnrb2ig3kayxm-python3-3.13.11/lib/libpython3.13.so.1.0
The target architecture is set to "auto" (currently "i386:x86-64").
$1 = (void *) 0x55f1907e31d0
$2 = 0
[Inferior 1 (process 1874850) detached]