`wal-g` crashing with `Bad system call` when running with postgresql

Hello,

I ran into some problems setting up wal-g backup with postgresql.

I eventually figured out the root cause to be an overly eager SystemCallFilter in postgresql’s systemd service definition, which prevented wal-g from doing setrlimit syscall when it is run from postgres through the archive_command.

This took a bunch of core dump investigations, running gdb and reading a bunch of manpages about kernel syscalls.
I am new to nix, but this doesn’t feel like the experience I’ve had with other packaged software so far.

My sense is that I should write a PR to nixpkgs to fix this, but I have a few questions first:

  • my feeling is that the wal-g package should fix this by changing the settings in postgresql’s serviceConfig here, in a similar way to the openFirewall option in some services.
  • or is the opposite the way to go? have postgresql change its settings if some other package is installed. I see that it does so if various extensions (eg. citus) are installed. But wal-g isn’t an extension, just a separate program which happens to work with postgresql. Which is the preferred way to solve this in nixpkgs?
  • I would love to take this as a learning exercise in contributing to nixpkgs, but is there an obvious example of a similar interaction between packages that I could use for inspiration?
  • As a side question, I fixed my config with systemd.services.postgresql.serviceConfig.SystemCallFilter = [ "setrlimit" ]; but I can’t figure out why this adds a SystemCallFilter line in the service definition, instead of replacing the setting altogether.

Thanks for any tips!

I don’t know enough about this set of services to help. You may find some useful information here:

@ZenoArrow thanks for your reply! I did look quite thoroughly through the repo you mention before posting here. At first I thought I had the same problem, but it was slightly different. That setup puts postgresql in a docker container (so it’s not affected by the syscall filter), while I run it directly on the server.

I might try to ping the maintainers of postgresql or wal-g in nixpkgs to see what they think.

my feeling is that the wal-g package should fix this by changing the settings in postgresql’s serviceConfig here , in a similar way to the openFirewall option in some services.

This sounds like exactly what I would expect. If you want to be nice about it, you would try to fit the fact that it is enabled into one of the enable setting descriptions related to the service.

On second look, since wal-g is not packaged as a service, this may be more difficult, possibly requiring a new nixos module for enabling wal-g support in postgresql.

Since the process ultimately doing the call is wal-g, this is not generally useful for the other postgresql consumers.

1 Like