Notification on systemd service failures

I deploy my server with NixOps and I want to be notified about systemd service failures. Therefore I tried to extend every systemd service with onFailure. Currently I have something along the lines of

{ config, lib, pkgs, ... }:

{
  config = lib.setAttrByPath [ "systemd" "services" ]
    (lib.genAttrs (lib.attrNames config.systemd.services)
      (serviceName: lib.setAttrByPath [ "serviceConfig" "onFailure" ]  [ "email@%n.service" ]));
}

However, that will lead into an infinite recursion. Any ideas?

2 Likes

Just another idea. Systemd publishes all event changes of systemd units over dbus. You could listen to these events and trigger something any time a service goes to status failed.

Nixpkgs ships with the pystemd python package that easily allows you to do this https://github.com/facebookincubator/pystemd/blob/a51e19e1abe65498f2d74aa540c7016716c3e846/examples/monitor_all_units_from_signal.py

Or you could do it with busctl and some bash:

sudo busctl  monitor org.freedesktop.systemd1 --json=short | jq 'select(.member=="PropertiesChanged")'

You then just need one service that monitors all of them.

However, l’m also curious how we can solve your nixos config to not infinitely recurse. Ill have a better look later

3 Likes

With systemd v244, we will be able to do it this way:

Unit files now support top level dropin directories of the form
<unit_type>.d/ (e.g. service.d/) that may be used to add configuration
that affects all corresponding unit files.

1 Like

Journalwatch seems like a possible workaround.

In the stockholm there is a module called krebs.on-failure which essentially links a separate on-failure.plans.<service-name> to the service:
The essential piece of configuration can be found at:

However it only attaches to explicitly marked services:
krebs.on-failure.plans.snapraid-sync.name = "snapraid-sync"; [source]

The idea is that most of the time you know which services are important and where you definitly want to receive a mail once something dies.

Cheers

Thank you for your suggestions. I learned about some cool stuff. Much appreciated :heart:
For the moment I just use my solution and specify explicitly the services for that I definitely want to receive notifications.

# default.nix
let
  nixos =
    import <nixpkgs/nixos> {
        configuration = { lib, ...}: {

            options.systemd.services = lib.mkOption {
                type = with lib.types; attrsOf (submodule {
                    config = {
                        serviceConfig.onFailure = "email@%n.service";
                    };
                });
            };

            config = {
                boot.isContainer = true; # this is only as an example

                services.nginx.enable = true;
            };

        };
    };
in nixos.config
$ nix eval -f. systemd.services.nginx.serviceConfig.onFailure
"email@%n.service"
2 Likes

Note this probably won’t affect service units shipped builtin with systemd, nor service units added via the systemd.packages option.

1 Like