Notification on systemd service failures

I deploy my server with NixOps and I want to be notified about systemd service failures. Therefore I tried to extend every systemd service with onFailure. Currently I have something along the lines of

{ config, lib, pkgs, ... }:

{
  config = lib.setAttrByPath [ "systemd" "services" ]
    (lib.genAttrs (lib.attrNames config.systemd.services)
      (serviceName: lib.setAttrByPath [ "serviceConfig" "onFailure" ]  [ "email@%n.service" ]));
}

However, that will lead into an infinite recursion. Any ideas?

3 Likes

Just another idea. Systemd publishes all event changes of systemd units over dbus. You could listen to these events and trigger something any time a service goes to status failed.

Nixpkgs ships with the pystemd python package that easily allows you to do this https://github.com/facebookincubator/pystemd/blob/a51e19e1abe65498f2d74aa540c7016716c3e846/examples/monitor_all_units_from_signal.py

Or you could do it with busctl and some bash:

sudo busctl  monitor org.freedesktop.systemd1 --json=short | jq 'select(.member=="PropertiesChanged")'

You then just need one service that monitors all of them.

However, l’m also curious how we can solve your nixos config to not infinitely recurse. Ill have a better look later

3 Likes

With systemd v244, we will be able to do it this way:

Unit files now support top level dropin directories of the form
<unit_type>.d/ (e.g. service.d/) that may be used to add configuration
that affects all corresponding unit files.

2 Likes

Journalwatch seems like a possible workaround.

In the stockholm there is a module called krebs.on-failure which essentially links a separate on-failure.plans.<service-name> to the service:
The essential piece of configuration can be found at:

However it only attaches to explicitly marked services:
krebs.on-failure.plans.snapraid-sync.name = "snapraid-sync"; [source]

The idea is that most of the time you know which services are important and where you definitly want to receive a mail once something dies.

Cheers

Thank you for your suggestions. I learned about some cool stuff. Much appreciated :heart:
For the moment I just use my solution and specify explicitly the services for that I definitely want to receive notifications.

# default.nix
let
  nixos =
    import <nixpkgs/nixos> {
        configuration = { lib, ...}: {

            options.systemd.services = lib.mkOption {
                type = with lib.types; attrsOf (submodule {
                    config = {
                        serviceConfig.onFailure = "email@%n.service";
                    };
                });
            };

            config = {
                boot.isContainer = true; # this is only as an example

                services.nginx.enable = true;
            };

        };
    };
in nixos.config
$ nix eval -f. systemd.services.nginx.serviceConfig.onFailure
"email@%n.service"
2 Likes

Note this probably won’t affect service units shipped builtin with systemd, nor service units added via the systemd.packages option.

1 Like

Resurrecting this: is there a canonical implementation of this today? I’d like to send an email to myself on user service failures also.

monitoring software is the answer, of course, but to directly answer your question is to use the systemd top level overrides feature like so:

{ config, lib, pkgs, ... }: {
  systemd.packages = [
    (pkgs.runCommandNoCC "on-failure.conf" {
      preferLocalBuild = true;
      allowSubstitutes = false;
    } ''
      mkdir -p $out/etc/systemd/system/service.d/
      echo "[Service]\nOnFailure=email@%n.service" > $out/etc/systemd/system/service.d/on-failure.conf
    '')
  ];
}

untested, so please ping back if this is incorrect and we can edit


reference on discourse: How to use toplevel-overrides for systemd - #4 by hmenke

Thanks; added to my agenda.

monitoring software is the answer

This escaped me. I temporarily tried netdata out some time ago but stopped for several reasons. Any general recommendations to consider?

i think netdata is somewhat popular among people running NixOS for hobby usage…

I personally use healthchecks.io (self-hosted version, but their free tier is pretty damn generous). It has integration for nearly everything, including email, slack, ntfy (which has mobile app), whathaveyou.

Works pretty well to keep me updated on automated NixOS updates and backup failures.

3 Likes

See also writeTextDir and toINI:

pkgs.writeTextDir "etc/systemd/system/service.d/alert.conf" (lib.generators.toINI { } {
  Service.OnFailure = "alert@%n.service";
});
2 Likes

Sending mail from systemd is possible, but can a bit of a pain, depending on the mailserver setup.

Here’s another example that sends different signals to a healthchecks.io instance depending on the service results and also sends logs on failure.

1 Like

I just did today and seems like all the useful stuff is locked behind their cloud offering. Customizability is also a bit lacking.

It seems like prometheus and friends is the way to go for now…