I deploy my server with NixOps and I want to be notified about systemd service failures. Therefore I tried to extend every systemd service with onFailure. Currently I have something along the lines of
Just another idea. Systemd publishes all event changes of systemd units over dbus. You could listen to these events and trigger something any time a service goes to status failed.
Unit files now support top level dropin directories of the form
<unit_type>.d/ (e.g. service.d/) that may be used to add configuration
that affects all corresponding unit files.
In the stockholm there is a module called krebs.on-failure which essentially links a separate on-failure.plans.<service-name> to the service:
The essential piece of configuration can be found at:
However it only attaches to explicitly marked services: krebs.on-failure.plans.snapraid-sync.name = "snapraid-sync";[source]
The idea is that most of the time you know which services are important and where you definitly want to receive a mail once something dies.
Thank you for your suggestions. I learned about some cool stuff. Much appreciated
For the moment I just use my solution and specify explicitly the services for that I definitely want to receive notifications.
I personally use healthchecks.io (self-hosted version, but their free tier is pretty damn generous). It has integration for nearly everything, including email, slack, ntfy (which has mobile app), whathaveyou.
Works pretty well to keep me updated on automated NixOS updates and backup failures.