Footgun warning: don't `set -e` in an activation script!

I’ve debugged my issue already but I wanted to create this thread to hopefully save others from hours of debugging. Maybe there should be a lint/test/assertion for this in nixpkgs?

It all started when I went to install my working NixOS config onto a new system, for which secrets hadn’t been provisioned yet. I ran nixos-install but got this weird output:

< other output ... >
/nix/store/vqn6z63pwzkg6whhkzji5jdd16kflid6-sops-install-secrets-0.0.1/bin/sops-install-secrets: failed to decrypt '/nix/store/<one of my secrets>': Error getting data key: 0 successful groups required, got 0
/nix/var/nix/profiles/system/sw/bin/bash: line 12: /run/current-system/bin/switch-to-configuration: No such file or directory

At first, I disregarded that second-to-last line, because it’s completely expected for the secret key to not be provisioned yet and the system should still boot fine without it. However, after debugging the missing switch-to-configuration for a while, I realized that /run/current-system was missing entirely and that the activation script should’ve created it but hadn’t. The activation script was bailing on the sops-install error, but it shouldn’t, as there’s no set -e at the top of the file only an ERR trap that recorded the status for the exit code at the end of the script. Except I had added my own activation script:

  system.activationScripts.createMyWgKey = lib.mkIf (!hasWireguardKeySecret) {
    deps = [ "users" ];
    text = ''
      set -euo pipefail
...

I begin almost all my scripts with set -euo pipefail, so writing it here was second nature to me. But what I didn’t realize is that “activationScripts” is not a set of independent scripts but instead snippets which are concatenated into a single activation bash script. My “set -e” caused the later sops-install failure to crash the entire system activation script, causing nixos-install to fail on the missing current-system!

Now I know that you have to be very careful not to modify the script environment in activation scripts, but this seems like a very easy way to cause obscure failures, so I thought I’d create this topic to warn others and also provide something to search for in case “switch-to-configuration” can’t be found by nixos-install for someone else in a similar situation.

13 Likes

Yeah this is unfortunately common with options that merge things into a single shell script.
I’ve always wondered why the activation scripts are handled that way.

The defensive technique I like is to use a subshell:

(
  set -euo pipefail
  ...
)

I also do that to explicity scope vars/env changes: (cd x; ...); ...

9 Likes

I had a look at home-manager activation scripts because they were mentioned in another post. It seems like while it uses the same script concatenation scheme, set -euo pipefail is currently a noop because home-manager already sets it at the beginning of the activation script: home-manager/modules/home-environment.nix at aeabc1ac63e6ebb8ba4714c4abdfe0556f2de765 · nix-community/home-manager · GitHub

Still be careful messing with the script environment, and I still wouldn’t recommend set -e in home-manager activation scripts as best case it does nothing, but unlike in system activation scripts it doesn’t look like it’ll break anything right now.

1 Like