Make ACME renew systemd-service depend on DNS/nss-lookup?

I wonder if the ACME renew service should depend on nss-lookup.target or not.
I have unbound as local DNS, when doing nixos-rebuild, the ACME renew services cannot resolve letsencrypt hosts because the local DNS (unbound) is not yet started.

Unbound systemd sets it should start before nss-lookup.

The most logical thing should be that the ACME renew services should execute after nss-lookup.target

I can fix this in my own configuration with some hacks, but I wonder if this shouldn;t be the default in NixOS?

Chances are high you are one of very few users of this setup, and it‘s awesome that you put your idea into the open instead of being satisfied with your hack. Why not submit a PR and include your reasoning? Seems like ideally it is just one line of code. I‘ll be happy to review if you reference the PR in this thread. That would be yet another little thing that Just Works™, and your contribution will make NixOS and the world a better place.

I think adding nss-lookup.target here (in after and/or wants?) should work:

I’ve followed your suggestion of adding nss-lookup.target to the after dependencies of the ACME renewal job:

{
  systemd.services."acme-henrimenke.com".after = lib.mkAfter [ "nss-lookup.target" ];
}

However, this still doesn’t fly for me. In the journal at boot, I can clearly see that the ACME renewal is started after Unbound, but it still fails:

Aug 10 04:42:29 henrimenke.com systemd[1]: Started Unbound recursive Domain Name Server.
Aug 10 04:42:29 henrimenke.com systemd[1]: Reached target Host and Network Name Lookups.
Aug 10 04:42:29 henrimenke.com systemd[1]: Starting Renew ACME Certificate for henrimenke.com...
Aug 10 04:42:29 henrimenke.com yz6frbp9w9s150pq449sxva06iyk95fz-acme-start[790]: 2020/08/10 04:42:29 Could not create client: get directory at 'https://acme-v02.api.letsencrypt.org/directory': Get "https://acme-v02.api.letsencrypt.org/directory": dial tcp: lookup acme-v02.api.letsencrypt.org: no such host
Aug 10 04:42:29 henrimenke.com yz6frbp9w9s150pq449sxva06iyk95fz-acme-start[790]: 2020/08/10 04:42:29 Could not create client: get directory at 'https://acme-v02.api.letsencrypt.org/directory': Get "https://acme-v02.api.letsencrypt.org/directory": dial tcp: lookup acme-v02.api.letsencrypt.org: no such host
Aug 10 04:42:30 henrimenke.com systemd[1]: acme-henrimenke.com.service: Main process exited, code=exited, status=1/FAILURE
Aug 10 04:42:30 henrimenke.com systemd[1]: acme-henrimenke.com.service: Failed with result 'exit-code'.
Aug 10 04:42:30 henrimenke.com systemd[1]: Failed to start Renew ACME Certificate for henrimenke.com.

I’m puzzled, I’m afraid I don’t know enough about both systemd and Nix to understand why this does not work

it seems the following (closed) GitHub issue is related:
https://github.com/NixOS/nixpkgs/issues/85794

1 Like

Can you check if 20.09 fixes it for you?

It does for me: ACME renewals fail due to DNS being unavailable during switch · Issue #85794 · NixOS/nixpkgs · GitHub

@nh2 unfortunately it doesn’t, it still tries to run ACME renew before unbound is started

hopefully this will solve it:
https://github.com/NixOS/nixpkgs/pull/101218

@andir does your refactor also (as a side effect) resolves the problem described in the first post?

Unfortunately it can’t fix that outside of system start. The way that NixOS is currently changing generations doesn’t allow enforcing service dependencies in targets that have previously been reached. It is a limitation with systemctl. There was some talk about rewriting our activation script with that limitation in mind but nobody has put in a lot of effort yet. @arianvp is one of those that I discussed it with in #nixos-systemd (on Freenode).

OK, thanks for the clarification! I now understand why this is a difficult issue.

Related:

https://github.com/NixOS/nixpkgs/issues/106862