Predictable network interface names in initrd

This is a topic that has come up before (and issue #39069) and generated multiple PRs (#39329, #47664); I’d like to try and understand the current situation and how to achieve the stated goal in the title at present, and hopefully even help get any remaining issues with this fixed upstream.

To avoid any confusion, the desired end goal is that I want my Ethernet device to be enp4s0 even in the initrd network (which uses dropbear to receive a disk encryption password over SSH), not to have eth0 consistently (which can be achieved by setting the NixOS networking.usePredictableInterfaceNames option, if it works (I think net.ifnames=0 is still supported), or else by setting the net.naming-scheme=... kernel command-line parameter (sources: systemd.net-naming-scheme(7), systemd-udevd.service(8))).

Previously, I was using the workaround described in the previous Discourse thread and enhanced in the latter PR successfully. Unfortunately, it doesn’t seem to be working any more; as I described in this comment, the 80-net-setup-link.rules file that this technique uses has been removed from NixOS as of commit 1f03f6f. Apparently, udev now does the renaming itself in C code, and therefore the .rules file is no longer required. But unfortunately, the initrd network does not seem to behave correctly in practice – NixOS stage2 starts with eth0 and fails to rename it due to the device being busy.

Hopefully someone is able to help shed any light on this confusing issue or even just offer a working workaround.

I was under the impression that we loaded all the udev rules that the udev package packages, (which includes the upstream 80-net-setup-link.rules. which calls the required builtin that reads the 99-default.link file that does the renaming. Maybe I am wrong here though.
if not, I think we should defenitely load the udev rules that the udev package comes with (which is just an alias to the systemd package by the way), because otherwise the udev renaming policy is not called.

So @andir 's commit seems only partially right. Perhaps he can shed some light on it? Because I’m also confused now. I think that we should definitely load the udev rules which systemd ships with, but I dont know how to find out if that is actually happening

Since we call udev in stage-1 then I would expect the renaming to happen correctly (if this rule is loaded by udev)

So my suspicion is right, if you cal sudo /run/current-system/systemd/lib/systemd/systemd-udevd -D you will see that the requires 80-net-setup.rules is indeed loaded:

(... snip ...)

Reading rules file: /nix/store/niw0jbw29x0rg85m7z8j5gll16y37g5n-systemd-242/lib/udev/rules.d/80-drivers.rules
Reading rules file: /nix/store/niw0jbw29x0rg85m7z8j5gll16y37g5n-systemd-242/lib/udev/rules.d/80-net-setup-link.rules
Reading rules file: /nix/store/qa1913xrl3x8g4nbcawj3zbarvgx7fix-udev-rules/80-udisks2.rules
Reading rules file: /nix/store/niw0jbw29x0rg85m7z8j5gll16y37g5n-systemd-242/lib/udev/rules.d/90-vconsole.rules
Reading rules file: /nix/store/qa1913xrl3x8g4nbcawj3zbarvgx7fix-udev-rules/95-dm-notify.rules
Reading rules file: /nix/store/qa1913xrl3x8g4nbcawj3zbarvgx7fix-udev-rules/98-ipv6-privacy-extensions.rules
Skipping empty file: /nix/store/qa1913xrl3x8g4nbcawj3zbarvgx7fix-udev-rules/99-ipv6-privacy-extensions.rules
Reading rules file: /nix/store/qa1913xrl3x8g4nbcawj3zbarvgx7fix-udev-rules/99-local.rules
Reading rules file: /nix/store/niw0jbw29x0rg85m7z8j5gll16y37g5n-systemd-242/lib/udev/rules.d/99-systemd.rules

(This is in stage-2)

However, in stage-1 we only seem to be loading very specific udev rules, e.g. not the one for renaming the interface: https://github.com/NixOS/nixpkgs/blob/c45bf10e9f314ffd3ddf089761f2ec905b288878/nixos/modules/system/boot/stage-1.nix#L201

Perhaps we should change that!

I’ll change the stage-1 udev call to run in debug mode to confirm my suspicion.

@emily could you see if https://github.com/NixOS/nixpkgs/pull/68953 fixes this for you?

Unfortunately, I get the same errors about eth0 being busy in stage-2 after cherry-picking your PR (and fixing a typo):

Sep 17 09:50:25 patchouli systemd-udevd[1431]: eth0: Failed to rename network interface 2 from 'eth0' to 'enp4s0': Device or resource busy

It seems like the device is remaining as eth0 for the duration of stage-1 even with your changes. Sadly I have no particular idea what could be the problem…

I’m currently not using networkd; perhaps that might help?

I experienced the same issue and added a comment to the PR which bumped systemd to 242 (https://github.com/NixOS/nixpkgs/pull/61321#issuecomment-529999507).

I tried to use the the new 80-net-setup.rules from systemd 243 as well, however I experienced the same issue as @emily did. My current workaround (which is rather ugly unfortunately) is to actually use the old 80-net-setup.rules from nixpkgs (which was incorporated from an older systemd version).

Thanks for that. That adds a bit more information for me to debug…

Am I right that just enabling networking in stage-1 reproduces this issue ? And with the issue I mean:

  1. in stage-1, the network is not renamed
  2. in stage-2 the renaming fails because the interface is “busy”

Am I also right to assume that when networking is disabled in stage-1, that the renaming in stage-2 succeeds?

It may also have something to do with having an open connection at the time of stage-2 init; after I give the ZFS encryption passphrase, the ssh connection dies hard quite abruptly and so I think it’s possible sockets aren’t being properly shut down between stages. So minimal reproduction might be waiting for the existence of a file or something at init time, sshing it and touching it.

For everyone having this problem on NixOS 19.09, take a look at https://github.com/NixOS/nixpkgs/pull/68953#issuecomment-540851979.

When adding the commit I mentioned in that comment on top of the commits on that PR (#68953), this issue seems to be fixed for me, but I have only done limited testing.