Macvlan network devices not being properly cleaned up?

I’m running into a couple different problems that seem like they might be related, all involving macvlan network interfaces.

First, I’m running a couple NixOS containers to which I want to assign separate IPs, so I’ve been creating macvlans for them to use (via containers.macvlans), and then configuring the network within the container. This works well, and the container can listen on the new IP without interfering with the parent host. However, whenever I rebuild the OS on the parent host (pushed using NixOps), the container can no longer assign the IP address to the macvlan interface with the error: “RTNETLINK answers: Address already in use”.

I don’t see the address in ip addr in the container, or on the parent host. I’m not able to ping it or anything. Restarting the container doesn’t help. So far, the only way I’ve found to fix it is to restart the parent host itself.

I’ve also got a couple hosts that use a macvlan as their primary interface (no container involved). A couple of them have fixed IP addresses which I configure manually; the others are using DHCP. When I push a new build to them, the static-IP hosts come back just fine, but the dynamic ones occasionally disappear. It’s got something to do with the build (they all succeed or fail together), and it’s been happening over a couple years. It seems like changes affecting networking (and requiring a restart of network services) cause DHCP to fail. Meanwhile, hosts that do not use a macvlan work fine.

Does anybody know what I’m doing wrong, or what the problem might be? In the first case, where is that IP assigned? I don’t see any interfaces or aliases anywhere. Is there a way to find and unassign it?

Edit: Out of curiosity, I tried a restart (instead of upgrade) on the container from the first example. That worked just fine!

I know this has been posted a while ago but in case anyone wanders here I’ve found out that at least in my case root-login session seem to be stuck between container restarts/rebuilds (perhaps some lost tmux session somewhere).

In this case you might want to check lsns -t net and look for the the stuck process. Send a SIGKILL to those PIDs and the macvlan should be freed and functional after you restart the container.

# lsns -t net
        NS TYPE NPROCS    PID USER    NETNSID NSFS COMMAND
4026531840 net     317      1 root unassigned      /run/current-system/systemd/lib/systemd/systemd
4026532680 net       1 265458 root unassigned      /nix/store/jpbahkdhclhdqrs7ay8wlqdn0110aqbj-util-linux-2.39.3-bin/bin/nsenter -t 9241 -m -u -i -n -p -- /nix/store/7f5fd42ixcmms2g06dwz6jnnbcgzhwc7-shadow-4.14.6-su/bin/su root -l
4026532751 net      12 407561 root unassigned      /run/current-system/systemd/lib/systemd/systemd /nix/store/lkmf7g56mrq6nml5d8yqs9kmgr48ayiq-nixos-system-hass-24.05pre-git/init
4026532752 net      19   1757 root unassigned      /run/current-system/systemd/lib/systemd/systemd /nix/store/9nlh4gwp6lhgm4yh9kks9rbjfxcrmznj-nixos-system-monitor-24.05pre-git/init
4026532773 net      18 166399 root unassigned      /run/current-system/systemd/lib/systemd/systemd /nix/store/plqjhrq1k93ykvrk31iz2yv2v9y2mc4a-nixos-system-timemachine-24.05pre-git/init
kill -KILL 265458
1 Like