Split traffic based on process

I am attempting to replace my Windows workstation with NixOS. My greatest challenge at this stage is how to do the VPN split-tunneling. My VPN runs on the router, not the workstation itself. On Windows I can use DSCP tagging. I’ll assign one tag to my VDI application and MS Teams, another to my browsers.

But so far I have not been able to figure out how to do that on NixOS/Linux. If somebody knows how to do it I would love to know.

An alternative, albeit less desirable is using 2 network interfaces, one going to my router (which uses the VPN) by default and one going directly to the ISP. But even then, I have no idea how to isolate my processes and route them to specific gateways. Control Groups? Network Namespaces?

I would really appreciate any help in this, feel free to get creative if you can think of another solution. I’ve been stuck on this for a week now.

3 Likes

At the level of Linux, the basic tools here are network namespaces and iptables. Here is a good tutorial on how to set up a network namespace and make it useful. Inside a namespace, you should be able to use iptables to tag all outgoing packets with whatever you want: iptables -t mangle -A PREROUTING -j DCSP --set-dcsp <tag>, probably.

At the level of NixOS specifically, most of this isn’t explicitly exposed as configuration options, but once you’ve figured out the Linux-generic commands to setup your namespaces the way you want them, you can put them in networking.firewall.extraCommands.

1 Like

Unfortunately after dozens of pages just like it I ended up finally creating an account here because it’s all over the place. And to be honest, I have been trying to switch over to Linux for 2 years now. Everybody keeps telling me how easy it is and yet Windows keeps being more intuitive.

I will try once more. Can I actually set up those namespaces declaratively or do I have no choice but to do that by hand? And does NixOS still use iptables? I have been reluctant to mess with all the iptables stuff since I figured it was using nftables or firewalld.

EDIT: Reading through that page the simplest route would be the double network interface, then connecting the ISP interface to the namespace and running my office apps inside it. That would have all other traffic use the router. Which makes me wonder if it is possible to edit shortcuts in Plasma to use the namespace. Anyways, I should probably try to make the DSCP version work first.

EDIT2: A second question would be if any child-app to an app opened in the namespace is also automatically part of that namespace. Unfortunately the VDI app can only be opened by a browser after authenticating in the browser.

Nixos defaults to using iptables, nftables must be enabled using networking.nftables.enable. The change is transparent in the sense that many options work the same when using iptables or nftables, e.g. networking.firewall.allowedTCPPorts.

Custom configuration differs though. Custom rules are added via networking.firewall.extra*Rules options, not networking.firewall.extraCommands which is iptables specific. Custom tables can be defined using networking.nftables.tables.

2 Likes

So… apparently I can do something like this:

{ config, pkgs, ... }:

systemd.services."netns@" = {
  description = "%I network namespace";
  before = [ "network.target" ];
  serviceConfig = {
    Type = "oneshot";
    RemainAfterExit = true;
    ExecStart = "${pkgs.iproute}/bin/ip netns add %I";
    ExecStop = "${pkgs.iproute}/bin/ip netns del %I";
  };
}

Combined with IfState (since 25.11):

Then it is still a matter of getting the processes to run in the namespace. And of course the tagging or routing.

1 Like

Okay, so I can’t use ifState as long as I am using NetworkManager. That is annoying and means I have to find a different way to configure it.

As stated before, this at least gets me the basics.

{ config, pkgs, ... }:

{
  systemd.services."netns@" = {
    description = "%I network namespace";
    before = [ "network.target" ];
    serviceConfig = {
      Type = "oneshot";
      RemainAfterExit = true;
      ExecStart = "${pkgs.iproute2}/bin/ip netns add %I";
      ExecStop = "${pkgs.iproute2}/bin/ip netns del %I";
    };
  };
  systemd.services.vpn = {
    description = "Split Tunnel";
    bindsTo = [ "netns@vpn.service" ];
    after = [ "netns@vpn.service" ];
    wantedBy = [ "multi-user.target" ];
    serviceConfig = {
      Type = "oneshot";
      RemainAfterExit = true;
      ExecStart = ''
        ${pkgs.iproute2}/bin/ip link add veth0 type veth peer name veth1
      '';
    };
  };  
}

The next step would be to assign veth1 to the namespace. I tried adding ${pkgs.iproute2}/bin/ip link set veth1 netns vpn into ExecStart after the add but that did not do anything, both virtual interfaces remain outside of the namespace.

Obviously having a second physical interface makes things a lot easier as it would have a DHCP IP and I can just link it and be done without any NAT. But I wanted to try doing it with a single interface first.

Anyways, guess I’m stuck again.

I think this is the kind of feature that must be implemented in the NixOS configuration. It would be a good idea to open a feature request on nixos/nixpkgs, along with an API proposal.

At this point I wish I could pay somebody to help me configure this. It’s been so long and so tedious and I just want it to work, finally.

How did you do this? Did you add a second command to the ExecStart string (probably won’t work), or switch to using systemd.services.vpn.script (should work)? If you post the exact code of things that don’t work, we’ll have an easier time helping you.

Also, doing this all with systemd services adds an additional thing that can be done incorrectly. I’d recommend first figuring out which commands need to be run, in a terminal, by trying them out until your system does the right thing. Then, once you know what those commands are, embark on figuring out how to put them into tidy systemd units. That leaves fewer things to be possibly incorrect at any step in the process.

1 Like

I did not want to “pollute” my setup which is why I tried to do it declaratively from the start (though it appears to forget the entire namespace configuration on boot anyway). And from all the searching I have done, it appears that the way to create a network namespace declaratively is through a systemd service.

The LLM-generated answer on how to do that is based on this page. The page uses the 2-stage systemd service method and adding multiple ip commands into the ExecStart.

The first service to create the namespace works fine, as long as I make sure it is actually started. The second one to create the virtual interfaces works as well.

I concluded the next step was to follow the fun with network namespaces link you posted earlier, in which the next step is to assign one virtual interface to the namespace, then set up static IP’s and a NAT route out of it. It would not be that complicated to translate the ifState examples into ip commands if needed. I figured once I knew how to make it execute multiple ip commands in sequence (as in the first link in this post) I could figure out the basic networking and the only step remaining would be to configure the firewall.

And then figure out if Plasma allows me to create a shortcut for an application to start in the namespace, but that’s a luxury problem.

So yes, I did add the second line to the ExecStart. My Linux skills are rather specialized. Some things I am really good at, others really not. This is still new to me.

EDIT:
Been doing some reading to figure out what that script option ends up doing. Does it create a shell script which it then executes in the service? In that case, can it even be used together with ExecStart?

Also, by reading the man-page it appears that when using oneshot I can not do multi-lines but I can do multiple single lines. And using a combination of ExecStart and ExecStartPre could theoretically work. Though using the NixOS-supplied script would be more NixOS conform, so I could try both.

man page excerpt

ExecStart=
Commands that are executed when this service is started.

Unless Type= is oneshot, exactly one command must be given. When Type=oneshot is used, this setting may be used multiple times to define multiple commands to execute. If the empty string is assigned to this option, the list of commands to start is reset, prior assignments of this option will have no effect. If no ExecStart= is specified, then the service must have RemainAfterExit=yes and at least one ExecStop= line set. (Services lacking both ExecStart= and ExecStop= are not valid.)

If more than one command is configured, the commands are invoked sequentially in the order they appear in the unit file. If one of the commands fails (and is not prefixed with “-”), other lines are not executed, and the unit is considered failed.

Unless Type=forking is set, the process started via this command line will be considered the main process of the daemon.

ExecStartPre=, ExecStartPost=

Additional commands that are executed before or after the command in ExecStart=, respectively. Syntax is the same as for ExecStart=. Multiple command lines are allowed, regardless of the service type (i.e. Type=), and the commands are executed one after the other, serially.

If any of those commands (not prefixed with “-”) fail, the rest are not executed and the unit is considered failed.

ExecStart= commands are only run after all ExecStartPre= commands that were not prefixed with a “-” exit successfully.

ExecStartPost= commands are only run after the commands specified in ExecStart= have been invoked successfully, as determined by Type= (i.e. the process has been started for Type=simple or Type=idle, the last ExecStart= process exited successfully for Type=oneshot, the initial process exited successfully for Type=forking, “READY=1” is sent for Type=notify/Type=notify-reload, or the BusName= has been taken for Type=dbus).

Note that ExecStartPre= may not be used to start long-running processes. All processes forked off by processes invoked via ExecStartPre= will be killed before the next service process is run.

Note that if any of the commands specified in ExecStartPre=, ExecStart=, or ExecStartPost= fail (and are not prefixed with “-”, see above) or time out before the service is fully up, execution continues with commands specified in ExecStopPost=, the commands in ExecStop= are skipped.

Note that the execution of ExecStartPost= is taken into account for the purpose of Before=/After= ordering constraints.

By the way, it might not even be possible to execute ip commands in the namespace from outside the namespace, so I may end up having to do it by hand after all. At least, I have so far not found any way (except ifState) to configure the inside of a namespace.

In which case, 2 network interfaces and assigning 1 exclusively to the namespace would be the only solution which could be automated.

This is very tricky to do right, and I wouldn’t recommend it unless you sort of know what you’re doing. NixOS in particular makes using network namespaces very difficult due to its use of nscd, which requires some pretty ugly hacks to prevent DNS from leaking. I’ll share the config I’ve been using to run a a systemd service restricted to a wireguard interface with a network namespace:

systemd.services.myService = {
  bindsTo = ["netns@wg.service"];
  requires = ["network-online.target"];
  after = ["wg.service"];
  wants = ["wg.service"];

  serviceConfig = {
    NetworkNamespacePath = "/var/run/netns/wg";
    
    # `nscd` on NixOS leaks DNS queries from network namespaces.
    # See https://github.com/NixOS/nixpkgs/issues/428554
    # This will mask the socket to prevent this from happening
    TemporaryFileSystem = [
      "/var/run/nscd"
    ];

    # Another undocumented gotcha: NetworkNamespacePath on systemd has different behavior
    # from `ip netns exec` in that it does not by default bind mount namespace-specific
    # config files over the global config file. So, we need to do this manually.
    BindReadOnlyPaths = [
      "/etc/netns/wg/resolv.conf:/etc/resolv.conf:norbind"
      "/etc/netns/wg/nsswitch.conf:/etc/nsswitch.conf:norbind"
    ];
  };
};

systemd.services.wg = {
  description = "wg network interface";
  bindsTo = ["netns@wg.service"];
  requires = ["network-online.target"];
  after = ["netns@wg.service"];
  serviceConfig = {
    Type = "oneshot";
    RemainAfterExit = true;
    ExecStart = with pkgs;
      writers.writeBash "wg-up" ''
        set -e
        ${iproute}/bin/ip link add wg0 type wireguard
        ${iproute}/bin/ip link set wg0 netns wg
        ${iproute}/bin/ip -n wg address add ${ip} dev wg0
        ${iproute}/bin/ip netns exec wg \
          ${wireguard-tools}/bin/wg setconf wg0 ${vpnConfPath}
        ${iproute}/bin/ip -n wg link set lo up
        ${iproute}/bin/ip -n wg link set wg0 up
        ${iproute}/bin/ip -n wg route add default dev wg0
      '';
    ExecStop = with pkgs;
      writers.writeBash "wg-down" ''
        ${iproute}/bin/ip -n wg route del default dev wg0
        ${iproute}/bin/ip -n wg link del wg0
        ${iproute}/bin/ip -n wg link del lo
      '';
  };
};

systemd.services."netns@" = {
  description = "%I network namespace";
  before = ["network.target"];
  serviceConfig = {
    Type = "oneshot";
    RemainAfterExit = true;
    ExecStart = "${iproute}/bin/ip netns add %I";
    ExecStop = "${iproute}/bin/ip netns del %I";
  };
};

environment.etc = {
  # The DNS servers you want to use inside the namespace.
  # Get these from your VPN provider, ideally.
  "netns/wg/resolv.conf".text = ''
    nameserver <ipv4>
    nameserver <ipv6>
  '';

  # This is needed otherwise DNS could leak over systemd-resolved
  "netns/wg/nsswitch.conf".text = ''
    passwd:    files mymachines systemd
    group:     files mymachines systemd
    shadow:    files
    sudoers:   files

    hosts:     files mymachines myhostname dns
    networks:  files

    ethers:    files
    services:  files
    protocols: files
    rpc:       files
  '';
};

I hope that this helps.

1 Like

Interesting, thank you. I will look into this, see what I can learn. Unlikely to be applicable in my case though. My WireGuard is running on the OpenWRT router. It does policy-based routing. My challenge is to let it know which packets should be routed outside of the VPN (things like my work) and which need to be routed through the VPN (everything else).

On Windows it’s 1 minute of clicking to tag application traffic using a QoS application-based policy. On Linux, not so much. The only reason I ended up with network namespaces is because it seems to be the only way to separate network traffic. Which then allows me to tag the traffic from the namespace on the Linux firewall.

EDIT:
Okay, NixOS does not support the multi-use of ExecStart which systemd supposedly does in the case of oneshot. It results in an error that ExecStart is already defined.

Using ExecStart and script results in an error about ExecStart having conflicting values, so I guess what the script option does is create an ExecStart.

Using only the script option will execute both lines it seems. I now have veth0 on the outside and veth1 on the inside. So… now I have to add ip netns exec vpn ip etc etc to configure inside the namespace?

Two steps forward, one step back. But I’ll take it, thank you for suggesting the script option, @rhendric

You can have multiple ExecStart values (same for other systemd directives) by defining it at a list e.g. {systemd.services.test-unit.serviceConfig.ExecStart = [ "/bin/sh -c true" "/bin/sh -c true" ];}

This will also work when script is defined, although you might need to use lib.mkBefore and lib.mkAfter to change the order.

Right, that makes sense, thanks. The type of notation is still a bit unfamiliar to me. Usually nixos-rebuild tells me which mistake I make.