EC2 metadata not available in runCommand

noah · August 19, 2021, 2:58am

Very new to NixOS, trying to figure out where I’m being stupid here.

I’m using the latest official AMI to launch an EC2, and I’m trying to read some secrets from AWS Secrets Manager as part of the initial build using runCommand. The IAM role attached to the instance has permissions to read from my secret.

Even though I can read the secrets if I use the CLI after the build is complete, it seems that runCommand doesn’t have any of the networking/system setup that would allow it to use the EC2 instance metadata. I know that setup must come from some of the imported modules in the virtualisation directory, but I can’t for the life of me figure out how to actually “import” that environment into a runCommand.

Here's my configuration.nix:

{ modulesPath, config, pkgs, ... }: {

  imports = [
    "${modulesPath}/virtualisation/amazon-image.nix"
  ];

  ec2.hvm = true;

  environment.systemPackages = with pkgs; [
    vim
    wget
    curl
    awscli2
    consul
  ];

  services.consul = {
    enable = true;
    extraConfigs = {
      tlsCertKey = 
        let
          secrets = pkgs.runCommand "secrets" {
            buildInputs = [ pkgs.awscli2 ];
          } ''
            aws secretsmanager get-secret-value --secret-id mycluster/testsecret > $out
          '';
        in
          builtins.readFile secrets;
    };
  }
}

I then get errors either like this:

$ nixos-rebuild switch
building Nix...
building the system configuration...
building '/nix/store/q99bkqx0f0q2rn8sxpc26wpznyisvdji-secrets.drv'...

You must specify a region. You can also configure your region by running "aws configure".

Or if I supply a region, then this:

$ nixos-rebuild switch
building Nix...
building the system configuration...
building '/nix/store/vfsp64hisbgwr2j0vfxcfqvd8pyfsysc-secrets.drv'...

Unable to locate credentials. You can configure credentials by running "aws configure".

So what am I missing here? Is my approach all wrong, or do I just need to pass something to get the instance metadata / AWS network dependency inside of a runCommand?

noah · August 19, 2021, 10:47pm

Ok so I think my strategy here is actually going to be a systemd oneshot script that runs after nixos-rebuild switch.

I will update the thread with my solution once it’s complete.

freelock · August 18, 2024, 9:35pm

So I just ran across this myself. It looks like on my EC2 Nixos host, it’s running dhcpcd to get its network configuration – but this creates interfaces and routes for 169.254.0.0/16, intercepting any requests to Amazon’s instance-metadata service.

This is also breaking the “ec2metadata” CLI tool in cloud-utils.

I think the best thing to do may be to configure networking manually – which probably means using an elastic IP address (which I want to do anyway, after I have this running correctly).

Does anyone have good guidance on running Nixos on EC2 that has solved this problem – being able to reach the instance metadata service at http://169.254.169.254 ?

aos · August 19, 2024, 2:19am

Are you using a custom image or the official published AMIs? NixOS Amazon Images / AMIs

freelock · August 19, 2024, 2:33am

Yes, I started with a “community” AMI - nixos/24.05.3525.dc0bac1f584c-aarch64-linux . But then applied my flake-based config.

I tracked it down to networking.dhcpcd.enabled – this defaults to true, but I have some Digital Ocean servers with this set to false, so I tried doing that to the Amazon host – and it disconnected from the network entirely! Couldn’t even connect to it using the Amazon tty console.

dhcpcd was creating link-local interfaces that were adding a route for 169.254.0.0/16 , which I think might be intercepting the connection to Amazon’s instance metadata address…

So I started over, and created a networking module with the assigned IP address and default route, with networking.dhcpd.enabled = false, and successfully got it working.

freelock · August 19, 2024, 2:37am

FWIW, I only configured the private Ip address and route – here’s what the module looks like:

{ lib, ... }: {
  networking = {
    nameservers = [ "8.8.8.8"
 ];
    defaultGateway = "172.31.16.1";
    dhcpcd.enable = false;
    usePredictableInterfaceNames = lib.mkForce false;
    interfaces = {
      ens5 = {
        ipv4.addresses = [
          { address="172.31.22.22"; prefixLength=20; }
        ];
        ipv4.routes = [ { address = "172.31.16.1"; prefixLength = 32; } ];
      };
    };
  };
}

… I’m not entirely sure if the usePredictableInterfaceNames is needed, that’s leftover from my DigitialOcean config…

Adding an elastic IP address did not involve changing this in any way.

freelock · September 23, 2024, 2:50am

Hi,

So I had some trouble with this approach – an update and reboot of the instance using the networking module as above failed to reboot – just nothing. I connected to the serial console of the instance, and it doesn’t even appear to have a boot loader – no grub screen shows, no boot log, nothing, even after multiple attempts to reboot.

It turns out I’m dealing with two different issues on this:

Metadata service is set to v2, which requires an authentication token (this appears to be set on the AMI, but it also prevents accessing the metadata service – I’m not seeing how to get nixos-rebuild to be able to handle this at all, it all just fails.
dhcpcd, combined with a custom Docker network, adds routes to the routing table that intercept connections to 169.254.0.0/16 (which intercepts connections to 169.254.169.254, the instance metadata service).

To fix #1, I had to change the instance metadata to “http-token” - “optional”, which is the “v1” version of the metadata service. This allows retrieving the data from the instance without a token, allowing nixos-rebuild switch to complete.

For #2, I’m not understanding why it’s adding a 169.254.. to all of my Docker containers, with the route that breaks access – I’m defining a Docker network that is using 172.18.0.0/16, so why is it also adding these other routes?

❯ ip route
default via 172.30.1.1 dev ens5 proto dhcp src 172.30.1.76 metric 1002 mtu 9001 
169.254.0.0/16 dev veth63baf1b scope link src 169.254.16.143 metric 1006 
169.254.0.0/16 dev veth1468eb9 scope link src 169.254.96.211 metric 1008 
169.254.0.0/16 dev vethfef7083 scope link src 169.254.249.152 metric 1010 
169.254.0.0/16 dev veth3fe28d4 scope link src 169.254.76.134 metric 1012 
169.254.0.0/16 dev veth828d5dd scope link src 169.254.163.68 metric 1014 
169.254.0.0/16 dev vethb8f87f9 scope link src 169.254.224.149 metric 1016 
169.254.0.0/16 dev vethe61d9fe scope link src 169.254.89.64 metric 1018 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown 
172.18.0.0/16 dev br-68bf452bef35 proto kernel scope link src 172.18.0.1 
172.30.1.0/24 dev ens5 proto dhcp scope link src 172.30.1.76 metric 1002 mtu 9001

… I’ve added a route to my configuration.nix on the main network interface, for the metadata service:

  # Fix instance metadata route - be sure to set the right gateway address
  networking.interfaces.ens5.ipv4.routes = [
     { address = "169.254.169.254"; via = "172.30.1.1"; prefixLength = 32; }
    ];

… this works for a while, but when I come back later the route is gone – I’m thinking a dhcp renewal is re-configuring the routes without the Nix config.

After this happens, nixos-rebuild switch does not put the route back – but systemctl restart network-addresses-ens5.service does.

Anybody have any suggestions for either getting rid of these extra 169.254 Docker addresses/routes, or making a route that stays active?

Also, is there a better way to handle tokens, for the v2 instance metadata service?

Thanks,
John