Proper way to build a remote system with flakes

I’m trying to switch to a flake based deploy.
Currently, in the same repo I have:
./api.nix - a Go module that you can nix-build api.nix, or nix-shell api.nix
ops/deploy.nix - a nix file that imports api.nix, and nix-build ops/deploy.nix builds the whole nixos system in ./result.

After pushing to Github, a github action builds the whole nixos system (that transitively builds ./api), and it copies the newly built system with nix-copy-closure to staging and production boxes.

I’ve read a bunch of docs and tutorials, but they’re either building a local package (in my case just api.nix), or they’re using nixos-rebuild to build the local nixos system on the computer that’s executing nixos-rebuild. So the 3 main quesitons are:

  1. What would the minimal flake look like that builds a production nixos system and imports a couple of local .nix files (go modules)?
  2. What’s the way to pass a parameterized argument to those builds? (e.g. nix build --argstr hostname [production|staging])
  3. How to pass an imported Go module to the configuration.nix and refer to it internally from configuration.nix?

I assume that after switching to flake-based system, one can still use nix-copy-closure based deployent.

TIA

1 Like

I would add the local .nix files to the flake, probably through an overlay. Home-manager example

I wouldn’t I would just have different flakes .#nixosConfigurations.staging and .#nixosConfigurations.production. If you want logic around which environment you’re in, you can pass parameters through specialArgs. Home-manager example, though extraSpeicalArgs is what’s exposed here, it’s generally specialArgs if you’re doing a nixosSystem

See first response. Use an overlay.

2 Likes

You might find some joy with my experimental tool GitHub - MatthewCroughan/nixinate: Another NixOS Deployment Tool - Nixinate your systems 🕶️. An interesting blog post to read if you’re interested in doing it yourself is Industrial-strength Deployments in Three Commands - Vaibhav Sagar

2 Likes

@matthewcroughan thank you for the response - I’m looking for the minimal flake.nix (and a build command) that produces ./result.nix with the nixos system. Looking forward to expand the config later, but still missing the “hello world” with no external tools

Thanks @jonringer. I’m surprised overlays are the official way to do this. Will try to parse your config and get rolling again. I think the official docs would really benefit from a few minimal examples like building a vm, building a nixos, building a digital ocean image.
For the people not involved in core development, I think this might be one of the blockers for adopting flakes.

If you copy that closure and then activate it remotely, you won’t install a bootloader entry, which means that your generation will be lost upon reboot. You need to use nix-env in order to install the system, such that it will work on reboot.

sudo nix-env --profile /nix/var/nix/profiles/system --set <path>

https://github.com/NixOS/nixpkgs/issues/82851

If you’re totally sure that you want to build the system closure and do something custom with it, then you would want to do something like nix build .#nixosConfigurations.<nameOfSystem>.config.system.build.toplevel.

The following flake.nix example would expect your system configuration to be in ./configuration.nix:

{
  inputs = {
    nixpkgs.url = "github:nixos/nixpkgs/nixos-21.11";
  };

  outputs = { self, nixpkgs }: {
    nixosConfigurations = {
      myMachine = nixpkgs.lib.nixosSystem {
        system = "x86_64-linux";
        modules = [
          (import ./configuration.nix)
        ];
      };
    };
  };
}

That is the most basic Flake, and you could run nixos-rebuild switch --flake .#myMachine --target-host user@system to deploy it to a remote. This is precisely what Nixinate does for you, it just makes a wrapper for every nixosConfiguration you already have in your flake.nix and makes it possible to declare things like the ssh user in your config, so you don’t have to type all that in each time. I don’t really think there’s a need to reinvent wheels here, but people do get very confused about how complex this needs to be, and so invent their own solutions.

1 Like

I agree, nix has a lot of good reference documentation. But examples are usually in the form of blog posts, or the rare good wiki page.

1 Like

I don’t really think there’s a need to reinvent wheels here, but people do get very confused about how complex this needs to be, and so invent their own solutions.

Not sure I’m 100% onboard with that statement. I’ve bricked a production box before this way, and don’t have access to it’s GRUB. My deploy script sets a time bomb on the target host, which rolls back if it doesn’t receive a ping and respond after deploy. It also copies closures on all boxes on each deploy but without activating on production. It activates on staging automatically, and only if all is good, I manually trigger activate on prod. This eliminates waiting, redundant building etc.
Side note - the GRUB bug doesn’t affect me because I manually keep a list of pointers of commitnix hash.

Back on topic:

Regarding the flake, either something’s wrong in my configuration.nix or the build command itself. Running nix build '.#nixosConfigurations.staging' returns
error: 'nixosConfigurations.staging.type' is not a string but a set.

Using nixos-rebuild --flake '.#nixosConfigurations.staging' build produces
error: flake 'git+file:///home/supermarin/code/...omitted...' does not provide attribute 'packages.x86_64-linux.nixosConfigurations."nixosConfigurations.staging".config.system.build.toplevel', 'legacyPackages.x86_64-linux.nixosConfigurations."nixosConfigurations.staging".config.system.build.toplevel' or 'nixosConfigurations."nixosConfigurations.staging".config.system.build.toplevel'.

In any case pretty cryptic thus I was asking for a working minimal hello world.
I tried with a minimal configuration.nix from Flakes - NixOS Wiki as well, same error.

There is no reason whatsoever that you could not implement that time bomb in https://github.com/NixOS/nixpkgs/blob/c28fb0a4671ff2715c1922719797615945e5b6a0/pkgs/os-specific/linux/nixos-rebuild/nixos-rebuild.sh and make it available for everyone via --timebomb. I plan to implement that in Nixinate also. Is it possible you could share the way you implement that timebomb?

I manually trigger activate on prod. This eliminates waiting, redundant building etc.

At the cost of more complexity, and cognitive dissonance, yes. That’s a factor most people are not considering when implementing deployment systems.

Regarding your error, I cannot comment because you haven’t provided the code you’re trying to build. nixos-rebuild --flake .#nixosConfigurations is wrong though, nixosConfigurations is assumed and prepended automatically, so you merely have to do nixos-rebuild --flake .#staging.

It’s not cryptic, but the flake schema is a bit misunderstood. Once Nix 3.0 comes out, and all the documentation is written, it will seem a lot easier for people like you, I am sure. It just needs a bit more time.

the output schema has a bit of enlightenment here.

https://nixos.wiki/wiki/Flakes

1 Like

To close the loop on this one, here are the answers:

  1. Minimal flake:
{
  description = "Minimal nixOS flake";
  inputs.nixpkgs.url = "github:nixos/nixpkgs/nixos-21.11";
  
  outputs = { self, nixpkgs }: {
    nixosConfigurations = {
      staging = nixpkgs.lib.nixosSystem {
        system = "x86_64-linux";
        modules = [ ./configuration.nix ];
        # Example how to pass an arg to configuration.nix:
        specialArgs = { hostname = "staging"; };
      }; 
    };
  };
}

# ... configuration.nix:
# { config, pkgs, hostname, modulesPath, ... }:
# {
# imports = [ (modulesPath + "/virtualisation/digital-ocean-config.nix") ];
# networking.hostName = hostname;
# ...
# }

Now the build command to end up with a nixOS in ./result:

  1. nix build .#nixosConfigurations.staging.config.system.build.toplevel
    Use this one if you don’t have nixOS but only nix available (e.g. on a CI action)

  2. nixos-rebuild build --flake .#staging
    Note that nixosConfigurations prefix is ommitted if nixos-rebuild is used.
    The error messages differ and are somewhat more usable via nixos-rebuild command, although it lacks a -o flag, so you can’t build multiple results next to each other without manually renaming them between builds.

Thanks @matthewcroughan @jonringer for the pointers.

6 Likes

just a small neat note,

that your flake.nix and configuration.nix, doesn’t have to sit /etc/nixos/ , they can be anywhere that is supported flake repo type (git/svn/mercurial), as flakes are hermetically sealed (AFAIUSI). A system configuration can be built from anywhere now allowing you to do funky things with the flake URI. However to switch to the configuration, you will need to be stoopid user (superuser).

nix flake show github:nixinator/nothing/
nixos-rebuild dry-activate --flake github:nixinator/nothing/#z620

so you can build my ‘machine’. on yours. If that doesn’t blow new users minds, especially the infrastructure as code people… i don’t know want can.

This post will probably prompt me to do some janitorial work on my configs… or ‘nix shaving’ as we like to term it.

flake makes truly sharable operating system configurations possible, which last time i looked has never been possible.

However, hardware brings impurity, so my gfx card, network , containing that to impurity is something i’ve got to think about long term, but for a fleet of cattle, it’s a not really a problem.

3 Likes

This is not directly responding to the question but it looks like what you are trying to do is already implemented in deploy-rs:

I’ve tested it to replace nixops and I’m very satisfied: First class support of flakes, auto-rollback on failure, multi-profiles…

1 Like

Yep, it does.

This thread was meant to clear up how to build a box with straight nix out of the box, so the users can understand the basics and boundary between what the 3rd party tool does and what’s provided by nix.

Sorry, forgot to answer this one:

@matthewcroughan: Is it possible you could share the way you implement that timebomb?

Sharing the solution below, and hopefully it’ll make sense why it shouldn’t live in nixos-rebuild.
The timebomb itself is implemented in systemd. It activates on deploy and machine restart, and shuts off after executing.
If the revision was previously marked as healthy, even when the timebomb executes, it’s a noop.

  systemd.services.rollback = {
    enable = true;
    description = "automatically rollback to the previous rev if unhealthy";
    script = ''
    #!${pkgs.bash}/bin/bash
    sleep 5
    healthy=$(readlink /srv/revisions/healthy)
    latest=$(readlink /srv/revisions/latest)
    if [[ "$latest" != "$healthy" ]]; then
      echo "!!! ERROR !!! $latest is unhealthy!" >> /var/log/rollback.log
      echo "rolling back to $healthy" >> /var/log/rollback.log
      nixos-rebuild --rollback switch >> /var/log/rollback.log
    else
      echo "$latest is healthy. nothing to do." >> /var/log/rollback.log
    fi
    '';
    wantedBy = ["multi-user.target"];
  };

This part is irrelevant, just posting it for for the full context:

# Build staging and prod, so we know of failures as early as possible
- name: build all boxes
  run: |
    nix build .#nixosConfigurations.staging.config.system.build.toplevel -o staging
    nix build .#nixosConfigurations.production.config.system.build.toplevel -o production
# Copy deployments to staging & prod. Activate only staging.
- name: deploy staging
  if: github.ref == 'refs/heads/main'
  run: |
    nix-copy-closure --to example.com staging
    readlink staging | xargs -I {} ssh example.com "ln -s {} /srv/revisions/$GIT_SHA"
    ssh example.com "/srv/revisions/$GIT_SHA/bin/switch-to-configuration switch"
    curl -i https://staging.example.com/ping | grep -i x-api-version | cut -f2 -d' ' | tr -d '\r' | xargs -I {} ssh example.com "unlink /srv/revisions/healthy && ln -sf /srv/revisions/{} /srv/revisions/healthy"

Note the most important bit: after nixos configuration is deployed, I don’t just use $GIT_SHA to mark as healthy, but go full circle and curl the web server to see if it’s been restarted and running the newest version. This ensures that:

  1. nixos has been rebuilt, updated, and deployed
  2. the newest code is up and running
  3. using SSH to mark /srv/revisions/healthy enusures the machine is still accessible

…continued

- name: copy deployment to production, don't activate
  if: github.ref == 'refs/heads/main'
  run: |
    nix-copy-closure --to api.example.com production
    readlink production | xargs -I {} ssh api.example.com "ln -s {} /srv/revisions/$GIT_SHA"

Then another irrelevant part, ommitted from above - I also symlink $GIT_SHA to latest, and have a local shell.nix that exposes some shorthands like ops deploy api. latest is there to be offered as a default argument, otherwise you can just use any git SHA you want to deploy.

3 Likes

Looking back, I think this comment inspired me to write up my own experience after I found your thread trying to solve my similar issue: build a local bare metal system from a flake and a view to rebuild remotely later.

I’m just wondering how to contribute back to NixOS documentation; discourse has been the best ‘official’ source, but the real stars have been independent blogs, that capture the really valuable concepts.

1 Like

I just got there, and it blew my mind. I did:

nixos-rebuild switch --flake .#dubedary --target-host root@192.168.8.117

and it worked, incredible. Took a long time to find out that this is all I needed.

5 Likes