NixFleet: NixOS Fleet Management with Staged Rollouts and Imutable Releases

NixFleet is a framework for managing fleets of NixOS and macOS machines. It combines a Nix module system for declarative configuration with a Rust-based control plane for fleet orchestration.

Why NixFleet

We built it around four converging problems:

  • Configuration drift. Imperative tools (Ansible, Puppet, Chef) depend on existing system state; NixOS makes drift impossible by construction.

  • Sovereignty. Most cross-platform fleet managers depend on US vendors (MDMs like Jamf and Intune, cloud agents like AWS SSM). NixFleet is fully self-hosted; if we disappear, your machines keep running with standard NixOS tools.

  • Bolted-on security. Security usually gets layered on after the fact (EDR, SIEM, SBOM scanners). NixOS gives us a hash-addressed store and immutable generations; NixFleet adds mTLS agent/CP and a full audit trail.

  • Compliance. NIS2, DORA, ISO 27001, and ANSSI require traceability that traditional stacks can only produce through bolt-on tooling. NixFleet emits machine-readable evidence at eval time and runtime.

What makes it different?

Most NixOS deployment tools stop at “push a config to a machine.” NixFleet adds the layer above: immutable releases you can roll out in stages, with health gates and rollback.

  • Rollout strategies: canary, staged (percentage or count batches), and all-at-once, with automatic pause on failure

  • Health checks: declarative systemd/launchd, HTTP, and command checks that gate rollout progression

  • Automatic rollback: agents revert if health checks fail. The control plane halts the rollout if a batch exceeds the failure threshold

  • Persistent state: the control plane tracks machine inventory, deployment history, and audit events in SQLite

  • Standard tooling: no custom deployment scripts. nixos-rebuild, nixos-anywhere, and darwin-rebuild work as-is

  • macOS support: Darwin fleet participation via launchd daemon

  • Regulatory compliance: 16 control modules across 4 frameworks (NIS2, DORA, ISO 27001, ANSSI BP-028) with a governance engine and compliance-check CLI. Machine-readable evidence mappings (JSON) covering all 10 NIS2 Article 21 sub-articles

  • Reusable scopes: nixfleet-scopes provides 17 scopes, 4 roles, and 6 disk templates in a separate repo

The framework itself is a single function: mkHost. It takes a hostname, platform, and optional configuration flags, and returns a standard nixosSystem or darwinSystem. No DSL to learn.

Quick Start (deploy to a real machine)


mkdir my-fleet && cd my-fleet

nix flake init -t github:arcanesys/nixfleet

# Edit flake.nix (user/locale), disk-config.nix (disko), hardware-configuration.nix

nixos-anywhere --flake .#myhost root@<host-ip>

Try it locally (no real hardware)

The nixfleet-demo repo runs a full 6-VM reference fleet under QEMU: control plane, two web servers, a database, a monitoring server, and a binary cache. It exercises staged rollouts, rollback, mTLS, and compliance evidence collection end-to-end.


git clone https://github.com/arcanesys/nixfleet-demo && cd nixfleet-demo

# One-time: set your SSH key in the demo config

sed -i "s|ssh-ed25519 NixfleetDemoKeyReplaceWithYourOwn|$(cat ~/.ssh/id_ed25519.pub)|" flake.nix modules/org-defaults.nix

nix run .#build-vm -- --all --vlan 1234

nix run .#start-vm -- --all --vlan 1234

The demo README has a walkthrough for exercising rollouts, rollback, and evidence collection once the VMs are up.

Architecture


Fleet repo (your config)

| consumes

NixFleet framework (mkHost API)

| imports

nixfleet-scopes (17 scopes, 4 roles, 6 disk templates)

nixfleet-compliance (16 controls, 4 frameworks)

| builds

NixOS/Darwin systems

| deployed via

Agent (per machine) <-> Control Plane (orchestrator)

The agent is a Rust binary that polls the control plane, applies generations, runs health checks, and reports status. Seven states: Idle, Checking, Fetching, Applying, Verifying, RollingBack, Reporting. The control plane orchestrates rollouts via immutable release manifests, tracks machine state, and provides a REST API with 23 endpoints. The CLI supports both control-plane and direct SSH deployment modes, with --json output for automation.

Links

Status

Tagged v0.1.1. Framework, agent, and CLI are MIT; control plane is AGPL. The framework, orchestration layer, and compliance controls are functional: binary cache (harmonia), MicroVM host, backup backends, and rollout policies are all implemented. Test coverage includes 12 VM scenarios plus eval tests for shipped modules.

We are looking for early adopters managing 3+ NixOS machines. Feedback on the mkHost API shape, the rollout model, and the evidence mapping format is especially useful.

Feedback welcome here or as GitHub issues.

9 Likes