NixFleet: Declarative NixOS fleet management with signed GitOps

NixFleet is a framework for managing fleets of NixOS and macOS machines. It combines a Nix module system for declarative configuration with a Rust-based control plane for fleet orchestration.

Why NixFleet

We built it around four converging problems:

  • Configuration drift. Imperative tools (Ansible, Puppet, Chef) depend on existing system state; NixOS makes drift impossible by construction.

  • Sovereignty. Most cross-platform fleet managers depend on US vendors (MDMs like Jamf and Intune, cloud agents like AWS SSM). NixFleet is fully self-hosted; if we disappear, your machines keep running with standard NixOS tools.

  • Bolted-on security. Security usually gets layered on after the fact (EDR, SIEM, SBOM scanners). NixOS gives us a hash-addressed store and immutable generations; NixFleet adds mTLS agent/CP and a full audit trail.

  • Compliance. NIS2, DORA, ISO 27001, and ANSSI require traceability that traditional stacks can only produce through bolt-on tooling. NixFleet emits machine-readable evidence at eval time and runtime.

What makes it different?

Most NixOS deployment tools stop at “push a config to a machine.” NixFleet adds the layer above: immutable releases you can roll out in stages, with health gates and rollback.

  • Rollout strategies: canary, staged (percentage or count batches), and all-at-once, with automatic pause on failure

  • Health checks: declarative systemd/launchd, HTTP, and command checks that gate rollout progression

  • Automatic rollback: agents revert if health checks fail. The control plane halts the rollout if a batch exceeds the failure threshold

  • Persistent state: the control plane tracks machine inventory, deployment history, and audit events in SQLite

  • Standard tooling: no custom deployment scripts. nixos-rebuild, nixos-anywhere, and darwin-rebuild work as-is

  • macOS support: Darwin fleet participation via launchd daemon

  • Regulatory compliance: 16 control modules across 4 frameworks (NIS2, DORA, ISO 27001, ANSSI BP-028) with a governance engine and compliance-check CLI. Machine-readable evidence mappings (JSON) covering all 10 NIS2 Article 21 sub-articles

  • Reusable scopes: nixfleet-scopes provides 17 scopes, 4 roles, and 6 disk templates in a separate repo

The framework itself is a single function: mkHost. It takes a hostname, platform, and optional configuration flags, and returns a standard nixosSystem or darwinSystem. No DSL to learn.

Quick Start (deploy to a real machine)


mkdir my-fleet && cd my-fleet

nix flake init -t github:arcanesys/nixfleet

# Edit flake.nix (user/locale), disk-config.nix (disko), hardware-configuration.nix

nixos-anywhere --flake .#myhost root@<host-ip>

Try it locally (no real hardware)

The nixfleet-demo repo runs a full 6-VM reference fleet under QEMU: control plane, two web servers, a database, a monitoring server, and a binary cache. It exercises staged rollouts, rollback, mTLS, and compliance evidence collection end-to-end.


git clone https://github.com/arcanesys/nixfleet-demo && cd nixfleet-demo

# One-time: set your SSH key in the demo config

sed -i "s|ssh-ed25519 NixfleetDemoKeyReplaceWithYourOwn|$(cat ~/.ssh/id_ed25519.pub)|" flake.nix modules/org-defaults.nix

nix run .#build-vm -- --all --vlan 1234

nix run .#start-vm -- --all --vlan 1234

The demo README has a walkthrough for exercising rollouts, rollback, and evidence collection once the VMs are up.

Architecture


Fleet repo (your config)

| consumes

NixFleet framework (mkHost API)

| imports

nixfleet-scopes (17 scopes, 4 roles, 6 disk templates)

nixfleet-compliance (16 controls, 4 frameworks)

| builds

NixOS/Darwin systems

| deployed via

Agent (per machine) <-> Control Plane (orchestrator)

The agent is a Rust binary that polls the control plane, applies generations, runs health checks, and reports status. Seven states: Idle, Checking, Fetching, Applying, Verifying, RollingBack, Reporting. The control plane orchestrates rollouts via immutable release manifests, tracks machine state, and provides a REST API with 23 endpoints. The CLI supports both control-plane and direct SSH deployment modes, with --json output for automation.

Links

Status

Tagged v0.1.1. Framework, agent, and CLI are MIT; control plane is AGPL. The framework, orchestration layer, and compliance controls are functional: binary cache (harmonia), MicroVM host, backup backends, and rollout policies are all implemented. Test coverage includes 12 VM scenarios plus eval tests for shipped modules.

We are looking for early adopters managing 3+ NixOS machines. Feedback on the mkHost API shape, the rollout model, and the evidence mapping format is especially useful.

Feedback welcome here or as GitHub issues.

16 Likes

This seems really cool. Are you planning to handle monitoring beyond health checks?

And just FYI, your pilot link on GH is broken.

Thanks! Probably yeah, but not in v0.2/v0.3, roadmap’s packed.

Worth knowing though: the control plane already exposes structured state over REST today. /v1/hosts gives you a unified per-host view (heartbeat, rollout state, compliance / runtime-gate / health-probe failures, pins, quarantined closures), and rollouts have their own endpoints for events and lifecycle. Plenty to pull into your own observability stack already.

There’s also a small /metrics (Prometheus) endpoint, but it’s deliberately alerting-oriented (not a full state surface).

What’s missing is the packaged side: dashboards, recording rules, opinionated alerts. That’s a later-cycle thing.

Re: the pilot link, site’s actually live now. I’ll bump the thread once v0.2 ships too.

1 Like

v0.2.0 is tagged and is a complete rewrite of v0.1. Reintroducing below.

NixFleet is a framework for managing fleets of NixOS and macOS machines with signed GitOps. Truth lives in git and signing keys; the control plane is a caching router for already-signed intent. Compromise of the control plane is an outage, not a breach.

Why NixFleet

NixFleet addresses four converging problems on a NixOS fleet:

  • Configuration drift. Imperative tools depend on existing state. NixOS makes drift impossible by construction; nix build is the gate.
  • Trust in the orchestrator. Traditional fleet managers concentrate signing and deploy authority in the control plane. NixFleet inverts this: truth lives in git and signing keys, the CP holds none, agents verify every artefact independently.
  • Bolted-on compliance. Regulatory frameworks need traceability that scanners produce after the fact. NixFleet evaluates compliance as a release gate, with a signed evidence chain from commit to host-signed probe.
  • Sovereignty. Most cross-platform fleet managers depend on US vendors (MDMs, cloud agents). NixFleet is fully self-hosted; hosts keep running with stock NixOS tools if the framework disappears.

How v0.2 hangs together

The spine is a signed pipeline anchored on one decision procedure. CI signs fleet.resolved.json and revocations.json with the release key; the binary cache signs closures independently. The reconciler’s verify_artifact procedure consumes the typed trust.json (serialised from flake-declared trust roots) and decides what any host is allowed to run. The CP fetches the revocations sidecar from git on every reconcile tick and replays the verified set before minting any cert. The agent pulls its target closure, verifies the signature, activates in a detached transient systemd unit, opens a confirm window, and auto-reverts on silence. Static compliance gates run at CI; runtime probes run on the host and sign their output with the SSH host key. Evidence chain: commit → CI signature → closure hash → host-signed probe.

Operators see one binary, nixfleet. Auditors see two standalone binaries with lean dependency closures so a regulator can verify signed artefacts without the operator network stack.

What v0.2 ships

  • Trust roots declared in the flake (nixfleet.trust.{ciReleaseKey,cacheKeys,orgRootKey}). Algorithm rotation supported end-to-end.
  • Agent identity mTLS-bound to /etc/ssh/ssh_host_ed25519_key. Enrollment gated by a signed bootstrap-nonces.json allowlist. Replay protection anchored in the fleet repo, not CP-local state.
  • Magic rollback queued in a detached systemd unit so the agent can’t kill its own activation. Confirm window is the deadline.
  • Compliance gate static at CI, runtime on host, output signed and aggregated into the wave-promotion gate.
  • Supply chain Crane-cached Rust workspace. HSM-held release key. Revocations sidecar replayed per tick. Prometheus /metrics is a default-on Cargo feature.
  • Operator surface nixfleet umbrella binary; standalone auditor binaries; nixfleet-trust-bootstrap ceremony tooling for the offline fleet root and TPM-bound issuance CA.

Quick Start

{
  inputs.nixfleet.url = "github:arcanesys/nixfleet";

  outputs = { nixpkgs, nixfleet, ... }: let
    fleet = nixfleet.lib.mkFleet {
      hosts.my-server = {
        system = "x86_64-linux";
        channel = "stable";
        nixosArgs.modules = [
          ./hardware-configuration.nix
          ({ ... }: {
            services.nixfleet-agent.enable = true;
            services.nixfleet-agent.controlPlane.url = "https://cp.example.com:8080";
          })
        ];
      };
      channels.stable.rolloutPolicy = "all-at-once";
      rolloutPolicies.all-at-once = {
        strategy = "all-at-once";
        waves = [{ selector.all = true; soakMinutes = 0; }];
      };
    };
  in { nixosConfigurations = fleet.nixosConfigurations; };
}

fleet.nixosConfigurations.<host> is a standard nixosSystem. Remove the agent module and the host is vanilla NixOS.

Try It Locally

nixfleet-demo boots a 4-VM reference fleet in ~10 minutes, exercises the canonical GitOps loop, and lets you trigger a signed wave promotion and magic rollback by editing one config.

Ecosystem

Three repos:

  • nixfleet (MIT / AGPL): framework, agent, control plane, CLI.
  • nixfleet-compliance (MIT): NIS2, DORA, ISO 27001, ANSSI BP-028 controls.
  • nixfleet-demo (MIT): 4-VM reference fleet.

Framework ships kernel + contract impls. Service wraps, hardware bundles, and role taxonomies live in the consuming fleet repo, not in nixfleet. Framework stays generic, consumer keeps full ownership of its shape.

v0.3 trajectory

Extends the v0.2 wire protocol, doesn’t break it:

  • RFC-0009 hardware-rooted trust (TPM-bound issuance CA, EK-quoted attestation).
  • RFC-0010 trust lifecycle (operator roles, rotation, attestation quarantine, threshold-signed channels).
  • RFC-0011 freshness-window policy.
  • RFC-0012 air-gapped operation via signed bundles.

Links

Status

Tagged v0.2.0 (2026-05-19). MIT for framework / agent / CLI; AGPL for the control plane. v0.2 baseline ships RFCs 0001-0008; v0.3 RFC track in progress. The stack runs end-to-end on the author’s own hardware fleet, signed GitOps with TPM-bound trust under daily use. Next gate is pilot validation on external regulated fleets to surface what a single-fleet test bench doesn’t. Pilots open for NIS2 / DORA / ISO 27001 / ANSSI BP-028 operators (see the pilot link). All feedback welcome: API surface, trust model, evidence chain, RFC track.

4 Likes