NixOS is arguably the best OS for letting an agent touch your system config. Declarative
configs, atomic rollbacks, reproducible builds — the whole model is built around reviewable
changes. But most AI agent tooling just runs shell commands and hopes for the best.
Agentix is a control layer between an LLM and your NixOS/flake configuration. The core loop:
Observe → Plan → Patch → Test → Explain → Approve → Apply → Audit
The agent never mutates directly. It generates patches against your Nix config, explains
what it wants to do and why, then waits. Every action gets a JSON audit log entry. If the
tree is dirty or a patch doesn’t apply cleanly, it stops.
Less “AI manages my server,” more a cautious junior infra engineer submitting reviewed PRs
against your flake.
What it does today (v0.3):
- agentix propose — generates a change proposal as a patch
- agentix verify — builds and tests without switching
- agentix apply — applies an approved patch
- agentix doctor / agentix status — system health and state
- agentix run — goal-driven workflows (e.g. “enable Tailscale and open the firewall port”)
- Git safety: refuses to operate on dirty trees, validates patch application before
committing
- Full JSON audit trail for every operation
Open questions:
-
The right trust boundary. Currently everything requires explicit approval. Graduated
autonomy — where adding a package is lower-risk than modifying firewall rules — might make
sense, or the “always review” model might just be correct for NixOS.
-
Whether this should integrate with existing Nix tooling (like nixos-rebuild test) more
tightly, or stay as an external wrapper.
-
How people actually want to interact with it — CLI only? TUI with diff preview? Editor
integration?
Built with Python, installed via uv tool install agentix.
Longer writeup: Agentix: Building a Safe Control Layer for NixOS — Research Program — Ned Karlovich
Repo: GitHub - Beach-Bum/Agentix: Safety-first agent control layer for NixOS: plan, sandbox, propose, verify, then human apply/rebuild towards an Agentic OS. · GitHub
I took a quick look at this, cause I like occasionally experimenting with LLMs.
It looks like you didn’t exactly proof read your readme and let your agent do a lot of the work for you. The formatting is broken. On top of that, it seems strange that you’d be building this for NixOS but not provide any nix way to install this. From a cursory glance, it doesn’t really seem like its built for NixOS at all, and is yet another slop-coded AI program.
On NixOS you’d look to manage a python environment with nix via a devshell or flake + devshell, neither of which are provided in any form. If this is meant to be installed globally, which isn’t made very clear, you also have not provided a package for it.
In theory its an interesting idea to keep an AI in check, in practice this seems rather sloppy, which always comes with the territory of AI when you don’t look over every little thing it does and blindly hit accept.
6 Likes
proposes patches instead of mutating directly
Nit: with good version-control hygiene this doesn’t make any difference, I think, as the mutations “won’t mix” with anything which has been approved before. (e.g. jj takes this even further, automatically preserving all past states of your work-tree and logging the changes and providing easy “undo” of any step)
6 Likes
I have to admit I am not the biggest fan of LLMs in general, conceptually however this does sound mildly interesting. As pointed out by @Misty_TTM previously, it is a very strange decision to not include a Nix-way of trying out your project. Not even a `flake.nix` file to speak of.
In terms of what the project is trying to achieve I think it is very important that lines are drawn for the model. There are about a million different ways to organize, maintain and use a Nix flake. A flake might contain a ton of custom glue code for binding different parts together across different NixOS configurations for instance. It might use Home Manager, sops-nix, flake-parts, disko, the dendritic pattern, custom library functions to work as it does. Not to mention subtle Nix-variant differences to boot, such as Lix or Determinate Nix. This is of course just scratching the surface of the Nix(OS) ecosystem, my point with all this is wanting to know how all this would be translated into efficient context for the model? It’s far from every project that has some `CLAUDE.md` or `AGENTS.md` file to try and describe a project and how it works to an LLM. These context files can of course be generated on the fly by the model itself but that essentially just wraps around to my original point, how can we make sure it’s not hallucinating?
Another question I have is the model’s approach to adding a feature for instance. Let’s say you want to set up impermanence on a host. If the codebase of the configuration is a stinky pile of garbage spaghetti code should it just add onto that pile or should it suggest a refactor of some kind? Not only that, but should a pattern be standardized that the model “knows”/”prefers”?
I do think LLM use has a place, for instance I recently tried out hosting a local model and organizing my Jellyfin media library with a Python script that queries a model about how to organize a given directory to conform to Jellyfin’s file naming standards. With some guidance it eventually managed to do that quite nicely.
As for programming, I personally have found it to work better as a fancy auto-complete than an actual programmer; it is fast at writing a ton of code, but not at reasoning about complex problems the way a human can. This comes from the way LLMs work under the hood. Large language models are at the end of the day just huge prediction machines for what “the next most likely token” is. It is not designed for complex reasoning at all.
Interesting project, but I really hope some more thought goes into designing it. 
1 Like