Agentic control layer for NixOS that proposes patches instead of mutating directly

NixOS is arguably the best OS for letting an agent touch your system config. Declarative
configs, atomic rollbacks, reproducible builds — the whole model is built around reviewable
changes. But most AI agent tooling just runs shell commands and hopes for the best.

Agentix is a control layer between an LLM and your NixOS/flake configuration. The core loop:

Observe → Plan → Patch → Test → Explain → Approve → Apply → Audit

The agent never mutates directly. It generates patches against your Nix config, explains
what it wants to do and why, then waits. Every action gets a JSON audit log entry. If the
tree is dirty or a patch doesn’t apply cleanly, it stops.

Less “AI manages my server,” more a cautious junior infra engineer submitting reviewed PRs
against your flake.

What it does today (v0.3):

  • agentix propose — generates a change proposal as a patch
  • agentix verify — builds and tests without switching
  • agentix apply — applies an approved patch
  • agentix doctor / agentix status — system health and state
  • agentix run — goal-driven workflows (e.g. “enable Tailscale and open the firewall port”)
  • Git safety: refuses to operate on dirty trees, validates patch application before
    committing
  • Full JSON audit trail for every operation

Open questions:

  • The right trust boundary. Currently everything requires explicit approval. Graduated
    autonomy — where adding a package is lower-risk than modifying firewall rules — might make
    sense, or the “always review” model might just be correct for NixOS.

  • Whether this should integrate with existing Nix tooling (like nixos-rebuild test) more
    tightly, or stay as an external wrapper.

  • How people actually want to interact with it — CLI only? TUI with diff preview? Editor
    integration?

Built with Python, installed via uv tool install agentix.

Longer writeup: Agentix: Building a Safe Control Layer for NixOS — Research Program — Ned Karlovich

Repo: GitHub - Beach-Bum/Agentix: Safety-first agent control layer for NixOS: plan, sandbox, propose, verify, then human apply/rebuild towards an Agentic OS. · GitHub

I took a quick look at this, cause I like occasionally experimenting with LLMs.

It looks like you didn’t exactly proof read your readme and let your agent do a lot of the work for you. The formatting is broken. On top of that, it seems strange that you’d be building this for NixOS but not provide any nix way to install this. From a cursory glance, it doesn’t really seem like its built for NixOS at all, and is yet another slop-coded AI program.

On NixOS you’d look to manage a python environment with nix via a devshell or flake + devshell, neither of which are provided in any form. If this is meant to be installed globally, which isn’t made very clear, you also have not provided a package for it.

In theory its an interesting idea to keep an AI in check, in practice this seems rather sloppy, which always comes with the territory of AI when you don’t look over every little thing it does and blindly hit accept.

11 Likes

proposes patches instead of mutating directly

Nit: with good version-control hygiene this doesn’t make any difference, I think, as the mutations “won’t mix” with anything which has been approved before. (e.g. jj takes this even further, automatically preserving all past states of your work-tree and logging the changes and providing easy “undo” of any step)

7 Likes

I have to admit I am not the biggest fan of LLMs in general, conceptually however this does sound mildly interesting. As pointed out by @Misty_TTM previously, it is a very strange decision to not include a Nix-way of trying out your project. Not even a `flake.nix` file to speak of.

In terms of what the project is trying to achieve I think it is very important that lines are drawn for the model. There are about a million different ways to organize, maintain and use a Nix flake. A flake might contain a ton of custom glue code for binding different parts together across different NixOS configurations for instance. It might use Home Manager, sops-nix, flake-parts, disko, the dendritic pattern, custom library functions to work as it does. Not to mention subtle Nix-variant differences to boot, such as Lix or Determinate Nix. This is of course just scratching the surface of the Nix(OS) ecosystem, my point with all this is wanting to know how all this would be translated into efficient context for the model? It’s far from every project that has some `CLAUDE.md` or `AGENTS.md` file to try and describe a project and how it works to an LLM. These context files can of course be generated on the fly by the model itself but that essentially just wraps around to my original point, how can we make sure it’s not hallucinating?

Another question I have is the model’s approach to adding a feature for instance. Let’s say you want to set up impermanence on a host. If the codebase of the configuration is a stinky pile of garbage spaghetti code should it just add onto that pile or should it suggest a refactor of some kind? Not only that, but should a pattern be standardized that the model “knows”/”prefers”?

I do think LLM use has a place, for instance I recently tried out hosting a local model and organizing my Jellyfin media library with a Python script that queries a model about how to organize a given directory to conform to Jellyfin’s file naming standards. With some guidance it eventually managed to do that quite nicely.

As for programming, I personally have found it to work better as a fancy auto-complete than an actual programmer; it is fast at writing a ton of code, but not at reasoning about complex problems the way a human can. This comes from the way LLMs work under the hood. Large language models are at the end of the day just huge prediction machines for what “the next most likely token” is. It is not designed for complex reasoning at all.

Interesting project, but I really hope some more thought goes into designing it. :slight_smile:

2 Likes

Thanks for the feedback. I’ve fixed the packaging (flake.nix, devshell, rewrote the README). That was sloppy, fair to call out :slight_smile:

I’d rather ask about what I’m actually trying to do than just patch and leave though.

The thing I keep coming back to: can an AI agent run an operating system? Not write code for it — actually watch it, figure out what’s wrong, propose fixes, and eventually handle things on its own once it’s proven it won’t break stuff. A machine that takes care of itself.

NixOS felt like the only place to try this. Everything is declarative, the whole system state is in version control, and if the agent screws up you just roll back. I can’t think of another OS where you could responsibly hand an AI any amount of control.

Tbh I’m not an engineer or computer scientist. I used Claude to learn Nix, understand how everything fits together, and turn ideas into code I couldn’t have written myself. That’s honestly part of what I’m testing: what opens up when someone who isn’t a systems programmer gets these tools and points them at NixOS. The community and the learning have been as valuable as the code. I’m daily-driving NixOS now and that alone was something I’d wanted to do for years.

Where this gets interesting to me is past my own setup. There are more and more people running local-first services, self-hosted nodes, mesh stuff — privacy-conscious infrastructure — who want control over their own systems but don’t have the background to keep them healthy. NixOS already makes that easier with declarative config and rollback. A local AI layer on top could go further: something that watches the system, tells you in plain language what it wants to change, and handles the upkeep most people won’t or can’t do. Local infrastructure that governs itself without phoning home to anyone.

Right now Agentix runs as a daemon on one of my NixOS machines. Basic loop: observe, propose, audit. It works. It’s early.

Is NixOS the right system for this? What would you try next? Are there things already in the Nix ecosystem I should be building on instead of around? I’d rather find the right direction than defend what’s there :slight_smile:

Out of pure curosity & interest, have you tried to use google antigravity to help you modify you nix flake (whether it outputs nixos, home-manager, a devshell, something else or several things all together)?, because I feel that if you did, you might find better direction for this tool project, as I currently don’t completely get the advantage of using that than to directly modify my flake inside google antigravity!, maybe for some simpler or resource constrained workflows it would be better (though not sure), not underestimating your nice tool, just wondering what benefits do you think it have over a good agentic IDE modifying your flake (google antigravity as an example!)?

Good question — and it gets at something I should probably state more directly.

I’m not really building a tool for editing flakes. I use Claude Code for that, same as you’d use any agentic IDE. It works fine.

Agentix is an attempt at something different: an operating system that maintains itself. Onethat observes its own state, detects drift, proposes fixes, tests them, applies them, and logs what it did without me sitting at a terminal. The human sets policy. The machine executes within those boundaries.

NixOS is the only platform where this is even worth attempting. The entire system state is declarative, content-addressed, and rollbackable. If the agent makes a bad call at 3am, the previous generation is one command away. No other distro gives you that safety net.

So the difference isn’t “Agentix vs. agentic IDE.” It’s “human drives the session” vs. “machine drives the session within rules the human defined.” The IDE model assumes someone is always there reviewing. Agentix assumes they’re not, and tries to make that safe.

Whether this actually works at scale is an open question. Right now it’s an experiment. But the goal is a machine that runs itself — not a better way to write Nix.

Just a detail: NixOS isn’t content-addressed - the hashes are not hashes of the content.

OP using emdashes and sounding like they just straight up copy -pasted their replies here from Claude! :rofl::rofl::rofl::rofl:

“You’re right to push back on that…”. :rofl:

3 Likes

It can be, though, if you turn CA derivations on.

I’m not sure I entirely see the point in what you’re describing. From what I am gathering this will make changes within defined bounds to a system whenever it feels like it. What are the benefits of this? Does it just add whatever feature it feels like, or does it try to do a task asked by a human, kinda like what OpenClaw tries to do? If so, those safe-guards need to be super strict in order to not just break a running system like OpenClaw has done in many cases I think.

Why would you want it to be messing with your system state at 3 AM? That sounds like a security nightmare. What is stopping it from just disabling SSH access on a server? The bounds set by the human operator could easily be bypassed with prompt injection, or specifically targeting AI tooling. We’ve seen this very recently with the Shai-Hulud Worm.

Again, why? Are we trying to make Ultron or what?

3 Likes

I think you are basically building another layer of abstraction, which is good & is actually one of the goals behind nix & many other projects that seeks improving system administration & automation, also arguably the goal behind AI & LLMs themselves, however, the scope is too wide, there are so many nix projects that try to solve many automation issues regarding nix, & I think your project needs to be abit more specific about the problem it is trying to solve, for example one tool that solves a simple issue (that would also benefit your project) is mcp-nixos (which is already in nixpkgs) to prevent LLMs from hallucinating options & package names, also arguably your whole project might be better as an MCP & a skill for already existing AI agents, instead of trying to make a whole agent on your own, the project scope as a whole AI agent is too wide, there are many AI agents that would do better as agents & are improving rapidly (openclaw, hermes, openhuman, antigravity, claude code, ..etc) with rapidly onging AI research baking most of those!, unless you intend to go down that rabbit hole of AI related science, you are better off with a good MCP & a skill (which we currently actually need!)