Stable NixOS support length

Resurrecting an old thread here but I am wondering if there isn’t room for a middle ground?

  • unstable, rolling
  • testing, rolling, but with stronger guarantees slightly behind unstable
  • stable, released once a year

I am currently planning a new setup and 6-month stable vs rolling is really quite a challenging sell.

What kind of stronger guarantees do you want to see for “testing”?

Also I don’t think that a year of support is enough for entities that want their LTS for compliance reasons.

1 Like

Fair question. Testing whether an upgrade breaks is a tough problem with many variables.

Honestly, all/most package following the same pattern that e.g. postgres uses would already go a long way for me.

postgres - latest
postgres_18
postgres_17
...

This does give a lot more confidence of not breaking production due to a package selection mistake during security updates. This also pushes the question of stability a little more to the package upstream.

Maybe this could be covered with some better lockfile management on our side. But identifying security related updates and breaking updates seems the to be a main goal.

Probably. My situation is a little different. I am currently contemplating running unstable in production because switching from old stable to new stable still feels like a major hurdle/risk. And doing that every 6 months is just too much of a headache. On the other hand I am not sure that a x years LTS branch is a great answer here.

This is almost pushing me back to use containers (which I do not really want) just to have easier control of versions.

Because of poeple like you, ctrl-os does overlapping LTS releases, on a 2 year cadence, IIRC.

If that again is too long, while 6 months is too short of release cadence, you will have to live with the 6 months of update gap or backport relevant updates yourself.

That works for a leaf package. But most problems are from dependencies. If they’re deeper in dependency trees, the number of combinations just blows up exponentially.

In many cases you can’t even have multiple versions of a library in a single dependency closure, as they would collide, e.g. C symbols in a single process, so with that approach you do run into some limits.

Also note that backporting across 3 months of development (on average) is somewhat less work than backporting across 6 months of development (on average, up to about a year), because more conflicts and differences arise. So making this longer has disadvantages as well as advantages.

2 Likes

You are saying it’s more likely that underlying dependencies cause possible problems in unstable?

My main concern here is/was major version upgrades.
It seems the naming and practise is not even consistent on the leaf packages.

Yeah, I didn’t expect it any other way.

Right now it seems running the OS on unstable and the payload as containers is the middle ground. Pinning tarballs or even just mixing stable and unstable is not super enticing.

I am just wondering how other people handle this for production deployments.

Sure, but we have also e.g. glibc major upgrade or python3 major upgrade.

Naming not being consistent is a nit. The thing is that best practice is to minimize the number of versions of each package (within a single nixpkgs commit). Or at least to minimize using non-default versions within nixpkgs itself.

we really need to split this off into a separate thread… :sweat_smile:

specifically what sort of issues are you running into? what sort of software are you running? it’s it version bumps that are causing you trouble, or module changes? having to reboot once every six months?

any more specific details you can provide?

2 Likes

We use stable and update once every 6 months. We are a small team ( < 10), running around ~ 100 autoscale groups. This has not caused us any particular issues. The constant stream of linux LPEs, multiple sec problems with ssh etc are much more of a headache.

We run on ec2 with a base nixos image that we control, we update the image, run a canary, and move on. We update the image ~ once per calendar month on average.

1 Like

Yes - we should. Sorry :sweat_smile:

What would be a good working title for that thread that would make you participate?

A production server/cluster. Serving 10k+ users. Webserver, database and application.

Running on stable it means a bigger upgrade twice a year.

You can argue it’s a good OPS practise to switch machines/clusters blue/green style more often - but twice a year to just keep the system up-to-date is a lot of work and risk.

Module changes are not such a big issue as they can be caught early.
Reboots are not really a major issue either.
With nix you can try the update and revert to the old system if things go sideways.

But that’s not so easy when this touches state though. I guess that’s also why the postgres_<version> pattern exists in particular.

TBH running unstable with controlled locking and testing feels like less of a headache than running stable (if state can be kept controlled) …unless the LTS cycle is extended - just to circle back to the original topic :slight_smile:

Can you be a bit more specific about your state issue? I.e. which package upgrade and which actual or theoretical problem that occurs on upgrade? Whether and why pinning system.stateVersion through upgrades (or just leaving it alone on unstable) is/is not a viable solution?

1 Like

this is a pretty straight forward stack, especially given how loose your constraints are - awesome! the only stateful thing you mentioned is database… so does your issue entirely boil down to database? you already mentioned the solution there. i think nixos stable is probably what you want here.

3 Likes

Right, done. The discussion isn’t that much close to the original thread.

3 Likes