Call for Input: Changing the Release Schedule to Reduce Work Load for NixOS Users

Dear all,

I’d like to ask for the community’s input about a problem I currently notice for the first time: Due to time constrains, the current Release Schedule is not fit for sys admins with lots of NixOS machines, especially the November release.

Currently, we are effectively reducing the EOL time window from 4 weeks to 2-3 weeks because of Christmas holidays.

When 25.11 was released on 30th of November, I started upgrading machines at work. I do still have to do some upgrades during the holiday season (and at Chaos Communication Congress), if I want to be done at the EOL of 25.05.

I’m kinda fine about working on those days, since I won’t be disturbing my colleagues then, who mostly won’t be working then. We are an civil rights organisation based in Berlin doing strategic litigation. I have an awesome colleague supporting me in the tech team with first-level support, but I single handedly am doing the upgrades – while not being able to do other work. (It’s fun and important work with interesting technology and a real cool team! Am thankful to get paid for doing NixOS there.)

The problem I am writing about (probably) was introduced with RFC0085 from 2021. This RFC aimed at stabilising the release by counting in the release schedules of critical packages such as systemd, gcc, … . I am not 100% sure if this RFC really introduced the switch from xx.03 and xx.09 to xx.05 and xx.11 releases, but both – RFC and the switch – fell all into the same time. The RFC does not mention changing the months in the changes section. I already used NixOS back then, but wasn’t actively involved back then; interested about any recollections about that part.

There was a brief and insightful exchange in the Release Management Matrix channel on the problem I’m bringing to discussion here, started by a post from me.

As far as I understand there are at least 4 options on how to move forward – which could be combined. My initial idea was option 2., 1. and 3. were brought up by others during the discussion on Matrix. Thanks a lot to all of you for bringing up alternative ideas!

1. Extend the EOL time frame:

  • Maybe lets extend it from 4 weeks to 6 or 8 weeks.
  • This way, one may still easily upgrade in the beginning of January.
  • This would increase costs for hydra etc. because of the backports.
  • Also puts a higher work load on the community.
  • I do am a fan of the brief EOL time frame tho.

2. Switch the Release months again:

  • We could go back to March and September I guess.
  • Kinda big and impactful change, which is not what I want…
  • Release critical package’s release schedules need to be counted in.
  • Other big (mainly religious and political) holidays need to be counted in aswell.
    • Mostly Christmas, Ramadan, Diwali, Channuka, Chinesee New Year (I guess).
    • NixOS community is primarily from Europe and North America, so we could focus on that regions.
  • Would open a huge discussion, with lots of political aspects (it is a possible solution to discuss, but let’s be fair and kind to each others).
  • IMHO NixOS is a server distribution (which works perfectly fine on a desktop), but I would not like to have to count in Desktop Manager’s release schedules.

3. Nudge users more into testing in advance:

  • Let’s explain, document and advertise how to test releases in advance.
  • Possible side effect, it could strengthen the releases if more users test master and report back problems they encounter.
  • This solution in combination with 1. could be worthwhile.

4. Sticking to the status quo:

  • Certainly a possible outcome.

Further considerations

I’d be interested how many other users encounter the same problem, as I do right now.

Also, happy about feedback from NixOS Release Team @jopejoe1 @leona (and previous teams), Nixpkgs Core Team @nixpkgs-core, and Infrastructure Team @hexa et al.

If we as a community agree that the status quo is not the way to go, I’d be happy shepherd an RFC to solve this problem by the end of 2026.

Thanks for all your input and a productive discussion.

2 Likes

Gonna be honest, people usually forget to backport to the active stable, much less a deprecated one…

Anyway IMO option 3 seems the most viable.

4 Likes

From your POV maybe, for me, an individual maintaining my personal computers, the holidays give me time to upgrade.

If you’re a professional shouldn’t you have the resources to work around this problem? There are a few topics about commercial support for a LTS version here on discourse, not sure what sate they’re in though.

The current release schedule was chosen in RFC0080 to match Gnome and Plasma release schedules, with time for us to package the new version. That seems a more appropriate determiner.

I could see the argument to extend the support overlap a little, perhaps. But I’m honestly not sure that’s worth it.

2 Likes

I’m not sure any dates are really good to avoid inconveniencing someone. End March/September bad for UK/US financial years, Christmas, Easter and summer holidays. You name it. Also school schedules affecting parents. There seems no good time for major activities!

Ways to help plan for the change and ease the execution would be better than changing the dates.

3 Likes

How does this actually help with the described problem? Testing more is always great, but how does the community testing more than it already does imply you will not bleed your update schedule into the Christmas week?

I guess you could run the release branch a week or two earlier on your end, or at least use it to preview what changes will likely need to happen, giving you time to prepare, but frankly that just sounds like you’re telling us you need to improve your company’s internal processes. The test period and branching strategy is in place already.

Upstream can’t really do much to help further besides maybe - as usual - improving documentation; that omnipresent problem is one of funding however and not something easily solved. Besides, I think the release process is documented pretty well, and information about it is disseminated widely. I don’t think it could be improved sufficiently to move the needle much in terms of making your processes easier to fix.


IMO shifting the update period may indeed make sense, though. I appreciate that it will always inconvenience someone, but avoiding all the solstice-related celebrations is not a terrible idea. Human cultures do differ, but there are common patterns. We can never make everyone happy, but inconveniencing fewer people makes sense.

That doesn’t mean we should shift the dates, there are other considerations that may be more important, but I don’t think it should be dismissed outright just because other annual events exist.

2 Likes

Hello everyone! We run a small fleet of servers and so I have opinions on this topic based on our work. Like riotbib, we’re working at a non-profit organization and we have a small team of people working with Nix at maybe 10% on average, given they’re all very busy with other stuff.

How we upgrade: we have a staging branch of our codebase with a few servers (canaries), then we have three waves building from main: dev servers, test/ACC/low-SLA prod, then the final wave containing mission critical production servers.

At present we managed to upgrade staging and first wave only before we hit the December freeze period, which is roughly the last two weeks of the year. This is faster than the last upgrade, and we are lucky (grateful!) that an awesome volunteer from the community did a drive-by patch to make our stuff work with -unstable a few weeks before the official release, as part of investigating a different issue.

(A large financial institution I used to work at froze all production assets for the whole of December, and I think that’s typical for extremely large/highly-regulated organisations.)

I looked into the links above and I don’t think it’s fair to ask anyone to change the months of the release cycle, buuuuut, I will say that if releases were published in the first week of May or November, as opposed to the last, this would make a very big difference.

One thing we plan on doing to improve our upgrade process is to set up a long-running branch building against -unstable so we can get early warning of breakages. Most issues we get are due to deprecated stuff being removed, either in nixpkgs or python. Obviously we could notice and act on warnings earlier, and that will be one of the benefits of tracking unstable.

Voilà my 2c.

1 Like

Up until a couple of years ago, the release schedule was end of March/September, and due to the overlap with the plasma/gnome schedule delays of 4 to 6 weeks have been the average. So changing the schedule to end of May/November just matched reality better, but didn’t change the effective schedule.

We now are basically at the effective schedule from the past, but by making it official, we took stress and pressure from the contributors.

Moving it 4 weeks earlier would put the stress again, as it removes at least the staging cycles that were intended to be there as “slack” or “reserve”.

I see how the current model puts stress on companies one way or the other.

Though here my question would be, can a 2 months deprecation cycle help you (and can this be done by the community)? Would “CTRL-OS” be a better alternative for you?

2 Likes

Hi @NobbZ, indeed I saw the post about CTRL-OS a few days ago and it looked interesting. If we need to use other Linux distro’s or OSs we tend to go for the LTS version because then we spend less time on upgrades.

OTOH we are hoping to take advantage of some features hopefully being added to future releases of nixpkgs (e.g. this PR to reinstate boot counting), so I think the trade-off is a bit more complicated than that.

As you are businesses which rely on an open-source project which pre-publishes their release dates for new releases, best suggestion I have for you (and also what I do with my infra):

Start your upgrades a month before the actual release.

There’s even a restriction that’s put in place for breaking changes by that point: https://github.com/NixOS/nixpkgs/issues/443568 - so developers shouldn’t be landing large changes that affect end users.

This allows you to catch regressions before you upgrade the rest of your fleet - which you can then in turn report as issues on GitHub for the team to hopefully resolve before release day comes around.

6 Likes

Thank you @poyera23 , that’s a great suggestion! In our case the “Restrict all breaking changes” milestone would indeed be a good point to update our staging branch.

I strongly believe that this is the answer. It’s impossible to suit everybody’s schedule — it might be inconvenient for you, but as we’ve seen, it suits some others and we did deliberately put the schedule where it is to improve quality of releases, particularly with regard to desktops. I believe you are wrong about NixOS being primarily a server distribution. In the 2024 community survey 95% of respondents use Nix on laptops, desktops, etc., and significantly less use it on any sort of server. If upgrade work only begins on the day of release, or even later, that’s not taking full advantage of the full overlap we already have, so I don’t see any reason to add more to accommodate that. Ideally, release day is the end of your update process, but in any event it certainly shouldn’t be the beginning.

13 Likes

IIRC, there was feedback in the release retro a few releases ago to expand the beta period, which may help with this goal.

3 Likes

Dear all, thanks for all your input and ideas!

I wanted to wait for most of the feedback to flow in, until I reply.

First most, I guess, we all conclude that this is a problem, which could accompany us for all next releases, if we don’t work on it. Please do speak up, if you disagree.

Just in case my initial post was easy to misunderstand: We are in no real bad situation because of this “release date versus holiday season” conflict. I mainly made this post, because I observed this shortening of EOL for the first time. I use NixOS since 18.09 and started getting involved more actively around 2022. Also, English is my second language, which do will add some room for misinterpretation, scusi.

I’d like to pick up 2 main arguments, which I noticed:

1. Throwing resources on problems

Hiring more people to work on our infra, is one aspect we are considering, as we just did with a student. We are a non-profit (legal status comparable to a 501(c)(3) in the States, just in German law: gemeinnütziger eingetragener Verein), so just adding resources ain’t too easy due to budget constrains.

Also, buying LTS support like Ctrl-OS is not something we consider, as we do not have to follow such strict compliance rules, as the customers it is intended for. Also, not falling into LTS is cool.

In the end, adding ressources does not solve this problem for us, as it won’t for anybody else (IMHO).

2. No perfect date for everyone

Yes… also thanks for the reply on non-professionals upgrading during the holidays, totally get that. For me, I upgraded the whole of December, usual working hours and the holidays included. Not asking for pity, just something I have to work on before 26.11. Hence this post.

I am no fan of shifting the months of the releases (option 2 in my initial post), which – for me at least – is out of question by now, e.g. @qyliss made some good points.

About moving forward, I talked to @leona who managed the 25.11 release (thank you!), and just read the 25.05 retro (thanks for the hint @numinit). I guess we could and should definitely go for option 3 (more documentation on testing before release), and consider option 1 (extend EOL).

For extending EOL we’d need more capacity on hydra, where I’ll ping some people who work on that privately. For option 3 I’d need to read more up on that and get in contact with some aswell.

If you feel like giving more input, happy to read your replies.

Thanks again!

2 Likes

Thanks for summarizing! So, there are two things that would be good to optimize:

  1. improving the timing so the work doesn’t bleed into the holidays
  2. reducing the amount of work required

For ‘1’, I think solution ‘3’ would be most effective: as @poyera23 suggests, start the process of upgrading before the actual release, for example at the “Restrict all breaking changes” phase of the release. This may mean some extra work, but more favorable timing. I understand you have a pretty mature ‘tiered’ fleet of servers with dev servers, test/ACC/low-SLA prod and mission-critical prod. Being able to test ‘most of’ the release in branches and perhaps even on dev/test before the release would help you discover and do the work required ‘in advance’.

For ‘2’: I understand you spent a rather significant amount of time on the upgrades. It would be interesting to analyze what kinds of activities contributed most to that time: was it mostly simply because it’s a lot of servers and you need to spend time on backups/scheduling/monitoring etc to perform the upgrade? Or did you have to spend most of that time on changes required by the update? Or on diagnosing and fixing issues that arose when/after performing upgrades? Of course this is hard to quantify after-the-fact - but I think it would be interesting to keep a record (perhaps when doing the 26.05 upgrade?) to see if there’s anything there that can be improved there.

3 Likes

Hej @raboof, thanks for your input and summary!

No, the upgrades were mostly flawless, there’s another reason:

We are ~50 people working at this organisation, mostly lawyers, but also comms, policy, fundraising, and one specific project. I currently administer around ~5 bare metal and ~15-20 virtual servers.

I ran into time constraints, because A) with around half of these systems I cannot upgrade those during normal working hours, to not hinder my colleagues. B) I got other tasks as well, as previously mentioned.

Just at the top of my mind, I could easily “clone” a machine (even bare metal) using NixOS and test the upgrades ad hoc: Copy /etc/nixos and the data, upgrade, now I know about possible pitfalls. This could be a process to document or even simply with a new tool, and supports the tendency to go for option 3). [Edit: I do not want us to pay for testing servers all of all appliances the whole year…]

I guess Documentation Team is the right one to contact and offer help, right?

Also, I really dig option 1), but do know about the hydra constraints.

1 Like