This is especially important for enterprises that require timely security updates. It’s going to be hard to be perceived seriously without that. We need to fix our workflow and tooling to make this happen.
When we’re talking about this… my understanding is that for security-critical servers it’s meant that you follow a
-small channel, most likely a stable version of it. Those can update very fast even for huge rebuilds, and I don’t remember having any long lags in there. (We had some backwards-updates there as well, but that should be solved now.)
It would probably be good to actually announce that the nixos-18.09 channel may be delaying critical security updates and to suggest to always use nixos-18.09-small if you care about security.
So what is the point of the nixos-18.09 channel then if it cannot be used for regular systems due to intermittent delays in security updates?
Update: It’s not just for security critical servers. I’ve just made the switch on my two desktop systems, due to the critical firefox 63 being delayed.
Well, it’s all best-effort so far. My participation (for example) is all in my free time.
If you’re touchy about security, IMO in the past several months the main problem isn’t really the speed of channel updates but the fact that many CVEs just don’t get fixed in our branches for a very long time (presumably they are usually less important, but I don’t really know ATM).
I didn’t intend to imply that I or someone else could demand other people’s work. If my post sounded like that I’m sorry, it was not intended. I’m just trying to understand what causes these notable delays in some security updates.
I wasn’t aware about that. I assumed that most security updates moved to the stable version rather quick, as it’s usually just a cherry-pick.
So currently I see two issues with nixos and security:
The security roundups are lacking manpower, which may cause delays in backporting or cherry-picking the updates to the release-branches.
This can probably be solved with some more people that have a look at the security roundups and support (check CVEs, provide patches, cherry-picking, backporting, …) them.
Channels may not update for some time in some cases due to random build failures.
This is rather opague to me:
Let’s assume I’m waiting for a security update to make its way through a channel. It has been merged in release-18.09. Usually after a couple of hours hydra should pick up the change and start building.
Now there are multiple things that could go wrong, some (like build failures with a compiler error) will be easy to figure out, others will just be “Exit code 1, but I’m not giving you any useful error message”, or even “Timed out after 10 hours”.
It’s easy to reproduce the builds locally assuming I’ve got a similar system type, thanks to the generated build script.
Now let’s assume the build works perfectly fine on my local system, suggesting an issue with one of the build systems. What can I do then? Should I contact someone?
Shouting in IRC with its high volume rarely yields a usable response to this and just repeating “could someone look at this failed build on hydra, please?” every hour is probably not very useful.
Should I open a topic on discourse? Or would it make sense to have a single topic that the relevant people can watch?
How can I support the people that actually need to do something here?
Those missing CVEs typically aren’t fixed on nixpkgs master either. Cherry-picks are usually easy, as you write.
That’s exactly what I just learned
As far as I understand this, both the nixos-unstable and nixos-18.09 channels are currently blocked by a problem with systemd. @andir is currently working on this, but perhaps he could need some help. See discussion on IRC in #nixos-security and https://github.com/NixOS/systemd/pull/24
Now even nixos-18.09-small has been stuck for a week. If there isn’t any low-hanging fruit left I think the way NixOS is developed must drastically change to be viable.
I think we can definitely do a lot to make some of these tests less flaky. Especially, the installer tests have had issues that are usually random. But, we actually do want things to get stuck when things are legitimately broken.
This particular hang wasn’t about tests themselves but about the build farm “maintenance”. TL;DR: you can’t well automatize everything – some humans still have to watch things and fix them when needed.
For those of you who were brought here by Google looking for help updating NixOS:
(I know this thread is only tangentially related, but this is where google keeps corralling that search query)
Here is my update/upgrade process…
Initially, using whatever channel you want:
sudo nix-channel --add https://nixos.org/channels/nixos-unstable nixos
Then to update:
sudo nix-channel --update nix-channel --update sudo nix-env --upgrade nix-env --upgrade sudo nixos-rebuild switch --upgrade
This was suggested to me on the IRC when I started asking questions about “why doesn’t setting xyz in my configuration.nix cause my system to upgrade?”
I was very surprised to learn that most of those settings didn’t do what I had imagined based on their names. This process might not be the “proper way” but it works well for me.
There are ux problems related to channels. The replacement is work in progress: https://gist.github.com/edolstra/40da6e3a4d4ee8fd019395365e0772e7
To revive this thread: Why hasn’t trunk-combined be evaluated over the last 5 days? Here is the last eval: https://hydra.nixos.org/build/86749887
There seems to have been an introduced an error in the kerberos test. @pbogdan have supplied a fix which should hopefully do the trick:
Thanks! Is there some way to find out about failed hydra evals?
As in be notified when it happens? Not that I know of. Perhaps if you have an hydra account, but that seems pretty restrictive at the moment.
Is there a manual way to check?
Last checked timestamp in the
Evaluation tab, so when that doesn’t result in a new eval there’s probably something wrong. The errors for the last check can be found in the
Evaluation errors tab.
For some reason the last check didn’t go through either, but the error blocking for the last days is gone at least.
Thanks, that’s good to know for the next time.