Probably will cause fatalities etc, so not funny.
Definitely no laughing matter.
Itâs interesting to think about how the NixOS approach can help protect against such problems.
I donât think it can prevent it entirely: any system powerful to update itself is probably powerful enough to break itself, so thereâs no substitute for disaster recovery measures that could rebuild any machine from backups if needed.
Still, the generations system allow easier recovery from anything that breaks due to a nixos-rebuild
. While machines will likely have some state in /var
and similar, that should be unlikely to influence the parts of the system that are needed to perform administrative tasks. Would there be any way to verify this is indeed the case? Or would this be another reason to go âfull impermanenceâ?
I think the whole MS ecosystem is antediluvian and defective.
Events such as these will surely just enhance the focus on
extremely stable systems like Nixos.
(I am certainly not laughing either, other than the involuntary laugh when you see people using MS for mission critical infrastructure)
Nix/NixOS, or a methodology like it, is definitely a pre-requisite for what you outlined. Without something like it, youâre just treating the consumerâs computers as testing grounds.
The more I read this 2002 paper, the more it seems reads like a prophecy. Why Order Matters: Turing Equivalence in Automated Systems Administration
Due to modern societyâs reliance on computers, it is unethical (and just plain bad business practice) for an operating system vendor to release untested operating systems without at least noting them as such. Better system vendors undertake a rigorous and exhaustive series of unit, system, regression, application, stress, and performance testing on each build before release, knowing full well that no amount of testing is ever enough (8.9). They do this in their own labs; it would make little sense to plan to do this testing on customersâ production machines.
Still, the generations system allow easier recovery from anything that breaks due to a
nixos-rebuild
.
There are conflicting requirements here though. Being able to boot into an earlier generation with a known-outdated EDR package can be seen as a security risk, so it would be impossible on a sufficienty locked down system.
It depends on your desired level of lockdown, of course. Thereâs plenty of situations where âusers that have access to the boot console are trusted to switch to older versions only when neededâ (or some variation thereof) is reasonable.
The scenario where you donât want that is also interesting to consider, though, of course. Perhaps in that case you could automatically throw away old generations only after youâve successfully rebooted into the updated one? Of course that comes with the downside of reboots, but that could be a reasonable trade-off.
To handle the locked down cases, there could be some kind of flag which disables certain things unless the most up-to-date revision is being used. I think that in most cases, âcanât bootâ is far worse than âno write access to prodâ.
But I donât think that most people actually need to be locked down to such a degree. Unless youâre ok with being occasionally down without recourse, you have to eventually trust the end user a little bit. I donât know what the contents of the corwdstrike update were, but they canât have been that important relative to the pain caused by their side effect.
Yeah, I also think the solution to this kind of thing lies in being able to verify pre-deployment. Sure, state can still cause issues, but if it is state at least it wonât bring down all your customers simultaneously.
Without NixOS there is simply no way to test with sufficient integration to catch these things. I can just imagine how nightmarish trying to test against a proprietary OS with practically random updates is compared to what we have. Even if you get special pre-warning from Microsoft because youâre a big player, you have no control over what your customers ultimately run.
Youâd basically need to do fuzzing to assert that. Not impossible, but quite costly.
It makes me want a filesystem that knows which file was created by which package so that we could mount an overlay on /var which is missing the files created by whichever packages are currently under suspicion.