If you had a huge team of engineers and a lot of budget. What would you change in Nix or NixOS and why?
Let me start. I would start by making the nix commands more uniforms, their man pages rarely help me and are very inconsistent. For example, I can’t figure on which commands I can use --substituters. And “nix command” doesn’t help much by providing similar features but with a different behavior and different man pages.
The second thing is not being able to upgrade a NixOS system if a package doesn’t build anymore (claws-mail recently, or zim on a regular basis), of course this is normal that the build is failing but we could have a simple mechanism to tell it’s ok to try an older package version or something.
There is a presumably-working version installed already, it’s just a matter of not removing that one (and its dependencies). There are obviously going to be some more complex cases, but at least for most leaf-ish “application” packages, the app would keep working even if it was actually a dependency upgrade that broke it.
There are some difficulties:
Declarative reproducibility: the resulting system is not the one you’d get if you rebuilt from scratch with the same config and channel revision, and there’s no way (from that channel revision) to build the old version of the package you “kept”. This seems obviously a compromise worth making, or that is unavoidable as a direct consequence of the goal, but we should look for a way to avoid that.
Specification: We need a way to specify which package and version is to be kept behind, basically a way to link the existing store paths into the environment so they can be found by $PATH and won’t get gc’d.
Override: We need, at a minimum, a way to prevent the failure from breaking the new build. Ideally, for the more complex cases, of using the existing version to satisfy dependencies of other packages (beyond just the system derivation).
It seems to me that the first two, combined, contain the clues to a solution for the simple case: we’re giving up declarative configuration for this package, so we shift to using the imperative mode. and basically symlink the store path into a suitable profile just as nix-env does. Holding on to old packages like this is basically an imperative state-manipulation anyway. Then 3 is just a matter of removing that package from the declarative configuration, as long as it’s just a leaf.
For more complex cases, flakes have the necessary mechanisms for specifying and pinning particular versions (but mostly at the repo level, so the nixpkgs monorepo still presents some challenges). That’s the way forward, the workaround above is viable in the meantime.
Wow. My answer turned into a huge wall of text. Guess I have a lot to say on this issue…
Well anyway, here it is:
Replace the nix language with something else, still lazy & purely functional, but with strong typing and clean abstractions. Some specific differences I’d like to see:
Strong separation of the eval-time filesystem path space from the runtime filesystem path space. Ideally with a guarantee that eval-time paths cannot escape into build outputs and runtime paths cannot be accessed at eval time. Nix has some nice defaults to make this separation easier, most notably the way the ${foo} notation handles paths vs strings, but nothing in the language forces you to use them correctly. See the strange behavior of home-manager’s home.file.<name>.source option.
A built-in override system that can be applied broadly with little effort. “Please make this thing overrideable” should be a trivial request to complete.
Full evaluation caching for flakes would be nice. (Not just caching outputs, but intermediate shared values as well)
Devshells need a ground-up restructuring:
Specify a datastructure describing a “shell environment” in a way that’s as independent of one’s choice of shell as possible, and where it can’t be independent, it can support multiple shells.
Specify a representation of that datastructure in the nix store, which nix commands will understand.
Create helper functions to create these environments easily.
If possible, re-implement the build-alike environments, that is, the original conceived usage of nix-shell, in terms of these new shell environment specifications.
Build the infrastructure for sharing build artifacts between devshell builds and nix builds. Recursive nix should make it possible to do this even with upstream build systems, if we put in the work to make wrappers for compilers, etc.
Think about flakes a bit differently, hopefully cleaning up the rough edges of the abstraction:
Flakes should allow arguments, but require default values for them.
When evaluating a flake, the copy in the nix store should be modified to reflect the specified arguments, input overrides, etc, so that if you keep a reference to self, and evaluate it again, you get exactly the same outputs, guaranteed. See nixpkgs#6894 and nixpkgs#6895.
In short, a well-formed flake should always specify a fully reproducible set of outputs, but it should also be thought of as specifying automatic processes to modify itself in various intended ways, such as updating inputs, overriding inputs, and altering arguments.
Allow computation in flake inputs, and inputs depending on other inputs. Mostly the nixpkgs lib in practice, I expect. Perhaps also cache some computable flake information, keyed to the hash of the whole flake not including the cache itself. This could allow an extra user check before running potentially long computations because the cache is invalid.
Don’t auto-lock flakes. Instead, error out when attempting to evaluate a flake that’s not fully locked (this should be checked after any specified input/argument overrides are applied).
Make special transformations of flakes, such as git crypt unlocking them, part of the flake uri schema. As this produces a different flake, it should have a different uri.
Fixed-output derivations are too easily abused, and often lead newcomers into a trap of effective non-reproducibility through fragile hashes. Remove them in favor of a combination of:
Builtin impure fetchers; “impure” meaning they do not specify a hash and are always rebuilt rather than using a cached version.
Impure derivations which are still fully sandboxed, with no network connectivity, but can rely on impure inputs and are themselves “impure” in the same sense as above. Note that this is not quite the same thing as the “impure derivations” that have been implemented as an experimental feature in nix.
A “purify” operation, which takes a list of impure derivations and a hash, and tries the derivations in order until one succeeds and gives the correct hash, renaming it to a content-based name. Nix might also be configured to attempt other methods of finding something with the specified hash beyond what’s specified in the nix code.
Explore well-abstracted methods of managing runtime state, and how they might integrate with nix.
Nix profiles and, more generally, gcroots, should ideally fit into this state-management system cleanly, alongside general system/service/program state.
Any given bit of state should have a stable filesystem path, regardless of actual storage location on the underlying system. This minimizes the interface that must be stable across distinct generations of the static config and distinct low-level hardware setups.
Backups and restoration of state, or even synchronization of state, where possible, could be understood by this system. Restoring from backups could be as straightforward as just installing from the same flake again, and having the system automatically realize at runtime that it’s missing some needed state, and go fetch it from the backup.
I really like your point of view, I don’t know exactly the amount of work required to implement something that could do 1 and 2, but this would be great!
I wonder if packages installed with nix profile could “soft fail” upon upgrade so you keep the older version of it and it still works.
Plenty of things, but not listed here yet is a new mkDerivation:
topologically sorted phases: not being able to set the order of phases blocks progress;
structured attributes: ad hoc variables used by hooks becomes a mess;
clean separation between inputs such as phase arguments and what is passed to derivation as attributes: package expression author should not need to know derivations are used;
multi-derivation packages: less building time, especially with CA.
If I had a huge team of engineers… I would immediately sink in coordination issues, as it always happens.
But if they could be overcome…
A clean and optionally-user-visible eval/instanitate/realise separation. Ideally separately pluggable stages…
Cleaner and pluggable build management, e.g. updating the build strategy on the fly, not interrupting build X needed by A when A is cancelled if a still-active B was also waiting on X, ability to keep and use in planning a database of build costs
Nixpkgs is morally a database, and find a way to manage it as such. Eval separation might help. Needs to integrate with universal overrides, of course.
NixOS module system split. Part 1 is a type system. Make it efficient, too. Most of the module system work gets pushed to universal override system Nixpkgs-overlay-style. Part 2 of the module system is maybe used to coordinate what cannot be fit into override style even with a hammer. I guess that part 2 would be pluggable.
Platform support extended? Including pluggable kernel support for NixOS (well, it also requires pluggable init systems; no problem with feature imparity as long as matching features are handled uniformly)
structured attributes: ad hoc variables used by hooks becomes a mess;
clean separation between inputs such as phase arguments and what is passed to derivation as attributes: package expression author should not need to know derivations are used;
multi-derivation packages: less building time, especially with CA.
On all of them that eventually realize a given derivation. But only if you are a trusted user or the mentioned substituter is already in the trusted-substituters list and has a public key available.
Or well, is it actually --option substituters? I never remember which options can be overriden using arbitrary flags and which require usage of --option…
Not (easily) possible in nix. Though indeed it should be easier to know what exactly pulls a broken package.
Even if we manage to build the broken config and use --why-depends we might get stuck with a meaningless information.
I wish there was some tooling that would allow to query the configuration itself for reference of build inputs by store path.
I keep running into this. Flakes are almost a generic replacement for things like github actions that can easily and reproducibly manage silly project chores like “update my Cargo.lock”.
I’ve been hacking it through putting scripts in apps, but that’s not what those are intended for. Some integration with nix flake update would also be great. Maybe this is where the “type” attribute could actually be used?
A smarter remote builder protocol that makes a difference between big-parallel builds and normal ones and cpu usage of different remote builders.
Also I want CA to really take off and multi-derivation packages to become a thing.
I would make sure that self can properly depend on git submodules (despite my aversion against them)
I’d make the flake schema more modular, such that we can provide own “labels” or “checks” for custom outputs, to not get annoyed by “darwinModules/homeManagerModules is not a know attribute” anymore
While I was a big fan of it for a long time, the more I use alternatives at work (i.e. std), the more I am starting to dislike the module system. It’s just too complex, it is the main source of confusing error messages that are no help, and I think that the suboptimal merge strategy of the Nix expression language would be solved better by using a different, but similar language; preferably one with static types, and type inference.
I think I am also slowly coming to the same conclusion as I learn more about the module system.
On a similar note, I wish the module system separated “Types” from “Merging Strategy”.
Coming from an imperative background, the idea that an option defined in multiple areas of a code base could result in value that is a merging of all those definitions was quite surprising and fairly difficult to grasp. I feel it would be beneficial to those coming in from similar backgrounds if this was spelled out more clearly.
Well I only mentioned it because we are explicitly talking about a wish list here, but it seems like it would be very difficult to decouple NixOS from the module system at this point, at least without losing valuable information.
Thinking a bit further on this though, it would provide a clean boundary between config and packaging to just use an entirely different language for configuration all together. Maybe Nix doesn’t need to be the end all be all configuration language, maybe we could just use a better language for that (Nickel?) and just focus on using Nix for packages?
For experienced users the fact that config can seamlessly include a bit which is closer to a package definition (why package list would not contain an override…) is quite a powerful escape hatch, though.
If so many people talk types, I want a completely different thing abot types: reduce the usage of booleans and add a type actually suitable for what we use booleans. Binary yes/no does fit in some cases (we either run the tests or not), but more often our reality is «known-good / unknown / known-bad» or sometimes even multiple gradations of «known», and hammering this into binary yes/no leads to pretty absurd looking discussions… Like all the history of changes around enableParallelBuilding is the story how painful it is to use booleans for handling «reliable-yes / unknown / flaky-no / reproducible-no»
I’ve heard on at least one occasion that it would be difficult for a company to adopt Nix/NixOS without having a vendor they can go to for support.
If I had a team of engineers and a lot of budget, I think it would be interesting to create a vendor that focuses on Nix/Nixpkgs/NixOS. First order of business might be to pick a random NixOS release and turn it into an LTS, where it would be supported for 2 years or so (instead of the normal 6-months).
It would be interesting to see if creating a vendor like this would spur more interest in using Nix/NixOS professionally.
Personnally I would try to improve the developper experience. NixOs fits poorly with many “language package managers” (e.g. npm) that tries to download pre-built binaries, and it would be great to automatically patch these files, provide a loader in NixOs or provide a way to enter a kind of FHS without creating a VM (VM or even steam-run are poorly integrated with the host system). Just trying to program with electron is already a challenge! EDIT I discovered meanwhile nix-ld whose goal is exactly to provide a system-wide loader… Really cool, this should be more documented ! That said this won’t help to package stuff in nixpkgs as this only works in NixOs.
Another improvement would be regarding caching when creating/debugging a derivation: I spent so many time recompiling a big package 10 times because of a few typos in the install phase… So adding a simple cache between the build phase and install phase would be really cool. I’m aware of nix-shells but in any case I’ll need to compile at least twice to test the final derivation, which is a waste of time. Ideally, nix-build could even provide a debug option where when we build twice, it tries to use to files that have already been compiled in the previous try, providing basically the benefits of nix-shells directly into nix-build, which would save the need for manually calling the various phases in a nix-shell.
This is why the most important thing we do is not any new features, but emphasizing layering.
The more modular the Nix ecosystem is, but more isolated groups of people will be able focus down on individual areas of improvement in isolation, and yet we can still try out their work through mixing in matching.