Using Nix in distro development?

We are developing a Linux distribution. It’s based on Linux From Scratch and uses dpkg as a packages manager. But I would really like to use Nix instead!

The whole team is okay with doing this switch, but we have several technical concerns…

TL;DR

How to avoid supply chain attacks and software rot when using Nix in distribution development?

Infrastructure

The NixOS/nix

:white_check_mark: Can be reviewed and vendored. No problem here. The codebase is more or less trusted to be safe from supply chain attacks or backdoors.

The binary cache

:white_check_mark: Can be selfhosted.

The nixpkgs/lib

:warning: Should be used “as is”, but unexpected changes can break our own proprietary packages.

The nixpkgs

:no_entry: Can not be used “as is”:

  • Many different maintainers that can’t be trusted.
  • Different release cycle would eventually lead to us being hesitant to rebase the repository.
  • Has a lot of packages that we don’t need.

The hydra

:warning: Can be selfhosted, but building the hydra itself requires packages from nixpkgs. Kind of a chicken-and-egg problem.

The offline requirement

Everything has to be built offline. Even though nix uses hashes to verify anything that is being downloaded from the internet, the Nix itself can not be fully trusted to uphold this guarantee.

Moreover, hashes won’t save you from upstream completely deleting or restricting access to the source code.

Though, this is not a huge issue because it can be solved by using our own mirrors. We have the resource for that. Moreover, Nix seems to be very friendly towards fetching things from the filesystem instead of the internet.

Summary

Basically, the biggest problem is that Nix heavily relies on nixpkgs. But there are several problems with using it directly. And reimplementing everything, while technically possible, would turn out to be really difficult.

Even if you cloned NixOS/nixpkgs, removing everything you don’t need and reviewed anything to ensure safety, you now have a massive repository that is completely out of sync with the upstream. 4-5 years down the line and you would be stuck with old and unmaintained packages.

Right now Nix is very good for setting up an occasional devShell, but quite often you see a CI yelling at you for using features from newer versions. Having different dev and production environment is very unpleasant.

Do you have any suggestions? :sweat:

1 Like
  • nixpkgs/lib — most probably you can snapshot then port updates that solve specifically the things that become a pain point. A large part of the code is very stable, maybe you should only rely on low-churn parts.

  • Nixpkgs — what are you doing with packages currently and how many packages do you intend to support officially? Nix will work just fine if you replace Nixpkgs with your own stuff. You need to either replace or fork stdenv, and it is possible that you want to inspire yourself on the older and simpler versions of it (depending on your needs).

    • Right now stdenv is complex, but it also supports many architectures, and cross-builds, and stuff… If you just need Linux on a couple of architectures, you can start with some old stdenv version and trim quite a bit. Most probably a lot of leaf packages would be then usable with a small and straightforward adapter.
    • You might look at triton/triton for an example of a (out of date) fork trimming down things. It seems to have been maintained for a few years by a really small team — so if you have limited scope it is an option.
    • I am not too optimistic about Auxolotl, but they do have a different setup for the core, you might want to look and compare.
    • Note that a user will be able to clone Nixpkgs on their own and install something from it in a separate devShell. Although if you don’t use Nix, installing Nix then doing this is still an option, obviously.
  • Hydra — how high are you aiming in terms of size? You might be able to set up some builders and run nix-build in a non-Nix-specific CI/CD, still with per-derivation parallelism and caching. It might even have lower overhead than Hydra, if you maintain a handful of entry points which never have evaluation failures (build failures are allowed, you can still ask for as much as can be built).

  • Offline requirement — requires some setup, you do need mirrors if you want reliability in face of flky upstreams; pretty easy overall in my opinion (depends on the definition of the goal, though — but you can request and realise all the transitive build dependencies with a rather simple command line). But yes, one needs to pay attention, especially when using binary caches.

Overall, for the package set I am afraid that you have two realistic options — cut down the scope to be able to review everything yourself (possibly looking at Nixpkgs solutions for the harder minority of issues), or partially use something with huge coverage and huge maintainer base (Nixpkgs, or GuixSD if you are willing to look at Guix and compare)

2 Likes

Let’s hope lib doesn’t depend on anything besides builtins so we can extract only that one folder and leave out the rest…

Although I am not sure how to do that from the technical point of view, because using git submodule of the whole nixpkgs repository for a single folder sounds nuts.

Why oh why is it not in its own repository…

Mostly trying to keep everything from falling apart… You see, besides x86 we also target arm and RISC-V, but it’s not done through a proper cross compilation.

I won’t get into the details. Basically, packaging core system’s components (like compilers) requires so much patching, everyone avoid updating packages for as much as technically possible. It’s a huge mess.

There are also some patches that exist because of government regulations (like CVEs fixes and fixes introduced by static analysis). Not all of that can be upstreamed that easily.

Well, we aren’t making a distro in a traditional sense. The OS closer resembles macOS, iOS or Android since the application developer is not expected to use anything outside of what we provide as an OS API (that means the developer shouldn’t rely on our software repositories, package manager or FHS).

The Nix looks appealing for its…

  1. Ease of cross compilation.
  2. Ability to hide complexity behind custom functions.
  3. Ability to make dependency graphs explicit.
  4. Active community.

In that order.

Yeah. After I read a Nix Manual it finally made me realize the separation between the package manager (nix) and the package repository (nixpkgs).

I assume after you fork/clone/copy/rewrite stdenv and lib you can bootstrap everything else from the nixpkgs? Or is there some other component that is important for proper (and simple) nixpkgs packaging?

I assume it is high enough to make using Hydra justifiable. We have around 80 developers. All of them work in teams on different parts of the OS. There is a lot of proprietary code that’s written in-house (like a compositor, trusted boot, application validation and sandboxing, system services).

Right now we do have a centralized CI that handles beguiling all the packages. I think Hydra would be and obvious choice. I also assume Hydra handles building .iso images, which would be a nice alternative to our existing workflow.

That doesn’t sound so bad. While using nixpkgs blindly, without any review whatsoever, does sound scary (from the paranoid point of view of an OS development, when you want to vendor everything), it’s all just Nix code and you can always see what’s it doing.

I just don’t understand how to do that from the technical point of view. Forking the whole nixpkgs would introduce a lot of code that we don’t need. I feel like almost 70% of nixpkgs is irrelevant to us. Constantly rebasing all of that would be nightmare. And rebasing automatically kind of defeats the purpose of vendoring.

Then there is the issue of keeping our fork in sync. It’s won’t be that big of an deal of they eventually diverge, but it somewhat defeats the whole purpose.

What do you mean by that?

A not-so-random bunch of normal-ish packages are actually stdenv dependencies, of course. However, they are not inside stdenv directory itself.

Thats sound actually «low» in terns of Hydra paying off. You are going to ship what, ten images, three images per platform? And maybe a bunch of tests that can be made deps of the «all-tests-pass» quasi-test. Sounds like hacking together nix-build in whatever CI setup you have could be close enough. Hydra treats all the tens of thousands of packages as independent entry points which makes more sense in our sprawling stuation not spearheaded by a single deliverable.

I really hope you need way less than 30% of Nixpkgs.

That’s probably your actual best bet: fork, review, diverge, look up solutions to hard problems in Nixpkgs.

We do have package update scripts, so if you have integration tests for things you care about, you might just about pull off maintaining a fully independent narrow-trimmed fork.

If you want to follow the upstream closer, welcome to reviewing periodic treewide cleanups touching the packages you care about…

If you want to give back something with no disclosure around your core competence, you might have a flow where you semi-auto-port the changes to whatever counts as core for your needs to a public repo, then merge them into another branch after reviewing. I think a public narrow fork would be a cool case study «that’s how you use Nixpkgs if you are forced to invest into better review than sustainable at full Nixpkgs scale».

Nixpkgs has really a lot of packages. That means collaboration of really a lot of people, and sometimes negotiating the trade-offs so that no area is too painful to maintain. And many areas are covered just by one volunteer who happens to use this stuff.

Having uniformly high review standards across all this is a bit absurd, having higher review standards for some stuff makes sense but the details are not always legible. And also some of the trust involved is based on people having seen each other work on the same project for a decade or so — you don’t a priori have trust in the same set of individuals.

So yeah, if you are worried about supply chain attacks via build scripts, you probably have to define what you care about and rereview it, privately or in a public repo.

(Of course you’ll also need some clearly-non-public extra package definitions for the in-house non-public stuff; but that should be easy as you know how the package expects to be built)

2 Likes

See GitHub - nix-community/nixpkgs.lib: nixpkgs lib for cheap instantiation [maintainer=@github-action] (with initial help from @blaggacao) which is already an extraction. I’m pretty sure there’s no dependency on nixpkgs in lib, and have even heard suggestions of moving it out of nixpkgs completely.

2 Likes

What I get from that is that you merely want an implementation of Nix (the build manager) that is independent from Nixpkgs and rather build up your own package hierarchy (to mimic what you currently did with LFS).

Have a look at auxolotl.org which provides a standalone lib and a minimal bootstrap package set called foundation (note that the project itself is still in pre-alpha…).
A next step would be to build Nix/Lix on that base to have a minimal set of trusted dependencies for your build tool, then create derivations for what you need (by adapting code from Nixpkgs and thereby reviewing it).

1 Like