Why do Nix packages have names?

I started reading the documentation at Nix Pills and am starting to respect the design philosophy. The idea seems to be:

Each package has dependencies. We will achieve reproducible builds and integrity by making all linking explicit to specific library versions. When you open a shell, you will have access to a specific set of programs that are explicitly linked to certain version/builds of the target software.

If you are following best practices for builds, e.g. CIA’s “Development Tradecraft DOs and DON’Ts” (warning: SECRET//NOFORN) then you will not have build times, link times, directories in the outputs. Is this the goal for the Nix project? Do we want to have builds be byte-for-byte identical across compatible machines?

If yes then this is a most elegant design. The only remaining issue is the directory naming. It seems like the directories should be named as tar -c -f- ./build/ | shasum -a 256. The descriptive names for these packages and details about their versions etc. should be stored elsewhere, like in a SQLite database at /nix/installedPackages.db. Of course the person using Linux should not need to care about any of this and the tools should query the package database to set up $PATH or whatever other task is needed.

1 Like

Bit-for-bit reproducibility was not among the original aims. The main stumbling blocks are build systems of upstream packages, as they often try hard to inject some impurities (e.g. timestamps); it’s getting better thanks to others trying as well, but most packages still don’t reproduce AFAIK.

The naming scheme you write about is what we call “intensional store” around here; some call it content-addressability, too. The idea has been there all the time – you can find it in the PhD thesis already. So far it still seems that the non-reproducibility of most packages would cause more practical trouble in NixPkgs than the advantages of the transition, but hopefully one day that gets better and we will switch.

3 Likes

Hi,
a small personal opinion as a user (and an admin): I’d like to and I
found often useful to have names in /nix/store/* sometimes I have to
look for something in a package and the easiest and quick and simple
way is a simple

ls /nix/store/*pkgname*

also useful to see how many version/instance of a package I have. It
may seems marginal but is a simple nice form of control. BTW this is
one of many reasons that in general we still have, use and value
filesystems than other forms of storage.

– Ingmar

3 Likes

The naming scheme you write about is what we call “intensional store” around here; some call it content-addressability, too. The idea has been there all the time – you can find it in the PhD thesis already. So far it still seems that the non-reproducibility of most packages would cause more practical trouble in NixPkgs than the advantages of the transition, but hopefully one day that gets better and we will switch.

Then again, many many packages have their own output path listed in many files inside the output (and inside the build) and we do not even consider this non-reproducibility right now. It is a complicated question.

I am not sure the inclusion of name into the path will ever be the top problem…

Thank you for the explanation. I’m glad to hear that other people are worrying about reproducible builds too.

My next step was to ask for content-addressable programs. Ethereum has this and it works nicely.

As for querying the database with ls /nix/store/*pkgname*, there should definitely be a tool for that:

nix ls *pkgname*

But this tool could also produce dependency analysis:

nix uses *pkgname*

Indeed, such a tool exists: take a look at nix path-info --help. With the -r flag, you can see the full transitive closure of runtime dependencies.

You may also be interested in nix log and nix show-derivation, which can show the full configure/compilation output and input specification for any given path.

My next step was to ask for content-addressable programs. Ethereum has this and it works nicely.

It is not really comparable: Ethereum is not trying to be a build target for large preexisting software packages.

As for querying the database with ls /nix/store/*pkgname*, there should definitely be a tool for that:

What is the benefit of not including the name into the path?

1 Like

Then that tool needs to support

ls -lh /nix/store/*firefox*/bin/*

as well. With intelligent coloured output, please. I don’t think it’s worth reimplementing ls with all its goodies just to point it to a different database. And what do you tell those that prefer other tools to ls?

It sounds to me like you’re trying to create a database that is able to address content in another database.
Why not just keep everything in one database (the filesystem)?

Having an intensional store also means that it will be impossible to have programs that know their own path.
Which will break the pkg-config mechanism that many configure scripts use to figure out the question “i need library X. Which include and library paths do I have to use during build?”

If you don’t do enforce that, you will end up with a package that is built into /nix/store/abc123 and has a checksum of 123456. So you either rebuild with the new target directory /nix/store/123456 or replace all occurences of abc123 with 123456. But then the checksum of /nix/store/123456 will be 987654. What do you do now? Run in a circle and keep changing the references until they don’t change anymore? If you manage to find such a point, it basically is a proof that the checksum mechanism is too weak and we should use a better one.
So this basically only works for simple things with known outputs - e.g. the download of a file.
But do we mix these things in the store? It’s probably not worth it.

If it’s just about saving space… Then nix-store --optimize pretty much does exactly that by hard linking files that contain the same content and that’s probably more effective on a file level than on a derivation (“package”) level.

# nix-store --gc; nix-store --optimize
...
note: currently hard linking saves 6021.66 MiB
0 store paths deleted, 0.00 MiB freed
0.00 MiB freed by hard-linking 0 files

Reproducability is another value on its own and I think Nix/NixOS is actually pretty good there. Changing the compiler or any compilation flag automatically results in a different output path, as the output path actually is a cryptographic hash over all inputs and the functions used to generate that output. Change any detail in there and you get a different output path.

Debian is pretty strong in reproducability and there’s a very useful website (What's in a build environment? — reproducible-builds.org) that lists criteria important for reproducible builds. A quick view only shows me that Nix fails on one out of seven of these criteria: the build user.
More issues probably come from non reproducible build processes that optimize for a given environment like the CPU type
Debian is struggling with that as well: depending on what package you’re looking at, they’re only at about 90 - 95%.
Nix gets a lot of that for free due to its design.

If the path does not include the name then that means LibreOffice 6.0.0, with some set of options, for some architecture, will always have the same path on any system:

/nix/store/029id2390i23d9i4di3948id34i

And if you want that package on your system then you can build it yourself. Or you can use ANY method of getting the built tarball that matches that hash. HTTPS, HTTP, IPFS, Torrent, Rsync, Git.

If you want your software to depend on this package then check simply with: /nix/store/029id2390i23d9i4di3948id34i. This will not be possible if the user can assign any name they want to packages.

If the names are not deterministic then I see more value in the simplest naming approach: 000000001-glbic, 00000000002-libreoffice-6.0.0, named based on the order they are installed plus whatever name the user wants to give each package.

Since name and version fields are part of the definition of a package, the name /nix/store/029id2390i23d9i4di3948id34i-openoffice-6.0.0 is just as fixed (to a given definition, having specific inputs) as the name without that human-readable suffix. What benefit is gained by the removal?

How do your arguments about being able to populate the store out-of-band cease to apply with the name present?

Well, if it is truly apples-to-apples and the 029id2390i23d9i4di3948id34i is a is a git tree hash of the directory contents that would be great.

Adding an additional name would only serve to confuse people because

/nix/store/029id2390i23d9i4di3948id34i-openoffice-6.0.0
/nix/store/398d49ud94d8u394ud349u8-openoffice-6.0.0
/nix/store/u9d2u938ud39udu982d3u33-openoffice-6.0.0

Are all equivalent, maybe save for a few bytes inside the package.

Just like the immutable content store inside git, the naming is held outside the content store. That was the main parallel I was drawing.

Don’t think “git tree hash” as much as “git commitish hash” – it’s not literal tree contents alone that matter (except in fixed-output derivations, where the hash was provided when the derivation was defined), but the inputs and content, including linkage to other commits/derivations.

Even for non-fixed-output derivations, the names are deterministic, just not from that single derivation’s tree contents alone – one needs to see the derivation file (the .drv) describing the build steps, and which other components are referenced during the build process (thus, their hashes, &c).

Arguing that three different openoffice-6.0.0 packages should be collapsed to not use hashes as part of their names at all doesn’t make any sense to me – both parts add value: The names provide human-readable tags and allow quick-and-dirty querying; the hashes ensure that you can distinguish between different builds of the same program, no matter whether they’re “different builds” because they’re compiled with different options, or built against different libraries, or whatever else the case may be. Whether it’s just “a few bytes” of difference or one package is a completely broken local build (or has major features disabled to slim it down for headless use converting documents) is outside of the package manager’s scope.

(Sure, you could move the name out-of-band, but we’re back to the “why?”).

Adding an additional name would only serve to confuse people because

/nix/store/029id2390i23d9i4di3948id34i-openoffice-6.0.0
/nix/store/398d49ud94d8u394ud349u8-openoffice-6.0.0
/nix/store/u9d2u938ud39udu982d3u33-openoffice-6.0.0

Are all equivalent, maybe save for a few bytes inside the package.

I don’t understand the point here, if they are equivalent and the readable names are the same, what is confusing?

Also, we have a fixed representation of the hash, variable length never happens.

Also, we do have paths that are equivalent but distinct, because they are linked against different versions of some libraries but the library changes are irrelevant for software in question. We do look for a way to reduce the costs of such rebuilds, but we are likely to keep considering these paths distinct.

If the path does not include the name then that means LibreOffice 6.0.0, with some set of options, for some architecture, will always have the same path on any system:

If it is already the same LibreOffice 6.0.0, it will have the name libreoffice-6.0.0

Note that derivation names do not depend on Nix attribute/variable names used to pass the derivation around. Overriding the name is an action that is on the same level as changing some configureFlags, it doesn’t happen by accident but feasible if desired.

If the names are not deterministic then I see more value in the simplest naming approach: 000000001-glbic, 00000000002-libreoffice-6.0.0, named based on the order they are installed plus whatever name the user wants to give each package.

The hash fully specifies the build process. It doesn’t always specify a reproducible build process because facts about environment leak in. We assume that the honest build outputs with the same hash are replaceable (it is not always entirely true, unfortunately, if too much CPU detection happens during the build). So dropping it would remove information.

1 Like