Creating vendor directories directly in srcs of go and rust packages so fixed output derivations won't be needed

So there’s Restrict fixed-output derivations · Issue #2270 · NixOS/nix · GitHub, and Recommend buildGoPackage instead of buildGoModule in the nixpkgs manual · Issue #84826 · NixOS/nixpkgs · GitHub which suggest that it was a bad decision to design buildGoModule and rustPlatform.buildRustPackages as they are today. I’m not sure I deeply understand the core issue of Restrict fixed-output derivations · Issue #2270 · NixOS/nix · GitHub, but I understand that if we could only go back in time and avoid fixed output derivations for sources, we’d satisfy edolstra.

It’s commonly known, that the alternative to buildGoModule is buildGoPackage which doesn’t produce fixed output derivations but people don’t like to use it because it requires adding a deps.nix that is large in size and it’s not comfortable to update it.

I have an idea but I wonder if it’s too wild: Rewrite buildGoModule so it will parse (in Nix) go.sum from ${src} and make it construct the fetchurl or fetchgit calls that will satisfy the check sums found there. Emulating Go’s hash calculation in Nix wouldn’t be easy (these are not exactly plain sha256), but I’ve found this project GitHub - vikyd/go-checksum: Simple tool to calc Golang module checksum of go.mod and module dir. which claims it has implemented this, in Go.

As for rewriting buildRustPackage in a similar manner, it seems there’s a somewhat complete Nix “Parser” and hashes calculator - https://github.com/nmattia/naersk/blob/a82fd7dc31a58c462b6dfa9d9d886fa2cc75dfd4/build.nix .

Is it possible to read a file found in ${src} and do pure Nix stuff based on it’s contents?

I think edolstra added the fromTOML builtin specifically for this usecase; as nix has a Rust dependency now.

See https://github.com/NixOS/nix/blob/8351d36b214fcd00f4a769e96ebaeb22af41164c/release.nix#L17-L24

For that to work we need to include the go.sum of every package into nixpkgs.
Also we need to modify buildGoModule to reconstruct the modules.txt inside the vendor directory.
Importing $src does not scale because that would means that we need to import each time we evaluate nixpkgs every package we have. We do not allow this in nixpkgs for the same reason.
Since we need to add extra metadata anyway I would actually generate nix source code directly since parsing other file formats with nix without builtins is potentially slow. Maybe we can make the tooling for this better than go2nix so that updating it, does not suck.

I see. There’s another topic which seems related now, which is “importing from derivation” which hydra doesn’t yet support: Import From Derivation - NixOS Wiki & Is importing a derivation from another repository bad practice?

If hydra was able to import from a derivation, it would have been possible to create a deps.nix inside the $src of go modules projects, and then import it?

Yes. But also this would be expensive. It would means that if you for example want to install docker from the binary cache, you would first need to download the docker source code to do the import from derivation.

It would mean that if you for example want to install docker from the binary cache, you would first need to download the docker source code to do the import from derivation.

WOW … :sweat_smile:

It feels pretty insane that these ecosystems have worked so hard to make sources reproducible but we can’t use their reproducibility methods directly in Nix? Maybe this observation is incorrect, because with either of the methods I mentioned above, it will be reproducible but it will require either downloading the source to evaluate (with Import from derivation) or downloading go.sum files to Nixpkgs…

You’ve mentioned:

Also we need to modify buildGoModule to reconstruct the modules.txt inside the vendor directory.

I noticed that in your recent improvements to buildGoModule, you linked the go-modules derivation to ./vendor. I wonder: Is it possible to use an extraPostFetch instead? This will spare one hash from the buildGoModule derivations.

Maybe Internet is not available for commands in this hook but it’s not documented so IDK. If so, then perhaps we need to make fetchurl and friends be able to perform arbitrary actions with internet access.

This sounds feasible.

1 Like

:upside_down_face: The question that remains is: Such srcs wouldn’t be considered by edolstra as fixed output derivations in disguise right?

No. I think his plan is to move all fetchers over to builtins. However I don’t think this is feasible since we have to many VCS’s around.

It feels pretty insane that these ecosystems have worked so hard to make sources reproducible but we can’t use their reproducibility methods directly in Nix?

It feels insane. Other ecosystems are thoughtful (enough), too.

See also: Fixed-output derivations to become part of flake inputs?

Note that in the case of Rust it is not enough to add Cargo.lock files to nixpkgs. You can read the lock file with fromTOML and specify sources with fetchurl (IIRC you can even use the hashes directly with fetchurl). However, you cannot fully evaluate the package expression, since you don’t know the default features of dependent crates without reading their Cargo.toml files and thus the rustc flags. So you’d need IFD as well.

For this reason approaches like crate2nix use cargo metadata to get richer metadata than Cargo.{toml,lock} provide and generate a Nix attrset from that.

Edit: if repository size and evaluation time/space was no issue, ideally, all of crates.io would be available as a Nix expression.

3 Likes

In case you havn’t had the chance to know it yet: there is

GitHub - edolstra/import-cargo: A function for fetching the crates listed in a Cargo lock file.

I know that project. But like ‘buildRustPackage’, it is not using Nix to build the dependencies. So dependent crates are not realized into their own output paths and you don’t get the benefits of caching dependencies. The improvement is that you don’t rely on cargo to download the dependencies, but you are still relying on cargo to build all the dependencies for you, rather than benefitting from Nix’ strengths.

I should have made explicit that I meant approaches that build dependencies using Nix, though the end of the first paragraph hints at it ;).

1 Like

The linked project sounds like a 50% fit for this discussions title (100% if you strike out the word “go”). Is that correct?

You mention two interesting things:

  • caching
  • fetching rather via nix

My current understanding is that caching is a real thing for “core” vs “leaf” packages on centrally hosted caching infrastructure (cachix).

So it is a optimization constraint that in certain contexts fades in importance. While in others it’s utterly important.

Say, for example, for distributed leaf package flakes, it would be less of a concern, since they externalize the costs to the user (in it’s role as builder), anyway.

On the other hand, why should fetching via nix as opposed to the nativ tooling be superior?

Go for example implements indirections for urls which to the reader in a browser represent package documentation, while to the go getter it presents a redirection html meta tag to the package source. Pretty cool. Obviously, this could ve mimicked by a nix native fetcher, but if the reproducability constraints and expectations are met, why not delegate?

Because it’s harder to audit than to reimplement?

(And is that really true, at all?)

Only if you pick out one point of the discussion. In the original issue, Restrict fixed-output derivations · Issue #2270 · NixOS/nix · GitHub , one of the main arguments of Eelco against buildRustPackage is:

Such impurities are bad for reproducibility because the dependencies on external files are completely implicit: there is no way to tell from the derivation graph that the derivation depends on a bunch of crates fetched from the Internet.

But the same argument applies equally to approaches where fetching is transparent (fromTOML + fetchurl), but the whole build is farmed out to a third-party tool like cargo. You do know what source tarballs are in the transitive closure, but the Nix expression does not encode the exact dependency graph and how the sources are built. This leads to the downsides I mentioned (no caching of dependent crates).

I am not sure what you are saying here :wink:. Almost every Rust package that we currently package is effectively one leaf package and tens or hundreds of interior packages (all the dependencies). With buildRustPackage we just pretend that the whole thing is one leaf package and lose many of the benefits of Nix (such as caching).

Contrast that with buildRustCrate (e.g. via crate2nix). Builds are typically blazingly fast because all the shared crates are only compiled once. Moreover, you get Nix expressions with properly defined dependency graphs. It’s how Nix is supposed to work.

@mic92 and @kolloch are working on making crate2nix more fit for use in nixpkgs:

https://github.com/kolloch/crate2nix/issues/102

  • Source dependencies are not explicitly defined.
  • Every builder retrieves the same sources over and over again, because dependencies are cached at the vendored-tarball level and not at the individual source level. So if two packages use the same version of the libc crate, it is fetched over and over again.
  • It’s fragile. If upstream changes anything to the vendoring that cannot easily be normalized, all hashes break.
3 Likes

I am not quite sure what current state on this is. I asked for an update.

1 Like

I think we need to setup a CI system such as Mach-nix but for Go, Rust, and NodeJS. I tend to imagine it should be easier to create such systems for these ecosystems vs how hard it was for @DavHau to do it for Python. Because crates.io and and sum.golang.org are much more arranged then pypi.org .

1 Like

Ah, jep, in a farmed out scenario we’d loose a lot of caching potential. I ponder still, how big of an issue that is in a farmed-out build scenario. Clear it is that for cachix/hydra etc, not having proper caching would soon get too costly.

I’m not sure this is superior over usurping native tooling. At least in go, arbitrary hashes are tagged as nodes in the dependency graph only loosly further constrained within a major version range.

What deduplication benefits could possibly manifest if there are no range abstractions in the dependency graph at all, such as there are in pypi?

Therefore the full dependency graph for pypi is compressed obly about 260mb, for Go - ceteris paribus - this would be orders of magnitude bigger.

In a farmed-out secario, which doesn’t maintain builds for thousands of packages, the advantage of deduplication minifies even more.

In the end, marginal deduplication benefit could be very low over a fixed output derivation for those languages to content with ensuring build integrity. So low that, maybe, we are left with a purity in style argument.

Note that I firmly believe, the solution to nix current scaling constraints derived from contributor resource constraints is farming out (flakes! - it’s already quite figuratively in the name if you think of snowflakes, for example).

So everything that makes farming out more cumbersome for the target farmer (i.e. deviate too much from native tooling) than it needs to be (to ensure build integrity) is a thing we should question.

On the other hand: if such proposed CI system could make those packages available without asking any “target farmer” to do something, then those are low hanging fruits, provided the maintenance is built in a way that any number of N package builds can be maintained within reason, where N is greater than what flakes could achieve within reasonable time. That’s true regardless of wether we can materializing cache benefits. Then, it’s an excellent idea!

That said, a really nice advantage of the status quo is that it actually works, is very simple, and has extremely low maintenance cost since expressions can be written/read/debugged/modified/overlayed entirely by hand. Since both go and rust are building statically linked applications there’s no end-user installation difference, either.

4 Likes

The reason why mach-nix can do what it can do, is basically because it allows IFD.
IFD makes many things much much easier, as already mentioned by some people in this thread.
IFD provides a lot of power to nix. It allows to automate a lot more stuff, like for example reading packages metadata automatically and then acting according to it.

If we could only do IFD in nixpkgs, it would become much simpler to incorporate packages from other package universes.

So, maybe we can eliminate the drawbacks of IFD in order to comfortably use it in nixpkgs.

As far as I understand, the only serious problem with IFD is evaluation cost.
Could caching be a solution for that?
Nix 2.4 already comes with nix evaluation caching.
If this functionality would be extended a bit and an online cache made available, similar to cache.nixos.org, would this fix the problem?

If you think about it from another perspective, much of the code in nixpkgs is basically just an evaluation cache created and maintained by humans. We browse other repos, download files, read some of their metadata and then write nix expressions and put them into nixpkgs (our cache for the human IFD engine) . All manually and with very high costs. Much of it could be automated for sure.
Of course this can already be automated right now with code generators. But IFD would allow us to get rid of the code generators and also the auto-generated nix code itself. Instead, we could have some nix code which is able to properly interpret fetched packages.

I think we need to setup a CI system such as Mach-nix but for Go, Rust, and NodeJS…

Nixpkgs+hydra is already a CI system. Because this CI system is missing a feauture (IFD), you’re proposing we should make a new CI. Wouldn’t it be better to improve our existing CI and add the missing feature?
Our forked IFD-nixpkgs will sooner or later run into the same problmes as the current nixpkgs would when using IFD. Better fix the IFD issues and make it usable.

If this is not feasible, then I’m happy to support in designing some framework to bring the mach-nix approach to other languages. It will probably be much simpler for other languages than python.