If you have a pin like this, it breaks nix run nixpkgs/<rev>#package (which again is not suitable to put onto unsuspecting new users), and makes it so that if you do nix flake update, you end up with the path in the lockfile, so other users won’t be able to use it unless they’re able to find path in a substituter. It also doesn’t encode the git commit that the nixpkgs came from. So while I agree pinning nixpkgs is useful, in practice, I wish to pin it to a github reference. (And then have nix transparently use a mirror).
The Go community had a similar problem, in that a lot of Go code is stored in Github. When you add a module to your project, the Go CLI needs to read through all the git tags to find the “latest version” of the code. The solution from the Go community is https://proxy.golang.org/
The Go team maintain a central proxy, but you can have your own proxy for your organisation. You set an environment variable, and you can point at your own proxy (or no proxy at all).
This is useful because, in an offline network, I can’t connect to Github and download code, but my organisation can run a proxy and pre-emptively cache modules.
Nix could take the same approach. I think it would be ideal if Nix could make all network requests through a proxy, so that if you have a Flake input, or fetchURL call etc. which isn’t in your store, the proxy might already have the result cached.
The proxy would, ideally, be able to redirect code too. For example, if I have a fork of a repo that I’d like people in my organisation to use instead.
Features like static code analysis of ingested code, or maintaining a list of banned repos / known vulnerabilities could also be added to the proxy.
I think such a feature is important for reproducibility too, so I do think this is a “nix problem”.
The biggest hole in nix reproducibility is currently sources disappearing from the internet. Nope, misremembered that. It’s kernel quirks changing over time - though I am now curious how they got all the inputs. If nix’ downloads happened through a transparent proxy, it would be trivial to archive and preserve those inputs - I’d imagine a large organization would also care about being able to preserve their inputs (in fact I know of organizations which explicitly require this from bazel).
Nix does of course keep those sources in its store as well, but that is much more unwieldy to archive (hydra doesn’t), let alone rebuild from e.g. archive.org if necessary.
Native support from nix for a proxy of this form (where the proxy provides a mirror) would be neat.
This is not to be confused with a standard http/https proxy which has the problem of needing to man-in-the-middle the github.com TLS connection, something I would prefer not to do.
If the solution were to run a proxy and then do NIX_PROXY=https://my.org.proxy.example.com (or equivalent nix.conf), with no other configuration (e.g. CA hackery), that would seem close to ideal in my eyes.
Yes, I wasn’t thinking it would be a HTTP proxy, like you say, too many issues with certificates etc.
I was thinking that the proxy could be an RPC style server. The Nix CLI would identify the presence of the NIX_FETCH_PROXY environment variable, and POST requests to it. The body of the requests would be the JSON of the argument passed to the Nix function.
POST https://my.org.proxy.example.com/nix/flake/input
{
"ref": "github:NixOS/nixpkgs/nixos-24.11"
}
The RPC server could return a JSON object containing a store path and its hash. I imagine that the RPC server would be built into a binary cache, so it would be fairly easy to add the behaviour to an existing HTTP binary cache.
POST https://my.org.proxy.example.com/nixpkgs/fetchUrl
{
"urls" = [
"https://raw.githubusercontent.com/NixOS/nixpkgs/23.11/does-not-exist"
"https://raw.githubusercontent.com/NixOS/nixpkgs/23.11/.version"
],
"hash" = "sha256-BZqI7r0MNP29yGH5+yW2tjU9OOpOCEvwWKrWCv5CQ0I="
}
There’s some detail to think about, e.g. fetchurl has a “downloadToTemp” argument, which doesn’t make sense as an RPC call, so a subset of keys would be needed. There’s some specific fetchers for SVN etc. but I don’t think that’s a problem - the dumbest Nix fetch proxy implementation could create Nix files on the fly, call the Nix CLI and collect the store path contents from there.
Within the Nix fetch proxy, it would be easy enough to hash the normalised URL and JSON input, and store any results as a blob containing the directory structure. However, I’d also expect a Nix fetch proxy to maintain a database containing metadata, such as the raw inputs, file size, date fetched, last downloaded date, number of downloads, known vulnerabilities etc. so that a policy can be put in place around deletion or warnings.
Another environment variable could be used to store the file location of the Authorization bearer token passed to the API, e.g. NIX_FETCH_PROXY_AUTH=/secrets/nix-token.jwt
For this to work, a standard for the Nix network RPC API would be needed, the Nix CLI would have to be updated to support it, and implementations provided, likely in existing HTTP binary caches like attic.
Access to private repos is more complicated, however, I don’t think there’s a concern about rate limiting in that case. However, if the nix fetch proxy had access, access control features within the Nix fetch proxy could limit further distribution. For example, setting a policy that only some users to fetch from specific repos.
You might be able to get close to this by managing/manipulating the fetcher and tarball sqlite caches.
While that may work for now, that’s relying on an implementation detail that may just break in a couple months. And when such breakage happens, the official response would be “well this was an impl detail that you shouldn’t have relied on”.
Of course - these are mentioned as workarounds. A global NIX_FETCH_PROXY for all/some fetchers would be quite interesting and is likely the right approach. Similar to git’s “insteadOf” as @pwaller mentioned. Considering the fetcher subsystem is it’s own library, this might not be that hard to add.
I can ping the Github teams about this to see if there’s anything they an do on their end, can we put together a short form request?
More broadly, mirroring in general is a good idea since servers, even the corporate-controlled ones, go down. In the past mirroring was commonplace until developers started centralizing. It wouldn’t be much for folks to have a local code fork on their server & a mirror on one of the bigger hubs, but flakes does not support mirroring of any kind.
Yes mirroring was part of Franz Pletz’s platform for Steering Committee and I think it is a fantastic idea needed for a long time.
AFAIU, under the hood nix resolves github: flake refs to valid git URLs and calls git to fetch the code. Which means this must be the matter of configuring that git with proper insteadOf settings.
This seems to be the proper clean solution that works transparently to nix. Why discuss proxies and special configuration options? What am I missing?
No, github: primarily generates a github tarball URL (getDownloadUrl) which it then fetches and extracts.
It’s only going to use git in some edge cases, because a full clone can easily be 100x slower than just downloading a tarball on something like nixpkgs, and even shallow clones are not very fast…
Speaking of tarballs, actually there’s now a much simpler approach:
It seems it was only introduced into the NixOS infra this year (fastly: implement "Lockable HTTP Tarball Protocol" (Flakes) for channels.nixos.org by emilylange · Pull Request #562 · NixOS/infra · GitHub), so that’s why no one suggested it yet.
Thanks for the clarification.
Very surprising and really frustrating. Hard-coded optimization for a special case (a proprietary API!) w/o an option to hook in an arbitrary command as fetcher or something similar—an obvious thing a sysadmin would expect.
OK, how about having a locally hosted repo, downstream of sorts, which depends on my mirror of NixOS/nix and applies some patches/overlays to libfetchers? Something similar to sudo systemctl edit some.service on imperative distros. How would I approach that?
Doesn’t make sense. You’re talking about tarballs. How does that apply to github: flakerefs?
Don’t use github: flakerefs for nixpkgs inputs and this becomes a non-issue, basically. This largely only works for nixpkgs, since many flakes are distributed exclusively as GitHub repos, but nixpkgs is probably the most important one, and the one causing GitHub ratelimits in the context of this issue, so it helps to simply shift your sources to NixOS foundation infrastructure.
You’d have to patch nix, which you can just do. Are you asking how to apply patches to stuff deployed with NixOS (simple, pkgs.nix.overrideAttrs (_: { patches = [ <path-to-patch> ]; }), but where to put that depends on what and how you’re deploying) or just how to patch software in general?
Either way, while patching nix downstream is of course an option, a generic, stable solution that doesn’t require downstream maintenance - which downstream patches to nix would - would be nice, which is why proxies and special configuration options are being discussed.
Well, having to manually control transitive github: inputs and explicitly setting follows to my mirrors in my flakes seemed like an anti-pattern. Initially my idea was having a downstream repo depending on my nix-community/nix-installers mirror to customize it (extra substituters in nix.conf etc.) including insteadOf or whatever makes github: refs just work. Now I realize that patching nix can be done in that downstream repo as well, w/o having another downstream repo just for nix. However, patching indeed looks like too much hassle compared to explicit follows.
I know how to apply patches in general but I’m still new to nix. The ecosystem seems to be based on conventions and patterns which I may be not fully aware of (I guess that’s what low-policy means). I have used some overlays in my machine’s nix-darwin flake configuration though.
The point is that we want this to work for the ecosystem that already exists. The problem to be fixed is ‘if a user follows instructions they find, things should work as expected’.
As soon as they do something like nix build nixpkgs#foo or nix build nixpkgs/nixios-unstable#foo or nix build nixpkgs/ab123ef#foo this breaks, no? Same with using a flake you didn’t write. I think there should be a route to make this work, ideally without an intercepting SSL proxy or changing any existing sources.
It’s less of an anti-pattern and more of an ugly necessity caused by an inherent flaw in the design of flake inputs. They lack support for semver or any other solution to the dependency creep problem that the software engineering community has known needs a solution for over a decade now. npm started figuring this out in, what, 2012?
follows is a crutch, but it’s all we have - in fact, it’s arguable that not specifying follows for every recursive nixpkgs instance is the anti-pattern.
IMHO this and other problems mean flakes themselves are still so inherently flawed (and, yes, experimental) that using them at all should be considered an anti-pattern. More pragmatically, until this is solved, I suggest using something like flint to ensure you deduplicate all your inputs.
Fair enough! If you do ever find yourself wanting to change the nix package used by one of the various module systems (be that nix-darwin, home-manager, NixOS or others), be aware that while overlays work, it’s often cleaner to change a .package attribute in a module.
Overlays change the meaning of the package in the package tree, no matter where it is used. It’s a rather drastic change that can have unintended side effects and has a much higher evaluation impact than changing a package option.
In this case the choice depends on what you end up using that nix package for. Either way, using .follows seems far more appropriate IMO, especially given that I consider that best practice even outside of this use case.
For the record, I don’t disagree with this at all. I think there should be escape hatches for this, if only to make source archiving easier. Dodging github’s rate limits is a side benefit of making source archival easier; IMO we should be able to reproduce builds purely with something like archive.org.
That said, I believe we should also change the ecosystem as much as feasible. It’s silly that we’re just relying on github as an omnipresent source host. We’re giving that external entity far too much power here, these ratelimits are just a manifestation of that wider problem.
At the very least the host of the tarball of the nixpkgs input should be controlled by the NixOS foundation - that has other benefits discussed elsewhere, to boot.
No, it doesn’t. Or arguably it has been broken all along. The default flake registry is another one of these pesky design flaws.
When using NixOS or home-manager, at least, your nixpkgs registry entry will point at whatever your system was derived from. This may or may not point at the tarball already. Using the CLI like that is inherently impure and you can’t make any predictions about whether it “breaks” or not on a given system. Consider it a convenience feature.
Sure, and hence I suggest we start recommending the tarball to anyone and everyone who will listen so that in the future all flakes you didn’t write also avoid github: flakerefs for nixpkgs.
I suspect that’s why @waffle8946 brings it up too - there are now at least two more people aware of this feature and its importance for people who run into github rate limits.