Flakes Blocker for Org Users: Github ratelimit and need for flake mirroring

If you have a pin like this, it breaks nix run nixpkgs/<rev>#package (which again is not suitable to put onto unsuspecting new users), and makes it so that if you do nix flake update, you end up with the path in the lockfile, so other users won’t be able to use it unless they’re able to find path in a substituter. It also doesn’t encode the git commit that the nixpkgs came from. So while I agree pinning nixpkgs is useful, in practice, I wish to pin it to a github reference. (And then have nix transparently use a mirror).

The Go community had a similar problem, in that a lot of Go code is stored in Github. When you add a module to your project, the Go CLI needs to read through all the git tags to find the “latest version” of the code. The solution from the Go community is https://proxy.golang.org/

The Go team maintain a central proxy, but you can have your own proxy for your organisation. You set an environment variable, and you can point at your own proxy (or no proxy at all).

This is useful because, in an offline network, I can’t connect to Github and download code, but my organisation can run a proxy and pre-emptively cache modules.

Nix could take the same approach. I think it would be ideal if Nix could make all network requests through a proxy, so that if you have a Flake input, or fetchURL call etc. which isn’t in your store, the proxy might already have the result cached.

The proxy would, ideally, be able to redirect code too. For example, if I have a fork of a repo that I’d like people in my organisation to use instead.

Features like static code analysis of ingested code, or maintaining a list of banned repos / known vulnerabilities could also be added to the proxy.

6 Likes

I think such a feature is important for reproducibility too, so I do think this is a “nix problem”.

The biggest hole in nix reproducibility is currently sources disappearing from the internet. Nope, misremembered that. It’s kernel quirks changing over time - though I am now curious how they got all the inputs. If nix’ downloads happened through a transparent proxy, it would be trivial to archive and preserve those inputs - I’d imagine a large organization would also care about being able to preserve their inputs (in fact I know of organizations which explicitly require this from bazel).

Nix does of course keep those sources in its store as well, but that is much more unwieldy to archive (hydra doesn’t), let alone rebuild from e.g. archive.org if necessary.

2 Likes

Native support from nix for a proxy of this form (where the proxy provides a mirror) would be neat.

This is not to be confused with a standard http/https proxy which has the problem of needing to man-in-the-middle the github.com TLS connection, something I would prefer not to do.

If the solution were to run a proxy and then do NIX_PROXY=https://my.org.proxy.example.com (or equivalent nix.conf), with no other configuration (e.g. CA hackery), that would seem close to ideal in my eyes.

3 Likes

Yes, I wasn’t thinking it would be a HTTP proxy, like you say, too many issues with certificates etc.

I was thinking that the proxy could be an RPC style server. The Nix CLI would identify the presence of the NIX_FETCH_PROXY environment variable, and POST requests to it. The body of the requests would be the JSON of the argument passed to the Nix function.

POST https://my.org.proxy.example.com/nix/flake/input
{
  "ref": "github:NixOS/nixpkgs/nixos-24.11"
}

The RPC server could return a JSON object containing a store path and its hash. I imagine that the RPC server would be built into a binary cache, so it would be fairly easy to add the behaviour to an existing HTTP binary cache.

POST https://my.org.proxy.example.com/nixpkgs/fetchUrl
{
  "urls" = [
    "https://raw.githubusercontent.com/NixOS/nixpkgs/23.11/does-not-exist"
    "https://raw.githubusercontent.com/NixOS/nixpkgs/23.11/.version"
  ],
  "hash" = "sha256-BZqI7r0MNP29yGH5+yW2tjU9OOpOCEvwWKrWCv5CQ0I="
}

There’s some detail to think about, e.g. fetchurl has a “downloadToTemp” argument, which doesn’t make sense as an RPC call, so a subset of keys would be needed. There’s some specific fetchers for SVN etc. but I don’t think that’s a problem - the dumbest Nix fetch proxy implementation could create Nix files on the fly, call the Nix CLI and collect the store path contents from there.

Within the Nix fetch proxy, it would be easy enough to hash the normalised URL and JSON input, and store any results as a blob containing the directory structure. However, I’d also expect a Nix fetch proxy to maintain a database containing metadata, such as the raw inputs, file size, date fetched, last downloaded date, number of downloads, known vulnerabilities etc. so that a policy can be put in place around deletion or warnings.

Another environment variable could be used to store the file location of the Authorization bearer token passed to the API, e.g. NIX_FETCH_PROXY_AUTH=/secrets/nix-token.jwt

For this to work, a standard for the Nix network RPC API would be needed, the Nix CLI would have to be updated to support it, and implementations provided, likely in existing HTTP binary caches like attic.

Access to private repos is more complicated, however, I don’t think there’s a concern about rate limiting in that case. However, if the nix fetch proxy had access, access control features within the Nix fetch proxy could limit further distribution. For example, setting a policy that only some users to fetch from specific repos.

3 Likes

You might be able to get close to this by managing/manipulating the fetcher and tarball sqlite caches.

While that may work for now, that’s relying on an implementation detail that may just break in a couple months. And when such breakage happens, the official response would be “well this was an impl detail that you shouldn’t have relied on”.

Of course - these are mentioned as workarounds. A global NIX_FETCH_PROXY for all/some fetchers would be quite interesting and is likely the right approach. Similar to git’s “insteadOf” as @pwaller mentioned. Considering the fetcher subsystem is it’s own library, this might not be that hard to add.

3 Likes

I can ping the Github teams about this to see if there’s anything they an do on their end, can we put together a short form request?

1 Like

More broadly, mirroring in general is a good idea since servers, even the corporate-controlled ones, go down. In the past mirroring was commonplace until developers started centralizing. It wouldn’t be much for folks to have a local code fork on their server & a mirror on one of the bigger hubs, but flakes does not support mirroring of any kind.

3 Likes

Yes mirroring was part of Franz Pletz’s platform for Steering Committee and I think it is a fantastic idea needed for a long time.

2 Likes