Hello Nixers ! I am hoping someone can help point me in the right direction.
My goal is to evaluate all the outPaths for every package (or close to) in nixpkgs for a given system (i.e. x86-64). I would like to do this because I would like to build a reverse index from outPath → nixpkgs commit.
Ideally, the output can be in a machine friendly format (i.e. JSON).
let
pkgs = import <nixpkgs> {
config.allowBroken = true;
config.allowUnfree = true;
};
lib = import <nixpkgs/lib>;
tryEval = builtins.tryEval;
in
lib.mapAttrs (k: v:
let name = (tryEval v.name or "");
out = (tryEval v.outPath or "");
in {
name = name.value;
out = out.value;
}
) pkgs
This one has some odd failure:
❯ nix eval -f example.nix
error: store path '/nix/store/sqpnqdjc2mpzixcy2kbbgkddf8awhq99-nixpkgs-21.05pre278688.c0e88185200' is not allowed to have references
(use '--show-trace' to show detailed location information)
nix-eval-jobs
Tried nix-eval-jobs but it doesn’t emit outPath.
Was fairly easy to add it via this pull request but it doesn’t work with recursive packages and some other errors.
error: OVMF-CSM has been removed in favor of OVMFFull
{"attr":"__splicedPackages.__splicedPackages.__splicedPackages.__splicedPackages.OVMF-CSM","error":"error: OVMF-CSM has been removed in favor of OVMFFull"}
error: OVMF-secureBoot has been removed in favor of OVMFFull
@matthewbauer had a great blog post about doing almost similar which was very good for inspiration.
You may want to check out this project if you haven’t:
For your specific use case it would be:
Generate the list of commits with git
For each commit:
Check it out
Use nix-env --json and jq to compute all offered attributes by this nixpkgs set (something like this
For each attribute use nix-instantiate to get its out-path
Expect lots of storage to be used since what you want to compute is a lot of information, but you can implement a bisection algorithm or perform sampling and still get good enough results. You can exploit the fact that not all packages change in every commit
The constraints on this project are the 10GB per repo that Github allows, and that we don’t have Github Actions at the moment. Also the search engine (run client-side, stored on Github Pages) has a limit per page of 100MB, so I’ve not been able to add to the search index features like “search by package output path (like /bin/nix as nix-index)” because it would exceed the limit. The data is already in the repository, but not under the website
So, just take that into account if you decide to contribute, I’ll also be happy if you decide to fork or just take some ideas from it. It’s free, libre and open source!
With enough resources (S3 buckets, some machines to run periodic updates, a proper search engine like elasticsearch) one can definitely make something awesome out of this data. But I have not found a way to make paying those expenses sustainable, so that’s why I decided to go with Github free tier
@kamadorueda thanks for the advice.
Do you know of a better way to generate the outPath in a single invocation or by evaluation of a nix script?
(like the one I was attempting in my original message).
It would need just need to evaluate the derivation for the outPath and not actually build it…
I’ve tried to do it, and it’s not possible. There are a few builtins.abort and even a few syntax errors on Nixpkgs that cannot be handled inside a Nix expression. You cannot evaluate it all. On the other hand Nix-env and hydra know very well how to handle those, so that’s the only way, and thus computing what you want require many steps
I think I covered in the blog post why that script doesn’t work (it’s been a while so my memories a bit hazy).
It’s a bit interesting that you need such a crazy script or tools just to evaluate all attributes of the set.
nix-env can do it but you can’t do it natively with the repl.
AFAIK no script, even those used by nix-env and hydra, were able to fully list all derivations. Which is why derivations like darwin.apple_sdks.frameworks.CoreServices have never showed up on search.nixos.org.
I’ve spent a LOT of time on this, and have a script for it!
But first I need to mention:
Even excluding the syntax errors and abort problems
Even excluding the equality checking problem; which is that nix can’t tell if two functions are equal (e.g. if a=x: x then following is false: a == a) and therefore nix can’t tell if two attr sets are deeply equal, and therefore can’t tell if an attr has already been explored/seen earlier in the tree. (e.g. makes a finite tree with back references look like an infinte tree)
Even then the actual attr tree is a non-converging infinite tree. E.g. its like a fractal. As the recursion deepens, some hash values keep changing.
Even if we explore the tree using BFS with a hardcoded max depth (packages can occur at any depth), the amount of ram required explodes (I maxed out a 256Gb machine)
So.
To get around all those, I made a Deno script that abuses the hell out of 40 concurrent nix repl subprocesses, and uses iterative deepening to get around the BFS memory problem.
It has some config vars at the top for:
which nixpkgs hash
a start attr-path (defaults to root)
child attr names to ignore (this list can be empty but having a few things in it makes the search MUCH faster)
number of concurrent subprocesses (dont use too many or there will be nix locking errors)
Finally:
It writes the attr path, child attr names, and some other info to a file, one line per attr-path.
NOTE: I think I used nix v2.11 with it
I think this is the only one that truly prints every attr path. It probably takes 24 hours to get to a depth of 4, so its still not practical for most things, but its progress.
It is really sad that it takes this much effort and this hacky of a solution to literally just iterate over all packages.