Evaluate all outPaths for every package in nixpkgs

Hello Nixers ! I am hoping someone can help point me in the right direction.

My goal is to evaluate all the outPaths for every package (or close to) in nixpkgs for a given system (i.e. x86-64). I would like to do this because I would like to build a reverse index from outPath → nixpkgs commit.

Ideally, the output can be in a machine friendly format (i.e. JSON).

Here are the attempts I’ve tried:

nix-env

The following works but is not machine readable

❯ nix-env -qaP --out-path -f default.nix > output.log

❯ cat output.log | head
all-cabal-hashes                                                         01a23b49c333c95167338433cd375e24fc60d66d.tar.gz                                                /nix/store/k17xwfdzsdmbvjlj9yqfc6mhddhjkqga-01a23b49c333c95167338433cd375e24fc60d66d.tar.gz
zeroadPackages.zeroad-unwrapped                                          0ad-0.0.25b                                                                                    /nix/store/6idaxvlm241fsbrn5irf0ii3k83z1fkm-0ad-0.0.25b
zeroadPackages.zeroad-data                                               0ad-data-0.0.25b 

Various Nix expressions

let
  pkgs = import <nixpkgs> {
    config.allowBroken = true;
    config.allowUnfree = true;
  };
  lib = import <nixpkgs/lib>;
  tryEval = builtins.tryEval;
in
  lib.mapAttrs (k: v:
    let name = (tryEval v.name or "");
    out = (tryEval v.outPath or "");
    in {
      name = name.value;
      out  = out.value;
    }
  ) pkgs

This one has some odd failure:

❯ nix eval -f example.nix
error: store path '/nix/store/sqpnqdjc2mpzixcy2kbbgkddf8awhq99-nixpkgs-21.05pre278688.c0e88185200' is not allowed to have references
(use '--show-trace' to show detailed location information)

nix-eval-jobs

Tried nix-eval-jobs but it doesn’t emit outPath.
Was fairly easy to add it via this pull request but it doesn’t work with recursive packages and some other errors.

error: OVMF-CSM has been removed in favor of OVMFFull
{"attr":"__splicedPackages.__splicedPackages.__splicedPackages.__splicedPackages.OVMF-CSM","error":"error: OVMF-CSM has been removed in favor of OVMFFull"}
error: OVMF-secureBoot has been removed in favor of OVMFFull

@matthewbauer had a great blog post about doing almost similar which was very good for inspiration.

PS: If someone wants to help me with this project, please reach out :slight_smile: It’s in the same vein as Searching and installing old versions of Nix packages – Marcelo Lazaroni – Developing for the Interwebs

3 Likes

nix build .#cowsay

nix path-info -rS ./result

is fun!

nix-instantiate --eval --json --strict -A cowsay

You may want to check out this project if you haven’t:

For your specific use case it would be:

  • Generate the list of commits with git
  • For each commit:
    • Check it out
    • Use nix-env --json and jq to compute all offered attributes by this nixpkgs set (something like this
      • For each attribute use nix-instantiate to get its out-path

Expect lots of storage to be used since what you want to compute is a lot of information, but you can implement a bisection algorithm or perform sampling and still get good enough results. You can exploit the fact that not all packages change in every commit

2 Likes

Wow what an awesome search engine.

Basically what I want to add @kamadorueda is adding the outPath so you can also search by that instead of the name or version.

Is this something I can contribute to your project?
happy to discuss this over email or DM

Yeah, sounds awesome, definitely

The constraints on this project are the 10GB per repo that Github allows, and that we don’t have Github Actions at the moment. Also the search engine (run client-side, stored on Github Pages) has a limit per page of 100MB, so I’ve not been able to add to the search index features like “search by package output path (like /bin/nix as nix-index)” because it would exceed the limit. The data is already in the repository, but not under the website

So, just take that into account if you decide to contribute, I’ll also be happy if you decide to fork or just take some ideas from it. It’s free, libre and open source!

With enough resources (S3 buckets, some machines to run periodic updates, a proper search engine like elasticsearch) one can definitely make something awesome out of this data. But I have not found a way to make paying those expenses sustainable, so that’s why I decided to go with Github free tier

I know how to get output path for an individual attribute.

I was seeking help on how to generate it for all attribites in nixpkgs

1 Like

@kamadorueda thanks for the advice.
Do you know of a better way to generate the outPath in a single invocation or by evaluation of a nix script?
(like the one I was attempting in my original message).

It would need just need to evaluate the derivation for the outPath and not actually build it…

I’ve tried to do it, and it’s not possible. There are a few builtins.abort and even a few syntax errors on Nixpkgs that cannot be handled inside a Nix expression. You cannot evaluate it all. On the other hand Nix-env and hydra know very well how to handle those, so that’s the only way, and thus computing what you want require many steps

Captured my approach so far here: https://fzakaria.com/2022/01/05/computing-all-output-paths-for-every-attribute-in-nixpkgs.html

It’s far from perfect and in fact very slow lol

2 Likes

Hard problems are the best ones!

Remember nothing is slow if you can distribute it over 1000 machines.

:slight_smile:

Ok, I had a brainwave.

Would a postbuildhook help you?

https://nixos.org/manual/nix/stable/advanced-topics/post-build-hook.html

I don’t want to build everything though.

To calculate output paths is merely an evaluation…

you want your cake and eat it!! :slight_smile: . Hard problem… hope you get a solution.

I have just added support for recurseForDerivations to nix-eval-jobs in Add support for recurseForDerivations by adisbladis · Pull Request #62 · nix-community/nix-eval-jobs · GitHub which makes it actually possible to eval nixpkgs.

Example expression:

import <nixpkgs> {
  allowAliases = false;
}

Edit:
@fzakaria If it wasn’t clear nix-eval-jobs now also contains output paths.

I somehow missed this thread earlier. The goal here sounds pretty similar to what ofborg does to get the number of rebuilds:

We also discussed this a bit here:

How to find all reverse transitive dependencies of a package?

2 Likes

I think I covered in the blog post why that script doesn’t work (it’s been a while so my memories a bit hazy).

It’s a bit interesting that you need such a crazy script or tools just to evaluate all attributes of the set.
nix-env can do it but you can’t do it natively with the repl.

1 Like

Something like this could be very useful for binary cache mirror :thinking:

AFAIK no script, even those used by nix-env and hydra, were able to fully list all derivations. Which is why derivations like darwin.apple_sdks.frameworks.CoreServices have never showed up on search.nixos.org.

I’ve spent a LOT of time on this, and have a script for it!

But first I need to mention:

  • Even excluding the syntax errors and abort problems
  • Even excluding the equality checking problem; which is that nix can’t tell if two functions are equal (e.g. if a=x: x then following is false: a == a) and therefore nix can’t tell if two attr sets are deeply equal, and therefore can’t tell if an attr has already been explored/seen earlier in the tree. (e.g. makes a finite tree with back references look like an infinte tree)
  • Even then the actual attr tree is a non-converging infinite tree. E.g. its like a fractal. As the recursion deepens, some hash values keep changing.
  • Even if we explore the tree using BFS with a hardcoded max depth (packages can occur at any depth), the amount of ram required explodes (I maxed out a 256Gb machine)

So.

To get around all those, I made a Deno script that abuses the hell out of 40 concurrent nix repl subprocesses, and uses iterative deepening to get around the BFS memory problem.

It has some config vars at the top for:

  • which nixpkgs hash
  • a start attr-path (defaults to root)
  • child attr names to ignore (this list can be empty but having a few things in it makes the search MUCH faster)
  • number of concurrent subprocesses (dont use too many or there will be nix locking errors)

Finally:

  • It writes the attr path, child attr names, and some other info to a file, one line per attr-path.

NOTE: I think I used nix v2.11 with it

I think this is the only one that truly prints every attr path. It probably takes 24 hours to get to a depth of 4, so its still not practical for most things, but its progress.

It is really sad that it takes this much effort and this hacky of a solution to literally just iterate over all packages.

2 Likes