Pre-RFC Discussion: Making Flakes Discoverable

jeff-hykin · April 25, 2022, 1:11am

One thing adding to Nixpkgs's current development workflow is not sustainable, is the discoverability of out-of-tree packages. Without that, federated system stays in the realm of the impractical.

I’m making this post because I want to do something about it, and I’ve been thinking about this problem since I first posted about the version-discoverability issue.

I think we agree Flakes are an the opportunity to solve the discoverability problem in a forward-compatible way. But to my knowledge, description is the only non-functional piece of data in a nix-flake so unless it contains a JSON/YAML/TOML string, which is not going to hold anywhere near enough data for a search system to index and deliver to the average user.

So, before I work on an RFC, have there been any prior discussion on storing static indexable information in flakes?

I’m aware of nix flake metadata, but I mean data from the flake.nix that can be parsed without the need for nix evaluation. This would be information such as a package name, keywords, descriptions, homepage links, characteristics, etc; anything a discovery system would want know without evaluating the entire nixpkg tree.

ryantm · April 25, 2022, 2:46am

Related to [RFC 0123] Flake names by schuelermine · Pull Request #123 · NixOS/rfcs · GitHub

jeff-hykin · April 28, 2022, 2:43pm

Quick update:

It seems the flake.lock would be a perfect fit for saving/generating indexable data.
(milahu’s idea from that linked thread above^)

If every package in nixpkgs, and other repos, had a corresponding flake.lock entry, then an efficient scraper could trivially put a database together, watch those files across a git timeline, and build an index that would be wicked fast to search.

jeff-hykin · May 16, 2022, 3:16pm

I’ve been writing the RFC, and building my own package index

Its boiled down to one-and-a-half problems:

We’ve got Schrodinger’s metadata; the name/version/dependencies of a package don’t exist until we look. And when I look and when you look, we can be seeing different name/version/dependencies because it depends on the operating system/cuda-situation/build-choices etc.
(this isn’t a new problem, but it plays a role in the RFC)
(the half-problem that complicates things:) We think of packages as inputs and outputs, but in practice we don’t have one flake per package. A flake can have a lot of nested packages.

How can we deal with this flexibility?

Some packages have a LOT of options. Computing every combination would be problematic, but also users probably don’t want to see every combination when they’re searching. They need to see the key aspects, and can mix-and-match those aspects on their own.
I think search-functionality is a way for authors/maintainers to inform the world about a package. So I think it makes sense that it’ll be up to the package maintainer to craft input-examples that represent the available/likely options. Those inputs can then be used to make lock files that a search system can index.

The most straightforward system could be something similar to this:

Have an examples.nix file like
```
{
  linux_x86 = { };     # set #1 of flake-like inputs
  darwin_x86 = { };    # set #2 of flake-like inputs
  cudaAndFfmpeg = { }; # set #3 of flake-like inputs
}
```
A basic examples.nix can be auto generated for every package. And then maintainers can edit it to show more options. The default file could have an option for each major operating system.

Then a script could be used to generate a lock file for each of those^ options.
For example:

examples.nix
flake.nix
lockdex/
     linux_x86.lock
     darwin_x86.lock
     cudaAndFfmpeg.lock

The 1/2-problem

The above system is nice, but falls apart because the system is not one-flake-per-package. Packages are completely unstandardized as to how they are built.

The simple option of moving the examples.nix data to be within a package definition also won’t work because it causes the Schrodinger metadata problem all over again (examples can change with different package inputs).
So a more external solution is needed.

Changing the examples.nix to mention which package it is making examples for is a start:

{
  # Names below picked for easy-explaining, not actual proposal names
  
  whichPackage = [ "parentAttribute1" "somePackage1" ];
  # ^ e.g.  flakeOutput.parentAttribute1.somePackage1

  jsonGenerator =  { flakeObject, exampleInput } :
    (flakeObject.outputs exampleInput).parentAttribute1.somePackage1;
  # packages are often just a default.nix containing a function, but not always
  # ^this generator adds the flexibility to handle overrides/overlays and other edge-cases

  examples = {
    linux_x86 = { };     # generator input #1 
    darwin_x86 = { };    # generator input #2
    cudaAndFfmpeg = { }; # generator input #3
  };
}

This is no longer really connected to flake-locks files because of the required flexibility. But thats okay, we can have a helper function recursively scan for all the example.nix files. For each package, the examples could be fed into the generator, with the output being converted to json.
```
 [
   {  "attr": [ "parentAttribute1", "somePackage1" ], examples: { "linux_x86" : {}, "darwin_x86": {} } },
   {  "attr": [ "parentAttribute2", "somePackage2" ], examples: { "linux_x86" : {}, "darwin_x86": {} } }
 ]
```

Not a solution I’m very happy with, but one that would address making packages discoverable.