When should `meta` attributes be computed vs. raw data?

rhendric · April 6, 2024, 12:11pm

There are a small number of meta attributes that are not raw data but computed from various other parts of a derivation, but in fairly bullet-proof ways. It’s currently possible, for example, to collect a list of all package attribute paths in Nixpkgs and toJSON all of their metas, without error.

There is currently a proposal, implemented and then backed out after it broke evaluation, to add a meta.repository attribute that is computed based on src. src is not a bullet-proof source of data; many packages have src attributes that can fail. I think this is a bad idea, even in its current proposed implementation (which does not seem to break evaluation but would break a toJSON of all packages’ metas, among other potential use cases I’m not thinking of). My position is that meta attributes should be bullet-proof, either because they are raw data or because they are computed in ways that we can be certain won’t fail.

The reason that the proposer wants meta.repository to depend on src is so that it can be automatically populated when not specified. The logic is trivial; it simply uses src.meta.homepage in that case. I think a more robust design would be for consumers of meta.repository (the central example being search.nixos.org) to reference src.meta.homepage themselves, if they want to (and can handle the cases when src blows up).

The proposal is being discussed in Feature request: new meta attr for source code repository · Issue #293838 · NixOS/nixpkgs · GitHub; I’ve requested that the current implementation not be merged until more of the community has had a chance to look at this, as for the last several weeks the only real participants have been a small number of people without much experience in making Nixpkgs-wide architecture decisions. Please weigh in.

CCing @a-n-n-a-l-e-e; I can’t find lolbinarycat’s handle here, if they have one.

AndersonTorres · April 6, 2024, 8:57pm

As I said before, I prefer no computation inside Nix, especially when the thing is amenable to automation.

I can’t explain why, but I do not like to rely on Nix outside its builder description field.

About that particular case, I believe it can be done in an automated treewide:

fetchFrom.* can include meta.repository = [ . . . ];
fetch{git,hg,darcs} etc. too
treewide meta.repository = src.meta.repository; for cases like above
leftovers in a case-by-case basis

No serious argument besides “oh that’s too much commits” (followed by groundless urgency pressures from novices) were posed.

RaitoBezarius · April 7, 2024, 8:56am

Did you test getting all meta attributes of all packages for all architectures or are you just speculating?

Meta attributes already depend from src for changelogs.

Given the techniques we already use to evaluate the set of packages, I don’t really see the validity of the objections. toJSON can always work by repairing the string coercion of src in addition.

AndersonTorres · April 7, 2024, 6:15pm

This is valid for absolutely all packages? Or only for a small bunch of them, always locally?
Is there any piece of code in check-meta generating changelog links using doubtbul heuristics based in undocumented fringes of Nixpkgs code?

Or are you just especulating?

RaitoBezarius · April 7, 2024, 6:16pm

I didn’t say “all”, :).

AndersonTorres · April 7, 2024, 6:22pm

Any of them in check-meta?
Or all of them in their respective default.nix files?

Because there is a huge distance between changelog = . . . /${src.rev}/. . . locally and repo = src.meta.homepage globally.

(Or maybe you implicitly agreed with the second point…)

rhendric · April 8, 2024, 2:35am

AndersonTorres’s post is my question too. If an attribute like meta.changelog has a default value that depends on src, that’s my argument entirely sunk (but, having searched Nixpkgs for such a thing, I don’t think this is the case). But if individual packages define their meta.changelog by referencing their own src, that’s just a redundancy factored out—those packages can make sure that their own src never fails, whereas a default definition of meta.changelog could not (just like the proposed meta.repository can’t).

I guess ‘computed’ vs. ‘raw’ isn’t quite the right distinction; I’m not concerned with packages that compute their own meta attributes. My actual concern is fallibility, and as a proxy for that, attributes that are computed by code external to the package that makes assumptions that the package may not meet.

Here’s my test procedure: first run nix-env -qaP --no-name -f ./. --option system x86_64-linux > universe.txt from a checkout of Nixpkgs, then nix-instantiate --eval test.nix, where test.nix contains this script:

let
  pkgs = import ./. { system = "x86_64-linux"; };
  inherit (pkgs) lib;
  names = lib.attrNames pkgs;
  f = builtins.readFile ./universe.txt;
  paths = lib.init (lib.splitString "\n" f);
in
builtins.toJSON (map (p: (lib.getAttrFromPath (lib.splitString "." p) pkgs).meta) paths)

This works, but it doesn’t appear to work on alternate systems (on many of the ones I’ve tried, the first step fails; on aarch64-linux, the first step succeeds but there are individual packages that abort the second step). I’m a little concerned about that, but (A) it’s possible I’m screwing things up somehow and (B) I don’t think that’s license to make things even worse—you could still do quite useful things with such a script even if it only ever ran on x86_64-linux.

I don’t understand what this means; can you clarify?

jonringer · April 8, 2024, 2:39am

As I said before, I prefer no computation inside Nix, especially when the thing is amenable to automation.

I still find changelog links and similar “version or other metadata related” metadata to be useful, I think moderate computation is fine as long it’s likely to reflect upstream conventions and not be frequently prone to update/evaluation errors.

RaitoBezarius · April 8, 2024, 7:34am

I guess this is here where we disagree: what does it mean “package can make sure their own src never fails” ? In my experience of nixpkgs committer, this cannot be guaranteed.

More generally, anything that is not systematically checked by CI cannot be guaranteed.

As you made the experiment below:

You are not screwing up things, this is working as intended and what I hinted to.

Evaluation of nixpkgs doesn’t take place like this in Hydra because we know that it can fail, to perform this properly: ofborg/ofborg/src/outpaths.nix at 3fd6b66cd36ef2ec7adbb23370007604f02ebcfb · NixOS/ofborg · GitHub here’s a way to do this.

if src is a structured meta field that is not a string, src = src // { __toString = src.someStringOfSrc; } will give you a way to ensure that toString src == src.someStringOfSrc; and this will be used during toJSON.

(Disclaimer: I am a Tvix developer and I had my nose into toJSON few days ago.)

rhendric · April 8, 2024, 3:29pm

The sorts of packages that cause problems for this scheme have explicitly conditional srcs—they test for system configurations or other conditions and abort if they are not met, or they use requireFile, or something similar. It is easy to have a policy for package authors not to reference such srcs from meta—and yes, we can and should guarantee it by running an appropriate version of the toJSON test on meta in CI.

Right, yeah, an outpaths.nix-style generation of the universe is actually the sort of thing I meant. The nixpkgs-update infrastructure also has a similar way of finding all packages with update scripts. I’m aware it’s a messy process.

Again, that doesn’t mean we should make things worse by adding defaulting logic to meta that breaks more things. We shouldn’t give in to broken window syndrome.

But how is that relevant to meta.changelog values that reference src.rev, or the proposed meta.repository referencing src.meta.homepage? The point is to ensure that meta is toJSONable, not something that has src itself as an attribute.

RaitoBezarius · April 8, 2024, 3:54pm

It is unclear to me what this policy bring to us, especially given it requires one step before being enforced (policy without CI check is useless here): someone to line up and do the work for CI; are you willing to do so? I am very happy to see improvements in the precision of our meta, I did a lot of work as a tvix developer to fix many issues in nixpkgs, not only for meta but also spec violation of base64 encodings of SRI hashes, and I continue to do so.

BTW, FYI, abort is imprecise here. abort cannot be caught by Nix, only throw can be caught. Those two things are very different, thus. abort for fatal errors, throw for local errors.

It’s not a messy process IMHO.

I don’t think it’s a broken window syndrome, evaluation of a package requires some harnesses, that’s a given as we are operating.

Short of introducing a Result<Evaluation, Error> type in Nix, it seems that all attempts to improve this by policy are artificial limitations we impose ourselves for benefits I cannot grasp?

meta is always toJSONable as long as everything is coercible to strings in the leaves, I just provided a way to make any attribute set coercible to strings. But maybe, I misunderstood your point on meta.repository.

rhendric · April 8, 2024, 4:35pm

I’m sorry I’m failing to communicate the thing I think is important here.

Yes, evaluating an entire package requires a lot of context, and that’s unlikely to change.

Evaluating just meta should not. Currently does not. Nothing needs to change here except for, ideally, adding a test in CI to make sure that it doesn’t break going forward. If adding that test is the price of having this idea taken seriously, I’ll pay it.

As long as meta continues not to be fallible, automation infrastructure can build offline databases of packages and their meta information, which can be much faster to access and search than reevaluating all of Nixpkgs. That’s the benefit. I care about improving our automation infrastructure, so this isn’t questionable to me but a real approach I’ve experimented with for various prototypes. The currently fallible part of this sort of thing is accurately generating the list of package attribute paths that are accessible on a given system, which is a separate thing I’d like to make better but not what I’m trying to preserve right now.

Unlike the rest of a package definition, there shouldn’t really be a need for meta to depend on any amount of fiddly context. It’s metadata about the package. It should be made of data literals, or data computed from things we know are infallible. We don’t need one meta.changelog on one system and a different one on another. Leaving open the possibility for that is silly. And it means trading away the utility of meta as data that can be loaded into aforementioned offline databases.

The proposed meta.repository attribute is reasonable as a meta attribute that follows those rules. The proposed default implementation of meta.repository as something that accesses src would not do so by converting src into a string, but by accessing src.meta.homepage. It is reasonable for packages to use different srcs on different systems, which means it’s reasonable for src to be system-dependent and fallible. That means abandoning meta as system-independent, infallible data.

Does that adequately detail the benefit I’m trying to protect?

RaitoBezarius · April 8, 2024, 4:42pm

OK, that’s a goal I can understand.

OK, automation infrastructure does not work like this AFAIK, it knows that evaluation can fail and perform batch evaluation via things like nix-eval-jobs, so again, if that’s the reason for why you want to do the above, I fail to comprehend.

Do you have examples of automation infrastructure depending on this critically, e.g. cannot afford to do batch evaluations?

I can understand this philosophy.

So you say “system-independent”, but are you aware that the vast majority of meta is not system-independent, for reasons beyond meta.changelog ? I think this comes again to a problem of being sufficiently familiar with stdenv (or doing enough certain sorts of nixpkgs work?).

And I fail to see how you will make folks move from the system dependance of meta.

I understand the benefit you are aiming for. I cannot see this goal reachable on the short term without a large re-engineering of nixpkgs or splitting meta into two fields.

Or… just accepting that it’s fallible data and recover from this.

(I say this as someone who implemented incremental ingestion of evaluation results: nix-security-tracker/src/website/shared/evaluation.py at 5a72910c7a65a159d964c593e433ad3c10f5fa22 · Nix-Security-WG/nix-security-tracker · GitHub and logged Evaluation memoization service · Issue #355 · NixOS/infra · GitHub in case my background regarding this problem space is unclear.)

rhendric · April 8, 2024, 5:59pm

Well, here’s what nixos-search does:

github.com

NixOS/nixos-search/blob/0f14890c793f0ee4cf800231d20e4c64e52be471/flake-info/src/commands/nixpkgs_info.rs#L12-L24


      
          let mut command = Command::new("nix-env");
          command.add_args(&[
              "--json",
              "-f",
              "<nixpkgs>",
              "-I",
              format!("nixpkgs={}", nixpkgs.to_flake_ref()).as_str(),
              "--arg",
              "config",
              "import <nixpkgs/pkgs/top-level/packages-config.nix>",
              "-qa",
              "--meta",
          ]);

This grabs a universe of packages from Nixpkgs and converts their metas into JSON, along with some other derivation information.

Now, the reason I used my more complicated test script in my previous message instead of this simpler invocation of nix-env with --meta is because nix-env does exactly the thing you’re talking about: it recovers from evaluation failures. But it doesn’t do this well; when there are evaluation errors inside meta, parts of meta get dropped on the floor. Here is the result of nix-env --json -f ./. --arg config 'import pkgs/top-level/packages-config.nix' -qa --meta -A cplex from the meta.repository branch with the actual meta.repository logic patched out:

{
  "cplex": {
    "meta": {
      "available": false,
      "broken": false,
      "description": "Optimization solver for mathematical programming",
      "homepage": "https://www.ibm.com/be-en/marketplace/ibm-ilog-cplex",
      "insecure": false,
      "license": {
        "deprecated": false,
        "free": false,
        "fullName": "Unfree",
        "redistributable": false,
        "shortName": "unfree"
      },
      "maintainers": [
        {
          "email": "bernard.fortz@gmail.com",
          "github": "bfortz",
          "githubId": 16426882,
          "name": "Bernard Fortz"
        }
      ],
      "name": "cplex-128",
      "outputsToInstall": [
        "out"
      ],
      "platforms": [
        "x86_64-linux"
      ],
      "position": ".../pkgs/applications/science/math/cplex/default.nix:81",
      "sourceProvenance": [
        {
          "isSource": false,
          "shortName": "binaryNativeCode"
        }
      ],
      "unfree": true,
      "unsupported": false
    },
    "name": "cplex-128",
    "outputName": "out",
    "outputs": {
      "out": null
    },
    "pname": "cplex",
    "system": "x86_64-linux",
    "version": "128"
  }
}

And here it is with the new meta.repository attribute (cplex is one of the packages for which src evaluation is an issue):

{
  "cplex": {
    "meta": {
      "available": false,
      "broken": false,
      "description": "Optimization solver for mathematical programming",
      "homepage": "https://www.ibm.com/be-en/marketplace/ibm-ilog-cplex",
      "insecure": false,
      "license": {
        "deprecated": false,
        "free": false,
        "fullName": "Unfree",
        "redistributable": false,
        "shortName": "unfree"
      },
      "maintainers": [
        {
          "email": "bernard.fortz@gmail.com",
          "github": "bfortz",
          "githubId": 16426882,
          "name": "Bernard Fortz"
        }
      ],
      "name": "cplex-128",
      "outputsToInstall": [
        "out"
      ],
      "platforms": [
        "x86_64-linux"
      ],
      "position": ".../pkgs/applications/science/math/cplex/default.nix:81"
    },
    "name": "cplex-128",
    "outputName": "out",
    "outputs": {
      "out": null
    },
    "pname": "cplex",
    "system": "x86_64-linux",
    "version": "128"
  }
}

Notice how meta.sourceProvenance, meta.unfree, and meta.unsupported are missing! A different recover-from-failure algorithm might have dropped the entire package! Both of these are bad!

I’ll freely admit I’m an inexperienced dummy. Please lend me the benefit of your superior experience. From the list of all of the meta attributes that check-meta.nix permits, these three stand out to me as legitimately system-dependent.

available
broken
unsupported

But they’re simple booleans, and the logic that implements them seems to be pretty good at returning a default true or false as appropriate if anything is screwy.

Here are the remainder:

badPlatforms
branch
changelog
description
downloadPage
executables
homepage
hydraPlatforms
insecure
isBuildPythonPackage
isFcitxEngine
isGutenprint
isHydraChannel
isIbusEngine
knownVulnerabilities
license
longDescription
mainProgram
maintainers
maxSilent
name
outputsToInstall
pkgConfigModules
platforms
position
priority
schedulingPriority
sourceProvenance
tag
timeout
unfree
version

Which of these are system-dependent? Is it really a ‘vast majority’?

RaitoBezarius · April 8, 2024, 6:16pm

rhendric:

Now, the reason I used my more complicated test script in my previous message instead of this simpler invocation of nix-env with --meta is because nix-env does exactly the thing you’re talking about: it recovers from evaluation failures. But it doesn’t do this well; when there are evaluation errors inside meta, parts of meta get dropped on the floor. Here is the result of nix-env --json -f ./. --arg config 'import pkgs/top-level/packages-config.nix' -qa --meta -A cplex from the meta.repository branch with the actual meta.repository logic patched out:

Just FYI.

This is literally implementing a builtins.tryEval logic, you don’t need nix-env to do that and that’s an antipattern actually that meta is special-cased IMHO with a meta checker.

I asked:

Do you have examples of automation infrastructure depending on this critically, e.g. cannot afford to do batch evaluations?

nix-env is literally doing batch evaluation as you showed, thus, error recovery is built in from bad evaluations.

People who does this sort of work use nix-eval-jobs which is just an implementation focused on doing that and much easier to grok.

The fact that nix-env has failure modes which are dumb is unrelated to the problem space at hand, we should fix nix-env, not prevent those usecases !

rhendric:

I’ll freely admit I’m an inexperienced dummy. Please lend me the benefit of your superior experience. From the list of all of the meta attributes that check-meta.nix permits, these three stand out to me as legitimately system-dependent.

Apologies, my words were not precise enough, I meant that the vast majority of usage of meta involves system dependencies, e.g. broken, available and unsupported.

hydraPlatforms can, knownVulnerabilities could (I don’t think we had this in the past though so that’s an extreme case), position could (split architectures definition files), sourceProvenance could: not all platforms do possess the source or can afford to build the source, unfree hence can be influenced by that.

I do think that’s already a lot in terms of usage, and I do provided rationale for why at least 5 more could be used in that context.

Bottom line, the way I think things is in layers and in fundamentals.

Fundamentally, you can evaluate all meta for search, for any automation, even if that meta is broken, you can just completely skip the attribute if you implement the evaluation logic right in the interpreter, let’s not work around interpreter outright limitations by imposing limitations on nixpkgs. Nix should serve Nixpkgs is my vision of things. We already did a lot of workarounds in Nixpkgs for Nix bugs (disallowedReferences and what not comes to mind), let’s not build on the top of ideas like those further IMHO.

7c6f434c · April 9, 2024, 3:58pm

A treewide addition for simple cases does not, per se, guarantee the coverage going forward.

(ETA PS: I do generally want the least magic meta possible, but on the objective level there is a serious argument against the position I prefer)

AndersonTorres · April 9, 2024, 5:22pm

Whae is a “simple case” here? Updating the fetchers is not what I would call “trivial”.

7c6f434c · April 9, 2024, 6:18pm

If we say that fetchFromGitHub in src implies meta.repository, and apply this treewide, we also need something for the packages added later.

AndersonTorres · April 9, 2024, 7:13pm

Nope. It implies src.meta.repository.
Using it in meta should be explicit.

It can be enforced for future additions. That’s just one line.

The current piece of code trying to be imposed already pressuposes its obligatoriness “in cases of failure”.

binarycat · April 30, 2024, 3:17am

Nope. It implies src.meta.repository.

author of the meta.repository PR here. this is only the case in your hypothetical ideal implementation, not in the one that already exists.

the implementation that already exists (that i wrote) use src.meta.homepage, a field that fetchFromGitHub and friends already set.

there is no need to change the definitions of any fetchers.