Nixlang: how do you find all uses of a declaration?

amjoseph · March 26, 2022, 10:12pm

Hi, this is the same question I asked in this PR comment.

In nixlang, how exactly does one go about figuring out all the places in a codebase that reference a given identifier?

In the other languages I know well – Haskell, ML, Rust, Coq, C/C++, and Java – I’m accustomed to simply deleting a declaration and then recompiling to figure out “what uses this declaration?” The compiler will tell me all the places where it is used by producing a compilation error for each of those places.

How do I do this in nixlang?

I’m starting to wonder if it is, in fact, impossible. I used to consider myself an expert in Javascript (15 years ago when it was a much simpler language) and even wrote a JIT for it. I don’t know about today, but in Javascript as it existed back then it was impossible.

Despite having a PL background I’ve also realized that I don’t know of the technical name for this language property. My best guess would be “statically resolvable identifiers”.

If nixlang does in fact lack this property I can’t help but suspect that it will someday make maintenance of large-scale codebases extremely difficult. Perhaps that day has already arrived. Clearly a lightweight, dynamically-typed, simple language like nixlang is the right choice for writing package expressions, since these are (ultimately) “bash metaprograms” and bash is about as not-statically-anything as it gets. Given the overlap between the Haskell and nixlang communities, has anybody tried rewriting all of the inter-package stuff (i.e. nixpkgs/lib/ and nixpkgs/pkgs/stdenv/) in Haskell? Basically using Haskell to orchestrate single-file nix expressions. You’d also get the huge benefit of having real types to document and check interfaces at the inter-file boundaries, where there is the most risk of incorrect assumptions. Of course all the types still get erased when you call from Haskell to nixlang.

I’m sure nobody would be enthusiastic about having nix depend on GHC. OTOH Haskell98 was an incredibly simple language that is easy to write interpreters for.

/me ducks tomatoes thrown by audience

Artturin · March 26, 2022, 10:15pm

post a link to this in the nix repos issues

amjoseph · March 26, 2022, 10:21pm

Done: Nixlang: how do you find all uses of a declaration? · Issue #6325 · NixOS/nix · GitHub

sternenseemann · March 26, 2022, 10:29pm

Nix has some types of lexical bindings which you can statically reason about. However, they are inconvenient, since if you want to import something you have to pass anything in separately or you loose this property. Because of this we usually stuff things into attribute sets (e.g. lib is a big attribute set containing functions).

Attribute sets, crucially, are computed and even lazy, so you can’t statically reason about them. Additionally, these properties are crucial to how nixpkgs works.

This leaves us with an answer to your initial question: The simplest (or most reliable?) way is to actually evaluate everything in every way you think is relevant to figure out the repercussions of removing something. This is why ofborg has a multiple eval checks and nixpkgs contains many files for this purpose. Your case just slipped through the testing net as ofborg doesn’t seem to check cross evaluation currently.

amjoseph · March 26, 2022, 10:36pm

To be clear, there are two distinct forms of laziness here. Haskell is a lazy language, but still has statically-resolved identifiers.

My understanding here is that the reason this is crucial is that it lets you avoid having to parse every single file in all of nixpkgs simply in order to evaluate one expression.

I agree: that property is crucial and must be preserved.

This is why I suggest only lib and pkgs/stdenv should have statically-resolved identifiers (and perhaps types as well). You need to parse nearly all of that anyways right now in order to evaluate any one package expression.

I agree with all of this except the “most reliable” part. Clearly it didn’t work here. There are too many “entry points” into nixpkgs for this to be reliable. Testing is important, but it isn’t a replacement for types (and you could say that statically-resolved identifiers are the most basic form of types).

amjoseph · March 26, 2022, 10:40pm

By the way, are the expressions for these eval checks (and hydra’s as well) exhaustively specified in the nixpkgs repo somewhere?

I appreciate your answer here but I remain mystified as to where you looked in order to find that answer. Give a man a fish, feed him for a day; teach him to fish, feed him for a lifetime…

pennae · March 26, 2022, 10:56pm

it totally isn’t. not just because attrsets are lazy and indexable by arbitrary stringly expression, but also because any binding can leak into a derivation (and thus into an actual shell script). in many ways nix has the same problem of combinatorial explosion as the C preprocessor does when used as a configuration mechanism due to its laziness as well (which ofborg tries to exhaust, with varying degrees of succes).

amjoseph · March 27, 2022, 2:09am

That’s what I thought.

Well, I would be happy enough with a definition of “used” that included “antiquoted into any string which became part of a derivation” regardless of whether or not the derivation’s execution is actually affected by it. In other words, builder = "if false; then ${foo}; else true; fi" is considered to “use” foo.

But yeah, the big problem is that whereas C++ distinguishes between struct and boost::unordered_map, nixlang uses attrsets for both of these things.

vcunat · March 27, 2022, 7:16am

One practical issue is that NixPkgs is very large nowadays and thus unavoidably very expensive to fully evaluate. I don’t anticipate that much opportunity to cut down the costs by limited evaluation (if someone wrote code specifically for this purpose).

I’d probably approach this by sensible use of identifiers, so that grepping will get you what you want in most cases.

pennae · March 27, 2022, 11:55am

that would not be sufficient. derivation takes an attrset as a function argument, and every attr in there is exposed to the builder as an environment variable. try this and look at the output:

derivation {
  name = "drv";
  system = builtins.currentSystem;
  builder = "${runtimeShell}";
  args = [ "-c" "${coreutils}/bin/env > $out" ];
  something = 42;
}

you’d not just have to track identifier uses, you’d also have to track set merges and builtin functions to varying degrees of complexity and/or impossibility. take a simple definition of filterAttrs:

filterAttrs = fn: as: listToAttrs
  (concatMap
    (a: if fn a then [ { name = a; value = as.${a}; } ] else [ ])
    (attrNames as))

your definition of “used” would have to include potential uses of the name of an attribute to even have a chance of tracking uses through this function, and by then it’s broad enough to not be useful any more.

Ericson2314 · March 27, 2022, 1:23pm

Laziness, especially lazy imports come into play, but the fundamental issue (with Nix as well as any untyped) is that lexically scoped identifiers are tangled up what’s more properly thought of as “map keys”.

In all those typed languages you mentioned, you also have little way to assume that map lookups succeed in finding something in the map.

The problem isn’t so much that Nix is a “fucked up language” — e.g. getting rid of with doesn’t actually fix this problem — but in in dynamic languages it is simply too ergonomic to to use maps,

Thank goodness we don’t use builtins.scopedImport in practice. What that means is that the sort of bug that make-bootstrap-tools-cross: fix typo which prevents hydra eval · Pull Request #165733 · NixOS/nixpkgs · GitHub fixes should be findable by an evolution mode that “eagerly” follows all imports and just checks for (static) binding errors.

with still defeats such an analysis, forcing it to optimistically assume the variable is bound, but that is OK. Replacing with ...; with let inherit (...) <whitelist>; can help improve the ability of such an analysis.

amjoseph · March 28, 2022, 5:16am

Yes, I know. All of the attrs passed to the derivation are “used by” it. I ran your example and wasn’t shocked or surprised by anything in the output. I didn’t see anything unexpected “leaking” there.

Your comment about filterAttrs is a great example of (as you put it) the fact that “attrsets are lazy and indexable by arbitrary stringly expression”. This fact is definitely the source of all the problems.

Wow, I did not even know that existed. It isn’t documented in the nix manual. Probably because people shouldn’t use it. Yes, I too am thankful we aren’t using it.

sternenseemann · March 29, 2022, 12:41pm

Not just that: If you had a hypothetical strict Nix evaluator that could evaluate a nixpkgs worth of code instantly, it wouldn’t be able to evaluate nixpkgs, because it relies on laziness for building package sets. Especially in cross we are dealing with multiple mutually recursive fixed points that would cause any strict evaluator to lock up forever.

In that sense I can respond to your comment,

that while this is true to an extent, you really have to think of it as more of a design decision (at least at this point): We are actively relying on this property and can not be thought of as an objective “problem” of Nix we can solve by fundamentally changing how attribute sets work.

Interestingly, Nix is strict in one way: The attribute names present in a map are strict, i.e. if you evaluate a single key or check for a single key, Nix will evaluate everything necessary to determine which keys are in a attribute set. This property already causes trouble in some stdenv bootstrapping related portions of nixpkgs, so there would be some interest to make them lazy as well.

CI only exrpessions are to be found in the linked ofborg repository. Expressions that define Hydra jobsets are usually tracked in the nixpkgs repository, indeed. By convention these are called release*.nix. ofborg should probably use some of the release expression that are used outside pkgs/top-level/release.nix and nixos/release.nix.

I think one lesson we can take away from this is that we should make nixpkgs evaluation less complicated over time. There are indeed many entry points and different ways to evaluate different parts of it. One concret step towards this could be to make cross the normal entry point.

amjoseph · April 2, 2022, 7:27pm

I was not proposing to change the nix binary to use strict evaluation while leaving all the nixpkgs code exactly the way it is. Obviously that would break lots of things in many different ways.

Let me be more explicit; I should have said:

Back to your comment:

Yeah, I’m still somewhat undecided about this.

On one hand, I’ve spent the last week digging around in the internals of make-derivation.nix and am totally impressed with the way it approaches cross-compilation (specified using natural deduction no less!). The nested fixpoints-within-fixpoints (at least three levels deep, maybe more) involved are clearly an excellent showcase for a situation where a lazy-by-default language is a big win. It’s also exceptionally clean, organized, and well-documented code (this applies to pkgs/stdenv/* in general). Trying to rewrite this in ML with thunks would produce a fragile mess; the flow control is implicit and ought to be implicit.

On the other hand I spent most of this morning fighting with nixpkgs trying to make it not “encounter infinite recursion” when curl depends on file (this is a longstanding upstream dependency via ./configure which nixpkgs has been ignoring). This experience has made me less enthusiastic about the lazy-by-default situation. The boilerplate “Note: this package is used for bootstrapping fetchurl” slapped onto so many packages is a sign that maybe implicit laziness is causing headaches here.

Yes, I know that it’s a design decision. And I know that it won’t be changing anytime soon.

But I think it’s helpful to understand what the truly important uses of this decision are. Here’s my list so far:

Lazy parsing of .nix files, so we don’t need 24,000 calls to fopen() for every eval
Internally within stdenv, especially related to bootstrapping. This includes dependencies between multiple invocations of nixpkgsFun (stages.nix, cross-compilation, etc).
Between packages at top-level.nix – i.e. between the individual packages of a single invocation of nixpkgsFun

The first point is sort of unique to nixlang; most lazy languages (eg Haskell) don’t share this property.

I’m sold on laziness for the second point.

I’m not entirely sold on the last point. Leaving the inter-package dependency graph implicit is definitely tempting, but there are a lot of footguns in there like membership in the “fetchpatch clique” which is expressed in prose documentation rather than in code. When things go wrong it can be really hard to figure out where the infinite recursion is coming from.

None of this is going to change in the next decade, but understanding the “essential uses” of laziness is helpful to me in understanding the “zen of nix”.

Is there an example use case for this? I get the undefined // { x = 3; } example, but do people actually write code like this?

Ah, thank you so much. I assumed that release*.nix was just for making releases. Now that I understand that it is really more of a “this is what hydra does.nix” it makes more sense. Thanks!

I absolutely agree with this, 100%.

sternenseemann · April 12, 2022, 1:07pm

This fact is not to blame on lazy evaluation, but rather on Nix’s pure approach: If the outPath of the derivation depends on the complete build description of itself and all its dependencies, then it is clear that this computation can never finish in the case of an (indirect) circular dependency between file and curl, for example. This is not to blame on laziness, but would also happen with strict evaluation. Breaking up circular dependencies will always be a thing you need to do in bootstrapping such a package description system (in a way it’s a bootstrapping problem in and of itself) – in nixpkgs it takes the shape of the “fetchpatch clique”.

Laziness rather helps to avoid infinite recursion in cases where strict languages would cause one in absence of an actual circular dependency (e.g. a package taking itself as an input to use in passthru.tests which doesn’t influence the derivation itself).

I seem to remember that it would’ve been useful in some areas of stdenv bootstrapping, especially when dealing with cc-wrapper’s passthru attributes, but a concrete case escapes me. I also never tested this stuff with a lazy attribute names patch, so my intuition may very well be wrong there.

amjoseph · May 8, 2022, 6:48am

Yes, my point is that strict evaluation would have made the problem obvious much earlier in the development process. It forces evaluation order to be explicit. It would, effectively, force the “fetchpatch clique” to be a first-class language object (a set/list of packages) passed to the fetchpatch derivation, rather than a convention documented by a bunch of comments.

That said, strict evaluation is way more work. Laziness was the right choice for nix, but it isn’t without cost.

I’ve been very tempted lately to write a nixlang extraction for Coq. I’m surprised nobody has (there is no mention here). Obviously the Coq type for nixlang values would be coinductive since nixlang is lazy and its values are potentially-infinite. It would let you mix strict and strongly typed library/coordination code with lazy and dynamically typed packaging code, which I think uses each type of language where it is best suited.

I think would be a really nice match for deviously tricky routines, like the stdenv bootstrap and the cross-compilation stage shift, with the benefit of being able to get machine-checked proofs of key properties instead of simply documenting them and hoping nobody breaks invariants somewhere down the line. That said, I expect nixpkgs to accept a Coq as a prerequisite right around the time when pigs fly.

amjoseph · May 31, 2022, 7:41am

Posting here to answer my own question in case anybody else stumbles across this thread.

Yes! People do write code like this, and I just did without realizing it. Fortunately there was this wonderful comment here waiting to greet me (how thoughtful!) when I screwed up:

# An infinite recursion here can be caused by having the attribute names of expression e in .overrideAttrs(finalAttrs: previousAttrs: e) depend on finalAttrs. Only the attribute values of e can depend on finalAttrs.

The overrideAttrs mechanism is implemented with a fixpoint. Because of the strictness of the set of attribute names, functions composed to form the fixpointed functional must not use the existence of an attribute in the final result to decide whether or not to adjoin an additional attribute (not even a different one, because the entire set of attrnames is strict, forced as a single thunk rather than a separate thunk for each attribute name).

Now I understand why pkgs/stdenv/adapters.nix has extendMkDerivationArgs but doesn’t have overrideMkDerivationArgs.