How to extract signal from almost useless error message

Puzzle

I wonder how long it would take a Nix expert to understand the problem reported by the following error message:

builder for '/nix/store/wc4g9sq091cbrih86wd83p947l6jzcxf-channel-rust-1.50.0.toml.drv' failed with exit code 1; last 7 log lines:
  
  trying https://static.rust-lang.org/dist/2021-02-26/channel-rust-1.50.0.toml
    % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                   Dload  Upload   Total   Spent    Left  Speed
    0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  curl: (22) The requested URL returned error: 404 
  error: cannot download channel-rust-1.50.0.toml from any mirror
(use '--show-trace' to show detailed location information)

With a flakes-enabled Nix, you can generate the message for yourself with the following one-liner:

nix build github:jacg/naersk-test-drive/useless-error-message

which also indicates where the source code can be found.

Answer

A working version of the code can be seen in action here:

nix build github:jacg/naersk-test-drive/nightly-cargo-stable-rustc

Puzzle

I wonder how long it would take a Nix expert to understand the problem reported by the following error message:

builder for '/nix/store/wc4g9sq091cbrih86wd83p947l6jzcxf-channel-rust-1.50.0.toml.drv' failed with exit code 1; last 7 log lines:
 
 trying https://static.rust-lang.org/dist/2021-02-26/channel-rust-1.50.0.toml
   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                  Dload  Upload   Total   Spent    Left  Speed
   0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
 curl: (22) The requested URL returned error: 404 
 error: cannot download channel-rust-1.50.0.toml from any mirror
(use '--show-trace' to show detailed location information)

What else do you expect this to say? It tries to download something that is not present upstream. It fails and reports exactly that.

Looking at the code, you are trying to use nightly instructions for stable rust, and downloading stable manifest from today’s directory on the upstream distribution server. That does not work, naturally. I do not even use nightly rust ever, but in five minutes I found out «date=".";» actually works for pretty natural reasons (took me a DuckDuckGo search to find out where the stable channel-rust-*.toml is supposed to be in the first place). Still a hack, of course, as you are using nightly procedures for stable releases, but works well and for understandable reasons (per-release manifests are way less numerous than nightlies, so they can be obtained at a higher directory level directly, and «.» stays in the same directory — unlike «…» which has much more annoying security implications, «.» is quite likely to work on static sites)

With a flakes-enabled Nix, you can generate the message for yourself with the following one-liner:

nix build github:jacg/naersk-test-drive/useless-error-message

which also indicates where the source code can be found.

Answer

A working version of the code can be seen in action here:

nix build github:jacg/naersk-test-drive/nightly-cargo-stable-rust

So you consider a non-rebuildable expression that works only due to caches an answer? Please, if you miss some information — which is perfectly fine, that’s how people learn — do not claim a solution that requires basically «guessing» (I know, copying from another place) a hash is the correct answer.

2 Likes

Looks like you are requesting non-existent release. I would just omit the date argument.

The “answer” does not work for me either:

$ nix build github:jacg/naersk-test-drive/nightly-cargo-stable-rust
error: --- FileTransferError ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- nix
unable to download 'https://api.github.com/repos/jacg/naersk-test-drive/commits/nightly-cargo-stable-rust': HTTP error 422 ('')

response body:

{
  "message": "No commit found for SHA: nightly-cargo-stable-rust",
  "documentation_url": "https://docs.github.com/rest/reference/repos#get-a-commit"
}
2 Likes

Oh, it is even optional. Yeah, I should have guessed that…

Adjusting for the specifc hashes used in the example, I expect it to say

hash mismatch in fixed-output derivation '/nix/store/hhvgs3wwribfvkgdff7pmfnb4vy801bs-channel-rust-nightly.toml.drv':
  specified: sha256-hTj47PwUeP276uF6+HLDzsHYoDvfJa+y9o+vmxZqV0Y=
     got:    sha256-hTj47PwUeP276uF6+HLDzsHYoDvfJa+y9o+vmxZqV0g=

… which is exactly what it says in identical situations in a slightly different context … and I still haven’t got to the bottom of what exactly the context change is which turns the message from spot-on to almost useless.

If you can suggest a better solution to this problem, I’d be most grateful.

WTF!!? I made sure to double-check that before posting. And now I get the same error as you do.

Ugh, sorry, copy-paste error:

nix shell github:jacg/naersk-test-drive/nightly-cargo-stable-rustc -c naersk-test-drive

(c missing at the end of the branch name … I’ll edit the original to fix.)

Adjusting for the specifc hashes used in the example, I expect it to say

hash mismatch in fixed-output derivation '/nix/store/hhvgs3wwribfvkgdff7pmfnb4vy801bs-channel-rust-nightly.toml.drv':
 specified: sha256-hTj47PwUeP276uF6+HLDzsHYoDvfJa+y9o+vmxZqV0Y=
    got:    sha256-hTj47PwUeP276uF6+HLDzsHYoDvfJa+y9o+vmxZqV0g=

… which is exactly what it says in identical situations in a slightly different context … and I still haven’t got to the bottom of what exactly the context change is which turns the message from spot-on to almost useless.

When a thing exists upstream, but has different hash than expected, that’s what is reported. When it doesn’t and returns 404. the 404 status is reported. Of course, when the hash differs it is easy to say what is the current hash, and if the thing is missing completely, it is pretty hard to say where it could be found instead.

If you can suggest a better solution to this problem, I’d be most grateful.

Above Jan correctly tells you to just drop the date parameter (which does effectively the same thing but in a reasonable way and without abusing the way the server works).

1 Like

WTF!!? I made sure to double-check that before posting. And now I get the same error as you do.

You might want to read Feature request: Warning when src changed but sha256 didn't. · Issue #2970 · NixOS/nix · GitHub and some linked issues

In my mental model for using rustChannelOf, I had a rule of “use date for unstable, but not for stable”, until, sometime in the last three days (I forget the details of how I stumbled upon it) some problem seemed to have been solved by adding a date when specifying a stable channel. Thus my model was polluted by a superstition that date-with-stable might have some meaning. If only there were some documentation that stated explicitly how these should be used, rather that having to infer behaviour from examples scattered about the web or reading through the source, there would be far less scope for such superstitions to take hold. Who knows how many other superstitions I pick up per hour of bumbling around Nix. (This irks me, because I spend much of my time encouraging and helping others to educate themselves in order not to fall victim to superstition.)

For someone who is not fully engaged in Nix, ‘read the source’ just doesn’t cut the mustard as documentation.

Nix experts can take all these glitches in their stride so it doesn’t much matter to them that 80% of the things one does in Nix are undocumented (or maybe they are, but finding the thing you need is very difficult in the huge-yet-sparse documentation), because they are familiar with the arcana and are accustomed to figuring out how things are supposed to work by diving into the source code. I wonder whether you can appreciate that (to take but one example) even though it is obvious to you that you should search for channel-rust-*toml, to me that’s voodoo. [edit: OK, I’ve now spotted builder for '/nix/store/fxiq5z0gr87h143rh4qm8ai82n1f7a5l-channel-rust-1.50.0.toml.drv' failed with exit code 1; in the error message itself.]

For ignorant fools like me, where everything here is alien (and I’m only visiting by taking out a huge debt on my time budget), solving a problem like this requires peeling off layer upon layer of ignorance and uncertainty. In this context, too many error messages send me completely the wrong way.

I appreciate that there is a logical explanation for such error messages, and with hindsight I can even understand many of them myself.

I also appreciate that for any single one of the errors that I come across on my Nix journey, had I dedicated more time and attention to that specific error, I could have done a better job of figuring out its meaning on my own.

The trouble is that these errors don’t appear in isolation: one error leads to some attempt which leads to another error, and so on. When you are in a stack of errors-and-attempted-resolutions that is 19 deep, you don’t have much time, energy or patience to analyze the latest error carefully.

The fundamental fact remains that Nix is almost impenetrable to the vast majority of mortals, and the all-too-often completely unhelpful error messages are a huge contributing factor. This is a great shame, because I firmly believe that Nix (or, at least the ideas it brings to the table, if not their implementation) is vitally important and would be a huge boon to far greater a proportion of the population if it were more accessible.

Is this an argument for a statically-typed Nix?

I’m sorry, I’m too tired to try to carefully analyse what this means. As a bit of background on where I’m coming from: I didn’t guess any hashes. I’m in the habit of using sha256 = "" when changing version of fixed-output derivations in order to discover the hash of the new version. I’ve been using this same trick, perhaps superstitiously, when changing versions in rustChannelOf and it appeared to be working OK.

I also (perhaps mistakenly, I’m not sure of anything in Nixland) believe that sha256 = "" is equivalent to using any valid but non-existent hash: thus the hash you see in the ‘question’ is meant to be a synonym for sha256 = "".

By ‘answer’ I was alluding to what I did to get from something that seemed to be working for me, to something that broke in a highly-unexpected way, rather than as a solution to the problem.

What this isolated example fails to convey is that I didn’t come across this error in isolation, but in the context of a ‘real’ case where the error appeared in a stack of problems 19 (or 23, or 42) deep, where I was expecting to generate an error message containing the new hash, but got something different instead. Then I invested time and effort into isolating the issue, to present it to the world at large without irrelevant distractions.

And I get responses which, to my poor Nix-battered soul, sound (even if not intended that way by the author) a bit like: you fool, you’re just too lazy to look.

Which brings me this much closer to concluding that perhaps I should stop wasting my time on Nix.

[With apologies: Je n’ai fait celle−ci plus longue que parce que je n’ai pas eu le loisir de la faire plus courte.]

1 Like

In my mental model for using rustChannelOf, I had a rule of “use date for unstable, but not for stable”, until, sometime in the last three days (I forget the details of how I stumbled upon it) some problem seemed to have been solved by adding a date when specifying a stable channel.

The question is not how to avoid acquiring superstitions. The question is understanding that whatever Nix does it is to drive some interaction with upstream build system, or in this case upstream distribution server. It is the question of having a mental model of which part of the work Nix does (composing build instructions and making it harder for them to access what they should not), so that you can experiment with partially-understood tricks and with discarding these tricks, and make sense of the complaints.

For someone who is not fully engaged in Nix, ‘read the source’ just doesn’t cut the mustard as documentation.

Notice that you are using a thing that is not even a part of NixOS or Nix-community organisation.

Nix experts can take all these glitches in their stride so it doesn’t much matter to them that 80% of the things one does in Nix are undocumented (or maybe they are, but finding the thing you need is very difficult in the huge-yet-sparse documentation), because they are familiar with the arcana and are accustomed to figuring out how things are supposed to work by diving into the source code. I wonder whether you can appreciate that (to take but one example) even though it is obvious to you that you should search for channel-rust-*toml, to me that’s voodoo.

This is not a Nix problem. The problem that needed solving first is a basic getting-stuff problem: if a tool tells you a file is needed, and a tool tries to get it from «this URL» and the URL gives you status 404, the most natural question is: OK, where am I even supposed to get this file? This question — independently of the tool in question — is often answerable for searching for the file name (and the URL was included in the output). Then, once you know where a file can be had, you can look at the actually tool-specific problem of how to make it grab stuff from correct place.

Is this an argument for a statically-typed Nix?

Hmm, this is exactly where I would not believe in type systems. The question is not whether you build the URL in a locally consistent way, the question is whether your idea of what URLs will be provided by the upstream is in sync with the upstream. Not sure it is easier to include in types than just as an assertion that date should not be used with stable. (It is a question towards mozilla-nixpkgs did they consider adding this assertion — that might be a good idea)

As a bit of background on where I’m coming from: I didn’t guess any hashes. I’m in the habit of using sha256 = "" when changing version of fixed-output derivations in order to discover the hash of the new version. I’ve been using this same trick, perhaps superstitiously, when changing versions in rustChannelOf and it appeared to be working OK.

This makes perfect sense as long as the new URL is correct.

I also (perhaps mistakenly, I’m not sure of anything in Nixland) believe that sha256 = "" is equivalent to using any valid but non-existent hash: thus the hash you see in the ‘question’ is meant to be a synonym for sha256 = "".

That’s approximately correct (you need a correct number of characters, but indeed it does not matter if you use a bunch of zeros or any different wrong hash; a lot of people do that).

And I get responses which, to my poor Nix-battered soul, sound (even if not intended that way by the author) a bit like: you fool, you’re just too lazy to look.

Oh sorry, it’s definitely not about lazy, it’s about looking in the wrong layer completely. Basically you were comparing the reports from Nix itself being unhappy and reports from Nix running a tool and the tool being unhappy and, I don’t know, it felt like you were ignoring the distinction (or were unaware of the difference and it having implications). Of course if the source of unhappiness is inside Nix, Nix has an easier time to printing the complaint in a readable form.

Which brings me this much closer to concluding that perhaps I should stop wasting my time on Nix.

That might be, unfortunately — you might be learning too many new tools (and in case of Nix using some components outside of their documented primary use) at once without having firm ground on any level to get some stability. But of course here I do not have any meaningful amount of information to advise.

1 Like

At face value, and in isolation, this is very clear. Now:

    35|         packages.abracadabra = pkgs.stdenv.mkDerivation {
    36|           name = "abracadabra";
      |           ^
    37|           src = self;

cannot coerce a function to a string
(use '--show-trace' to show detailed location information)

If a tool tells you that a function cannot be coerced to a string and points at line 36, the most natural question is: OK, where is the function on line 36?.

In fact, it turned out that the function is on line 98. There is no way I would have understood this were it not for the fact that I had just changed line 98. Before this incident, this same error message in different contexts had simply caused me to grind to a halt on multiple occasions.

With this experience under my belt, my instinct will be to ignore the line where the message points, and look somewhere else. Give me a week of Nixing, and I’ll be able to recall another dozen situations where Nix has taught me to ignore what the message ostensibly says.

My point is that, after years and years of programming myself and teaching programming to others in a large variety of programming languages, it is clear to me that error messages sometimes/often/usually (depending on the language) are red herrings. When you’re up to your neck in layers of confusion and unfamiliarity and you’ve been bitten by red herrings multiple times in the recent past, it’s very difficult to resist the urge to look away from the ugly mess that has just appeared on your screen, and analyse it calmly and dispassionately. When you do overcome the urge to look away, your recent history has probably conditioned you to focus your attention on some patterns you (subconsciously) recognize in error messages. The deeper you are in the stack of confusion, the more likely it is that cold, calculating, calm and analytical reading of the error message is replaced by heuristics.

Perhaps I am especially sensitive to this because of the countless hours I have spent watching students struggle with error messages which are obvious to me, and going through the exercise of putting myself in their shoes.

Perhaps I spent too much time hanging out with early C++ in my youth, where about 80% of error message interpretation was of the form “30 screenfuls of line noise which resemble cat vomit usually mean that you’re trying to compile C++ with a C compiler, but if it looks a bit more like weasel vomit then you’ve probably written using namespace std; and one of those names clashes with one of your own; but this here looks like alpaca vomit, so you probably forgot the semicolon after class {...} … in a file that doesn’t even get mentioned in this error message.” This experience may have caused a form of brain damage which gives me a tendency to reach for heuristics too quickly and stick with them for too long, and Nix is certainly bringing this tendency to the fore.

Perhaps my problem is that my comfort zone is inside programming languages, and with Nix I’m mostly dealing with package management and system configuration and the like … nasty, dirty problems whose existence my programming language purist side would like to ignore.

Let’s see if I have vented enough of my frustration.

1 Like

At face value, and in isolation, this is very clear. Now:

   35|         packages.abracadabra = pkgs.stdenv.mkDerivation {
   36|           name = "abracadabra";
     >           ^
   37|           src = self;

cannot coerce a function to a string
(use '--show-trace' to show detailed location information)

If a tool tells you that a function cannot be coerced to a string and points at line 36, the most natural question is: OK, where is the function on line 36?.

Erm, the part of the error messate pointing to line 36 is lost (and I hope that --show-trace would show you which file it is in)

In fact, it turned out that the function is on line 98. There is no way I would have understood this were it not for the fact that I had just changed line 98. Before this incident, this same error message in different contexts had simply caused me to grind to a halt on multiple occasions.

Note that this is yet another context. The first case was Nix relaying an error from the build scripts. The second was Nix reporting an error during build (hash mismsatch for fixed-output derivation). Now this is about Nix evaluation phase. Yes, Nix evaluation errors indeed take much more getting used to to read (and «fluently» might require waiting for a change in Nix, not in you). Yes, they are often harder to read than they should be. Yes, there is some work ongoing and some cases have been improved but far from all. I guess error "value is a list while a set was expected" is too vague · Issue #963 · NixOS/nix · GitHub is close to that specific problem… There is nix-errors-enhancement - error format demo by bburdette · Pull Request #3466 · NixOS/nix · GitHub (and related work) and I have no idea if Add a flag to start the REPL on evaluation errors by edolstra · Pull Request #3901 · NixOS/nix · GitHub ever gets merged but it would be useful.

But yes, debugging Nix evaluation errors is hard (one gets better of it with time — but this is not a pleasant time for sute) and I hope it will become easier some day.

With this experience under my belt, my instinct will be to ignore the line where the message points, and look somewhere else. Give me a week of Nixing, and I’ll be able to recall another dozen situations where Nix has taught me to ignore what the message ostensibly says.

I would say that with any tool the first thing I wonder about error messages is whether they are weirdly distinct types (like syntax errors and linker errors, which both can happen when running gcc), and which parts I can trust in which case. (On Nix-based systems linker messages in particular might need some interpretation when one does things interactively…)

The deeper you are in the stack of confusion, the more likely it is that cold, calculating, calm and analytical reading of the error message is replaced by heuristics.

I guess way before I came to Nix, a curl error message already was a pattern I would recognise before thinking what this error message does in the middle of my terminal emulator window. So my basic heuristics independent of Nix already work for build failures (which were the beginning of the discussion). Learning heuristics for reading evaluation traces was less pleasant…

Perhaps I spent too much time hanging out with early C++ in my youth, where about 80% of error message interpretation was of the form “30 screenfuls of line noise which resemble cat vomit usually mean that you’re trying to compile C++ with a C compiler, but if it looks a bit more like weasel vomit then you’ve probably written using namespace std; and one of those names clashes with one of your own; but this here looks like alpaca vomit, so you probably forgot the semicolon after class {...} … in a file that doesn’t even get mentioned in this error message.” This experience may have caused a form of brain damage which gives me a tendency to reach for heuristics too quickly and stick with them for too long, and Nix is certainly bringing this tendency to the fore.

You are discussing triaging the rough shape of the failure. Yes, that’s definitely needed with Nix because failures come from vastly different contexts.

Perhaps my problem is that my comfort zone is inside programming languages, and with Nix I’m mostly dealing with package management and system configuration and the like … nasty, dirty problems whose existence my programming language purist side would like to ignore.

Annoyingly, programming part of Nix actually has worse error messages than the system administration part (i.e. evaluation has worse messages than builds)… On the building side, though, there is indeed an expectation that you know how a working build would look like and how failures from the build could look like (if you are writing an expression and not just using the existing ones)

2 Likes

Thank you for continuing to humour me, and replying to my whining with thoughtful information. I’ll try to go through your wisdom in a calmer moment to try to extract useful lessons from it.

However, there’s something among what you wrote, that leaves me completely perplexed:

which comes just below you quoting me showing an error message in which a ^ clearly points at a binding of the attribute name which is clearly marked as being on line 36. (Curiously enough, your quote replaces a | in my original with a >.) So I don’t understand in what sense the part of the error message pointing to line 36 is lost.

Oops, sorry, I was mistaken, I thought you were providing the file for context.

It’s another story that I would try --show-trace just in case it has something more interesting.

That part of Nix would indeed be helped by type checking, because now you get a stack trace of the place of use of an incorrect value instead of place of introduction of the value. Which is like with Valgrind traces of memory errors, related and better than nothing but not exactly what you really need.

(Now the fact that the location of a derivation is defined as the place of definition of its name attribute — or pname, I guess — is a separate gotcha one can only get used to but not expect in advance; the reason is as far as I understand that otherwise you will always end up in the definition of mkDerivation that would probably be even less useful)

Wait … it’s pointing there because it’s name and not because it’s the first thing in the set? Experiment bears this out. Hmmmmm.

And it is pointing at the first place where the value later used for the actual derivation construction call has first appeared in the final form. It does tend to guess at least the relevant file more often than not, which is better than some superficially plausible alternatives, but…

It will take me a while to parse that :slight_smile:

Hm. I might actually alos be wrong because there might be some special casing for pname. But basically, the place name is finally passed to derivation constructor is deep inside stdenv, and you probably have not just introduced a bug there. So there is a bunch of heuristics, to skip the boring und unchanging pass-the-name-along steps and find where it originally comes from. (I might have some comments if you show the show-trace output, as I generally don’t even try to understand anything without reading the full trace)

Oh, sorry, I’m not expecting you to try to debug this example for me!

I am ecstatically happy to have discovered that ‘cannot coerce function to a string’ on line X hasn’t got anything to do with line X. This is already a huge step forward in my overall ability to deal with these error messages.

What’s more, in this specific case I know exactly what the problem was: on line 98 (or wherever it was) I had passed too few arguments to a function.

[edit: so my interest in what exactly is going on is whimsically philosophical, rather than pragmatically urgent.]