Tracking string location in source code: my experiment

It has bugged me for a long time that we are not able to easily find the location where a string was defined. This is needed to automate version string and source hash updates during (automated) package update.

So today I finally got back to work on my old stashed stuff, and managed to get a working patch for nix. Everything is available at (permalink)

The patch keeps track of every trivial string (no antiquotations, no escape characters, etc.) in the values themselves. As long as the value is passed unchanged, we can recover its original location through any number of with, inherit, function calls and any other way to toss it around in nix.

Because an example speaks a thousand words, here is what I achieved.

$ ../nix/inst/bin/nix --experimental-features nix-command repl  
Welcome to Nix version 2.4. Type :? for help.

nix-repl> :l <nixpkgs>
Added 10996 variables.

nix-repl> i3.version

nix-repl> builtins.unsafeGetPos i3.version
{ column = 14;
  file = ".../pkgs/applications/window-managers/i3/default.nix";
  line = 8; }

nix-repl> i3.src.outputHash                       

nix-repl> builtins.unsafeGetPos i3.src.outputHash
{ column = 15;
  file = ".../pkgs/applications/window-managers/i3/default.nix";
  line = 12; }

We can show that builtins.unsafeGetPos is more accurate than the only other builtin I am aware of: builtins.unsafeGetAttrPos. The later is off by a few characters in the case of version, and completely useless in the case of src.outputHash.

nix-repl> builtins.unsafeGetAttrPos "version" i3        
{ column = 3;
  file = ".../pkgs/applications/window-managers/i3/default.nix";
  line = 8; }

nix-repl> builtins.unsafeGetAttrPos "outputHash" i3.src
{ column = 18;
  file = ".../pkgs/build-support/fetchurl/default.nix";
  line = 135; }

By design, it does not work with generated values. For example, cannot be retrieved because it is composed from pname and version in make-derivation.nix

nix-repl> builtins.unsafeGetPos           

nix-repl> builtins.unsafeGetAttrPos "name" i3     
{ column = 11;
  file = ".../pkgs/stdenv/generic/make-derivation.nix";
  line = 191; }

For automated updates, we need nothing more. But the patch could be improved a bit to support at least escape characters in the strings. But that’s all for today.

Feedback welcome, and have a nice weekend !

PS: I first wanted to implement that in HNix, but the dependencies have been compiling during all the development of this patch, and the compilation is still in progress :wink:

What’s the memory and performance overhead of this additional tracking?

Truth is, I did not run any benchmark :slight_smile:

At the moment it costs one Pos structure (line + column + file name) per string. Depending on how much of the position data is shared, it could be as few as a pointer and two ints per annotated string. There is also an extra Pos pointer in all the nix values which may a bit worrysome if this was ever considered for upstreaming.