buildNodeModules - The dumbest node to nix packaging tool yet!

A few days ago I was staring at a package-lock.json and had an epiphany:
What if I can just substitute package URLs for file:... dependencies?

That’s the entire premise of GitHub - adisbladis/buildNodeModules: An experiment in improving node packaging for nix. Dead simple..

Features:

  • No IFD.
  • Only fetches a node dependency once. No massive FODs.
  • Dead simple (the PoC was less than 100 lines, now it’s a little more but not much).
  • Support for URL, Git and path dependencies.
  • Mostly composes with npmHooks from nixpkgs.
  • Shell hooks to manage node_modules for development.

I’ve already thrown it at some fairly large & complex projects, and it seems to mostly “just work”.

13 Likes

I would be interested how it compares in terms of evaluation needed for evaluation. Because since we start adding things like importing Cargo.lock files for a subset of packages into nixpkgs we have seen a massive increase in memory and repo size. For out-of-tree projects probably a cool thing to have tough.

I would be interested how it compares in terms of memory needed for evaluation.

I also wanted to know what the impact was so I converted an arbitrary nixpkgs package (github-copilot-cli) with a package-lock.json already checked in to git.

Here are the results from NIX_SHOW_STATS=1 nix-instantiate ./. -A github-copilot-cli:

  • before.json
{
  "cpuTime": 0.8051069974899292,
  "envs": {
    "bytes": 16531624,
    "elements": 828921,
    "number": 618766
  },
  "gc": {
    "heapSize": 402915328,
    "totalBytes": 180765008
  },
  "list": {
    "bytes": 3000672,
    "concats": 21613,
    "elements": 375084
  },
  "nrAvoided": 864119,
  "nrFunctionCalls": 558479,
  "nrLookups": 201175,
  "nrOpUpdateValuesCopied": 4886502,
  "nrOpUpdates": 45948,
  "nrPrimOpCalls": 391174,
  "nrThunks": 1152515,
  "sets": {
    "bytes": 98766016,
    "elements": 5996044,
    "number": 176832
  },
  "sizes": {
    "Attr": 16,
    "Bindings": 16,
    "Env": 16,
    "Value": 24
  },
  "symbols": {
    "bytes": 386981,
    "number": 37425
  },
  "values": {
    "bytes": 36922296,
    "number": 1538429
  }
}
  • after.json
{
  "cpuTime": 0.34689000248908997,
  "envs": {
    "bytes": 6537208,
    "elements": 330113,
    "number": 243519
  },
  "gc": {
    "heapSize": 402915328,
    "totalBytes": 63770208
  },
  "list": {
    "bytes": 985664,
    "concats": 9249,
    "elements": 123208
  },
  "nrAvoided": 358265,
  "nrFunctionCalls": 218872,
  "nrLookups": 82271,
  "nrOpUpdateValuesCopied": 1712123,
  "nrOpUpdates": 18283,
  "nrPrimOpCalls": 171800,
  "nrThunks": 461449,
  "sets": {
    "bytes": 33343888,
    "elements": 2027896,
    "number": 56097
  },
  "sizes": {
    "Attr": 16,
    "Bindings": 16,
    "Env": 16,
    "Value": 24
  },
  "symbols": {
    "bytes": 246498,
    "number": 24916
  },
  "values": {
    "bytes": 13269432,
    "number": 552893
  }
}

So for single packages we’re looking very good.
But as the cost of the implementation of fetchNpmDeps amortizes the performance starts to be in favour of that.
I converted a bunch of packages to get a more complete overview with this test expression as my entrypoint:

let
  pkgs = import ./. { config.allowUnfree = true; };
in
pkgs.runCommand "all-tests" { } ''
  ${pkgs.yarn-lock-converter}
  ${pkgs.github-copilot-cli}
  ${pkgs.vencord}
  ${pkgs.osmtogeojson}
  ${pkgs.netlistsvg}
  ${pkgs.mongosh}
  ${pkgs.sunshine}
  ${pkgs.memos}
''

With that expression the results look like:

  • before.json
{
  "cpuTime": 1.1940569877624512,
  "envs": {
    "bytes": 31789616,
    "elements": 1599978,
    "number": 1186862
  },
  "gc": {
    "heapSize": 402915328,
    "totalBytes": 275616976
  },
  "list": {
    "bytes": 5268080,
    "concats": 46200,
    "elements": 658510
  },
  "nrAvoided": 1649593,
  "nrFunctionCalls": 1070449,
  "nrLookups": 408846,
  "nrOpUpdateValuesCopied": 6321984,
  "nrOpUpdates": 92526,
  "nrPrimOpCalls": 759629,
  "nrThunks": 1922521,
  "sets": {
    "bytes": 130643984,
    "elements": 7866524,
    "number": 298725
  },
  "sizes": {
    "Attr": 16,
    "Bindings": 16,
    "Env": 16,
    "Value": 24
  },
  "symbols": {
    "bytes": 392306,
    "number": 37863
  },
  "values": {
    "bytes": 62970504,
    "number": 2623771
  }
}
  • after.json
{
  "cpuTime": 1.563081979751587,
  "envs": {
    "bytes": 41784936,
    "elements": 2155619,
    "number": 1533749
  },
  "gc": {
    "heapSize": 402915328,
    "totalBytes": 274688464
  },
  "list": {
    "bytes": 5585472,
    "concats": 81226,
    "elements": 698184
  },
  "nrAvoided": 2160066,
  "nrFunctionCalls": 1399176,
  "nrLookups": 601130,
  "nrOpUpdateValuesCopied": 4680108,
  "nrOpUpdates": 143881,
  "nrPrimOpCalls": 945377,
  "nrThunks": 2258018,
  "sets": {
    "bytes": 100134000,
    "elements": 5937452,
    "number": 320923
  },
  "sizes": {
    "Attr": 16,
    "Bindings": 16,
    "Env": 16,
    "Value": 24
  },
  "symbols": {
    "bytes": 469095,
    "number": 41044
  },
  "values": {
    "bytes": 70001928,
    "number": 2916747
  }
}

Of course evaluation performance isn’t everything.
There is a trade-off between faster evaluation and better incrementality & sharing.

1 Like

So 8 packages increased the overall memory by roughly 1M? We have 180 occurrences of buildNpmPackage + 310 packages in node-packages.json. With this unscientific interpolation that would be: (275616976 - 274688464) / 8 * (180 + 310) = 56871360.0 => 54.2M more RAM.

Given that eval needs several gigabytes, something on that order sounds acceptable.

What are acceptable file sizes for lock files commited to nixpkgs? We probably would at least double the size of the nixpkgs archive. I think I had estimations somewhere.

EDIT: Repackage node2nix generated packages using buildNpmPackage · Issue #229475 · NixOS/nixpkgs · GitHub Maybe like 3 times the current nixpkgs size?

2 Likes

Now that’s a much harder pill to swallow. I think that even our ~30M are quite large.

This is terrible for build time performance especially when optimizing cores and jobs 1 and especially remote building. Node projects usually have hundreds but more often thousands of dependencies. My entire system takes ~2400 downloads after a staging-next merge and building the 250 derivations that just write evaluated text to disk are not build in an instant but take actual time. Having alone one package with that many dependencies will boost building/copy time significantly and will take way longer than with the vendor blob. 2
Also since js people have atten to much semver for breakfast every binial change is a major semver bump, fragmenting the ecosystem and reducing the possible re-usage of packages. For the big js projects every update will probably download 100s of dependencies.

In my experience a derivation which is smaller than ~1 MB or takes shorter than 1 s to build is atten by nix poor build scheduling and start times and things take longer in the end.

1 since we cannot differentiate between tiny builds, normal ones and ones that require bigParallel we can only optimise for one of them. So you either wait a significant amount of time for the sometimes 1000 node packages to be downloaded or you OOM when building by accident to many things in parallel.
2 It is the same story for naersk and node2nix.

2 Likes

*36MB and that’s only compressed as a .tar.gz. Unpacked we are at ~341MB.

2 Likes

Hi @adisbladis @Mic92

I am trying to build the project using the repo but I’m lost and can’t build it. Can you guide me on how to use it?