(Future) of npm packages in nixpkgs?

My (basically outsider) impression is that the buildGoModule and buildRustPackage functions revolutionized packaging for those ecosystems. I went from spending hours learning about Go paths and directories to package something to being able to package something in 5 minutes with buildGoModule. I’m strongly in favor of fetchNodeModules from a packager ease-of-use perspective.

If the npm output format changes, I expect we will be able to fix it in a somewhat automated way.

@andir, is the issue you mentioned with fetchNodeModules and multiple architectures not a problem with the other options?

6 Likes

Doesn’t fetchNodeModules method keep the dependencies out of the nixpkgs git repo?

Hello,

I’ve noticed that there’s a new discussion about NPM packages in Nixpkgs. I’d like to also give my point of view about it and provide you some background information. Maybe this helps to decide in which direction we want to go.

Deploying NPM packages with Nix is a complicated problem – NPM is both a build and dependency manager, and obviously the latter aspect conflicts with Nix.

Bypassing NPM’s dependency management features is not very trivial – it has odd behaviour, certain dependency specifiers are more “loose” than Nix’s dependency addressing mechanism (with hashes), we must bypass the local cache (that in turn may cause downloads to be triggered), and we must deal with undeclared native dependencies. Furthermore, because NPM is also a build manager, we must still make sure that it can run the build scripts.

Furthermore, every major NPM release may introduce new features that could break the integration approach.

How the current NPM packaging approach came to be was mostly just an accident :slight_smile: . Long time Nixpkgs project members may probably already know that before we started using node2nix, there was npm2nix – for a while it worked well enough. Sometimes we ran into breakages when new major versions of NPM were released, but for a while problems seemed to be easily fixable.

At some point, packages with circular dependencies were introduced (I don’t think this is a feature intentionally supported by the developers of NPM, but they were there all of a sudden!). npm2nix couldn’t cope with these kinds of dependencies, and I got stuck with the same problem as well. Because Shea Levy (the npm2nix author) no longer had any time to work on npm2nix, I’ve decided to investigate.

After lots of experiments, I realized that there was a fundamental problem with npm2nix: just symlinking dependencies into the node_modules/ sub folder of each dependency no longer sufficed to cope with circular dependencies. Instead, we must memorize earlier deployed dependencies and copy them. When a dependency has been encountered previously, then it should be skipped. This basically required a fundamental rewrite of the entire tool.

In the beginning, my implementation was somewhat controversial. The first attempt relied on building derivations in the evaluation phase. Later I did a major revision, that computed the entire dependency before the evaluation stage, solving this problem.

Still, my implementation was rejected by some people in the community (mostly because it looked too complicated, which was IMO “necessary evil” to cope with NPM’s peculiar behaviour).

Some of my findings were also integrated into npm2nix, making it possible to still deploy some packages with circular dependencies. Because this solution worked “well enough” for most people, npm2nix was kept and I’ve decided to not push my implementation forward anymore. Nonetheless, I kept using it for my own projects and decided to it to call it node2nix. For quite some time (more than a year) npm2nix and node2nix co-existed.

Then roughly a year later, a new controversial feature was introduced in a new major release of NPM called dependency de-duplication. Basically, dependencies of a package are no longer deployed in the node_modules sub folder of each dependency, but “moved up” in the dependency tree until a conflict has been encountered.

This major NPM change (again!) broke npm2nix – now all of a sudden npm2nix could no longer be used to deploy any package. As a result, the entire NPM package set in Nixpkgs was completely broken.

For a while, Nixpkgs appeared to be unfixable, until I gave node2nix a try. It seemed to work fine and I basically proposed to use it for Nixpkgs instead of npm2nix. This basically explains how node2nix was introduced into Nixpkgs, and why it was named node2nix, and not npm2nix. Basically, node2nix came about because there was not a better alternative :slight_smile:

Although node2nix mostly works for a large category of packages, its design also had be overhauled several times. When package-lock.json files and offline installations were introduced, I had to do another major rewrite to bypass the local cache.

Sadly, NPM is still subject to evolution and certain changes in newer NPM versions have introduced new kinds of problems to node2nix. Furthermore, there are also new kinds of use cases for which node2nix was initially not “designed”.

From a functional perspective, we have the following NPM deployment use cases with Nix:

  • Deploying end-user packages (e.g. from the NPM registry)
  • Deploying local development projects
  • Deploying remote development projects (this a typical Nix use case, NPM does not have an equivalent)
  • Deploying NPM dependencies in a non-NPM project

node2nix was only designed for the first two use cases. The latter two can only be done by creatively using certain integrating of the generated code, which isn’t trivial at all.

Furthermore, there are other drawbacks as well:

  • For deploying end-user packages, you always have to regenerate the Nix expression for the generated package set as a whole. The advantage is that common dependencies are reused in the generated Nix expression (so that the amount of code churn is smaller), but the drawback is that regeneration is very time consuming, and typically does not map well to the Nixpkgs development workflow (in which each commit refers to a single, or well identified group of packages)
  • The node-env.nix has evolved into a very complicated beast. This is mostly caused by coping with lock files and bypassing the local cache – as a result, in its current state it is very hard to rewrite it in such a way that we can use it as a deployer for NPM dependencies in non NPM projects
  • Another result of a very complicated node-env.nix, is that it runs out of memory when the nesting of dependencies is too deep.
  • We also cannot easily implement support for deploying remote development projects.
  • The NPM dependency resolution algorithm is wrong. In newer versions, NPM makes a distinction between the origins of the packages. For example, async 0.2.2 from the NPM registry is considered a “different” dependency than async 0.2.2 deployed as a source tarball from a HTTPS URL. In older implementations, they were considered the same. Supporting these differences, requires a substantial rewrite of the dependency resolution algorithm in node2nix.
  • node2nix does not use any newer features of Nix. For example, newer versions of Nix also supports SRI hashes, but node2nix doesn’t use them.
  • some newer NPM features are not yet supported, e.g. workspaces
  • the build process isn’t tweakable and its overriding capabilities are somewhat limited. Again, the fact that node-env.nix is a mess is a major impediment.

As you may probably observe, to implement these new use cases and address some of the fundamental flaws of node-env.nix, it is not possible to just implement a “quick fix”. Instead, a fundamental revision/rewrite is required for the base layer: node-env.nix.

I’m already working on a new base layer for a while (that I only have as a private PoC), but sadly it’s progressing very very slowly. This can be attributed to the fact that I can only do the development in my spare time.

Basically, the idea behind the new base layer is that instead of a Nix expression (node-env.nix), you can run a tool that “tricks” NPM in such a way that all dependencies are present by providing the Nix store paths to them. It already works fine locally, but the remainder of the integrations are not done yet, and deploying from local sub directories is still unsupported.
A companion tool can take care of obtaining the dependencies (either via a lock file, or by another companion tool that performs a dependency resolution on an ordinary package.json file)

However, what I also realized is that with node2nix I have always been aiming for accuracy, and maybe this isn’t what everyone wants/needs. For Nixpkgs’ use cases, we may also be able to live with a less accurate, and more hackable approach.

From what I know from talking to people in the Nixpkgs community, I know that an incremental package deployment approach would be desirable.

I have also been thinking about a completely different integration approach:

  • Instead of making/maintaining our own implementation of NPM’s dependency resolution algorithm (which node2nix uses to deploy end-user packages), we can generate a package.json file with the package we intend to deploy as its only dependency. By running ‘npm install’ in an isolated environment, we can generate a package-lock.json file that contains the resolved dependency tree. Then the resulting package-lock.json file can be consumed by a Nix expression to deploy the dependencies. (For Git dependencies, we must still somehow compute the output hash, because these aren’t known by NPM).
  • We don’t run ‘npm install’ in the derivation anymore. If everything works out properly, the dependencies should already be there. However, the build steps (if any) must still be performed by other means
  • To support incremental deployments: we generate a data file JSON format that can be easily read (e.g. with builtins.fromJSON) and updated, rather than generating a Nix expressions. The disadvantage of using Nix expressions for incremental updates is that they are difficult to read, and need to be evaluated.
  • Alternatively, if we can live with the large amount of code churn, we can also save the package-lock.json files for each package that we want to deploy in the Nixpkgs repository). This prevents us from having to implement some kind of de-duplication method. Furthermore, this also maps better to our Nixpkgs development workflow in which each commit refers to a package.
  • We must make sure that the derivation can be easily “tweaked” with hooks and/or overrides, to correct potential deployment discrepancies

The above solution obviously has the disadvantage that it doesn’t deploy as quickly as node2nix (but this can alleviated with incremental updates), does not support older NPM versions, and does not support build management facilities out of the box (but this can still be fixed with overrides).

Maybe a different solution that is fully lock driven, incremental, and tweakable would be a better for Nixpkgs.

Anyway, regardless of the approach that we think is best, some major changes are required to make NPM deployments more future proof.

What do you think?

P.S. If you want even more context, I have written a number of blog posts about this subject. They have a ‘node2nix’ tag, and can be found here: Sander van der Burg's blog: node2nix. These will give you even more details on what I have done in the past.

18 Likes

(I created the fetchNodeModules PR)

Yes, that’s precisely the reason why scripts are disabled by default with --no-scripts in the PR.

2 Likes

i’m going to be doing some bit and pieces with node soon… so any improvements to this ecosystem is much appreciated.

whats this about node downloading random binaries … i thought it was javascript and c/c++ stuff?..

Some packages think it’s a good idea to download compiled (usually C/C++, but not always) libraries and binaries at build time so that their packages can link against them/use them in their build steps.

Instead of asking users to install them before using the package, it’s common practice to run a curl as part of the script hooks of your npm build. Usually these are custom builds hosted somewhere, so it’s not even trivial to check versions. Some are patched downstream.

I think this is a side effect of how there are always far too many dependencies, that probably means people never read installation instructions of dependencies deep in the tree, so packages that are more reasonable see little use.

You can work around this somewhat in node2nix, thankfully.

1 Like

Yeah there’s been a couple packages I’ve wanted to add but when I went to regenerate the list it took forever so I just gave up :upside_down_face:

2 Likes

I propose we forget NPM and start manually packaging JS packages applying the patches for reproducibility. Objections?

This sounds absolutely insane and is already a big enough problem for Python, an ecosystem with far smaller dependency graphs.

I would suggest to adopt something like GitHub - nix-community/npmlock2nix: nixify npm based packages [maintainer=@andir] instead and let leaf packages deal with the package graphs.

1 Like

Just in the interest of discussion, I made a yarn fetcher that handles both yarn.lock and package-lock.json.

it’s probably not 100% ready, but if we can get something to help with the js transition sooner (waiting for npmlock2nix) then I think it’s all the better.
Note that I haven’t had the time to test on different platforms. So in essence I don’t know if the sha256 is going to be dependent on the platform.

1 Like

:wave: Hello everyone,

My name is Oleg, and I am relatively new to Nix. Allow me to add my 2 cents :slight_smile:

I think we can rely on the popular Node.js package manages here and get all the checksums out from package-lock.json or yarn.lock(or like) files rather than download the whole world to infer checksums, and provide a mechanism to specify missing checksums if any (for some edge-cases yarn doesn’t provide checksums, like direct github-hosted dependency) so we don’t infer anything behind user’s back.

Hence, we don’t make any assumptions about if a package has been changed without a version bump (but I am not sure if this is possible with the current npm registry policy). Anyway, the majority of packages goes with checksums so a bit of extra manual work for the newly introduced Node.js based package should not be a problem but the approach can solve this issue of slowness here.

If we still want to have Node.js based packages within nixpkgs in a way that they can be reused/shared between the other Node.js based packages and granularly cached, I think we still have to have such packages’ declaration (a lock file) somewhere. The same works for any other Node.js project, not just for nixpkgs. However, a reasonable question could be - if we really need all the packages we have, i.e. have them publicly cached? The same question should be asked when adding a non-Node.js package as well? That is, should we introduce a policy about how we introduce the new Node.js package into the nixpkgs?

These are both great projects and I have been playing around with them. The node2nix project is pretty useful when you need a binary that is provided within a Node.js project, that works quite well for particular cases. However, I see a fundamental problem with this approach - they do not provide packages as individual Nix packages so they would be available for reuse, tweak, override, ie no granularity provided, they do share/cache the tarballs but this is just a part of the solution. If we would have a mechanism that turns a Node.js project dependency tree into a Nix dependency tree that would be highly beneficial. This approach is about what @sander is saying.

There is a project that aims to implement this approach - GitHub - Profpatsch/yarn2nix: Build and deploy node packages with nix from yarn.lock files. Have you had a chance to check that?

two challenges:

  1. NPM allows circular dependencies, nix does not
  2. require and import follow symlinks, which changes the current workdir

to fix 2, i made nodejs-hide-symlinks, aka nodejs-chroot,
which allows lightweight composition of node_modules folders
by symlinking packages from a machine-level global store, in our case /nix/store

limitation: circular dependencies must be packed together

Instead of using LD_PRELOAD, would it not be better to wrap node with the –preserve-symlinks flag?

i played with this 3 months ago …
as i remember, i tried node --preserve-symlinks
but i was not satisfied. maybe cos i use pnpm-style node_modules
so node should still resolve some symlinks, just not all symlinks.
sorry i have no better answer

1 Like

let me do another guess ; )
i guess the problem are child dependencies

the naive approach would be
symlinking from /nix/store to ./node_modules

but when two packages require the same dependency,
but in a different version, this fails,
cos you have no write access to package contents,
and a shallow node_modules gives collision on package name

pnpm solves this by putting every package in a separate folder
for example ./node_modules/.pnpm/pkg1/node_modules/pkg1
which is a hardlink to ~/.pnpm-store

the hardlink is needed to stop npm from following symlinks
but hardlinks are not portable, so we cant use them for nix

pnpm exposes packages to ./node_modules
via symlinks to ./node_modules/.pnpm
which are resolved by node

my motivation for pnpm-install-only:
pnpm is a crazy-complicated piece of software
and we need only a small part of that complexity.
my script is basically a wrapper around
snyk-nodejs-lockfile-parser (supports multiple lockfile formats)
and uses npm to run lifecycle script (npm run postinstall, etc)

1 Like

Hmm, confusing that you call it pnpm-install-only and it uses ~/.pnpm but it doesn’t use pnpm-lock.yaml?

yes, because snyk-nodejs-lockfile-parser cannot parse pnpm lockfiles
and i have not yet added the workaround, see support pnpm lockfiles

1 Like

Really enjoying this discussion. You guys obviously know quite more than me on that topic.
Seems this discussion died during the winter months.
Do we have anything new on this topic so far?

I have a feeling a ‘pnpm’ like approach with linking of node modules could be the right step to solution.

especially i dont like the current way NodePackages are added to nixpkgs.
Its just too much unpredictable going on with node2nix on this scale

I think dream2nix project is where efforts are concentrating.

8 Likes

+1 on dream2nix, the right approach from the beginning and nodejs has been the focus “1st class” support, due to its complexity.

1 Like

Hey everyone, I’ve announced js2nix here Announcing `js2nix` - scale your Node.js project builds with Nix. Please have a look. I think I managed to address all the @sander’s concerns in this project.

@sander, @Profpatsch, @DavHau, as you are the ones who are interested and contributed in that field lot, I would love to hear your feedback.

5 Likes