Hello,
I’ve noticed that there’s a new discussion about NPM packages in Nixpkgs. I’d like to also give my point of view about it and provide you some background information. Maybe this helps to decide in which direction we want to go.
Deploying NPM packages with Nix is a complicated problem – NPM is both a build and dependency manager, and obviously the latter aspect conflicts with Nix.
Bypassing NPM’s dependency management features is not very trivial – it has odd behaviour, certain dependency specifiers are more “loose” than Nix’s dependency addressing mechanism (with hashes), we must bypass the local cache (that in turn may cause downloads to be triggered), and we must deal with undeclared native dependencies. Furthermore, because NPM is also a build manager, we must still make sure that it can run the build scripts.
Furthermore, every major NPM release may introduce new features that could break the integration approach.
How the current NPM packaging approach came to be was mostly just an accident . Long time Nixpkgs project members may probably already know that before we started using node2nix, there was npm2nix – for a while it worked well enough. Sometimes we ran into breakages when new major versions of NPM were released, but for a while problems seemed to be easily fixable.
At some point, packages with circular dependencies were introduced (I don’t think this is a feature intentionally supported by the developers of NPM, but they were there all of a sudden!). npm2nix couldn’t cope with these kinds of dependencies, and I got stuck with the same problem as well. Because Shea Levy (the npm2nix author) no longer had any time to work on npm2nix, I’ve decided to investigate.
After lots of experiments, I realized that there was a fundamental problem with npm2nix: just symlinking dependencies into the node_modules/ sub folder of each dependency no longer sufficed to cope with circular dependencies. Instead, we must memorize earlier deployed dependencies and copy them. When a dependency has been encountered previously, then it should be skipped. This basically required a fundamental rewrite of the entire tool.
In the beginning, my implementation was somewhat controversial. The first attempt relied on building derivations in the evaluation phase. Later I did a major revision, that computed the entire dependency before the evaluation stage, solving this problem.
Still, my implementation was rejected by some people in the community (mostly because it looked too complicated, which was IMO “necessary evil” to cope with NPM’s peculiar behaviour).
Some of my findings were also integrated into npm2nix, making it possible to still deploy some packages with circular dependencies. Because this solution worked “well enough” for most people, npm2nix was kept and I’ve decided to not push my implementation forward anymore. Nonetheless, I kept using it for my own projects and decided to it to call it node2nix. For quite some time (more than a year) npm2nix and node2nix co-existed.
Then roughly a year later, a new controversial feature was introduced in a new major release of NPM called dependency de-duplication. Basically, dependencies of a package are no longer deployed in the node_modules
sub folder of each dependency, but “moved up” in the dependency tree until a conflict has been encountered.
This major NPM change (again!) broke npm2nix – now all of a sudden npm2nix could no longer be used to deploy any package. As a result, the entire NPM package set in Nixpkgs was completely broken.
For a while, Nixpkgs appeared to be unfixable, until I gave node2nix a try. It seemed to work fine and I basically proposed to use it for Nixpkgs instead of npm2nix. This basically explains how node2nix was introduced into Nixpkgs, and why it was named node2nix, and not npm2nix. Basically, node2nix came about because there was not a better alternative
Although node2nix mostly works for a large category of packages, its design also had be overhauled several times. When package-lock.json files and offline installations were introduced, I had to do another major rewrite to bypass the local cache.
Sadly, NPM is still subject to evolution and certain changes in newer NPM versions have introduced new kinds of problems to node2nix. Furthermore, there are also new kinds of use cases for which node2nix was initially not “designed”.
From a functional perspective, we have the following NPM deployment use cases with Nix:
- Deploying end-user packages (e.g. from the NPM registry)
- Deploying local development projects
- Deploying remote development projects (this a typical Nix use case, NPM does not have an equivalent)
- Deploying NPM dependencies in a non-NPM project
node2nix was only designed for the first two use cases. The latter two can only be done by creatively using certain integrating of the generated code, which isn’t trivial at all.
Furthermore, there are other drawbacks as well:
- For deploying end-user packages, you always have to regenerate the Nix expression for the generated package set as a whole. The advantage is that common dependencies are reused in the generated Nix expression (so that the amount of code churn is smaller), but the drawback is that regeneration is very time consuming, and typically does not map well to the Nixpkgs development workflow (in which each commit refers to a single, or well identified group of packages)
- The node-env.nix has evolved into a very complicated beast. This is mostly caused by coping with lock files and bypassing the local cache – as a result, in its current state it is very hard to rewrite it in such a way that we can use it as a deployer for NPM dependencies in non NPM projects
- Another result of a very complicated node-env.nix, is that it runs out of memory when the nesting of dependencies is too deep.
- We also cannot easily implement support for deploying remote development projects.
- The NPM dependency resolution algorithm is wrong. In newer versions, NPM makes a distinction between the origins of the packages. For example, async 0.2.2 from the NPM registry is considered a “different” dependency than async 0.2.2 deployed as a source tarball from a HTTPS URL. In older implementations, they were considered the same. Supporting these differences, requires a substantial rewrite of the dependency resolution algorithm in node2nix.
- node2nix does not use any newer features of Nix. For example, newer versions of Nix also supports SRI hashes, but node2nix doesn’t use them.
- some newer NPM features are not yet supported, e.g. workspaces
- the build process isn’t tweakable and its overriding capabilities are somewhat limited. Again, the fact that node-env.nix is a mess is a major impediment.
As you may probably observe, to implement these new use cases and address some of the fundamental flaws of node-env.nix, it is not possible to just implement a “quick fix”. Instead, a fundamental revision/rewrite is required for the base layer: node-env.nix.
I’m already working on a new base layer for a while (that I only have as a private PoC), but sadly it’s progressing very very slowly. This can be attributed to the fact that I can only do the development in my spare time.
Basically, the idea behind the new base layer is that instead of a Nix expression (node-env.nix), you can run a tool that “tricks” NPM in such a way that all dependencies are present by providing the Nix store paths to them. It already works fine locally, but the remainder of the integrations are not done yet, and deploying from local sub directories is still unsupported.
A companion tool can take care of obtaining the dependencies (either via a lock file, or by another companion tool that performs a dependency resolution on an ordinary package.json file)
However, what I also realized is that with node2nix I have always been aiming for accuracy, and maybe this isn’t what everyone wants/needs. For Nixpkgs’ use cases, we may also be able to live with a less accurate, and more hackable approach.
From what I know from talking to people in the Nixpkgs community, I know that an incremental package deployment approach would be desirable.
I have also been thinking about a completely different integration approach:
- Instead of making/maintaining our own implementation of NPM’s dependency resolution algorithm (which node2nix uses to deploy end-user packages), we can generate a package.json file with the package we intend to deploy as its only dependency. By running ‘npm install’ in an isolated environment, we can generate a package-lock.json file that contains the resolved dependency tree. Then the resulting package-lock.json file can be consumed by a Nix expression to deploy the dependencies. (For Git dependencies, we must still somehow compute the output hash, because these aren’t known by NPM).
- We don’t run ‘npm install’ in the derivation anymore. If everything works out properly, the dependencies should already be there. However, the build steps (if any) must still be performed by other means
- To support incremental deployments: we generate a data file JSON format that can be easily read (e.g. with builtins.fromJSON) and updated, rather than generating a Nix expressions. The disadvantage of using Nix expressions for incremental updates is that they are difficult to read, and need to be evaluated.
- Alternatively, if we can live with the large amount of code churn, we can also save the package-lock.json files for each package that we want to deploy in the Nixpkgs repository). This prevents us from having to implement some kind of de-duplication method. Furthermore, this also maps better to our Nixpkgs development workflow in which each commit refers to a package.
- We must make sure that the derivation can be easily “tweaked” with hooks and/or overrides, to correct potential deployment discrepancies
The above solution obviously has the disadvantage that it doesn’t deploy as quickly as node2nix (but this can alleviated with incremental updates), does not support older NPM versions, and does not support build management facilities out of the box (but this can still be fixed with overrides).
Maybe a different solution that is fully lock driven, incremental, and tweakable would be a better for Nixpkgs.
Anyway, regardless of the approach that we think is best, some major changes are required to make NPM deployments more future proof.
What do you think?
P.S. If you want even more context, I have written a number of blog posts about this subject. They have a ‘node2nix’ tag, and can be found here: Sander van der Burg's blog: node2nix. These will give you even more details on what I have done in the past.