So, pnpm (nodejs package manager) is already a long way towards the nix philosophy. It creates per-package/source hashed entries in a store, and symlinks everything together. It handles circular dependencies by creating package groups for the specific versions requested.
I am wondering what some sort of nix compatibility would look like, to contribute to the project.
the sources and packages would go into the store
pnpm would use nix via exec so that there’s no binary dependency?
would it be querying via nix too, or just checking if the correct dirs exist?
node_modules would probably still be written as a dir with symlinks in it, node is pretty fickle about symlinks
is there more to it? Seems like a pretty small ask, no?
I am assuming you’ve read the manual? What extra do you want to see? I guess I’m asking from a place of ignorance, having occasionally packaged things that were build with pnpm, the existing support seems sufficient?
I guess one could replace the pnpm store with the Nix store, storing all packages within the Nix store and linking to it when creating node_modules.
I have actually implemented something like this in importPnpmLock.nix (GitHub Mirror), where I parse a pnpm-lock.yaml in Nix, store all dependencies in the Nix store and serve them as a fake NPM Registry using mitm-proxy for pnpm to download. This allows you to prepare a pnpm workspace in a reproducible manner.
Having this straight in pnpm would be great, but from my experience maintaining our pnpm toolchain in nixpkgs (fetchPnpmDeps), reproducibility doesn’t seem to be the main focus of pnpm. While it shares some of Nix store’s design, it does store some impure information like a lastCheckedAt as well as (platform-dependent) side-effects from build scripts in its store.
I think it would be great to have a package manager for Node that moves all install scripts into a Nix sandbox. This would tackle the most common supply chain attack by isolating scripts from the rest of the system. Issues immediately arise with scripts that try to download stuff from the internet, which a disturbing amount of NPM packages do
Ah - that’s new to me, didn’t know that nixpkgs had built-in support nowadays, thanks!
But, I see that it’s just a dumb copy of what pnpm builds.
I’m wishing for a store entry per package, with the hash being the content hash when there are no build scripts and otherwise the hash is the input hash of all dependencies, build platform etc.
(caveat, when packages form a cycle they must be put together in a single store path because of how nodejs resolves modules)
Basically, the current node_modules/.pnpm would be /nix/store, and the package’s node_modules is then just a bunch of symlinks.
I should add that I did some work on this in dream2nix but it never got merged.
That’s why I’m thinking that maybe we can come up with some sort of “API” that we’d provide and propose “hooks” that pnpm would add, which would allow better separation of concerns.
An example of the net effect: You have direnv enabled, you enter a project folder, and the node_modules is instantly rebuilt with the correct packages.
Then, you switch your branch, and again, node_modules is instantly rebuilt.
Then, you go to another project, and it partially uses packages that the other project was using.
I have been thinking about this subject quite a lot and also studied it quite a bit. I was pretty disappointed when i realized that you cannot really share the same version of a package for multiple apps, without hard linking. And nix doesn’t support hard linking at build time.
PNPM relies heavily on hard linking.
The most nix-compatible approach for node.js dependencies probably is Yarn’s PnP system.
Instead of symlinking, this allows the app to have a single file that points to every dependency’s location, so that they can be organized and stored however the package manager wants.
The support for zhis feature is not very good though.
Having a nix-native package manager that downloads packages to the nix store and then generates this file would still be cool, though.
There is a cli flag for node.js that tells it to not resolve symlinks and instead use the virtual path as base for resolving dependencies. This has a lot of different problems, the most notable being that you then can’t run the code without that flag.
dream2nix might use this strategy. I’m not sure though.
anyway, the original idea of the OP was that pnpm would do all the file wrangling via the nix store with some sort of plugin system.
Although now that I’m thinking about it, there’s no way to make that secure without the nix store enforcing content hashes and each user having a trusted mapping from module to content hash
I think the hashing wouldn’t be a big problem. All JS package managers verify the hash and store it in their lock files. Instead of downloading & extracting the package archives by itself, a package manager could generate a nix derivation and let nix do this part. then it could continue with the nix store path.
The bigger problem is wiring the packages up. In Node.js, this normally requires a specific folder hierarchy, so that packages can find each other. But if you build this hierarchy with symlinks instead of hardlinks or copying the files, you’ll need the `–preserve-symlinks` flag.
pnpm already does everything with symlinks as much as possible. Basically, ./node_modules/.pnpm would be /nix/store.
A problem is that you can’t have cycles in nix store paths, so a cycle of packages needs to be stored as a single store path.
The hash verification is a problem, because basically anyone can write anything into the nix store. So if you just look if a path exists, you are running arbitrary code.
pnpm already does everything with symlinks as much as possible. Basically, ./node_modules/.pnpm would be /nix/store.
A problem is that you can’t have cycles in nix store paths, so a cycle of packages needs to be stored as a single store path.
PNPM uses both symlinks and hardlinks. Without the hardlinks, it would also need either copying files, or the `–preserve-symlinks` flag.
It first hardlinks all packages completely flat into the project directory. Then, for every package, it adds symlinks to their dependencies. this really only works because of hard-linking.
You can tell PNPM not to use hard-links, but then it will copy packages instead of hard-linking. the symlink-part stays the same in this case.
Ah, didn’t know that. I thought that only nix itself can write stuff there, and that has to be either a build result or come from a trusted cache.
But I mean, the PNPM store is also not protected in any way, so at least it’s not worse without nix.
The pnpm hardlinks are there just to get a low-cost local copy that can also be written to I think. (because many packages do this)
You can absolutely use symlinks, combined with real directories. Dream2nix did exactly that but later switched to copying (and I gave up on it).
Look at how .pnpm is organized, each package gets its own dir with node_modules in it, and the package itself is under that, with its dependencies symlinked into it. Then the app gets symlinks to each of those .pnpm/pkg/node_modules/pkg directories.
nodejs will dereference symlinks, so when looking for deps it will eventually end up in app/node_modules/.pnpm/pkg/node_modules/pkg and then look for deps in app/node_modules/.pnpm/pkg/node_modules.
(now that I think about it, I wonder why it doesn’t check app/node_modules/)
When a file imports a package, node.js starts at its real location, and then looks if there’s a node_modules folder next to it. If not, it goes up a folder until it finds a node_modules folder.
If we copy a package into the nix store without the node_modules folder, it cannot find any dependencies from there. We can copy the node_modules folder into the package itself, but then we lose deduplication, because we’d have to re-download the package, if it gets different dependeny versions, or one of the dependencies gets other dependency versions in another app. This is almost always the case I assume.
But sure, if deduplication is not the goal, then it’s no problem at all. The question is what the benefit would be.
Right, that’s why there’s two node_modules under the .pnpm directories.
Pnpm already solved this. We just need a way to reproducibly generate missing store paths, and then generate the app node_modules directory as a bunch of symlinks to the store/pkg-hash/node_modules/pkg first
This latter is a very fast operation provided we have a fast mapping from lockfile to pkg hashes. So then with direnv we’d have instant modules updates as you switch git commits
So the goal is not to be storage efficient (because we can’t deduplicate packages, when we store them in the nix store). Is the goal to be fast? That won’t work either, at least not for dependency updates. If we change one single package in the tree, it results in needing to re-download almost all packages in the tree again, because we cannot hardlink.
But of course, after re-downloading is done and all duplicated packages are in the nix store, switching between commits is fast. or at least as fast as pnpm. but for updating dependencies, PNPM is much, much faster, because it doesn’t have to copy anything.
I’m not sure we’re talking about the same thing. If you look into the .pnpm directory, you see per-package directories that are then sub-symlinked into node modules.
Those directories are what I’d like to see in the nix store, and they’d be shared by everything.