Yarn Plug'n'Play and direnv/packaging

wmertens · June 15, 2022, 1:16pm

UPDATE: dream2nix creates node_modules from the lock files directly, if the lock file has all the necessary info. With e.g. an npm v2 lock file, you just put dream2nix in flake.nix and you’ll have a devshell and build with the correct modules, no extra step.

So I recently found out that yarn has a PnP mode which overrides Node.js’s standard resolution mechanism (to node_modules directories) and instead references a read-only compressed store of packages.

This is nice for disk space of course, but also for install speed and reproducibility, there’s a lot of hashing and caching going on. The modules can now be built without requiring their dependencies be part of the build.

In other words, the Node.js module system goes from “slightly deranged” to “very similar to Nix”.

I looked at yarn2nix but it doesn’t seem to handle PnP repos yet.

Did anybody look at Yarn PnP yet and have ideas on how to integrate this with Nix? It seems to me that most Node.js applications are reasonably trivial to convert to Yarn PnP, and with PnP being such a close match with Nix that would result in stronger better faster Nix builds?

Ideally, this could be integrated with direnv so that just having the yarn.lock change results in the pnp being built automatically. That way, while developing, the modules would always match the commit you’re on.

Growpotkin · June 16, 2022, 5:57am

Yes I have been down this rabbit hole for about three months and am about to release a new flake based package manager for Node.js.

I wrestled with Yarn extensively and did have “some success” with PnP; but before you get too excited you need to know - Yarn’s caches and hashing systems are non deterministic and are fundamentally incompatible with Nix’s approach to building.

Frankly Yarn “berry” is poorly implemented, and I found that it produces massive numbers of redundant tarballs because the core hashing system is based on randomly generated seeds. The inline comments explaining the rationale behind randomly generated seeds was based on an incorrect understanding of how filesystem synchronization works ( cough websh*ts cough ). I would strongly suggest that you avoid building large projects with Yarn, and bear in mind that you should not try to include any part of yarn’s global cache in a build output - this will poison your closure. There are notable aspects of their telemetry “feature” which consume an inordinate amount of disk space and time after v2, parts of which appear to be intentionally obfuscated to mask the extent of Facebook’s data collection even with telemetry disabled you still suffer the runtime costs of collection, you only prevent reporting ( that is until you invoke yarn with telemetry enabled at which point they send the entire backlog… ). It’s fucking malware in my opinion - and I only say this after digging around its internals for several weeks.

Yarn is more workable with CA hashing; but honestly you’ll be pulling your hair out dealing with the dozens of other poor design decisions and bugs that throw a wrench in reproducible builds.

NPM - much better. It’s got performance issues because of filesystem IO but aside from that I can’t gripe about it too much. Nix’s caching system and built-in fetchers give it a notable performance boost. If you manage a larger project you can cut hours long cache initializations down to a few minutes, with a flexible UX for local dev AND CI which is missing from existing tools.

I’m excited to show off what I’ve been working on soon. But until then I’ll just advise you to “just use NPM with locks and registry packages” if you can, because most existing tools were designed with that in mind. Suffice to say that NPM package-lock.json and flake.lock and “flakeref” URIs are “almost directly compatible” which I’ve been leveraging with a lot of success. You can almost directly feed a package-lock.json into fetchTree which has an enormous performance impact compared to nixpkgs.fetch*.

wmertens · June 17, 2022, 7:48pm

Ok I’m super excited to see what you’ll unveil

Thanks for the thorough problem analysis - I was actually thinking to make a plug-in for yarn since pnp is a plug-in, and have it interface with nix that way, eventually resulting in a closure that includes wrapped node, the pnp require handler and all package binaries.

But the problems you point out look really painful :-/

wmertens · June 21, 2022, 6:14am

I do think that the biggest problem of Node.js on Nix is that you can’t symlink modules, and PnP sidesteps this by using its own implementation via a module that loads first.

I’m confident that we can make a similar loader for Nix, implementing hierarchical or flat trees.

Then there is only the problem of going from package-lock.json to nix, something that takes forever with node2nix and often fails.

Growpotkin · June 23, 2022, 6:21am

Just as an update.

I previously was able to convert arbitrary NPM URIs to Nix fetchers, I could also create module trees with linked bins as either node_modules/ or “global” style installations given structured inputs. I had a large collection of utilities to convert NPM or Yarn locks, and even package.json with non-conflicting descriptors into my “structured inputs” and I even supported workspaces. This worked if you were willing to compose these tools together, but it wasn’t a “works out of the box” solution. You had to manually run any build steps and install phases which was the largest gap to fill.

Today I had a big breakthrough using NPM v2 style locks with “complex” workspaces. The format of the lockfile made it simple to structure trees automatically, and I can Nixify all fetchers, and dump trees that are equivalent to --ignore-scripts invocations. My fetchers are abstracted to use either flake inputs, built-in fetchers, or Nixpkgs fetchers depending on which the user prefers; and I have a mechanism to force specific packages or pattern matched URIs to use a particular method ( this is necessary for a small number of tarballs which contain directory entries, since these will fail if builtin fetching is used ). The caching “works as expected” and will short circuit if it determines that fetched inputs match a stored build - this is NOT something most other utilities handled properly.

I still need to run the life cycle scripts, but I understand exactly which get run for various resolutions; I imagine this will be done in the next week or two. So far, I never invoke NPM or Yarn. I have pacote as a stop gap for those obnoxious tarballs temporarily, and I have a utility that uses it to fetch and transform packuments/manifests - but this could be running as a full replacement in the future.

Notable highlights:
Individual modules are cached and composed separately - this is not the approach taken by most tools and it made an enormous impact on performance.
Intermediate phases are all cached individually as well, so changing node versions or various inputs will not trigger a full rebuild of the closure.
Memory consumption is less than half of either Yarn or NPM in equivalent phases. I haven’t done proper benchmarks yet but never exceeded 800 Mbs on --ignore-scripts installs that consume over 20Gbs with Yarn and 16Gbs with NPM. CPU usage never exceeded 10%. With an empty cache and built-in fetchers I clocked 12 minutes where NPM took 20 and Yarn took 50 ( honestly fuck Yarn, this time is largely spent collecting metrics and unzipping/rezipping tarballs… ). I shuffled and renamed the output directories several times to ensure nothing got rebuilt - and I produced new node_modules/ trees in under 3 seconds.

So still work to be done, but I’d say “things are looking good”

Honestly shouts out to Eelco and the Tweag folks for pushing the new UI stuff. I’ll admit relearning things drove me crazy at first, and the docs leave a lot to be desired right now; but the various new types of caches and fetchers can absolutely fly if you leverage them well ( and can dedicate the time to reading the sources a lot ). This UI still has kinks to work out and a long road ahead; but “I get it” now. I think in a while the various lang2nix tools could be like - ~~less shit~~ potentially competitive and accessible to folks who don’t have much experience with Nix.

wmertens · June 23, 2022, 11:29am

That sounds wonderful! So you copy everything into a from-scratch node_modules? Would the install not be faster if we make a loader that understands nix and skip the copy, like pnp?

At some point we might even lift the resolution logic from npm and generate the lock files without downloading the dependencies?

Growpotkin · June 23, 2022, 6:25pm

The overhead of PnP, the patches it requires, and the need to run Yarn makes this a lot slower, and carries the same cache non-determinism issues I mentioned before.

wmertens · June 23, 2022, 6:51pm

Ah - I meant using the same approach of intercepting the requires/resolves to point node directly at the files in the store instead of having to compose node_modules.

That would make it trivial to skip around in commits and always having the correct dependencies.

Growpotkin · June 23, 2022, 7:25pm

Have you tried this tool out yet?

Give it a shot, I think it does almost exactly what you’re describing. It’s good to see it implemented and how it plays out for your project’s needs. For your use case it might be fine; but we had issues using workspaces and composing builds together - and unfortunately it did not appear to “share cached tarballs between projects” as described ( again, the underlying non-determinism in Yarn’s hash keys is the problem, there was nothing the authors of this tool could realistically do about that ).

My initial approach with my tools was hacking parts of this, node2nix, npmlock2nix, and pacote together. I’ve accrued a large graveyard of approaches that extended these tools, and they were all essential references. And honestly, if you only care about building packages that are published in a registry - these tools totally cover you. My use case is for managing a large collection of unpublished modules, so we really needed “full coverage” for acting as a drop in replacement for Yarn or NPM to avoid hitting gaps/edge cases not handled in existing tools.

juaningan · June 27, 2022, 9:56am

How dream2nix framework fill the gap here?

wmertens · June 28, 2022, 12:26pm

dream2nix looks interesting for my use case, I’ll give it a shot. After that the more complex yarn-plugin-nixify might indeed do the trick.

tgunnoe · June 28, 2022, 2:01pm

Second on dream2nix. I’ve used it extensively to package extremely complex yarn-based monorepos with workspaces and a variety of other off use cases, with close to 100% success rate with nodejs-based packages. The nodejs ecosystem is pretty much solved for dream2nix. It has support for both pure (package-lock.json, yarn.lock) and impure translations and provides many paths to solve various problems, such as package overrides, dependency injections, building without devDependencies (so build from the outside with nix instead), etc.

Growpotkin · June 28, 2022, 3:31pm

I don’t see any indication on the project’s github that this is related to Node.js.it looks like another Flake, Niv, etc project management CLI tool.

I’ll dig into their docs, but if they have really solved Node.js in Nix they might want to put that front and center in the readme.

I dug deep to find every Node.js+Nix framework I could a few months ago. It would be a shame if I dumped all this effort into a new tool because an existing solution had a vague readme

Edit: yup. I found the nodejs part of the repo. It’s less granular than mine and has way more abstraction but we have nearly identical routines for all the parts that matter… Feels bad. Spent literally months working through all of the nasty edge cases, writing tree walkers and closure resolvers. Pain.

wmertens · June 28, 2022, 3:49pm

Argh this is really a problem with the nix ecosystem. I hadn’t heard of dream2nix before this thread. The project’s packages are even in the default flake registry :-/

Profpatsch · June 28, 2022, 3:55pm

Just to let you know, derogatory comments about whole classes of developers are not welcome here.

Growpotkin · June 28, 2022, 6:56pm

I feel a bit better. Dream2nix looks fantastic, it handles workspaces well. The documentation leaves a lot to be desired, no Node.js docs now, and the existing examples contained deprecated routines, but it’s a WIP so I can’t fault them for that.

I think it’s biggest strength is how it organizes projects and the interfaces it’s created for builders, fetchers, input parsers, and input “discoverers”.

I think my best path forward is to plug my builders and my registry fetcher into their API as a dream2nix extension. This would save me the headache of defining my own abstractions, which is a time consuming process.

Building a known project isn’t actually that hard, but providing abstractions to allow overrides, dealing with exceptional packages that need special treatment, and giving users the ability to configure their build is basically covered by their API which is pretty dope.

With my builders that completely replace NPM, this thing is gonna fly

tgunnoe · June 28, 2022, 8:31pm

Yes! I’m glad you’ve taken a deep look at it. The flaws are pretty much what you pointed out, but it’s come a long way in a short time. Development chat is on matrix at #dream2nix:nixos.org

wmertens · July 9, 2022, 2:35pm

Do you have anything for dream2nix yet? I’m hitting the problem that it doesn’t resolve peerDependencies: Track nodejs progress · Issue #22 · nix-community/dream2nix · GitHub

Growpotkin · July 9, 2022, 9:25pm

Yeah I have it working on a real project, I just need to yank my routines and move them to my library.

Do you care about peerDependenciesMeta for optionals? I haven’t handled those yet because they are allowed to have dependency cycles which is a mess to deal with.

wmertens · July 14, 2022, 5:37pm

@Growpotkin I have a somewhat dirty PR up at nodejs: Use more pure symlink builds by wmertens · Pull Request #195 · nix-community/dream2nix · GitHub - it handles cyclic dependencies and simply includes all optionals for now.