Initially I just wanted to quickly write down some of my thoughts, but it turned into a pseudo-blogpost, I’ll probably clean this up and publish it later as a series, just don’t have time to expand on everything properly atm.
After thinking more about it, I think a big disconnect we have is in the way people see their dependencies.
Languages like C and C++ projects lack a unified package manager for their language, this has traditionally always been handled by Distributions, given that they themselves consist largely of applications written in C, and C++ piggy-backed along, being used mainly for the UI parts and larger applications.
That means that for people using C/C++ it’s natural to expect their distribution to have all libraries they might want to use, but I don’t think this extends far beyond that community.
I’m generally biased towards the approach of having a dependency list for each application in nixpkgs
, and I don’t think having it as a general library package repository (other than maybe for C/C++ lacking alternatives) is very scalable in terms of human effort and repo size.
While it’s super convenient to be able to run a nix-shell with some version of the library you’d like to use without having to create any nix file for it, I don’t think this benefit is what the majority of users value about nix
, I might be wrong about this of course, but since none of the languages I use have libraries in nixpkgs
I can’t say I’ve missed it much.
One solution to that is having fully automated repositories outside of nixpkgs
, maintained by their respective communities. Then instead you’d write e.g. nix-shell '<ruby-2.5>' -p nokogiri-1.8.2
(which in turn depends on <nixpkgs>.libxml2
etc) and the only difference is that you’d have to subscribe to a specific ruby channel or pass the full URL to that channel. I don’t think that’s an undue burden for such a feature and might actually boost our ecosystem by having development more distributed and promoted outside of nixpkgs.
Since I started using Nix, I’ve written many half-assed nix wrappers for a bunch of languages, and in all cases I favor using the dependency resolution of the package manager used by the application authors, simply because sometimes it’s very domain specific and hard to get right, plus nix doesn’t provide any dependency discovery and resolution mechanism other than the nix-prefetch-*
tools, which isn’t exactly performant or ergonomic.
A third way
I think we’re missing a third option here. A tool that generates the needed derivation on the fly and loads it into a nix-shell or creates a default.nix
with it on demand based on the existing language-specific lockfile.
Initially this would be slow, given that there are no caches right now, but I don’t think it’d be prohibitive given enough feedback to the user about progress.
What would be needed for that is an intermediary layer for each 3rd party package manager that does the dependency resolution and provides a solution to us.
Here’s where things get tricky, because in many languages you don’t have access to this solution easily, you either let the package manger do its thing with full network and disk access, or it dies.
Now there’s a few things that could possibly make life easier:
The Lockfile
What I would propose is a common lockfile format for all languages, which ideally would be written in .nix
format so we can use it on ofborg. From what I’ve seen they all address the same issues in slightly different ways, but all want the same result, a reproducible build with fixed version dependencies.
In some languages the files look something like this:
{
foo = {
type = "github";
repo = "example/foo";
commit = "fff...";
sha256 = "fff...";
dependencies = ["bar"];
};
bar = {
type = "rubygems";
version = "1.0";
sha256 = "fff...";
};
}
In Haskell they look like this:
"Dust-crypto" = callPackage
({ ... }:
mkDerivation {
pname = "Dust-crypto";
version = "0.1";
sha256 = "112prydwsjd32aiv3kg8wsxwaj95p6x7jhxcf118fxgrrg202z9w";
libraryHaskellDepends = [
base binary bytestring cereal containers crypto-api cryptohash
directory entropy ghc-prim network random random-extras random-fu
random-source skein split threefish
];
librarySystemDepends = [ openssl ];
testHaskellDepends = [
base bytestring cereal Dust ghc-prim HUnit QuickCheck
test-framework test-framework-hunit test-framework-quickcheck2
threefish
];
description = "Cryptographic operations";
license = "GPL";
hydraPlatforms = stdenv.lib.platforms.none;
}) {inherit (pkgs) openssl;};
As you might notice, that’s not a lockfile, that’s already a full derivation and not just the source, but also all its dependencies, both for the haskell and for all system dependencies must be in the same scope.
I would argue that this, while quite impressive, isn’t something people want to look at, they want a simple list of dependencies, not the dependencies of their dependencies and so on.
What’s the “issue” here is that each derivation depends on other library derivations directly, so they need to be actually derivations, and that requires this kind of complexity in the “lockfile”. While your actual buildInputs
would then simply use this as Dust-crypto
and you’re done with it, which is quite nice.
We approached that in Ruby using a global default configuration called defaultGemConfiguration
which lives in parallel to the gemset.nix
and captures most of the things people expect by default to happen so a gem is usable (but can still be modified when passing it to bundlerEnv
).
Since we talked about nokogiri already, here’s the entry for it:
nokogiri = attrs: {
buildFlags = [
"--use-system-libraries"
"--with-zlib-dir=${zlib.dev}"
"--with-xml2-lib=${libxml2.out}/lib"
"--with-xml2-include=${libxml2.dev}/include/libxml2"
"--with-xslt-lib=${libxslt.out}/lib"
"--with-xslt-include=${libxslt.dev}/include"
"--with-exslt-lib=${libxslt.out}/lib"
"--with-exslt-include=${libxslt.dev}/include"
] ++ lib.optional stdenv.isDarwin "--with-iconv-dir=${libiconv}";
};
Now, the thing is, Nokogiri probably won’t switch to a different way of configuring anytime soon, I know it hasn’t changed in the past decade or so, and I expect that to stay the same unless they rewrite it from scratch. And most popular libraries are like that, they have one configuration that works, and little is needed outside of that.
For the other library dependencies of the gem, we simply lookup the string keys in their dependencies list, which is a bit slower, but means that dependency specification and their derivation can live apart from each other, and the derivation can be generated dynamically.
I’m not sure there’s a good super-general solution for all languages for the lockfile, but some convergence would be nice to be able to build better tools that are user-friendly and flexible.
I’m also not aware of any other language that has something like the LTS system in Haskell. Many languages rely more and more on decentralized dependency management, where you fetch them directly from their source, so a comprehensive list of them ranges from hard to impossible. And it still doesn’t cover things that are only privately available.
Distributed language libraries
So, the most pragmatic solution comes down to generating a lock for each application, and having it distributed alongside the derivation. I’ll go through some of the languages.
Javascript
Number of libraries: ~700.000
That brings us to good old Javascript, where applications tend to have thousands of dependencies, and that hinders their adoption into nixpkgs. So in theory, if we’d have a single source of npm
packages, we could simply add the application itself, say what it depends on, and be done.
However, the large number of possible libraries plus their frequent updates means we’ll have a lot of churn if they get added to nixpkgs. I’m also not aware of an efficient way to get a list of all the packages of npm, and don’t think they provide a DB dump like rubygems does.
We’ve got 3 major projects for dealing with JS applications, node2nix, yarn2nix, and yarn2nix (yeah, god knows how that happened). They each use different approaches, are compatible with different codebases, and are configured differently.
I think there’s a lot of room to improve here, but it doesn’t help that JS has multiple package managers (like bower, jspm, component, duo, etc…) where in some cases it’s very hard to emulate them in Nix.
There is also no shared configuration for packages, especially ones that require native dependencies or come with precompiled binaries that have to be patchelf’d for use on NixOS.
Ruby
Number of libraries: ~143,000
I’ve experimented with creating derivations for every gem from their weekly DB dumps, but didn’t have enough time or reason to continue with it, the code should still be around somewhere, and in theory it might be useful to someone.
But the average application requires a handful of those, and even the biggest rails applications I know of use maybe 2-400. That makes the overhead of fetching all package definitions quite large, and adds a lot of dead weight to nixpkgs that still has to be maintained.
We still have issues where people depend on the bundler
gem in nixpkgs
directly, without taking into account the version of bundler
the application specifies, and that causes a lot of headaches. In hindsight I think it was a mistake exposing it directly like that.
Overall I think the bundix
+ bundlerEnv
approach here has been a success so far, consolidation of effort definitely paid off and made the life of everyone easier.
Crystal
Number of libraries: ~3240
Not a ton of libraries here, but also not many popular applications written in it. I’ve also written a wrapper for this in order to package the Mint language written in Crystal.
I think we could offer big benefits to the Crystal community by having a tool for them that makes static compilation trivial, because right now it’s tough without Nix.
Elm
Number of libraries: ~1072
While I haven’t made a separate project for this yet, I’ve packaged a few Elm applications using their lockfile and some simple prefetching just like bundix
. From there it’s simply building a directory tree that matches what their packaging system does.
Addition to nixpkgs would be entirely reasonable were it not for the low demand.