A Map of Nixpkgs

Hi all,

we have written a blog post about the dependency graph of Nixpkgs: https://www.tweag.io/posts/2019-02-06-mapping-open-source.html . It looks quite interesting and has statistical properties similar to other software or code repositories such as Debian or Maven.

Would love to hear some opinions on this …

9 Likes

Very beautiful. I really want to hang up a poster of it in my room :slight_smile:

1 Like

I played with gephi and nixpkgs graph some time ago.

What I failed to do is to distinct build and runtime dependencies and then make suspicious dependencies look “unnatural” on the graph.
For example, runtime dependency on gcc, which is a build dependency for the most; or a package with dependencies on almost unrelated clusters.
I can’t prepare the list in advance, the visual irregularities of the graph visualisation should help to make the list.

Did you try something like this?

This is really cool! I particularly like that you shared the graphing tool and source code you used to generate these; I’m going to give some of these scripts a run on my own projects and see what clusters I can find.

I didn’t try this but it is a nice use case. I thought about using the adjacency matrix. The adjacency matrix is basically just the list of nodes with their dependencies to run outlier detection algorithms. Running Umap on the adjacency matrix for clustering also gives some nice clusters. I didn’t investigate further so far. Maybe we can run some experiments on this dataset in this thread …

if you run it on the closure of a specific package, it is sufficient to use the nix-store --graph command to generate the dot file. It wasn’t sufficient to build the graph of the entire nixpkgs repository which needed an adjacency matrix of 5-7Gb if I remember correctly.

The internal graph tool also doesn’t provide very meaningful node properties. Something like derivation size, excentricity in nixpkgs, attribute name would be nice. I think links are already colored by the internal tool, maybe as build / run time dependencies.