Monorepos don't map to our social structure

Here are the first two sentences of the Pijul manual (emphasis added):

Pijul is the first distributed version control system to be based on a sound mathematical theory of changes. It is inspired by Darcs, but aims at solving the soundness and performance issues of Darcs.

I used Darcs for a long, long time. It was great, except that it was written by a physicist who did not spend enough time thinking about scaling issues.

I don’t think that is correct; there is no “change order”. A Pijul revision specifies an unordered set of (non-conflicting) patches. Using the unordered set of patches you can always reconstruct an exact snapshot of the pristine. Is there a more detailed example somewhere of this problem?

Regarding the comment in fetchpijul, we have the same problem with git: the contents of the .git directory after a git clone are not deterministic, because git doesn’t promise this. I don’t think pijul makes this promise for the database in .pijul/pristine/ either.

3 Likes

One of the things I didn’t like about Pijul when I tried it a year or two ago was that identity is tied to the host you’re using for your repositories. As far as I know, the only option currently is the Nest, and self-hosting Pijul repositories is not (yet?) supported.

While I don’t think splitting up nixpkgs is a good idea, I can agree with moving off GitHub. My contributing to nixpkgs is one of the only reasons I bother to keep a GitHub account instead of self-hosting my repos. Self-hosting a GitLab instance for nixpkgs might be one option, but I expect cost would be a significant factor.

3 Likes

This changed about a year ago (probably after you tried it), although the change had been in the works for years… and the new system is awesome. Your Pijul identity is now an ed25519 public key. This means that – unlike git – name changes are retroactive, so deadnames don’t appear in the version history. The mapping from public keys to usernames/firstlastnames is not (by default) part of the revision hash. Projects can choose to keep this mapping in-tree if they like.

Oh no that’s definitely not true; you’ve always been able to host a repository on any HTTP(S) server.

However :

Yes, that is currently the situation for forge software with a web browser GUI.

This is a concern that I share. It’s also bit of a touchy situation.

Pijul’s author has done an immense amount of awesome work, for which he has earned almost no money aside from NLNet grants. His current plan appears to be to run a “PijulHub” site (i.e. “the Nest”) as a business, sort of like github. I wish him the best and hope this works out. However it creates a slightly awkward situation where people feel guilty writing competing “Forge” software for Pijul and open sourcing it, since that would sort of sabotage the author’s plans. So, understandably, nobody does this.

I don’t really know what to do about this situation. I do acknowledge that it is a negative for Pijul.

12 Likes

That’s good to know. At the time, everything seemed tied to the Nest. I just wanted to self-host and sign my own commits There were other things I didn’t like about Pijul, but that was a big one for me.

Sorry, you’re right. I was thinking of forge software and not about plain repos (even though the latter is what I had wanted to self-host). I assume for nixpkgs that migrating to a repo without forge software would not be an option.

5 Likes

Perhaps this is getting too far into the weeds, but I’ve previously wondered how/whether pijul handles bisecting for a breaking change, which is (at least currently) something we need to be able to do.

A brief search again today suggests this still doesn’t exist–but it sounds like you have enough experience with it to have a sense of ground truth there?

3 Likes

Heh, thank you for putting that so gently. :smile:

Yes, due to its sheer scale nixpkgs requires not just “a forge” but one that can deal with thousands of users hammering on it. And basically only github, gitlab, sourcehut and gitea/codeberg/forgejo are in that league today.

Let me be very clear: pijul is not going to solve any of nixpkgs’ problems tomorrow, this month, or realistically even this year. If github implodes tomorrow and we suddenly need a replacement, do not waste your time with pijul. But if we’re looking farther out – a year, two years from now – on that timescale and certainly beyond I think it is the most promising route.

An excellent question. Short answer: yes, you can, but it’s not automated yet.

Long answer: The state of a pijul repository (which is analogous to a git revision) is a set of patches. To use git bisect one of these revisions must be an ancestor of the other, so one of the corresponding pijul states would be a subset of the other. This means you can calculate the difference between the two sets, and do a bisection search on the patches in that difference: start by randomly cherry-picking half of them into the good/smaller/older state and test.

That’s the good news. The bad news is:

  • Currently none of this is automated. People do run scripts on the output of pijul log --state --output-format json, just like the early prototypes of git-bisect, but yeah this functionality definitely should be in the tool itself.
  • The patches in that “difference” set will likely have internal dependencies, so if you randomly cherry-pick in half, they’ll pull in their dependencies, and you’ll end up with a lopsided bisect. Doing this optimally requires a topological sort and a lot of book-keeping.
  • Right now in order to “cherry pick in half of them” you have to cherry-pick them one at a time, which is not particularly fast.

I’m currently importing the nixpkgs history (all the way back to 2003) using the slow single-threaded pijul git importer. It’s up to mid-2018 at the moment. The import (single threaded!) will take at least a week to finish; I’ll post the result when it’s done, and hopefully an example of what a bisect would look like. It’s really interesting how some things (like adding commits) are a bit slower but other things are faster – for example, git blame pkgs/top-level/all-packages.nix takes 59 seconds, but pijul credit at the same point in history on the same hardware takes less than 2 seconds. Pijul’s internal representation is sort of like running git blame run on every file in every commit and then storing the output in a highly compressed format. So it takes more work to add new commits but you can query certain things much more efficiently.

14 Likes

Would you convert it? There isn’t any reason to keep the old history in Pijul.

It appeared to me that this was just an experiment/demonstration, although maybe I misunderstood?

Why not? I’ve had to dig into history while working on Darwin stuff (to understand why things are the way they are currently). Not having that history would have prevented that, and having to dig it up in another repository would have sucked (to put it politely).

7 Likes

It would probably make more sense to make a combined git pijul blame tool rather than “convert” the history.

Who’s going to write that, and will they be done before the importer finishes?

4 Likes

I want to add that I’m not trying to be dismissive or flippant, but waiting on the import is a one-time, known cost compared to the unknown cost of developing a tool that could bridge the history gap. Once the import is done, we’ll not only be able to see how Pijul scales to a repository the size of nixpkgs, but we’ll also be able to see how its tools for working with history in nixpkgs compare to Git’s. And without history, how would it even be possible to do an example of a bisect?

Here’s another example (aside from looking back at history): working from a Pijul repo and syncing changes back to Git. I’m working on an update to cctools and ld64 on Darwin. Because of the timing, I’m going to have to maintain that branch for the next few months. I’ve rebased it twice on staging, and both times I’ve had to fix merge conflicts. If Pijul can eliminate those conflicts, that would be a good demonstration of what it can bring to the table.

13 Likes

Yes I think it has to be a monorepo in the sense of “single root package”, because the goal of Nixpkgs is to be the last step in integrating everything together. But we can pull things out and make Nixpkgs depend on them, like the lib as @infinil says.

7 Likes

There is at least a thing I do not like one problem with poly repo, a thing I call “second class citizen repo”.

It usually happens when

  1. there is a set of first class citizen repos that receive all official support but are limited in scope: core, extra, multilib etc.
  2. there is another set of repos that receive few no nil official support but everything can be found on them - including softwares like “managers of second citizen repos (backdoor included, remove it with --with-backdoor=no configure option)”

The official maintainers provide support only for their “base system”. This way they are free to upgrade their repos without worrying about breaking the second class repos. Second class is unsupported, after all.

I believe this model can be harmful to the community as a whole, since it invariably creates a set of “second class citizen developers” that will keep the heavy duty of synchronizing the second class repos with the first ones.

10 Likes

This is patently false, as many projects like FreeBSD (cvs → svn → git) and even Linux kernel (Perforce → git) has shown in pratice.

Also, there are many open source projects that are way older than I, and receive fixes for bugs from literally decades ago.

4 Likes

I broadly agree with this. Splitting the repo up also introduces more opportunity for things to get out of sync, PRs in one repo may depend on PRs in a different repo entirely, independent maintainers struggle to keep up with nixpkgs, packages receive even less scrutiny than before, and it becomes impossible to keep an eye on everything happening in this complex interconnected web of dependencies.

Packages do not exist in a vacuum, and neither do we.

8 Likes

FWIW the messages mentioning pijul in this topic should be split out into a separate topic. It’s kinda sad that discourse doesn’t have reply-to-thread-with-subject-line-change, which is the way of signaling a new subthread in email discussions. Sigh yet another 1980s innovation that seems to have been forgotten. Anyways the pijul discussion is not just orthogonal to the monorepo discussion, but sort of transcends it – there is no pijul submodule or josh-for-pijul because these things are unnecessary.

The joke goes: two git users are arguing, heatedly, about monorepo vs multirepo. A pijul user drifts by, saying

the-matrix-there-is-no-spoon-0

13 Likes

What exactly is needed? To me, two years is not a huge amount of time.

I don’t know how the conversation ended up on pijul when the OP was about eliminating the board and monorepo… but anyway, I don’t think pijul is a serious option.

On a tmpfs:

$ time pijul clone https://nest.pijul.com/pijul/pijul
Repository created at /tmp/tmp.dSyrVH5tU1/pijul
Downloading changes  [==================================================] 1053/1053 [00:00:12]
Applying changes     [==================================================] 1053/1053 [00:00:16]                                                     Downloading changes  [==================================================] 1053/1053 [00:00:00]
Completing changes... done!                                                                                                                        
real	0m19.583s
user	0m8.949s
sys	0m0.618s

$ time pijul clone pijul pijul-copy
Repository created at /tmp/tmp.dSyrVH5tU1/pijul-copy
Downloading changes  [==================================================] 1053/1053 [00:00:03]
Applying changes     [==================================================] 1053/1053 [00:00:07]                                                     Downloading changes  [==================================================] 1053/1053 [00:00:00]
Completing changes... done!                                                                                                                        
real	0m8.737s
user	0m8.447s
sys	0m0.454s

Even a local, in-tmpfs clone took nearly 9 seconds for the clone of only 1k changes. And I think I have a quite decent CPU (Ryzen 5700X). Extrapolating this to the 640k commits that nixpkgs currently has, assuming (optimistically!) linear scaling, that’d be 1.48 hours for a local clone. Network speed and even disk I/O are huge variables, but in the end, will only make those times massively worse.

Meanwhile, with git:

$ time git clone file://$PWD/nixpkgs nixpkgs-copy
Cloning into 'nixpkgs-copy'...
remote: Enumerating objects: 4930209, done.
remote: Counting objects: 100% (4930209/4930209), done.
remote: Compressing objects: 100% (949117/949117), done.
Receiving objects: 100% (4930209/4930209), 877.36 MiB | 72.31 MiB/s, done.
remote: Total 4930209 (delta 3616600), reused 4928000 (delta 3614680), pack-reused 0 (from 0)
Resolving deltas: 100% (3616600/3616600), done.
Updating files: 100% (41804/41804), done.

real	2m40.512s
user	7m50.344s
sys	0m33.091s

About 33x faster. And even nowadays, we consider the barrier to contribution quite high, as people don’t know about blobless clones, or just don’t have the network bandwidth and stability to download gigabytes of data.

3 Likes

Well, it looks like a nice use case for improvements!