Solve slow man-cache the content-addressed way, but not ca-derivation

tldr.

Instead of using Nix’s ca-drv, which isn’t generally usable, use a script to content-address installed manpages to see if man-db cache needs to be rebuild.

Problem

Generate man-db cache by enabling documentation.man.generateCaches is very slow. Average desktop NixOS can easily have 5k+ manpages installed, which can take man-db at least 10 seconds to generate cache from scratch. This is the method used in nixpkgs.

It’s irritating to see nixos-rebuild stuck on building man-cache from time to time :<

{E7FE8C13-A3DC-4E72-8109-0AAEC9D36FA4}

Other solutions

ca-derivation

By making documentation.man.man-db.manualPages a ca-drv, we can skip rebuild man-db cache if packages didn’t change. This method is also recommended in the option document.

However, ca-derivation isn’t currently support by either cachix or garnix AFAIK, so it’s a no go for me since I depend on cachix heavily.

If ca-drv become stable and usable one day, the method here won’t be necessary anymore.

man-db.service

man-db ships with a systemd service with a timer to update cache daily. Crucially, because man-db can do cache incrementally, it can be really fast.

However, man-db’s incremental cache ability relies on mtime, which totally doesn’t work on NixOS where all files are dated back to 1970.

In my workflow, the service is hooked up with sysinit-reactivation.target so that each switch will trigger the service. It’s too wasteful to rebuild cache every time, regardless of whether manpages have actually changed.

If man-db can archive incremental cache without mtime, it would be a perfect solution on NixOS though.

My solution

By combining the force of two, I made a prototype service that does content-address on the manpages directory, and if the hash changes, meaning the manpages also have changes, the cache will be regenerated, otherwise skip.

Such way man-cache doesn’t block nixos-rebuild anymore, and I can still enjoy man with completions.

The logic of content-address is:

  1. Checksum all *.gz files in /run/current-system/sw/share/man/
  2. Sort alphanumerically
  3. Join checksums into one long string
  4. Checksum this long string

The full service module can be found at Nuran/nixos/documentation/default.nix at 6a4708cb2c4b46c97fd1ee98fd8f2b77df13bfe6 · MidAutumnMoon/Nuran · GitHub

Edit. 1

Deleting unneeded manpages can also help speed up building cache:

The snippet I’m using deletes multilingual manpages as well as man section 3 which contains mainly C functions that are not useful for me personally.

With this trick, I’d been able to shave the time of full cache build down to about 5s.

7 Likes

Wonderful! Do you plan on eventually upstreaming this to nixpkgs?

Are you sure it’s actually this? IME the thing you see logged in the Nix CLI here isn’t always what you’re actually waiting for.

I have also never noticed this issue and I’m quite picky about rebuild time.

Do you have documentation.nixos.enable enabled? That generates a huge man page with all NixOS options which I’ve disabled because I don’t need it and it takes a while to generate. I could see that making a significant difference.

2 Likes

Ah, the difference is that I don’t have documentation.man.generateCaches set which is off by default.

Maybe I’ll do it in the future. But I still have some crazy ideas in flight, this solution is not in its final form yet.

1 Like

Yes, and in order to make fish shell generate completions this option has to be enabled. It is the motivation for me to dig into this problem.

Trust me, that thing is slow.

Speaking of which, I suspect this could be much faster if the generation were split at the package level: building smaller caches that are finally combined together, like fish completions files. I don’t know if this is possible, though, I would have to study the format of the cache.

3 Likes

Yes! This is one of the ideas I’m thinking of.

man-db uses Berkeley DB - Wikipedia under the hood. With some experiments I find it is trivial to merge multiple databases using BerkeleyDB cli:

# Dump
db_dump ls_index.bt -f ls_index.dump
db_dump cat_index.bt -f cat_index.dump

# Merge
db_load -f ls_index.dump -f cat_index.dump index.bt

It’s absolutely doable to generate cache database during package building and merge all of them during system activation.

One downside of this approach would be that man-db might be pulled into the bootstrap chain.

6 Likes

Thanks for sharing this, I’ve recently switched to fish myself and had to disable the man-cache for the same reason.

Been struggling to ensure bash is the login shell and fish is launched afterwards too when using the wiki example, but I spotted you seem to be handling that differently too - so double thanks, going to give your approach a try :blush:

1 Like