Given that the chunks are still quite big, and I suspect that many files only change by their inclusion of nix store paths that changed, how about pre-processing all stored files as follows:
The idea is to move the nix store paths (only up to the first part) into a separate list, and remove them from the file. So then you would replace a file F
with a tuple (Fx, L)
. Fx
is the binary contents of the file with every sequence matching /nix/store/[^/]+
removed, and L
is a list of (position, path)
tuples.
This can be encoded in a streaming manner, and decoded in a streaming manner provided you have access to the tuples L.
L
can be compressed better by making position
be relative to the end of the last match, and making path
a index of a list of found paths. So then we get Lrel
being a list of (relPosition, pathIndex)
tuples, and P
a list of paths, so F
becomes (Fx, Lrel, P)
.
This result should be way better at being chunked. I am hoping that many rebuilt files will have the same Fx
and Lrel
, and only P
will differ.
For future-proofing etc, the /nix/store/
part should be configurable.
What do you think @zhaofengli ?