Reconsider reusing upstream tarballs

Maybe, let’s work more on the top of actual code rather than too much discussion.

Here’s something I have been thinking to do, in a WIP format:

I believe the TODO list is to:

(1) provide a diffoscope version for developers so we can analyze the diffs
(2) provide “normalizers”, e.g. normalize for line endings, normalize for PO translation files, etc, etc. We are bound to have divergence between release tarballs and git sources. Which normalizers are acceptable is an interesting question, ideally, normalizing should not increase the chance of hiding executable code in the wild, but let’s say if someone hides a binary in the “processed” .po translation and load it, it’s annoying.
(3) provide “reproducers”, e.g. ways to reproduce certain generated files — I know that autoconf will leak the exact versions in the generated files, this is a challenge to overcome, not a big problem. For example, .gmo could be reproduced. Then, it also becomes a reproducible build situation too.
(4) sprinkle that slowly and surely to build confidence in our bootstrap chain.

I won’t be able to finish all of that by myself, and I am pretty sure this will much more tractable with focused community efforts and upstream collaboration as I have heard that upstreams such as autoconf would be open to work towards reproducible builds for their stuff or moving away from m4 macros.

7 Likes