Java, Nix & Reproducibility

https://fzakaria.com/2021/06/27/java-nix-reproducibility.html

What’s the case not to have strip-nondeterminism in fixupPhase by default?

4 Likes

Personally, I think needing strip-nondeterminism indicates an upstream problem, so ideally I’d rather fix it upstream than add it to all our builds (increasing complexity, some chance of problems and resource usage). You quote:

I’m not sure if that is really so common anymore.

I think this is a very interesting question. I’m not sure it should be tracked in Maven Central itself, but it would be good to track it somewhere.

There is the reproducible-central project, but it mixes the concerns of ‘actually rebuilding’ and ‘recording that rebuilding happened’. I think it might be more feasible to keep those concerns separate.

It would be very cool to have some tool that, for a given artifact, can check multiple repositories (i.e. both Maven Central and a 3rd-party repo) and see if the artifacts are identical - or, rather than checking full artifacts, perhaps compare checksums in ‘buildinfo’ attestations to the real artifacts.

@raboof how do you fix the upstream problem of the fact that JARs are non deterministic.

Have gradle or maven apply it by default ?
I’m not sure why we care for mtime in zip anyways.

As for having multiple SHA for the same artifact.
I’ve run into it enough times when attempting a new mvn2nix solution that it made the attempted solution (walking the generated .m2 directory) not feasible.

I don’t really see the difference of strip-nondeterminism and the other actions we run on the fixup phase (my 2cents)

Stdenv should make it easier to port things to Nix, and ideally be hermetic.

Do you have an example of a particular instance of this problem?

Here was the opened issue that led me to investigate: hash mismatch in fixed-output derivation · Issue #29 · fzakaria/mvn2nix · GitHub

Basically, not only are each rebuild non-deterministic but sometimes a developer may use a different Maven profile when uploading to a different Maven Artifactory (Central vs. JFrog).

It turned up to happen in practice enough that it was a problem that my original solution (walking the .m2 directory) didn’t solve.

That issue seems to be about the pom (https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-velocity/1.1.2/plexus-velocity-1.1.2.pom), so there strip-nondeterminism wouldn’t help anyway, right?

In this case yes.
It’s about how a publisher likely built with a different maven profile when uploading to a different repo.

The high level problem is that Maven’s federated system is problematic to the reproducible project itself.

First off, thanks for your continued work on Java & Nix! :slight_smile:

I love the idea of adding strip-nondeterminism or something like it to the standard fixup phase.

To me, normalizing timestamps inside archive files is a very natural extension of the normalization of timestamps that nix already does.

The ZIP format is used in other contexts besides Jar files in the Java ecosystem, after all.

@Jerith do you have experience adding a setup hook?

I was thinking something super simple like: strip-nondeterminism.sh · GitHub

However I think then strip-nondeterminism needs to be added to some bootstrap tooling?
(I think; kind of got stopped here)

I don’t have that experience, no.

I do know that it’s a positive that it’s written in perl, since perl has a very small closure and is already used in other low-level parts of nixpkgs (e.g. the buildenv build script).

I thought the plan was to remove the dependency on perl. Is that no longer the case?

Googling for “nixpkgs remove perl” turned up Get rid of the Perl dependency · Issue #341 · NixOS/nix · GitHub, but it’s long since completed; I don’t know if there are any plans beyond that.

This plan was to remove the perl dependency from Nix itself, not from nixpkgs.