Hydra security model discussion

Is there an alternative cache (like NUR repo, community-fed) that uploads built packages from the participating users so others can benefit? I wonder if that would be a security problem since packages are hashed etc.

(asking since it’s been a day and cache still not populated, I’d like to build and contribute, at least for one arch)

1 Like

Is there an alternative cache (like NUR repo, community-fed) that uploads built packages from the participating users so others can benefit? I wonder if that would be a security problem since packages are hashed etc.

Well, if you are root it is trivial to modify the build while it runs from outside the sandbox. You need to make a deliberate effort, but the bar is not high.

Cross-checking the results from different users could kind of helped as long as submitters are independently working and the package is bit-perfectly reproducible. The latter will be verified automatically by such a process, of course, but the former is a complicated question.

1 Like

It is a security problem. There are two hashes:

  1. The output path contains the hash of the derivation. It provides no guarantee that the output was not tampered with. Even more, it also doesn’t guarantee that the inputs was not tampered with. One could change the input, build and then rename the output path with the hash of the (expected) unmodified input.
  2. The narinfo contains a hash of a data. But one could just tamper with the output and update the narinfo to contain the revised hash.

For this reason, you should only use trusted binary caches. Or even better, use binary caches with narinfo signed by someone that you trust.

The story would get a bit better with fully reproducible builds, because then several independent parties could build a derivation and the output hashes could be compared.

2 Likes

Hmm I thought nix was locally checking integrity by actually hashing the downloaded thing, not just reading the narinfo. Then yeah, it’s no go. I was thinking somewhat nur-like “beware of the risks” cache which has a submitter and comparison-based trust ranking, but it’d be naive in such state.

It is checking the integrity. But hashing alone does not provide security. If someone tampers with the output, they can update the hash as well. You need signatures + a trust model (decide which signers to trust). If you set up such a community cache, even with nar signing, you need to trust everyone in the community who is able to upload and sign.

Sorry for rookie questions, apparently I have some fundamental gaps on the model so trying to clarify.
I wasn’t trying to provide the input and I thought the output is essentially defined by the input only? Inputs from official repo could be the source of truth, so the expected output hashes will be already known.

E.g. we have the thunderbird definition on official 20.09, which (I assumed) is expected to create 80bgzpp1lz9z93aiwjp8h5585nlk9zhm-thunderbird-78.4.0 before the build, so tampering on the cache side would be at least hard (except finding a nice sha256 collusion, which of course would be enough to label the whole thing “insecure” by itself).

Edit: Wait, after thinking a bit more on it, I realized my mistake here. There is no way you can guarantee what will be the hash of the (ingredients of the) built package, it’s just the name of the output we can be sure… :man_facepalming: Thanks for the answers.

1 Like

we choose to goto the moon, not because it is easy, but because it is hard. Never under estimate hardness, in regards to well resourced, dedication, infinite time, and other motivations.

The folks we love over at debian, they have been trying to solve this ‘problem’. That the sum of a builds inputs will always produce the same byte for byte output. This is somewhat difficult.

https://wiki.debian.org/ReproducibleBuilds

If a build is byte for byte reproducible , you could get all your trusted friends to build software configurations for you, compare all the outputs with each other, and the all match, you probably know that it’s probably what the author of the software intended…

Maybe the forth coming IPFS intergrations , will bring us a step closer to distrubuted , trustless build nirvana. or maybe not.

Welcome to nixos, the ecosystem the makes you ask questions on how software is developed, built, packaged, and delivered…

2 Likes

Even bit-perfect reproducible builds do not remove the question whether all the submitted outputs are in fact controlled by sockpuppets by a single person trying to make a point or something.

2 Likes

notice i said ‘friends’… … the fact that some of my friends are actual ‘real life’ sockpuppets, maybe related.

Trust artifacts would need to be exchanged out-of-band with the non-sockpuppet friends. How could that possibly (and practically) look like in an IPFS scenario so that it could scale?

Maybe a mixed model with well-known trust anchors is needed?

I had a fleshed out proposal after @blaggacao proposed the iot button I will finish it up and submit to nlnet and ask what’s tgeir policy around public contributions.

Reproducible builds are great! While indeed Debian has been taking the lead here, it’s good to see people with other backgrounds (notably Arch, openSUSE, Guix) joining in:

While we’re well-positioned to add similar validations, we have some catching-up to do. You can already easily do a (basic) reproducibility check for a package with ‘nix-build --check’. While many packages are indeed already bit-by-bit reproducible, there is also work to be done. Focusing on the nixos-minimal iso, this is a list of packages that still need work:

Some of the python packages are expected to be fixed when pip 20.2 makes it in (on its way, pythonPackages.pip: make reproducible by zimbatm · Pull Request #102222 · NixOS/nixpkgs · GitHub). A PR for the dbus docs problem is also in progress (dbus: docs: make id's reproducible by raboof · Pull Request #102961 · NixOS/nixpkgs · GitHub).

@zimbatm did a twitch live stream on this on friday 6/11/2020 … and takes you through some theme’s around reproducibility. The FIPS140 stuff in package nss is very interesting… reproducibility is hard, especially when software wants to be non-reproducible (optimisation, security, etc etc).

https://www.twitch.tv/videos/793797914

I just discovered this, which is some nice thoughts on trust.