If I only request chunks known to be theoretically buildable by Hydra or
present in r-ryantm Cachix cache, I am not too likely to leak a chunk
containing something specific to me.
Hmm… I think the solution of having an adapted P2P system would work
too? (cf. the hash(hash(…)) idea above) Maybe one already exists, but
IIRC anyway current P2P systems just don’t scale enough to handle
nixpkgs, so we’d need a new one anyway.
My point was that a single filter would allow us to use any protocol
as-is. Actually, e2dk did scale quite well back in the day…
That sounds more or less similar to the hash(hash(…)) idea above? To
make the full protocol I was thinking more explicit (checking it’s the
same as the one you were thinking of):
- User searches the DHT for hash(hash(chunk))
- The peers who claim having the chunk answer
- User establishes secure connections to these peers. For each peer:
- A secure connection is established between the user and the peer
- Peer sends a nonce
- User answers with hash(hash(chunk) || nonce)
- User has now proven he knows the chunk’s hash and is therefore
allowed to download, Peer sends the chunk
Not the same one. This protocol, by the way, has an obvious MITM attack
which is cheap — that is why I said that a secure two-party random
string generation is needed, we need both client and server to be able
to ensure that nonce is good.
Of course, if you request just a single chunk and the attacker knows its
structure and it contains a weak enough password, the leak helps to do
offline brute-forcing on a GPU. Just requesting from the server won’t
do, though, because an honest server will make sure that random salt
agreement will end up with a different salt during the replay (and
a dishonest server being able to provide you with a file with your
password is bad enough without a second attacker).
That is an interesting other threat model… but here I’m not sure there’s
much that can be done against it. The inner
or anything similar, though, which can make things harder.
Also, maybe something from the zero-knowledge proof domain could help
here? I’m not familiar with it, though
Well, there is a secure multi-party protocol that reveals only the
desired outputs. But we have a lot of paths to request, and a lot of
served paths, so it will be capital-E Expensive.
I am not sure how to do such an oblivious search in a reasonably secure
and efficient way. Although I guess I could ask a few people to see if
the current state of the art is indeed better nowadays (but I have
doubts it scales well).