Md5 deprecated?

tricktron · March 27, 2020, 4:10pm

Nix has deprecated the support of md5 in fetchurl. I am no security expert but as far as I know md5 is still secure for checking file security.

I am asking because when I opened a new pull request to simply update the amazon-ecs-cli derivation I noticed that amazon also publishes the md5 hashes of the version (see this link. So it seemed natural to use the official md5 hash in fetchurl which is unfortunately not possible.

So what is the official position regard md5 for nix?

Moredread · March 27, 2020, 4:50pm

Checking if a file matches a certain hash - so that we can be certain that the content of a file isn’t changed later - is one of the use cases that is definitely broken for md5, i.e. MD5 - Wikipedia

I don’t think arbitrary collisions are possible yet, but an attacker can e.g. produce a version that has a backdoor and one that doesn’t with the same MD5 hash, release the later first, wait a while and replace it with the backdoored one.

tricktron · March 27, 2020, 6:45pm

Thanks for the answer. That makes a lot of sense.

But isn’t the case I mentioned above different? I mean I already have the official md5 from amazon and therefore I trust that md5. Isn’t it then about second-preimage-resistance and against such an attack md5 is still secure as far as I know?

So for the use case: if there is an official and thus trustworthy hash available, md5 would still be a good option as it is both safe and fast. I mean why would amazon otherwise provide official md5 hashes.

primeos · March 27, 2020, 7:28pm

MD5 can still be used for checksums to protect against accidental file corruptions (bit flips, transmission errors, etc.) but from a security standpoint it is completely broken for many years now and should not be used anymore. See e.g. MD5 - Wikipedia. Even SHA-1 shouldn’t be used for security anymore (e.g. https://shattered.io/ and subsequent attacks - some usages are theoretically still fine, but IMO there’s simply no point if there are better alternatives and transitioning often takes time).

Yes, but that isn’t always the case in Nixpkgs (e.g. we have a lot of unsecure fetchers and methods that aren’t protected well against MITM attacks). Even for that reason alone we shouldn’t allow MD5 IMO (accidental misuse would be very likely). Additionally the Hydra builds would provide less additional protection and for https://tarballs.nixos.org/ this would also cause problems.

I’m not a cryptography expert either, but from a very quick research it seems like MD5 is still sufficiently preimage resistent (but there are known theoretical attacks and I wouldn’t rely on it).

But in any case, from a security standpoint I consider MD5 dead by now (unfortunately it still isn’t, but it should be).

That’s a good question… (and I don’t like these cases)
Could be to check the files for accidental corruption, legacy reasons, laziness, etc.

Edit: The documentation doesn’t seem to discuss why MD5 is used, but from a quick look it seems to be only(/mainly) intended to protect against transmission errors (though this isn’t explicitly stated):

Check the integrity of an object uploaded to Amazon S3 | AWS re:Post
- “If the upload request is signed with Signature Version 4, then a Content-MD5 is not calculated. Instead, the AWS CLI uses the x-amz-content-sha256 header as a checksum instead of Content-MD5.”
AWS CLI S3 FAQ — AWS CLI 1.32.39 Command Reference
Working with Content-MD5 checksums
- “Amazon Marketplace Web Service (Amazon MWS) calculates the MD5 checksum and compares it to the hash value you sent to ensure that the received feed has not been corrupted in transmission. The process is reversed when Amazon MWS sends a report; the Content-MD5 header is sent with the report and you calculate the MD5 checksum and compare it to the header Amazon sent to make sure the report you received has not been corrupted in transmission”
https://support.microsoft.com/en-us/help/841290/availability-and-description-of-the-file-checksum-integrity-verifier-u
- “The File Checksum Integrity Verifier (FCIV) is a command-prompt utility that computes and verifies cryptographic hash values of files. FCIV can compute MD5 or SHA-1 cryptographic hash values. These values can be displayed on the screen or saved in an XML file database for later use and verification.”
https://docs.amazonaws.cn/en_us/redshift/latest/dg/r_MD5.html
- “MD5 cryptographic hash function” tbh at this point I’m surprised that CRC32 isn’t documented as a cryptographic hash function as well - I’d be interested if it failed the cryptographic or the hash function requirement (sorry I couldn’t resist :D)

vcunat · March 29, 2020, 8:48am

Yes, it is mainly about second preimage. If the best attack is still above 2^100, I’d consider that practically safe. But… as far as security goes, people prefer to have some margin.

Example theoretical scenario: nixpkgs mainly packages open-source stuff, so what if the attacker could somehow be involved in upstream and thus manipulate the upstream hash as well? Just sending patches might not make this feasible, but still… md5 seems very vulnerable against creating two versions with the same hash – one might be completely innocuous, served to almost everyone and second malicious version could be served to specific targets that “they” want to attack.

tricktron · March 29, 2020, 8:56am

Thanks for your lengthy answer.

I see you have a strong opinion about md5, which I like.

I agree with you that md5 has too many possible vulnerabilities which are much more significant than my mentioned use case. So it makes sense to deprecate it.

7c6f434c · March 29, 2020, 9:25am

Example theoretical scenario: nixpkgs mainly packages open-source stuff, so what if the attacker could somehow be involved in upstream and thus manipulate the upstream hash as well? Just sending patches might not make this feasible, but still…

… and we should remember that getting a collidable SSL certificate from an unsuspecting (and oblivious about MD5 deprecation) CA did happen.

Note that if your collision is early enough in the tarball, the later parts do not matter. By now I would expect any 256 independent binary choices to be sufficient for a collision, even if you gzip later (I would not bid to do it, but I would not bet against someone who promises to find a collision in such circumstances). As many of the upcoming changes are discussed and even developed in the open anyway, it might be possible to submit a patch likely to lead to a collision-friendly release. It would probably require some social engineering, but this is not an attack people who still use MD5 (in 2020!) would actively try to prevent…