No hashes starting with 'e', 't', 'o' or 'u' in /nix/store?

I don’t have any hashes in my nix store that starts with one of those letters ['e','o','t','u']. I checked a few nixos machines of mine and found out that they’re the same. I checked both MacOS and Linux machines. Do you have any idea about whats going on with nix store hashing?

% for i in {{a..z},{0..9}}; do echo -n "Number of hashes starting with letter '$i' " ; ls | grep -e "^$i.*"| wc -l; done
Number of hashes starting with letter 'a'     285
Number of hashes starting with letter 'b'     274
Number of hashes starting with letter 'c'     314
Number of hashes starting with letter 'd'     289
Number of hashes starting with letter 'e'       0
Number of hashes starting with letter 'f'     288
Number of hashes starting with letter 'g'     314
Number of hashes starting with letter 'h'     293
Number of hashes starting with letter 'i'     287
Number of hashes starting with letter 'j'     279
Number of hashes starting with letter 'k'     299
Number of hashes starting with letter 'l'     310
Number of hashes starting with letter 'm'     280
Number of hashes starting with letter 'n'     282
Number of hashes starting with letter 'o'       0
Number of hashes starting with letter 'p'     260
Number of hashes starting with letter 'q'     326
Number of hashes starting with letter 'r'     264
Number of hashes starting with letter 's'     337
Number of hashes starting with letter 't'       0
Number of hashes starting with letter 'u'       0
Number of hashes starting with letter 'v'     285
Number of hashes starting with letter 'w'     300
Number of hashes starting with letter 'x'     262
Number of hashes starting with letter 'y'     278
Number of hashes starting with letter 'z'     325
Number of hashes starting with letter '0'     292
Number of hashes starting with letter '1'     291
Number of hashes starting with letter '2'     282
Number of hashes starting with letter '3'     293
Number of hashes starting with letter '4'     287
Number of hashes starting with letter '5'     300
Number of hashes starting with letter '6'     283
Number of hashes starting with letter '7'     275
Number of hashes starting with letter '8'     309
Number of hashes starting with letter '9'     312
1 Like

Huh, curious, absolutely the same for me.

[sondre@neptune:/nix/store]$ for i in {{a..z},{0..9}}; do echo -n "Number of hashes starting with letter '$i' " ; ls | grep -e "^$i.*"| wc -l; done
Number of hashes starting with letter 'a' 1097
Number of hashes starting with letter 'b' 1149
Number of hashes starting with letter 'c' 1129
Number of hashes starting with letter 'd' 1125
Number of hashes starting with letter 'e' 0
Number of hashes starting with letter 'f' 1100
Number of hashes starting with letter 'g' 1081
Number of hashes starting with letter 'h' 1094
Number of hashes starting with letter 'i' 1112
Number of hashes starting with letter 'j' 1149
Number of hashes starting with letter 'k' 1106
Number of hashes starting with letter 'l' 1142
Number of hashes starting with letter 'm' 1083
Number of hashes starting with letter 'n' 1088
Number of hashes starting with letter 'o' 0
Number of hashes starting with letter 'p' 1092
Number of hashes starting with letter 'q' 1091
Number of hashes starting with letter 'r' 1133
Number of hashes starting with letter 's' 1114
Number of hashes starting with letter 't' 0
Number of hashes starting with letter 'u' 0
Number of hashes starting with letter 'v' 1090
Number of hashes starting with letter 'w' 1096
Number of hashes starting with letter 'x' 1151
Number of hashes starting with letter 'y' 1078
Number of hashes starting with letter 'z' 1179
Number of hashes starting with letter '0' 1099
Number of hashes starting with letter '1' 1126
Number of hashes starting with letter '2' 1095
Number of hashes starting with letter '3' 1100
Number of hashes starting with letter '4' 1110
Number of hashes starting with letter '5' 1125
Number of hashes starting with letter '6' 1100
Number of hashes starting with letter '7' 1157
Number of hashes starting with letter '8' 1106
Number of hashes starting with letter '9' 1087
1 Like

To answer, I asked about this on IRC and clever found the solution extremely quick:

sondr3 | No hashes starting with ‘e’, ‘t’, ‘o’ or ‘u’ in /nix/store?                                   
       | https://discourse.nixos.org/t/no-hashes-starting-with-e-t-o-or-u-in-nix-store/4906            
sondr3 | anyone have any clue why this is? I'm extremely curious                                       
clever | src/libutil/hash.cc:const string base32Chars = "0123456789abcdfghijklmnpqrsvwxyz";              
clever | sondr3: its not just that it cant start with an e, but it wont have an e anywhere in the hash 
sondr3 | oh that's even more of a fun fact                                                              
sondr3 | do you know why?                                                                              
clever |  72 // omitted: E O U T                                                                       
clever |  73 const string base32Chars = "0123456789abcdfghijklmnpqrsvwxyz";                            
 monty | sondr3: Those just happen to be the characters dropped. Otherwise there'd be too many for       
       | base32                                                                                        
clever | and i think they omited letters needed for certain words, so you cant spell bad words         

So yeah, in line 72 you can see it.

9 Likes

Thank you very much @sondr3.

If people could spell bad words at choice, they’d be bitcoin trillionaires (=Zimbabwe trillionaires)

If people could spell bad words at choice, they’d be bitcoin trillionaires (=Zimbabwe trillionaires)

Well, you need just 6 bits per character, and we don’t use anything delibaerely slow, and 2^24 is a mere 16 million, this is laughably easy. 8 characters can be got in a day, I think.

A stupid proxy blocking Hydra access because you tried to download a file with the name starting with two bad words in a row would be annoying.

Now, if you could encode a 32-letter message, that would be indeed surprising.

1 Like

Not only you will not find any hashes starting with those letters but you will not see the letters inside the hashes either.

Nix encodes the binary string produced by the hash function into a ASCII string using a custom base-32 scheme. The scheme avoiding vowels was chosen in order to reduce the chance of the string containing swear words (Eelcoʼs thesis, page 88).

Edit: sorry for posting superfluous message, GMail for Android does not seem to respect In-Reply-To headers, so I did not notice this thread was already solved.

7 Likes

Damn shit, that was the reason?

18 Likes

Sounds like cachix needs an easter egg to find swear words.

6 Likes

This gives more sense to how nuke-references[1] is implemented. It replaces hashes with “eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee”, a completely invalid hash as per the above spec.

[1]: See https://github.com/NixOS/nixpkgs/blob/eab4adc1875af7c86dd109eaa429d0f7a63c2137/pkgs/build-support/nuke-references/builder.sh.

8 Likes

People do crack hashes to build words or names. See tripcode cracking

So it’s totally possible to create a nix derivation which has a hash that contains words you want. You just have to try many out automatically. Could be a fun excercise.

e, t, a, o are the top four letters by frequency in English, so this seems to be a strategy of reducing the possibility of spelling any words, except that a was replaced by u… You already know why.

1 Like

Here’s an amusing list of swearwords, many of which can show up in hashes :wink:

1 Like

Found a nice example of finding hashes with words you prefer: the tool “masscan” asks for donations to a bitcoin wallet with the word “MASSCAN” in it GitHub - robertdavidgraham/masscan: TCP port scanner, spews SYN packets asynchronously, scanning entire Internet in under 5 minutes.

3 Likes

Hmm…

$ curl https://raw.githubusercontent.com/RobertJGabriel/Google-profanity-words/master/list.txt | wc -l
451
$ curl https://raw.githubusercontent.com/RobertJGabriel/Google-profanity-words/master/list.txt | grep '[0123456789abcdefghijklmnopqrstuvwxyz]' | wc -l
451
$ curl https://raw.githubusercontent.com/RobertJGabriel/Google-profanity-words/master/list.txt | grep -v '[eotu]' | wc -l
77
$ curl https://raw.githubusercontent.com/RobertJGabriel/Google-profanity-words/master/list.txt | grep -v '[aket]' | wc -l
69
$ curl https://raw.githubusercontent.com/RobertJGabriel/Google-profanity-words/master/list.txt | grep -v '[aeot]' | wc -l
67
$ curl https://raw.githubusercontent.com/RobertJGabriel/Google-profanity-words/master/list.txt | grep -v '[aeio]' | wc -l
52
$ curl https://raw.githubusercontent.com/RobertJGabriel/Google-profanity-words/master/list.txt | grep -v '[aiou]' | wc -l
32

So e, o, t, u is pretty good, but you can get fewer with just taking a, i, o, u. Maybe there’s some dutch swears with a bunch of t’s I don’t know about though.

$ curl -s https://raw.githubusercontent.com/RobertJGabriel/Google-profanity-words/master/list.txt > badwords.txt && for a in {{a..z},{0..9}}; do for b in {{a..z},{0..9}}; do for c in {{a..z},{0..9}}; do for d in {{a..z},{0..9}}; do echo -e $a$b$c$d\\t$(grep -v "[$a$b$c$d]" badwords.txt | wc -l); done; done; done; done | sort -nrk2
4 Likes

Tezos also does this with protocol changes. For instance the codename is carthage and the protocol hash is “PtCarthavAMoXqbjBPVgDCRd5LgT7qqKWUPXnYii3xCaHRBMfHH” [1]

1 Like

Probably the best approach would be to remove all the vowels aeiouy and add some punctuation to get back to 32 characters. Most languages don’t have words without vowels. - and _ should be allowed in all filesystems.

Of course then you can still have a file hash like grr-_-grr-_-grr :wink:

Probably the best approach would be to remove all the vowels aeiouy and add some punctuation to get back to 32 characters. Most languages don’t have words without vowels. - and _ should be allowed in all filesystems.

Of course then you can still have a file hash like grr-_-grr-_-grr :wink:

fsckn-btrfs !

I guess this would be a partially incompatible change anyway, so we can as well move to multi-level directory structure, and then we can even afford a slightly larger length while we are at it.

At the same time, l33tsp34k teaches us o=0 i=1 e=3 a=4 t=7, so the majority of vowels and currently excluded t is back anyway

(I guess for the thesis needs having a literal swear word at an official demonstration is a small risk but awkward if happens, but for l33tsp34k cursing everyone involved can always just plausibly pretend to not notice the hash is readable)

1 Like

Curses, foiled again :wink: c4f3b4b3_b00b13s_d34dc4t_etc is indeed valid under that scheme.

to go on a tangent: why would you want a multi-level directory structure? Modern filesystems can handle millions of files in a single directory just fine, they use a tree structure internally. By using a multi-level directory structure you’re actually making that tree structure less efficient.

1 Like

to go on a tangent: why would you want a multi-level directory structure? Modern filesystems can handle millions of files in a single directory just fine, they use a tree structure internally. By using a multi-level directory structure you’re actually making that tree structure less efficient.

This would probably be true if Nix didn’t insist on a 0o555 store.

From time to time some program asks itself why not readdir() the store, and maybe also stat() each result, and it is a bit annoying to keep track of what not to do to avoid hitting such a behaviour.

(One example is Zsh Tab completion that has a very convenient optional feature that also happens to readdir() all directories along the path being completed)

1 Like