No hashes starting with 'e', 't', 'o' or 'u' in /nix/store?

If people could spell bad words at choice, they’d be bitcoin trillionaires (=Zimbabwe trillionaires)

If people could spell bad words at choice, they’d be bitcoin trillionaires (=Zimbabwe trillionaires)

Well, you need just 6 bits per character, and we don’t use anything delibaerely slow, and 2^24 is a mere 16 million, this is laughably easy. 8 characters can be got in a day, I think.

A stupid proxy blocking Hydra access because you tried to download a file with the name starting with two bad words in a row would be annoying.

Now, if you could encode a 32-letter message, that would be indeed surprising.

1 Like

Not only you will not find any hashes starting with those letters but you will not see the letters inside the hashes either.

Nix encodes the binary string produced by the hash function into a ASCII string using a custom base-32 scheme. The scheme avoiding vowels was chosen in order to reduce the chance of the string containing swear words (Eelcoʼs thesis, page 88).

Edit: sorry for posting superfluous message, GMail for Android does not seem to respect In-Reply-To headers, so I did not notice this thread was already solved.

7 Likes

Damn shit, that was the reason?

16 Likes

Sounds like cachix needs an easter egg to find swear words.

5 Likes

This gives more sense to how nuke-references[1] is implemented. It replaces hashes with “eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee”, a completely invalid hash as per the above spec.

[1]: See https://github.com/NixOS/nixpkgs/blob/eab4adc1875af7c86dd109eaa429d0f7a63c2137/pkgs/build-support/nuke-references/builder.sh.

7 Likes

People do crack hashes to build words or names. See tripcode cracking

So it’s totally possible to create a nix derivation which has a hash that contains words you want. You just have to try many out automatically. Could be a fun excercise.

e, t, a, o are the top four letters by frequency in English, so this seems to be a strategy of reducing the possibility of spelling any words, except that a was replaced by u… You already know why.

1 Like

Here’s an amusing list of swearwords, many of which can show up in hashes :wink:

1 Like

Found a nice example of finding hashes with words you prefer: the tool “masscan” asks for donations to a bitcoin wallet with the word “MASSCAN” in it GitHub - robertdavidgraham/masscan: TCP port scanner, spews SYN packets asynchronously, scanning entire Internet in under 5 minutes.

3 Likes

Hmm…

$ curl https://raw.githubusercontent.com/RobertJGabriel/Google-profanity-words/master/list.txt | wc -l
451
$ curl https://raw.githubusercontent.com/RobertJGabriel/Google-profanity-words/master/list.txt | grep '[0123456789abcdefghijklmnopqrstuvwxyz]' | wc -l
451
$ curl https://raw.githubusercontent.com/RobertJGabriel/Google-profanity-words/master/list.txt | grep -v '[eotu]' | wc -l
77
$ curl https://raw.githubusercontent.com/RobertJGabriel/Google-profanity-words/master/list.txt | grep -v '[aket]' | wc -l
69
$ curl https://raw.githubusercontent.com/RobertJGabriel/Google-profanity-words/master/list.txt | grep -v '[aeot]' | wc -l
67
$ curl https://raw.githubusercontent.com/RobertJGabriel/Google-profanity-words/master/list.txt | grep -v '[aeio]' | wc -l
52
$ curl https://raw.githubusercontent.com/RobertJGabriel/Google-profanity-words/master/list.txt | grep -v '[aiou]' | wc -l
32

So e, o, t, u is pretty good, but you can get fewer with just taking a, i, o, u. Maybe there’s some dutch swears with a bunch of t’s I don’t know about though.

$ curl -s https://raw.githubusercontent.com/RobertJGabriel/Google-profanity-words/master/list.txt > badwords.txt && for a in {{a..z},{0..9}}; do for b in {{a..z},{0..9}}; do for c in {{a..z},{0..9}}; do for d in {{a..z},{0..9}}; do echo -e $a$b$c$d\\t$(grep -v "[$a$b$c$d]" badwords.txt | wc -l); done; done; done; done | sort -nrk2
4 Likes

Tezos also does this with protocol changes. For instance the codename is carthage and the protocol hash is “PtCarthavAMoXqbjBPVgDCRd5LgT7qqKWUPXnYii3xCaHRBMfHH” [1]

1 Like

Probably the best approach would be to remove all the vowels aeiouy and add some punctuation to get back to 32 characters. Most languages don’t have words without vowels. - and _ should be allowed in all filesystems.

Of course then you can still have a file hash like grr-_-grr-_-grr :wink:

Probably the best approach would be to remove all the vowels aeiouy and add some punctuation to get back to 32 characters. Most languages don’t have words without vowels. - and _ should be allowed in all filesystems.

Of course then you can still have a file hash like grr-_-grr-_-grr :wink:

fsckn-btrfs !

I guess this would be a partially incompatible change anyway, so we can as well move to multi-level directory structure, and then we can even afford a slightly larger length while we are at it.

At the same time, l33tsp34k teaches us o=0 i=1 e=3 a=4 t=7, so the majority of vowels and currently excluded t is back anyway

(I guess for the thesis needs having a literal swear word at an official demonstration is a small risk but awkward if happens, but for l33tsp34k cursing everyone involved can always just plausibly pretend to not notice the hash is readable)

1 Like

Curses, foiled again :wink: c4f3b4b3_b00b13s_d34dc4t_etc is indeed valid under that scheme.

to go on a tangent: why would you want a multi-level directory structure? Modern filesystems can handle millions of files in a single directory just fine, they use a tree structure internally. By using a multi-level directory structure you’re actually making that tree structure less efficient.

1 Like

to go on a tangent: why would you want a multi-level directory structure? Modern filesystems can handle millions of files in a single directory just fine, they use a tree structure internally. By using a multi-level directory structure you’re actually making that tree structure less efficient.

This would probably be true if Nix didn’t insist on a 0o555 store.

From time to time some program asks itself why not readdir() the store, and maybe also stat() each result, and it is a bit annoying to keep track of what not to do to avoid hitting such a behaviour.

(One example is Zsh Tab completion that has a very convenient optional feature that also happens to readdir() all directories along the path being completed)

1 Like

:thinking: actually, I think that’s a good thing. Regular services shouldn’t try to readdir() /nix/store, and if an interactive shell hangs because it’s listing a huge directory, that’s a bug that also slows down that shell in large directories…

:thinking: actually, I think that’s a good thing. Regular services shouldn’t try to readdir() /nix/store, and if an interactive shell hangs because it’s listing a huge directory, that’s a bug that also slows down that shell in large directories…

Large directories are completely avoidable, there are many convenient things that require enumerating some directory (not just Nix store), Tab-completing a Nix store path by the first 2 to 4 characters of its hash is sometimes convenient, and it is supposed to be possible to readdir() /nix/store as evidenced by Nix explicitly asserting that store has permissions that allow enumeration.

1 Like

Hmmm still not convinced…

  • On my servers and laptop I seem to have some several thousand files, which tab-completes quickly
  • On hydra storage, I would assume that if you’re looking for a certain hash it’s a quick copy and paste away
  • If you want to quickly access one of several perl packages, there could be an additional directory with a directory for each name in the store, filled with symlinks to the actual packages. So you’d visit /nix/links/perl/<tab><tab> to see all the ones you want; the symlinks could even have embedded data like the install date in their name

OTOH, if we switch to a multi-layer store, all packages need rebuilding, there will be lots of almost empty directories. On one of my servers, looking at the distribution with 2 letters by running

(cd /nix/store; ls | cut -c1-2) | sort | uniq -c | sort -n

gives me 998 buckets, with the biggest having 12 items, and 76 directories with a single item.

Your nix store auto completes fast? Which shell do you use?