problem: source files and build files are stored in the same CAS format
/nix/store/hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh-pname-source-1.2.3
/nix/store/hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh-pname-build-1.2.3
… which is a LOSSY transformation,
since /nix/store is messing with file permissions
solution: store source files in their original format,
to get a LOSSLESS local copy of the source files
/cas/nix/hh/hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh-tttttttt-tttttt-pname-1.2.3
/cas/git/hh/hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh-tttttttt-tttttt-pname-1.2.3
/cas/sha256/hh/hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh-tttttttt-tttttt-pname-1.2.3.tar.gz
focus!
nix should “do one thing and do it well”:
manage build instructions and build artifacts (“makefiles and buildfiles”)
cas = content addressable store (wikipedia)
/cas/git = content is in “git” format
/cas/nix = content is in “nix” format
existing solutions?
simple question: do we already have an implementation of a global /cas
prefix for the filesystem hierarchy standard (FHS)? aka a “meta content-addressable storage, providing one interface to many CAS backends”
we have gitfs, but it has not the interface that i would expect
but even if this “thing” does not-yet exist, it “should” be easy to build, since all the parts exist already, and its just a matter of “connecting the dots”. this could be called “nix light”, since we would use the same nix build system, and just get rid off the sha256 pinning “non-feature”. instead, we would pin source files ONLY by their native cas hash (git commit, sha256 of source tarball, …)
relevant xkcd comic:
we have 14 standards?? ridiculous!
we should create one universal standard, that covers EVERYones use cases.
yeah!
soon: there are 15 competing standards.
concept for a global /cas filesystem
hh/hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh = 40 char git COMMIT hash (not tree hash)
the first 2 chars are used as directory name, to make directory-listings smaller.
the same filesystem is used for git objects.
since hashes have high entropy (randomness),
this is a good way to partition many hashes into smaller groups of hashes.
tttttttt-tttttt = human-readable time of commit in UTC timezone, for example 20211031-084210
space and time
so we have location (hash) and time,
which are the two universal properties of any object.
pname = optional package name
1.2.3 = optional packagae version (“stable release? what is that? just shut up and give me the git HEAD!”)
one interface, many backends
the global /cas filesystem gives access
to many different content addressable stores
for example
/cas/ipfs/hh/hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh-tttttttt-tttttt-pname-1.2.3
/cas/sha256/hh/hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh-tttttttt-tttttt-pname-1.2.3.tar.gz
/cas/bittorrent/hh/hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh-tttttttt-tttttt-pname-1.2.3
/cas/kad/hh/hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh-tttttttt-tttttt-pname-1.2.3
/cas/bitcoin/hh/hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh-tttttttt-tttttt-pname-1.2.3
/cas/nanocoin/hh/hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh-tttttttt-tttttt-pname-1.2.3
/cas/onion/hh/hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh-tttttttt-tttttt-pname-1.2.3
/cas/i2p/hh/hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh-tttttttt-tttttt-pname-1.2.3
/cas/retroshare/hh/hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh-tttttttt-tttttt-pname-1.2.3
/cas/pgp/hh/hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh-tttttttt-tttttt-pname-1.2.3
/cas/tahoelafs/hh/hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh-tttttttt-tttttt-pname-1.2.3
/cas/gitannex/hh/hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh-tttttttt-tttttt-pname-1.2.3
/cas/perkeep/hh/hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh-tttttttt-tttttt-pname-1.2.3
/cas/docker/hh/hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh-tttttttt-tttttt-pname-1.2.3
/cas/oci/hh/hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh-tttttttt-tttttt-pname-1.2.3
/cas/npm/hh/hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh-tttttttt-tttttt-pname-1.2.3
...
npm packages are pinned by their hash, see package-lock.json.
generally, anything that is “pinned by hash” can be integrated into this system
(“we are the borg … we incorporate all your stuff”)
all content must be explicitly added to the /cas
store
so it is not possible to load unknown content only by its hash
when storing content, we must add at least one “remote”
from where the content can be fetched.
we can add multiple remotes, to use mirrors or p2p networks
what exactly is the problem with sha256?
the problem is the “reinvention of the wheel”.
most source files are content-addressable already.
by forcing these files into the “nix” format,
we create overhead by introducing additional hashes
(which probably give the impression of “more security”,
since we all “know” that “sha1 is unsafe” …)
in short: avoid collisions.
but there are much cheaper solutions
for the problem of “collision avoidance”:
simply add more metadata!
especially human-readable metadata,
which is easy to verify “in plain sight”.
every source file has a human-readable name and time.
so lets just use these “natural options”
to make our hashes more collision-safe.
security
if you say “now that megacorp has quantum-crypto,
megacorp can produce fake sha1 hashes and hack my system”
then you probably ignore the fact that many of your tools
have an “oh-so-convenient” autoupdate feature,
which in the old days we called computer-virus,
and you probably ignore the fact that your hardware,
especially “your” processor and network controller,
are closed-source machines with backdoors,
which already give megacorp full access to your digital privacy.
note: autoupdate was aggressively normalized
by such “trustworthy” players as microsoft and google …
so now, projects like the “brave” browser
(users must be “brave” to trust that piece of software)
can easily get away with their “autoupdate by default” dogma,
acting as if “there is a new zeroday exploit in SSL every day,
so we must update every day to be on the safe side …”
south park episode S17E02: Informative Murder Porn
Randy: our content is being blocked and we need it now!
Cable Guy: I’m sorry sir. If you need it now,
perhaps you should switch to another cable company.
[tauntingly] Ohhh there’s not another cable company, is there?
[begins to rub his nipples in circles]
Ohhh, that’s right, we’re the only one in town.
sha256 is overhead
we create overhead by introducing additional hashes, in this case, the infamous sha256 in nix files
in reality, this is just a useless pain in the ass.
consider how we update packages in nix:
we must change both the commit hash AND the sha256 of the source.
why? cos as we all “know”:
“sha1 is unsafe”, “sha1 is unsafe”, “sha1 is unsafe” …
my point is:
sha256 is just another version of “security by obscurity”.
real security would require AUDITING of source code, aka “peer review” in science.
but this is the same problem as with any “fine print”
(terms of use, end user license agreement, manmade laws in the legal system, …)
who the fuck actually reads all this crap?
most of this stuff was specifically designed to be unreadable (to hide backdoors),
and even if you can “read”, there will always be someone,
who will have a different interpretation of the same text
(keywords: class justice, unwritten laws)
in the domain of IT, closed source hardware represents the unwritten laws.
south park episode S15E01: HumancentiPad
Kyle is kidnapped after agreeing to an iTunes user agreement,
and forced to become part of a “revolutionary new product”.This episode parodies reports
about tracking software built into Apple’s iPads and iPhones,
and also the tediously long end-user license agreements
south park episode S13E03: Margaritaville
“Margaritaville” reflected Parker and Stone’s belief
that most Americans view the economy in the same way as religion,
in that it is seldom understood [obscurity]
but seen as an important, elusive entity.
/nix/store is lossy
problem:
source files stored in /nix/store have different permissions
than the original source files,
so storing source files in /nix/store is a LOSSY transformation,
so it is not trivial to calculate the git TREE hash from the stored source files
solution:
preserve the original file permissions,
AND also store the raw commit object,
to get a LOSSLESS copy of the source files,
which later can be re-used (deduplication, sharing)
challenge: the raw commit object is NOT available in the github API.
the github API is lossy at this point,
cos the TIMEZONE of the commit time is missing!
potential workaround: use github’s graphQL API to get the timezone.
read only
to make the store “read only”,
we can use a virtual filesystem (FUSE)
to provide the files with their original metadata (permissions, attributes),
but we simply block ALL write operations to the filesystem.
(could be solved cheaper with a ready-only bind-mount)