yepp, fuse can be unstable : /
probably too unstable, so the core /cas
filesystem would be just regular files,
with deduplication via hardlinks.
extra features could be implemented with a fuse-mount overlay
(problem with regular files: storing “a million small files” is a waste of inodes)
yepp: Content-addressed Nix − call for testers
the opposite of CAS is LAS (location-addressed storage).
for LAS, the additional sha256
is required to “pin the source”
but for CAS sources,
the additional sha256
does not give better security,
only more work for maintainers.
collision avoidance (“sha1 is unsafe”) is achieved by adding pname
,
assuming that “local” collisions inside one pname
have probability zero
(“what if pnames change? what if pnames collide?” - hmm …)
nope, how we STORE sources in the local filesystem - lossy or lossless
nice! yes, this is useful to fetch the sources
thats just implementation detail …
/cas
could be a virtual filesystem (fuse mount)
so we can implement “variable prefix listing” such as
ls /cas/git/abcd/
to list all known hashes with the prefix abcd
if we know the full hash, we can just say
ls /cas/git/hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh/
or
ls /cas/git/hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh-pname-1.2.3/
to additionally verify the pname
and version
of the source
it could be useful to group git hashes by type, for example
/cas/git/tree/hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh/
/cas/git/blob/hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
/cas/git/commit/hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
… since these hashes have different algorithms
the /cas/git
filesystem (concept) can deduplicate git objects,
so the shallow-clone version (--depth 1
) is part of the deep-clone version
practically, the tarball is fetched by the commit hash.
but this is a lossy operation:
from the source files, we can only compute the tree hash.
to compute the commit hash, we need the tree hash + commit metadata.
so, to validate the source files, we must fetch the commit metadata,
for example from the github API (or gitlab, gitea, cgit, …).
using only the tree hash is not practical (time is only stored in the commit, etc)
different hash algorithms. oci is pure sha256, so oci would be just an alias of sha256.
docker calls sha256 on the contents of the tar.gz files (command tarsum),
which is much slower than just hashing the tar.gz files