I’m working on packaging some proprietary code which comes shipped in a 90GB .tar.gz file, I’d like to avoid copying it to the nix store if at all possible, as it could end up with several copies on disk. Is is possible to reference a file (providing its hash so it’s known to be “the file” and hasn’t changed)? Obviously it’ll need to be decompressed for the package to build and install it, but I’m hoping to avoid having both the compressed and uncompressed data in the store.
You need at least one copy of whatever artifacts you intend to deploy in the Nix store; no way around that.
Do you really need to deploy all 90GB of it though or perhaps only a subset of that?
If it’s just a subset, you should delete the unnecessary parts inside your FOD.
Yes but that file must be present in the nix store to build any directly dependent derivation.
Exception is if all directly dependent derivations are already realised and don’t contain references to that file.
Then uncompress and install in one step. That only works if all steps required to do so are bit-reproducible though. Decompression typically is very much reproducible however (when modulo NAR constraints).
Also consider that you can always just reference files via symlinks instead of copying them. For example, in the proton-ge-bin derivation, we just download and unpack the rather large prebuilt tarball using fetchzip and the actual derivation merely symlinks its contents into the correct place and installs a few additional files.
Maybe a good question is whether you need to modify any large files after unpacking. If no, then it does look like you need fetchzip
for fetching + unpacking (it does support tarballs too), then symlinking for most of the files and copying of the files that need modification.
Though if you’re modifying some files, and all of the files are coming from the same source, it’s probably better to copy all of them, so that the resulting store path doesn’t also depend on the source store path (which will still contain the unmodified versions of the files you modify) at run time.