Symlinking vs copying into $out/

I want to include some input files in my build output. I know of two options.

  • ln -s input $out/
  • cp input $out/

What are the implications of using each of these?

  • Will the linked ones use less storage?
  • Would linked ones be garbage collected when copied ones wouldn’t?
  • What else?
1 Like

What is input? Is it another store path? If not, definitely copy because otherwise your output is impure.

If it is a store path, linking will save space that would have otherwise been used in the copy (unless your filesystem does automatic deduplication), but it also means the input store path is now referenced by yours. So if you symlink a single file from e.g. a 1GB build output, you’re now pinning that entire 1GB build output such that it can’t be garbage-collected.

Also, nix has an option to determine if files are identical and hardlink them to save space. It’s not done by default, presumably because it’s slower (it definitely takes a while to do the initial pass, I don’t know offhand if it can do this incrementally), but it can be handy, especially if you’re running out of inodes.

You could also just try hardlinking the file yourself. I don’t generally see this done in derivations but I’m not really sure why not. My only real thought is if there’s a chance the filesystem doesn’t support hardlinks, but I don’t know if that’s likely to come up in practice. It looks like cp has a flag -l to hardlink files automatically so if disk space is a concern with your input files then using cp -l might be the best approach.

2 Likes

My recollection of directly hardlinking is, last I tried doing it it
failed in a sandbox, because nix’s sandbox relies on bind-mounting each
store path and one can’t hardlink across two different mounts (even if
down below they refer to the same filesystem).

Now, maybe things have changed in the past few years? :slight_smile:

2 Likes

The hard linking optimization works for the local filesystem, but it doesn’t improve the transfer of store paths between systems like binary caches.

On the other hand, symlinks do tend to trade less bandwidth for more network roundtrips, because a typical cache retrieval takes two http requests. Nix can only be sequential in dependency depth (as long as it is designed to minimize the number of requests).

So, as usual, it depends :slight_smile:

1 Like

Oh interesting. I generally use Nix on macOS where it can’t use bind mounts like that so I hadn’t run into that, but that would explain why packages generally copy instead of hard-linking.