Using NixOS in an isolated environment

Hi! I will work on a scientific base on Kerguelen Islands for a full year from next November. There, internet access is provided through a low-bandwidth VSAT connection and reserved for professional activities, so I won’t be able to use it to install software for my personal projects.

I want to use this constraint as an opportunity to study the use of NixOS in an isolated environment. My goal is to be able to update my configuration, install new software and deploy NixOS machines without relying on an internet access.

What I already know

So far, I know that setting up a local NixOS channel is as easy as getting https://nixos.org/channels/nixos-19.03/nixexprs.tar.xz. I can also copy the corresponding binary cache locally by running:

curl -L https://nixos.org/channels/nixos-19.03/store-paths.xz \
    | xz -d \
    | xargs \
      nix copy --store https://cache.nixos.org/ \
               --to file:///path/to/nixos-19.03/

Now, that works great if I want to install something that is in the binary cache. And here start my questions.

What I want to know

  1. What exactly is included in the binary cache for a full channel, like nixos-19.03?

    On this first question, I know that any variant of a derivation—by updating its compile flags for instance—is not included. However, its source is cached so I can build it, so it’s not an issue.

  2. For what is not included, are the sources cached?

    If a source is cached, I can build it in an isolated environment, so that’s great.

  3. How can I cache the full set of sources?

    This would permit to install anything that is referenced in nixpkgs, which would be pretty neat. However, I don’t know if it is possible, and how much space it would require.

  4. What about the different architectures?

    How are managed the different architectures, like x86_64 vs aarch64? I would like to be able to setup aarch64 machines too if I want.

Any insight on these subjects is well appreciated :slight_smile:

9 Likes

I would be really interested in how big that cache would be (without sources and with sources).

The sources are also just derivations and as such they are also cached. One possible starting point could be to write a nix functions that walks over all derivations/attributes in the package set and and extracts src.outPath from each of them. You can then create a list and feed that to your nix copy solution above. However, this might miss some fetch* statements that are not directly placed as src attributes.

Another possibility may also be to generate a list of package that nixpkgs contain, and then, for each package, evaluate the package (without compiling, with readonly), with a modified fetchurl function that output the source and the hash of the url. You can then fetch these path with a script.

I’ve finished simple Nix copies for nixos-19.03-small and nixos-19.03, the sizes are here, showing it is not that big:

NAME                                    REFER  COMPRESS  RATIO
helios/test/cache                       1,11M       lz4  1.01x
helios/test/cache/nixos-19.03           71,0G       lz4  1.01x
helios/test/cache/nixos-19.03-small      572M       lz4  1.02x

However, I don’t know (yet) the proportion of what is not included.

@tilpner sent me some scripts on #nix:matrix.org, and notably this repository. I’ll dig into it, maybe there are some interesting things there.

:open_mouth: 71 GB sounds pretty small but those are then all xz compressed nar files. Looking at the output paths this looks like it is only final build products but no sources (which should be sufficient to install packages on a running system?).

I am actually in a similar situation (want to set up a local cache for a location with horrible internet connection) . I wondered if I should mirror tarballs.nixos.org as well. What would be the preferred method for that?

1 Like

I think it would be good to have one indeed. The full tarballs.nixos.org must be really big though: what we need is only the tarballs for a given version of nixpkgs. For this, we want to get all the external files referenced by all the derivations we want to cache, then download them.

I haven’t worked on it lately as I was on vacation, and now I still need to do a lot of other things. I try to advance on this subject in parallel to the other parts of my preparation.

The blocking points for me right now are:

  1. For a given derivation, how do we know its external dependencies?
  2. What is the set of derivations that we want to include? For instance, many packages in texlive.combined.scheme-full are not in the binary cache, but this is typically something that would be interesting to have in the tarballs. Same for beam.packages.erlangRxx.*: not everything is pre-compiled with every erlang version available here, but I’d like to have the sources so I can compile them offline.
  1. For a given derivation, how do we know its external dependencies?

I would do a nix-store -q -R to find all the dependencies, then do nix-store -q -b outputHash to check which ones are fixed-output. Hopefully Nixpkgs checkout + all fixed-output dependencies should be enough.

  1. What is the set of derivations that we want to include? For instance, many packages in texlive.combined.scheme-full are not in the binary cache, but this is typically something that would be interesting to have in the tarballs. Same for beam.packages.erlangRxx.*: not everything is pre-compiled with every erlang version available here, but I’d like to have the sources so I can compile them offline.

I guess you would need to write a more aggressive version of release.nix… Basically, you tryEval the Nixpkgs (this should succeed) and find out all the attribute names in the package set. Then for every attribute you tryEval it and check if there is outPath (if yes: one more derivation found, most likely) and also list all its subattributes to check later. Rinse, repeat. Of course you’d better check that the data values are attribute sets before recursion.

Until one will need a new package, for example to set up a VPN to get access to the network (I guess it is exactly the case of recent post of Rebuild NixOS Offline as I hit this problem few times as well).

Finding ALL possible fixed-output derivations (even not included to release.nix) looks doable. Challenging but doable.

And besides solving the problem with isolated environments it would be useful to have a bot to check if fixed-output derivations still have valid hash.
(I bet that fetchurl { url = http://download.processing.org/reference.zip; sha256 = ... } does not, it is updated monthly. And there must be many other like this. It would be useful to have a list of problem spots, published on Hydra or ryatm-bot)

Adding here to the idea here to use tryEval over nixpkgs:

  1. use nix-instantiate to then instantiate all derivations. This should give a list of all relevant derivations.
  2. iterate with nix-store -qR over all drv files to get all input/dependency paths (which includes all sources).
  3. Download all of the store paths from step 2 with nix-store --realise from the cache.

That should put all derivations in the store path to rebuild all packages offline(?)

1 Like

I’m told the sum total of derivations in the caches it 180TB. It’s never been garbage collected. :smiley:
Don’t forget to have a checkout of the nixpkgs repository too!

  1. iterate with nix-store -qR over all drv files to get all input/dependency paths (which includes all sources).
  2. Download all of the store paths from step 2 with nix-store --realise from the cache.

Erm, you really want to select only the fixed-output derivations — there are quite a few long-build packages that are also not in the cache.

Actually, it is a good question whether one wants to include the sources of previously-working broken packages…

I am afraid a single nix-instantiate run would not be enough, some FOD may be excluded by
if stdenv.isAarch64 then fetchpatch {}
or
if config.use_pulseaudio then fetchpatch {}

The task is closer to test coverage or fuzzers

1 Like

That would address one the original questions: how do you get everything into your store so that you can re-build a random package of your choice? For that I need all FODs, right?

OK, so even this one may still miss some FODs.

sidenote: this could be useful in high security contexts where you don’t want to ever connect a machine to the network?

2 Likes