Is my S3 cache setup sensible?

brodriguesco · November 13, 2024, 9:56am

I want to set up a publicly available cache for R packages.

The whole R package set contains more than 20’000 and packages get fixed slowly but surely. What I want to do is backport fixes and propose older versions of packages with the fixes. I have made pretty good progress, and have a set of core package working on both x86-linux and aarch64-darwin going back to August, 2021. I have these packages working for half a dozen dates per year.

Because compilation of certain packages take a lot of time, I set up an experimental publicly available cache. I would like to have your opinion on my approach because I would like to avoid doing something that is wrong or not as efficient as it could be.

Here is the workflow:

packages get built on Github Actions for Linux and Darwin
they get pushed to an S3 bucket (hosted on digital ocean)

The build part on Github Actions is performed like this:

I use secrets to set up my credentials:

      - name: Create AWS Credentials File
        run: |
          mkdir -p ~/.aws
          echo "[digitalocean]" > ~/.aws/credentials
          echo "aws_access_key_id=${{ secrets.AWS_ACCESS_KEY_ID }}" >> ~/.aws/credentials
          echo "aws_secret_access_key=${{ secrets.AWS_SECRET_ACCESS_KEY }}" >> ~/.aws/credentials

I also have the aws config hosted on Github and move it to the right place:

      - name: Move config file to right place
        run: |
          mv aws/config ~/.aws

Using secrets again, I set up my private key to sign packages:

      - name: Create nix cache secret key file to sign packages
        run: |
          touch cache-priv-key.pem
          echo "${{ secrets.NIX_CACHE_SECRET_KEY }}" >> cache-priv-key.pem

      - name: Sign all store paths
        run: nix sign-paths --all -k cache-priv-key.pem

Finally, I push a package (`lolhello):

 - name: Push lolhello
        run: |
          nix copy $(nix-store --query --requisites --include-outputs $(nix-store --query --deriver ./result))
          --to 's3://rstats-on-nix-cache?profile=digitalocean&scheme=https&endpoint=fra1.digitaloceanspaces.com' 
          --option narinfo-cache-positive-ttl 0

If you’re curious, you can find the repository here.

For now, I’m not yet pushing R package, just experimenting with a single modification of the hello package as mentioned. So instead of pushing a single package I would need to push all of the store contents.

My questions are the following: is this approach sensible? Could it be made “better”? What would you suggest should be the priority of the cache? cache.nixos.org is at 40 if I’m not mistaken, would my cache need to have higher priority?

Also, in my tests, the cache is very slow, is there some kind of option in nix-cache-info that I could set to make querying faster?

thefossguy · November 13, 2024, 11:59am

Is it slower than your bandwidth between your host and the caching machine?

brodriguesco · November 13, 2024, 8:07pm

What I meant is that downloading from the cache from another machine is very slow, also querying it takes quite some time. Much slower than other caches I have used.

thefossguy · November 14, 2024, 4:07am

Try hosting a large-ish (~200 MB) file on the machine hosting your Nix cache and download it using your host computer (I prefer aria2). If the speed you get with download is just as slow, it’s your bandwidth between your cache and host to blame, contact your ISP.

If the download speed is faster, you can try bumping up the priority for your cache. Since the first-party binary cache (cache.nixos.org) is pre-configured by nixpkgs, I override it like so:

nix.settings.substituters = lib.mkForce [
  "s3://rstats-on-nix-cache?profile=digitalocean&scheme=https&endpoint=fra1.digitaloceanspaces.com?priority=10"
  "https://cache.nixos.org?priority=40"
];

Also, pkgs.nix-serve-ng has some improvements, make sure you’re using that over pkgs.nix-serve.