Cache.nixos.org responds with 503

Right now when running a command that has to connect to cache.nixos.org that command fails with warning: unable to download 'https://cache.nixos.org/nar/0g3mvx4rg81g9fdcjc5822v14vf73lnr84fcbxa8jdgciqa1m3qk.nar.xz': HTTP error 503; retrying in 320 ms
Following the link in a browser reveals the following:

Error 503 Response object too large

Response object too large

Guru Mediation:

Details: cache-muc13957-MUC 1645387388 1282033041
Varnish cache server

I ran into this while trying to update my machine to latest unstable.
I do not know where else to report this, I hope this place is okay.
Can I do anything about it?

2 Likes

Can confirm the issue with that file. Did you notice the note on the cache.nixos.org homepage? thereā€™s a script you can run with diagnostics and then file an issue in the same repo.

1 Like

Thanks for linking me to those scripts.
I have opened a bug report in the repo.
https://github.com/NixOS/nixos-org-configurations/issues/207

1 Like

What package or narinfo references that NAR?

From the logs it looks like it is z4r9j5ld8fx3ksgyb53hp7nwdxy3zjpd-cuda_10.2.89_440.33.01_linux
Full logs are here:

Yes, seems to be it:

curl cache.nixos.org/z4r9j5ld8fx3ksgyb53hp7nwdxy3zjpd.narinfo
StorePath: /nix/store/z4r9j5ld8fx3ksgyb53hp7nwdxy3zjpd-cuda_10.2.89_440.33.01_linux.run
URL: nar/0g3mvx4rg81g9fdcjc5822v14vf73lnr84fcbxa8jdgciqa1m3qk.nar.xz
Compression: xz
FileHash: sha256:0g3mvx4rg81g9fdcjc5822v14vf73lnr84fcbxa8jdgciqa1m3qk
FileSize: 2613830740
NarHash: sha256:0l4jm51iilswgifkibrqsh6nsj2z9y7qkv0lfzw8lgls55pasc7k
NarSize: 2645419504
References: 
Deriver: bpdv0hmi10b55absdcww3s3j1d8jvbjf-cuda_10.2.89_440.33.01_linux.run.drv
Sig: cache.nixos.org-1:CU00nI34U3TO8jJV28hbzYRJ6Eus7RTsjZzWoXkC0d7VZmAiJTWHRkCVU3EkjSLXqiuE+0XONZwl4QVOkDx5BA==
CA: fixed:sha256:04fasl9sjkb1jvchvqgaqxprnprcz7a8r52249zp2ijarzyhf3an

Turns out in my case it was the package nvtop which depends on cuda.
Removing it from my system build resolved the issue.

Still an issue, prevents me from upgrading my system.

This seems quite big, is there any workaround?

1 Like

Disable use of binary cache, wait for that having been built, then cancel and restart with binary cache enabled.

That should help at least temporarily.

2 Likes

In my case, I could

  1. Run nixos-rebuild
  2. Stop the name of the .drv file for the .run file (e.g. /nix/store/bpdv0hmi10b55absdcww3s3j1d8jvbjf-cuda_10.2.89_440.33.01_linux.run.drv)
  3. Build it without cache (e.g. $ nix build --option build-use-substitutes false /nix/store/bpdv0hmi10b55absdcww3s3j1d8jvbjf-cuda_10.2.89_440.33.01_linux.run.drv)
  4. Run nixos-rebuild
2 Likes

So it seems like cache.nixos.org is trying to store the entire .run file, which isnā€™t necessary esp. since cache.nixos.org refuses to store the actual cudatoolkit build.

Is there any way we can get cache.nixos.org to just not store the massive .run file?

2 Likes

Whatā€™s the earliest package version this will work with?

Also, cache.nixos.org needs to provide a cache miss opposed to failing

1 Like

Honestly I have no ideaā€¦ Iā€™m not sure that there is one since these .run files are massive.

Is there a test that I could do locally to check whether or not it would work? That way I could bisect to find you a commit if there is one.

Is there a way to get Cudatoolkit 10 on NixOS?

I am on NixOS 21.11 channel. The ā€œstableā€ Nvidia driver is 495.44. There is some sort of issue with this driver that causes the following logs:

[    3.331741] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  495.44  Fri Oct 22 06:13:12 UTC 2021
[    4.236993] NVRM: GPU 0000:04:00.0: RmInitAdapter failed! (0x26:0x56:1479)
[    4.237039] NVRM: GPU 0000:04:00.0: rm_init_adapter failed, device minor number 0
[    5.785493] NVRM: GPU 0000:04:00.0: RmInitAdapter failed! (0x26:0x56:1479)
[    5.785597] NVRM: GPU 0000:04:00.0: rm_init_adapter failed, device minor number 0
[    6.317957] NVRM: GPU 0000:04:00.0: RmInitAdapter failed! (0x26:0x56:1479)
[    6.318101] NVRM: GPU 0000:04:00.0: rm_init_adapter failed, device minor number 0
[    7.498050] NVRM: GPU 0000:04:00.0: RmInitAdapter failed! (0x26:0x56:1479)
[    7.498195] NVRM: GPU 0000:04:00.0: rm_init_adapter failed, device minor number 0
[    8.031296] NVRM: GPU 0000:04:00.0: RmInitAdapter failed! (0x26:0x56:1479)
[    8.031370] NVRM: GPU 0000:04:00.0: rm_init_adapter failed, device minor number 0
[    9.198974] NVRM: GPU 0000:04:00.0: RmInitAdapter failed! (0x26:0x56:1479)
[    9.199111] NVRM: GPU 0000:04:00.0: rm_init_adapter failed, device minor number 0
[    9.732844] NVRM: GPU 0000:04:00.0: RmInitAdapter failed! (0x26:0x56:1479)
[    9.732950] NVRM: GPU 0000:04:00.0: rm_init_adapter failed, device minor number 0

I have tried switching to the legacy_470 driver, but CUDA Toolkit 11 complains of compatibility issues (CUDA_ERROR_COMPAT_NOT_SUPPORTED_ON_DEVICE = 804).

I would like to try cudatoolkit_10 to see if it will work with the 470 driver, but this 503 error makes it so I canā€™t download it.

OOC, what model GPU are you trying to use?

GTX 1050 Ti

I am connecting it to the ExpressCard slot of a Thinkpad T430 as an eGPU so maybe that is part of the problem. It always show up in lspci with Kernel driver in use: nvidia. It appears to works fine with legacy_470 and legacy_390, it at least can run OpenGL applications, but really I got it to use with CUDA for Tensorflow. But with the 495 driver and the above errors the system falls back to integrated graphics and nvidia-smi reports ā€˜No devices were foundā€™ The other option may be to try NixOS unstable and the 510 driver available there, but I donā€™t know if I can do that ā€œin-placeā€ without messing up my development system and being able to roll back to 21.11.

Nvidia fades out support for certain models in progressive driver updates. You can check your device/driver version constraints on their website. I wouldnā€™t be surprised if there was an upper bound on the driver version youā€™ll need.

In other words, I donā€™t think youā€™re doing anything wrong here

As far as I know the GTX 1050 Ti is supported by the 510.54 driver.

This 503 should not happen since a while ago, thanks to @zimbatm.

2 Likes