recently I have bought a GTX1060 for some CUDA action for Ollama.
I looked up that the services.ollama.acceleration gets set to cuda if nixpkgs,cudaSupport is set to true, so I tried setting that to maybe get some system-wide benefits.
Also I have added cachix with cuda-maintainers cache, but it did not seem to speed up the build.
After some hours building (~ 10k/14k jobs done?) it OOMd.
I find that weird since I have ~31 GB RAM and 32 GB SWAP.
Right now I am building it once again but I suspect it will fail.
I cheeked systemd-oomd logs and it did not kill anything…
I have set services.ollama.acceleration to "cuda" and comething compiled.
I think the acceleration works now.
I have decided to try and enable nixpkgs.config.cudaSupport.
Here’s what happens:
# nixos-rebuild test
building Nix...
building the system configuration...
trace: warning: cudaPackages.autoAddDriverRunpath is deprecated, use pkgs.autoAddDriverRunpath instead
trace: warning: cudaPackages.autoAddDriverRunpath is deprecated, use pkgs.autoAddDriverRunpath instead
trace: warning: cudaPackages.autoFixElfFiles is deprecated, use pkgs.autoFixElfFiles instead
trace: warning: cudaPackages.autoAddOpenGLRunpathHook is deprecated, use pkgs.autoAddDriverRunpathHook instead
activating the configuration...
setting up /etc...
reloading user units for egycobra...
restarting sysinit-reactivation.target
the following new units were started: dev-disk-by\x2duuid-<UUID>.device, srv-nfs-outer.mount
If I disable cudaSupport the warnings disappear.
Also I have possibly encountered another problem:
# nixos-option nixpkgs.config.cudaSupport
error: error: At 'cudaSupport' in path 'nixpkgs.config.cudaSupport': error: Attribute not found
An error occurred while looking for attribute names. Are you sure that 'nixpkgs.config.cudaSupport' exists?
When I set nixpkgs.config.cudaSupport = true then ollama.service.acceleration seems to not detect it:
# nixos-option services.ollama.acceleration
Value:
null
Default:
null
Type:
"null or one of false, \"rocm\", \"cuda\""
Example:
"rocm"
Description:
''
What interface to use for hardware acceleration.
- `null`: default behavior
if `nixpkgs.config.rocmSupport` is enabled, uses `"rocm"`
if `nixpkgs.config.cudaSupport` is enabled, uses `"cuda"`
otherwise defaults to `false`
- `false`: disable GPU, only use CPU
- `"rocm"`: supported by most modern AMD GPUs
- `"cuda"`: supported by most modern NVIDIA GPUs
''
Declared by:
[ "/nix/var/nix/profiles/per-user/root/channels/nixos/nixos/modules/services/misc/ollama.nix" ]
Defined by:
[ "/nix/var/nix/profiles/per-user/root/channels/nixos/nixos/modules/services/misc/ollama.nix" ]
You need to build and switch to your system once with the cache enabled before it takes any effect, so you should only enable CUDA for packages after you switch.
To do this, you can use nixos-rebuild test, which will switch the system but won’t add an entry to your bootloader menu.
However, even if the CUDA cache is enabled (which definitely helps), you’d still need to compile the packages that aren’t cached.
This is only the case if services.ollama.acceleration is null (which is the default). If you set it to "cuda", then that should be enough to enable CUDA for ollama.
Having this enabled globally, your system will try to compile any package that has CUDA support, which might be a little resource-intensive and which might have caused your system to be OOM.
What I recommend is to just set this on a per-package basis. For example, if ollama didn’t have the acceleration attribute, you’d override the package with:
In general, you need to check how CUDA is enabled in the derivation. Sometimes there is a cudaSupport attribute, other times it’s config.cudaSupport or you might even find an option like services.ollama.acceleration that makes this much easier.
If CUDA were not enabled, enableCuda would have been false and cuda_nvcc wouldn’t have been added to the nativeBuildInputs, but we can see that it’s working here.
I have not heard of nix repl before so thank’s for introducing it to me.
I see that you are loading a flake there and I do not use flakes.
How should I load the variables?
I have managed to get these variables with nixos-option:
# nixos-option nixpkgs.config
Value:
{
allowUnfree = true;
cudaSupport = true;
}
# nixos-option services.ollama.acceleration
Value:
null
Default:
null
Type:
"null or one of false, \"rocm\", \"cuda\""
Example:
"rocm"
Description:
''
What interface to use for hardware acceleration.
- `null`: default behavior
if `nixpkgs.config.rocmSupport` is enabled, uses `"rocm"`
if `nixpkgs.config.cudaSupport` is enabled, uses `"cuda"`
otherwise defaults to `false`
- `false`: disable GPU, only use CPU
- `"rocm"`: supported by most modern AMD GPUs
- `"cuda"`: supported by most modern NVIDIA GPUs
''
I am not that good at reading nix and I appreciate nixos-option for interpreting the code that you linked.
That’s obvious and it works.
This made me rethink what I thought before.
If these condions are met ollama enables CUDA but leaves acceleration = null, right?
This seems useful, but I don’t think it works with flakes.
I think you can just use :l . without the f in the repl or just run nix repl -f ~/nixos-config from the commandline. Afterwards, you can just hit tab and it will show you the available attributes you can access.
Alternatively, there are useful tools that allow you to visualize options in a TUI, like nix-inspect, which also allows you to set bookmarks for frequent paths (like nixosConfigurations.nixos.config in my case).
Indeed. It just uses acceleration to check what it needs to enable, but it doesn’t change it. It’s up to the user to do that.
Having an option like this is mainly useful for people who just want to enable CUDA for ollama without globally enabling cudaSupport and without having to override the package attributes (pkgs.ollama.override { ... };).
From playing around with it I notice that I cannot see more than with nixos-option.
I would like to do it but I don’t know how to load any of my config files since hey begin with { ... }:. Here’s what happens:
nix-repl> :l .
error: opening file '/etc/nixos/default.nix': No such file or directory
nix-repl> :l configuration.nix
error:
… from call site
at «none»:0: (source not available)
error: function 'anonymous lambda' called without required argument 'config'
at /etc/nixos/configuration.nix:1:1:
1| { config, lib, pkgs, ... }:
| ^
2|
I am afraid that the issue still persists.
nixos-upgrade service was unsuccessful (it’s return value idicated that it was OOMd).
Now i have started the upgrade manually and it is not going well.
as of right now it froze at [ 55%] Building CXX object modules/wechat_qrcode/CMakeFiles/opencv_wechat_qrcode.dir/src/zxing/common/decoder_result.cpp.o and htop shows that 30.3/30.7G RAM and 32/32G SWAP usage.
Is there a way to limit the worker count? I think it could help.
This was printed to the terminal:
FAILED: CMakeFiles/magma.dir/magmablas/zgetf2_kernels_var.cu.o
/nix/store/fa8aq5yhk7hnd8bhg2r564iz3l41hs7q-cuda_nvcc-12.2.140/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/nix/store/c6wk0nxbrwb5hamgxlfqsgi37gcn9752-gcc-wrapper-12.3.0/bin/c++ -I/build/magma-2.7.2/build/include -I/build/magma-2.7.2/include -I/build/magma-2.7.2/control -I/build/magma-2.7.2/magmablas -I/build/magma-2.7.2/sparse/include -I/build/magma-2.7.2/sparse/control -I/build/magma-2.7.2/testing -isystem /nix/store/17yql84kfrzd3pxsw6agj5dk2gqrzxlf-cuda_nvcc-12.2.140-dev/include -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_60,code=[compute_60,sm_60]" "--generate-code=arch=compute_61,code=[compute_61,sm_61]" "--generate-code=arch=compute_70,code=[compute_70,sm_70]" "--generate-code=arch=compute_75,code=[compute_75,sm_75]" "--generate-code=arch=compute_80,code=[compute_80,sm_80]" "--generate-code=arch=compute_86,code=[compute_86,sm_86]" "--generate-code=arch=compute_89,code=[compute_89,sm_89]" "--generate-code=arch=compute_90,code=[compute_90,sm_90]" "--generate-code=arch=compute_90a,code=[compute_90a,sm_90a]" --compiler-options -fPIC,-DADD_ -MD -MT CMakeFiles/magma.dir/magmablas/zgetf2_kernels_var.cu.o -MF CMakeFiles/magma.dir/magmablas/zgetf2_kernels_var.cu.o.d -x cu -c /build/magma-2.7.2/magmablas/zgetf2_kernels_var.cu -o CMakeFiles/magma.dir/magmablas/zgetf2_kernels_var.cu.o
nvcc warning : incompatible redefinition for option 'compiler-bindir', the last value of this option was used
nvcc error : 'cicc' died due to signal 11 (Invalid memory reference)
nvcc error : 'cicc' core dumped
As of right now I have decided to scale down my config until I get config.nixpkgs,cudaSupport = true; to work.
I have also disabled limiting max-jobs and cores.
Paperless
First compilation happened with paperless-ngx and RAM usage seemed to stay around (most of the time under) 5G for the most time.
It froze at
[ 97%] Linking CXX executable ../../bin/opencv_test_stitching
[ 97%] Built target opencv_test_stitching
[ 97%] Linking CXX shared library ../../lib/libopencv_cudaobjdetect.so
[ 97%] Built target opencv_cudaobjdetect
[ 97%] Building CXX object modules/cudaobjdetect/CMakeFiles/opencv_test_cudaobjdetect.dir/test/test_objdetect.cpp.o
[ 97%] Building CXX object modules/cudaobjdetect/CMakeFiles/opencv_test_cudaobjdetect.dir/test/test_main.cpp.o
[ 97%] Linking CXX executable ../../bin/opencv_test_cudaobjdetect
[ 97%] Built target opencv_test_cudaobjdetect
I got no CPU load and no disk RW for a while and then one cicc maxed out one core.
Then it emerged fine.
Ollama
No compilation, just gets pulled in from cuda maintainers cache I guess.
open-webui
Here’s the trouble.
15:27 - nixos-rebuild start time
15:27 - [1/3430] things start, 8GB RAM & 600MB swap used
15:30 [298/3430] - 12G RAM used, swao still at 600MB
15:35 [359/3430] - 29.2 GB of RAM & 9.5GB of swap used
Next day it was already built. nixos-rebuild test showed errors about failing to connect to dbus but rebooting fixed this.
Summary / solution
Turning on nixpkgs.config.cudaSupport1 led to OOM errors during nixos-rebuild.
I have managed to get a successfull rebuild by turning on components of my config one by one, so that they don’t happen in pararell.
Thank you to all the people that engaged with this post.