ng0177
March 11, 2026, 1:25pm
1
The question is how to get loadModels to access the unstable channnel? services.ollama.loadModels? may or may not be related)
sudo nix-channel --add unstablesudo
nix-channel --update
configuration.nix
...
#https://wiki.nixos.org/wiki/Ollama
services.ollama = {
enable = true;
unstable.loadModels = ["gemma3:27b-it-qat"];
acceleration = "rocm";
};
environment.systemPackages = with pkgs; [
...
unstable.ollama-rocm
...
];
ng0177
March 12, 2026, 9:15am
2
I would like to add a feature request to enhance the documenation in https://wiki.nixos.org/wiki/Ollama as most Ollama models are not running with the “stable” version. Any proposals how to file such a request? Thanks.
There is no reason to do this, and it is not possible. The models are defined and pulled from an external service, not a nixpkgs cache.
ng0177
March 12, 2026, 1:29pm
5
magicquark:
What errors do you see?
[nixos:~]$ ollama --version
ollama version is 0.12.11
Warning: client version is 0.17.7
In the first line, we have the “stable” version. In the second, it is the “unstable” version due to unstable.ollama-rocm
Set this:
services.ollama.package = unstable.ollama-rocm;
ng0177
March 12, 2026, 2:16pm
7
To sum up:
sudo nix-channel --add https://nixos.org/channels/nixos-unstable unstable
sudo nix-channel --update
and in configuration.nix
…#https://wiki.nixos.org/wiki/Ollama
services.ollama.package = unstable.ollama-rocm;
services.ollama = {enable = true;
loadModels = [“gemma3:27b-it-qat”];
acceleration = “rocm”;
};
environment.systemPackages = with pkgs; [
…
unstable.ollama-rocm
…
];
You can simplify your configuration.
You do not need to explicitly set environment.systemPackages as the service does it for you.
Also, acceleration is deprecated in nixpkgs-unstable and in nixpkgs-stable it is recommended that you use package instead.
Thus your config becomes this:
services.ollama = {
enable = true;
package = unstable.ollama-rocm;
loadModels = [“gemma3:27b-it-qat”];
};
1 Like
ng0177
March 13, 2026, 7:22am
9
To finally complete sum up:
sudo nix-channel --add https://nixos.org/channels/nixos-unstable unstable
sudo nix-channel --update
configuration.nix:
{ config, pkgs, ... }:
let
# Import the unstable channel
unstable = import <unstable> { config = { allowUnfree = true; }; };
in
{
#https://wiki.nixos.org/wiki/Ollama
services.ollama = {
enable = true;
package = unstable.ollama-rocm;
loadModels = ["gemma3:27b-it-qat"];
};
ng0177
March 13, 2026, 7:23am
10
Sorry, the frequently used “preformatted” text option does not seem very user friendly.
ng0177
March 13, 2026, 7:25am
11
Fine. I propose to update https://wiki.nixos.org/wiki/Ollama as this is a very concise and neat way of using an up-to-date version other users will benefit from.
You can switch it in the top left to just use pure markdown.
1 Like
ng0177
March 13, 2026, 10:20am
13
Looking good now Appreciate the hint.
1 Like
You should be able to make the update yourself if you register for an account with the wiki.
ng0177
March 20, 2026, 9:29am
15
If have created an account but get round to contributing only in about one year’s time when retired. I will not forget
1 Like
I’ve stopped using ollama because it is very slow, but here’s my config.
This was described in the YouToob video: https://youtu.be/mUrVC7oo2_g?si=z2vC84h5K3-a5S7D
# nix/checks.nix
#
# CI checks for nix flake check.
# Runs go vet, tests, and validates Nix expressions.
#
# Reference: documentation/nix_microvm_design.md lines 4808-4881
#
{ pkgs, lib, src }:
let
constants = import ./constants.nix;
# Go 1.26 with greenteagc (default) and jsonv2 experimental
goPackage = pkgs.go_1_26;
# Experimental features to enable
goExperiment = lib.concatStringsSep "," constants.go.experimentalFeatures;
# Common Go environment setup (use vendored dependencies for sandbox builds)
goEnv = ''
This file has been truncated. show original
I’m using llama.cpp instead now, which is mostly because I have crappy old AMD MI60 card and so vllm doesn’t support it. There is a pending PR to add the multiple card support.
master ← randomizedcoder:llama-cpp-gfx906
opened 11:25PM - 07 Feb 26 UTC
# nixos/llama-cpp: add multi-instance support
## Summary
Refactor the llam… a-cpp NixOS module to support multiple named instances, each with independent configuration. This allows running multiple llama-cpp servers simultaneously with different models, ports, and GPU configurations. e.g. GPUs with different amount of VRAM, running different models.
Also bumped llama-cpp package from b7898 to b7951 and add NixOS VM tests (cpu based only obviously).
### Key changes
- Replace single-service model with `services.llama-cpp.instances.<name>`
- Add per-instance `rocmGpuTargets` option for AMD GPU architecture targeting
- Add automatic `gpuLayers` detection based on package GPU support (99 for GPU packages, 0 for CPU-only)
- Add `hfRepo` and `hfFile` options for Hugging Face model downloads
- Add GPU-backend-aware DeviceAllow rules (ROCm, CUDA, Vulkan)
- Add typed options: `flashAttention`, `contextSize`, `parallel`, `slots`, etc.
- Add `environment` option for per-instance environment variables
- Add NixOS VM tests for module evaluation and service configuration
### Motivation
This pull request has taken me a few days to put together and test. The original intention was to get my MI50 working, but then I found it was hard to run with both graphics cards (other card is a baby).
Therefore, I've added support for multi instances. e.g. Run one bigger model on a bigger 32GB card, and a little model on the baby 8GB card.
This will be a "breaking change", so I would understand if people aren't so keen. The change to config should be small, but if you guys would prefer to make a different service with "-multi" or similar, I'd be happy to do that. Happy to discuss the best way forward.
Testing this took a long time, because the compiling takes so long, and I tighten the systemd security, which also took time to get correct. This is about as well tested as I can do with the hardware I have. Would be nice to test on machines with fancy/expensive GPUs (that I don't have ). I have tested on AMD and Nvidia, but all these cards are pretty old.
Another thing I notice is that our nixpkgs nix is pretty out of sync with the llama.cpp repo itself. I will try to create a pull request for llama.cpp to see if we can more closely align them. ( But it's cool to see the Meta team using nix! woot woot ;) )
Anyway - I'm very excited to be able to run models locally, and so hopefully this pull request helps!
### Security hardening
All settings verified working with GPU workloads:
- DynamicUser, PrivateUsers, PrivateTmp, ProtectSystem=strict
- DevicePolicy=closed with GPU-specific DeviceAllow rules
- MemoryDenyWriteExecute, SystemCallFilter, RestrictNamespaces
- UMask=0077, ProtectHome, ProtectKernelTunables
- Security score: **1.4 OK** (systemd-analyze security)
This was tested for the cards listed below, and took quite some time to get correct.
### Example configuration
```nix
services.llama-cpp.instances = {
# Large model on MI50 (32GB VRAM)
mi50 = {
enable = true;
rocmGpuTargets = [ "gfx906" ];
port = 8090;
contextSize = 32768;
flashAttention = "on";
enableMetrics = true;
hfRepo = "unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF";
environment.ROCR_VISIBLE_DEVICES = "1";
};
# Small model on W7500 (8GB VRAM)
w7500 = {
enable = true;
rocmGpuTargets = [ "gfx1102" ];
port = 8091;
contextSize = 8192;
flashAttention = "on";
enableMetrics = true;
hfRepo = "Qwen/Qwen2.5-3B-Instruct-GGUF";
hfFile = "qwen2.5-3b-instruct-q4_k_m.gguf";
environment.ROCR_VISIBLE_DEVICES = "0";
};
};
```
---
## Testing performed (x86_64-linux)
### Hardware tested
| Machine | GPU | Architecture | VRAM | Backend |
|---------|-----|--------------|------|---------|
| l | AMD MI50 | gfx906 | 32GB | ROCm |
| l | AMD Radeon Pro W7500 | gfx1102 | 8GB | ROCm |
| l2 | NVIDIA RTX 3070 | sm_86 | 8GB | CUDA |
### Test results
| Test | Configuration | Result | Notes |
|------|---------------|--------|-------|
| **Single-instance (ROCm)** | MI50 + W7500, both cards | ✅ PASS | Single instance using both AMD GPUs |
| **Single-instance (CUDA)** | RTX 3070 | ✅ PASS | Single instance, full GPU offload |
| Single GPU (ROCm) | MI50 only, Qwen3-30B | ✅ PASS | Full GPU offload, ~88 tok/s |
| Single GPU (ROCm) | W7500 only, Qwen2.5-3B | ✅ PASS | Full GPU offload, ~62 tok/s |
| Single GPU (CUDA) | RTX 3070, Qwen2.5-3B | ✅ PASS | Full GPU offload, ~150 tok/s |
| Multi-instance (ROCm) | MI50 + W7500 separate instances | ✅ PASS | Separate ports, GPU isolation |
| Mixed architectures | gfx906 + gfx1102 | ✅ PASS | Via ROCR_VISIBLE_DEVICES |
| Security hardening | All settings | ✅ PASS | Score: 1.4 OK |
| Auto gpuLayers | GPU/CPU detection | ✅ PASS | 99 for GPU, 0 for CPU |
| Model caching | CacheDirectory | ✅ PASS | Persists across restarts |
### Systemd security verification
All hardening settings tested and verified compatible with both ROCm and CUDA:
| Setting | Value | GPU Impact | Result |
|---------|-------|------------|--------|
| DynamicUser | true | None | ✅ PASS |
| PrivateDevices | false (GPU) / true (CPU) | Required false for GPU | ✅ PASS |
| DevicePolicy | closed | None | ✅ PASS |
| DeviceAllow | char-drm, char-kfd (ROCm) / char-nvidia* (CUDA) | Required for GPU | ✅ PASS |
| SupplementaryGroups | video, render | Required for GPU access | ✅ PASS |
| MemoryDenyWriteExecute | true | None | ✅ PASS |
| PrivateUsers | true | None | ✅ PASS |
| ProtectSystem | strict | None | ✅ PASS |
| SystemCallFilter | @system-service @resources ~@privileged | @resources needed for GPU | ✅ PASS |
| ProcSubset | all (GPU) / pid (CPU) | Required all for GPU | ✅ PASS |
### Performance results
**AMD W7500 (gfx1102) - Qwen2.5-3B-Instruct:**
```
Prompt processing: 308.8 tokens/sec
Generation: 62.5 tokens/sec
```
**AMD MI50 (gfx906) - Qwen3-Coder-30B:**
```
Prompt processing: ~50 tokens/sec
Generation: ~88 tokens/sec
```
**NVIDIA RTX 3070 - Qwen2.5-3B-Instruct:**
```
Prompt processing: 743 tokens/sec
Generation: 150 tokens/sec
```
---
## Things done
- Built on platform:
- [x] x86_64-linux
- [ ] aarch64-linux
- [ ] x86_64-darwin
- [ ] aarch64-darwin
- Tested, as applicable:
- [x] [NixOS tests] in [nixos/tests].
- [x] [Package tests] at `passthru.tests`.
- [ ] Tests in [lib/tests] or [pkgs/test] for functions and "core" functionality.
- [ ] Ran `nixpkgs-review` on this PR. See [nixpkgs-review usage].
- [x] Tested basic functionality of all binary files, usually in `./result/bin/`.
- Nixpkgs Release Notes
- [x] Package update: when the change is major or breaking.
- NixOS Release Notes
- [ ] Module addition: when adding a new NixOS module.
- [x] Module update: when the change is significant.
- [x] Fits [CONTRIBUTING.md], [pkgs/README.md], [maintainers/README.md] and other READMEs.
[NixOS tests]: https://nixos.org/manual/nixos/unstable/index.html#sec-nixos-tests
[Package tests]: https://github.com/NixOS/nixpkgs/blob/master/pkgs/README.md#package-tests
[nixpkgs-review usage]: https://github.com/Mic92/nixpkgs-review#usage
[CONTRIBUTING.md]: https://github.com/NixOS/nixpkgs/blob/master/CONTRIBUTING.md
[lib/tests]: https://github.com/NixOS/nixpkgs/blob/master/lib/tests
[maintainers/README.md]: https://github.com/NixOS/nixpkgs/blob/master/maintainers/README.md
[nixos/tests]: https://github.com/NixOS/nixpkgs/blob/master/nixos/tests
[pkgs/README.md]: https://github.com/NixOS/nixpkgs/blob/master/pkgs/README.md
[pkgs/test]: https://github.com/NixOS/nixpkgs/blob/master/pkgs/test
---
## Breaking changes
This is a **breaking change** for existing users. The single-instance configuration:
```nix
# Old (no longer works)
services.llama-cpp = {
enable = true;
model = "/path/to/model.gguf";
};
```
Must be migrated to:
```nix
# New
services.llama-cpp.instances.default = {
enable = true;
model = "/path/to/model.gguf";
};
```
Or use the new Hugging Face download feature:
```nix
services.llama-cpp.instances.default = {
enable = true;
hfRepo = "Qwen/Qwen2.5-3B-Instruct-GGUF";
hfFile = "qwen2.5-3b-instruct-q4_k_m.gguf";
};
```
1 Like