Gradient is a Nix-CI System (kinda like Hydra, but less Perl and more Rust).
Gradient is quite young and has been in heavy Development for the past 3 years.
We can finally say that it is in a development state where it is deployable and we are now happy to receive more Issues to improve overall stability and performance.
We welcome to see You push Gradient to it’s limits and find all possible kind of flaws.
If there are any Issues with deployments I am happy to receive PMs and find ways to improve documentation or any other utility necessary.
At the current state of development Gradient isn’t able to replace Hydra. Hydra is a mature Project, that has tested most of the edge cases you’d hit if you try to coordinate hundreds of Builders.
That said, Gradient has the freedom to experiment with new Technologies. While Hydra still evaluates everything locally and assigns jobs almost randomly to builders, Gradient can already evaluate on multiple Workers and assign build jobs based on e.g. “How much a worker would have to substitute?”. The Main NixOS Hydra is currently hitting a limit of maximum SSH connections it can simultaneously open. Gradient has it’s own Protocol using Websockets, and transfers zstd precompressed Nars, while Hydra only uses SSH on the fly compression.
It is pretty hard to compare those two projects easily, but Gradient goals are clear:
Sorry I must have missed that. But Hydra also just opens Nix-Daemon connections via the gRPC protocol? So things like Evaluations on remote builders aren’t possible?
Not to mention (what I’ve skimmed from the readme) OIDC and streaming logs. And github integration, which currently only ofborg does, in nixpkgs at least. Also the frontend seems a lot snappier on the demo instance.
Do I understand this correctly that gradient will dispatch builds to minimize network traffic and rebuilds between workers? This is currently my biggest issue with Nix CI systems right now. Essentially, you have 3 options:
Have ~1 builder per system. Ensures no unnecessary builds and no unnecessary closure copying. Obviously not ideal for scaling.
Have as many builders as you want, and use something like pipeline matrices. Ensures maximum parallelism on output level, but causes tons of rebuilds if e.g. different outputs share common dependencies
Use something like Hercules or just a single „main” builder with a bunch of remote builders. No rebuilds, but tons of network traffic as closures are constantly copied between builders
None of them are good and oftentimes for people without too much excess hardware like myself, the best option is to just have a single builder, which ends up being faster and more efficient than trying to scale horizontally.
There is no “perfect” Scheduling and you will always have somewhere tradeoffs. Gradient tries to minimize those tradeoffs, by minimizing network traffic, but there is more room to improve. And Gradient will need better statistics to improve on the scheduling algorithm. It’s for example uncertain how long a buildJob should wait when a worker is busy, but it wouldn’t need to substitute anything. Gradient will try to build derivations first that unblock many other derivations, so all Workers can be fully utilized. There is a specific scoring system, that will apply scores to buildJobs. I am specifically calling it buildJobs, because a single buildJob consists of multiple derivations. Gradient will try to build chaining derivations directly on a single worker, so there is no need to schedule loop the chain. If you are interested on how things work I can point out to the Gradient Proto documentation. Gradient is able to detect if the substituting caches (also called upstream caches, for example cache.nixos.org is configured as default entry) is a Gradient Cache able to speak the Gradient Proto. There are plans to transfer information about almost done / currently building Jobs, so the own scheduler will delay the build and Query the remote (external) Gradient Cache later again.
I want to point out that there are many more features (some of them are not fully done):
The gradient build command: Works like the nix build (but only on flakes) command but will also evaluate on builders and copy the outputs back to your machine. There is no need to have nix or a nix store installed on the system.
The Gradient Cache: It’s S3 compatible, but will also work without. You can return different Cache Priority values based on the IP range you are accessing it from, so If you have multiple devices at Home it will prefer the fast cache in the local network, but if your not at Home it will use other configured Caches. The Cache will also act as a pull through cache, so if cache.nixos.org is configured, it will also return results for cache.nixos.org and will temporary cache it on the own instance. Also if S3 is configured Workers won’t put load on S3 for pulling cached result. They rather will use presigned S3 Urls.
a API where everything can be controlled from
The Gradient Server itself doesn’t need a nix or a nix store, so it can be run in VM without nix store
Fully Open-Source and Free under AGPL3-Only forever
@DerDennisOP I played around a bit with gradient and it seems pretty cool!
One thing that confused me, though, were evaluations. In my mind, an evaluation is a single output of my flake. For example #packages.x86_64-linux.app1. But in gradient it seems to be the result of the entire flake (with filters applied).
It’d be nice if i could see something like what nix flake show gives me, when i click an evaluation, and then being able to click a package for example to jump to its build.
This is also confusing in the CLI. Here I expected to be able to download a specific flake output, instead of accessing it by index. So for example gradient download --project org/project #packages.x86_64-linux.app1
That leads me to a very specific use case I had in mind when testing gradient:
Use gradient as a plugin in argocd to build kubernetes resources out of a flake and apply them to the cluster. The argocd plugin would just have the gradient CLI installed, and on every commit it just executes `gradient download –project org/project #packages.x86_64-linux.kubernetes-resources`.
It might also be cool if the CLI would allow downloading the result of evaluating a flake output as JSON. So instead of having to turn something into a derivation first, with writeText for example, gradient could just let me download the result as JSON.
If you want you can create a GitHub issue, where further discussions can happen about the gradient download command, also the newest master just got a Gradient CLI refactor, so you might want to check that out also (if you haven’t it already).
You can always use the API directly, that’s actually the recommended way of interaction, if you plan to integrate Gradient somewhere. I will take a --json flag into consideration, since it seems like a important feature.
For Gradient point of view it is a single Evaluation. I can see that it leads to a bit of confusion, but I don’t really can think of an alternative name. Also most testers understood quickly what is meant by evaluation in Gradient. In Gradient Terminology what you think of an Evaluation is called a Entry Point.
A nix flake show like feature is already being planned, but isn’t high Priority.
If you had Problems / found any minor Bugs I would be happy to see your Issues on GitHub, that one way to improve the overall experience for everyone.