Nix, Flox, Nvidia Opening Up CUDA Redistribution on Nix

Just landed back from NixCon, somehow every year is even more beautiful.

I know that many of us in the Nix community dreamed of a world where Nix’s virtues of reproducibility, determinism, and portability would extend to CUDA projects.

We knew this was possible. We also knew it would require alignment and approval with NVIDIA.

The team at Flox has been working with NVIDIA over the past months on an agreement to redistribute CUDA Toolkit packages, in partnership with the NixOS Foundation, Nix Steering Committee (SC), Nix CUDA team, and a bunch of other teams and individuals across the community!

As of today, you can pull prebuilt, prepatched CUDA packages (including the CUDA Toolkit, PyTorch, TensorFlow, TensorRT, OpenCV, ffmpeg, and more) from Flox’s binary cache. These are unmodified builds from canonical Nixpkgs definitions, ready to use as soon as they’re downloaded from cache.flox.dev. CUDA packages that once took hours to build can be available for use basically as soon as they’re done downloading.

There are a few nuances to understand, though, so keep reading :slight_smile:

How to Use

Flox is opening its binary cache to the Nix community, so users of both the Nix package manager and NixOS will be able to pull and run prebuilt, prepatched user-space CUDA packages. Users of NixOS can get prepatched NVIDIA CUDA display drivers, too.

You just need to add cache.flox.dev to Nix’s/NixOS’s list of extra-substituters.

With extra-substituters, Nix always checks cache.nixos.org first, and only uses Flox’s cache for packages not found upstream. For the Nix package manager, extra-substituters lives in nix.conf; on NixOS it lives in configuration.nix. Add the following to the appropriate config file:

extra-trusted-substituters = https://cache.flox.dev
extra-trusted-public-keys = flox-cache-public-1:7F4OyH7ZCnFhcze3fJdfyXYLQw/aV7GEed86nQ7IsOs=

There are some nuances here. If you’re declaring specific versions of upstream CUDA dependencies, or if you’re tracking a moving channel (like nixos-unstable), you’ll need to take extra steps. The Flox blog has a helpful rundown on how this works and what you need to do.

This is just the first phase, released as early access. Over the next few weeks we’ll be shipping improvements. Your feedback—especially teams using CUDA for commercial purposes—will guide what we do next. If you run into problems, have questions, or want to see something specific, reach out and let us know.

It Takes a Community

This is the culmination of years of selfless work by the Nix CUDA team, non-stop encouragement from users of Nix, and months of discussion with leadership at NVIDIA. It’s finally happened: NVIDIA is recognizing Nix as one of the preferred paths for CUDA!

NVIDIA’s recognition makes Nix an officially supported path for CUDA, and the partnership between NVIDIA and Flox together with the NixOS Foundation and SC means that it’s now available to Nix users everywhere. Flox commits to maintaining this partnership, funding the required infrastructure, and doing the work necessary to maintain the binary cache.

Link to NVIDIA announcement - Developers Can Now Get CUDA Directly from Their Favorite Third-Party Platforms | NVIDIA Technical Blog

There’s still a lot more to do, and plenty of ways to be involved. The Nix CUDA team is at the core of this effort and contribution is welcome. If you’re interested in volunteering, supporting or helping to fund the critical work the Nix CUDA team is doing, please reach out directly to them or the SC/Foundation channels.

This is just another step in a Sustainable Nix.

A huge thank you! to everyone who packaged, tested, contributed, supported, advocated and/or pushed for this. And a shout out to NVIDIA for recognizing the critical role Nix plays in how teams build and run CUDA workloads.

Additional big :heart: to @SergeK @ConnorBaker @samuela @Ericson2314 @Infinisil @tomberek @winter @jtojnar @lassulus @ryantrinkle @edef @roberth @ra33it0
All the other folks, just drop me a note if I can add your names!

73 Likes

That’s a welcome development!

Does this have any relation to or implications for the existing Nix Community cache that has added CUDA caching less than a year ago?

3 Likes

This is amazing news, thanks to all the different parties involved for the hard work they’ve put in!

1 Like

Woohoo! That’s awesome!

1 Like

Any chance this agreement can be extended to the NixOS organization, so that we can redistribute hydra.nixos.org builds through cache.nixos.org directly?

8 Likes

Exactly where I hope this can lead us. There’s a few steps along the way and this is the first one. We have also been thinking about additional steps after that becomes the standard, things going further into testing as well.
If you’re using CUDA, the CUDA team and I would love to hear more.

9 Likes

This is super! Thank you!

1 Like

Hi! Not unlike Flox, Nix-Community remains an independent project, “the community playground”, and keeps pioneering the CUDA and “hardware acceleration” CI efforts. I do not mean to spoil future announcements, but, right at this moment, the Nix-Community infrastructure team is doing some of the crucial work towards deploying Nixpkgs’ first GPU-enabled CI!

There’s an important point to be made, also relating to Nix-Community Cachix. NVIDIA’s current recognition of Flox, rather than the Foundation, is still less than what either of us (including, perhaps, NVIDIA itself) truly needs for long-term sustainability. This is, however, a remarkable step forward from the grey area of individual resolute people having to keep things cached and tested in spite of legal ambiguity. Surely, only the first step!

To sum up, I’d personally describe the Flox-NVIDIA deal as a political advancement: makes us more “legible”, palatable to sponsors; opens new communication channels. We’re more “mainstream” now, for better and for worse. Meantime, Nix-Community is effectively a reconnaissance mission.

Thank you Ron, and thank you everyone involved. I particularly appreciate that, in addition to relentless work, there’s also been a lot of compromise and risk involved. And thank you to Nix-Community, especially @zimbatm, @zowoq, and @Mic92.

EDIT: Expanded for more recognition for Nix-Community. All opinions are personal and do not represent other parties.

17 Likes

Woo, congrats! Having a prebuilt cache and getting us closer to out-of-the-box hydra.nixos.org support is a step forward!

4 Likes

Congrats to everyone involved!
As a regular contributor to the python deep-learning stack in nixpkgs, I’m thrilled that the infrastructure around CUDA is getting better.

It’s always a pleasure to collaborate with Connor (and Serge too!). I was happy to meet the other CUDA team members at NixCon this year.
It’s such a great thing that Connor gets to work on making the CUDA stack better as part of his day job. Efforts like this are welcomed.

6 Likes

This is great! Thanks for the work put into getting this set up and the compute time!

I do feel that some extra information would be helpful. For example, in the Nix-community Nvidia Cache announcement, there were instructions like

that made it pretty clear you needed to update your derivations as well by modifying the nixpkgs config and which also helped you understand what exactly was being cached (since I’m not aware of a good way to introspect most binary caches).

Is the Flox Nvidia cache the same, in that the only real difference in generating the derivation is adding the cudaSupport flag to nixpkgs.config?

Can we get more clarity on what this parternrship involved? While I understand why this can be considered as a first step towards the binary packages making it to cache.nixos.org, it still isn’t clear to me why the foundation and the SC had to be involved on a matter involving a third party cache (cache.flox.dev).

As a bonus to why this feels weird, the only mention of Nix on the Nvidia announcement points to the unofficial wiki instead of wiki.nixos.org.

4 Likes

Yes, you are right, to trigger the substitution of cuda enabled packages, nixpkgs needs to be configured to evaluate to pacakges with cuda enabled.
When using Flox this is already done for pacakges in the flox-cuda/* catalog, nix users need to set this accordingly.
We’ll update the docs, thanks for flagging!

1 Like

To avoid the cross reference, the same instructions as for the nix-community cache apply:

Mind that flox tracks nixpkgs’ “nixos-unstable” channel in github:flox/nixpkgs, so if evaluations of your nixpkgs reference differ (i.e. revisions that are not yet built by flox) you might need to follow github:flox/nixpkgs/unstable.

1 Like

It’s a legal requirement on NVIDIAs side. They have agreements with customers that there is someone that is legal liable. So when they allow other vendors to ship their software they need to be liable as well. In particular they want a for-profit company that can be sued in case something goes wrong with the packaging.

4 Likes

Definitely. Two reasons (and I’ll keep it short :sweat_smile:).
First, I’m a big believer in open, transparent collaboration with the community. As soon as this started to look real, I wanted the right folks looped in Foundation, SC, the Nix CUDA team so we could include their opinions and ship something that actually works for users. Where possible, we set Flox up to carry the legal/financial/infra lift so the first step would be easy for everyone else. In my mind this is also the beta testing grounds for future plans around working with orgs/community. (As always all feedback welcome!)
Second, when a company the size of NVIDIA puts an “official” stamp on something, it can create pressure on the ecosystem. We have to be thoughtful about that. Bringing the community groups in early kept us aligned, avoided surprises for the CUDA team, and made them true partners in the work (they did a ton).
Hope that gives a clearer view into my thinking.

Also, great catch on the link. I just asked them to change that.

12 Likes

I must say, @ron , when you came up on the scene and was suddenly inducted into Foundation, I was sceptical on why an outsider is given such prominence.

While I have reservations about flox in particular, having you around has helped the community and I’m thankful about the way this particular task was accomplished. Well done!

2 Likes

Means a lot @payas, thanks.

2 Likes

Thank you for the succinct reply.

Having re-read the roles of the Foundation and SC’s, I think it aligns 100% with what you wrote. Under this light, I agree it is a good idea to get people involved early on in the process.

I had missed how much focus the board and the SC is supposed to give to partnerships with the private sector according to their responsibilites, hence my initial question.

This is what tripped me up initially, why the foundation had to be involved if things would be taken care of by Flox. Mic92’s comment answered this for me and your point on making the first step easier also makes sense given this context.

Agreed. I think Serge put it nicely, this is a political advancement, both for Flox and the Foundation.

It does, again thank you.

3 Likes

Shouldn’t this be added to CUDA - NixOS Wiki?

2 Likes