CUDA Team Roadmap and Call for Sponsors

tl;dr The Nixpkgs CUDA maintainers are looking for support in the form of funding and technical infrastructure to help make Nixpkgs the first choice for users of CUDA-related software. More about our mission and members here: CUDA Team on NixOS.org.

Summary

GPU acceleration has become a fundamental prerequisite to machine learning work and scientific computing more generally. Nix, Nixpkgs, and NixOS give us the tools we need to build reproducible and robust packages and environments for these applications. It is the goal of the CUDA-maintainers team to improve the state of support for CUDA, one of the leading platforms for GPU-accelerated applications, in Nixpkgs.

On NixOS, using import nixpkgs { config.cudaSupport = true; } with a recent Nixpkgs and the CUDA-maintainers’ binary cache (graciously provided by @domenkozar and Cachix) is enough for most CUDA applications to “just work”. The situation outside NixOS is less rosy, but workable. Our Hercules CI builds a subset of Nixpkgs every 12 hours and populates the binary cache with the results. In combination with nixpkgs-upkeep, which automatically files GitHub issues on build failures, we have a robust system for catching regressions, allowing us to attend to errors within a matter of days.

However, Nixpkgs’ support for CUDA does not realize its full potential. For this reason, the CUDA-maintainers team is proposing a roadmap towards a better experience using and maintaining CUDA-enabled software in Nixpkgs. We are seeking support from community contributors and industry sponsors who share our goals to make this work possible.

Motivation

Driven largely by the explosion in machine learning applications, CUDA-enabled packages have grown in number and capability. Building these packages from source is typically a time and resource-intensive process involving tricky dependency management and myriad build systems. As such, releases of these packages often reach users in the form of binaries built to support a wide range of CUDA architectures, distributed through package repositories like PyPI and Anaconda. PyTorch, for example, makes binaries available for different operating systems, architectures, and several CUDA releases; users on unsupported platforms must build from source.

Nixpkgs is more than just a package repository: it is a repository of package recipes. Nixpkgs’ declarative approach to package management allows us to build packages from source in a reproducible and robust manner, even when we modify the recipes! Where PyTorch and other large projects might resort to inventing their own ad-hoc infrastructure, Nixpkgs gives us the freedom to modify any component, upstream or downstream, of PyTorch in a principled manner. This allows us to create bespoke binaries optimized for performance or specific hardware, without mastery of the build system the package, or its dependencies, use.

Related work

The efforts of the Nixpkgs CUDA Team are narrow: its goals are limited to ensuring most CUDA-enabled applications work correctly and are reasonably easy to consume. In contrast, the higher-level motivation of the community is to support an entire spectrum of HPC and Scientific Computing workflows, including but not limited to “deep learning” on NixOS hosts. Similarly, we wish we supported a wider range of accelerators, extending to ROCm, Apple’s Metal, XLA devices, and, somewhat ironically, modern CPUs. As these desiderata are beyond the scope of the CUDA team’s focus, we do not try to satisfy them here. However, it is important to note that there are relevant efforts occurring in parallel across a number of communities and devices.

Goals

The singular focus of the CUDA-maintainers team is to make Nixpkgs the first choice for users of CUDA-related software. Though not a new team, we are renewing our efforts to improve the experience of using and maintaining CUDA-enabled software in Nixpkgs.

The most up-to-date version of things we are working on can be found in the CUDA Team Project, which includes breakdowns by both epic and feature.

Improve User Experience

Ensure Sustainable Maintenance

  • Lower the bar for contributions.
    • Document best-practices for creating CUDA-enabled packages.
    • Unify CUDA and CUDA-enabled package configuration interfaces across Nixpkgs.
    • Create a tutorial for packaging a CUDA-enabled application with Nixpkgs on nix.dev.
  • Tend to build and test infrastructure.
    • Find more permanent build infrastructure to replace volunteer CPU-time to better populate our current Cachix binary cache.
    • Investigate options like escaping the sandbox or including impure tests in passthru to enable tests which are indicative of runtime behavior.
  • Set up reliable processes for package creation, updates, and distribution.
    • Create, document, and harmonize CUDA Nixpkgs configuration options.
    • Investigate and document current maintainer workflows to record organizational knowledge.
    • Investigate and apply design patterns to increase package consistency and maintainability.
    • Document best-practices for building and using CUDA-enabled packages to ease adoption.
  • Increase Nixpkgs adoption within the scientific computing community.
    • Speak at conferences and meetups to increase awareness of Nixpkgs’ scientific computing capabilities.
    • Work with the NixOS marketing team to reach out to industry partners and better communicate the value of Nixpkgs to the scientific computing community.

Roadmap

PDT Partners and Tweag have generously offered to get this effort started. @connorbaker can thus begin work on the following tasks, but more support is needed to implement them fully. If you or your company are interested in contributing to this work, please see the Call for Sponsors section below for more information.

The overarching strategy is to deliver value to Nixpkgs’ users as quickly as possible. For that reason, our roadmap has an upfront focus on improving user experience and package quality. Tree-wide changes, like standardizing CUDA options, will focus on high-profile packages like pytorch and tensorflow first to reduce time-to-value for users. As we get further along the roadmap, the team’s focus will shift to support tooling, as well as CI and caching infrastructure.

Please note that the roadmap is not exhaustive, but provides a starting point for directing efforts towards accomplishing our goals.

Timeline

Task Area Estimate (Weeks) Funding (Weeks)
Create Multiple-Output Derivations for CUDA-Redist Packages 5 PDT Partners: 5
Remove Magma from PyTorch’s Runtime Closure 2 PDT Partners: 2
Migrate PyTorch Closure from CUDA Toolkit to CUDA-Redist 12 PDT Partners: 4
Standardize CUDA Options 3 0
Automate Runpath Patching of CUDA-Enabled Packages 8 0
Patch CUDA-Enabled Packages to Build Requested Targets 12 0
TOTAL 42 11

Task 1: Create Multiple-Output Derivations for CUDA-Redist Packages (Funded by PDT Partners)

The CUDA toolkit, hereafter cudaPackages.cudatoolkit, is a suite of CUDA libraries and software meant to provide a development environment for CUDA-accelerated applications, packaged in a multi-gigabyte monolithic installer. Since CUDA 11.4, NVIDIA has maintained CUDA redistributables (“CUDA-redist”): individually packaged components, such as cudaPackages.cudart, meant to facilitate redistribution and inclusion in downstream projects.

Historically, CUDA-enabled software in Nixpkgs has leveraged cudaPackages.cudatoolkit, resulting in needlessly large build and runtime closures. Recently, NixOS’ CUDA-maintainers group has made efforts to switch such software over to CUDA-redist, allowing for smaller closures. However, the transition is far from complete and the CUDA-redist packages themselves often provide unused components, like static libraries or header files, in their default output.

Deliverables

This task area seeks to improve the state of the ecosystem by splitting the outputs of each CUDA-redist package into its component parts (e.g., bin, include, or static). The CUDA-redist packages should have (when appropriate) outputs for binaries, dynamic libraries, static libraries, and headers. These new outputs will co-exist with the current output out, which contains all package files, effectively making these slimmer outputs “opt-in”, for backwards-compatibility. In particular, this involves assisting the efforts described in both cudaPackages: split outputs and multiple-outputs.sh: Install static libraries into $static->$dev->$out.

Timeline

Task Complexity Estimate (Weeks)
Familiarization with CUDA-redist packages Low 1
Refactor Nixpkgs CUDA-redist packaging Medium 2
Testing and review Medium 2
TOTAL 5

Task 2: Remove Magma from PyTorch’s Runtime Closure (Funded by PDT Partners)

As one of the premier Machine Learning libraries available in Nixpkgs, the quality of PyTorch’s packaging has an outsized impact on how users view Nixpkgs’ Machine Learning readiness. Unfortunately, it has several shortcomings, not least of which is its closure size.

PyTorch’s outsized closure, in particular its runtime closure, is due in part to the inclusion of libraries from the Magma package. Magma libraries are a build-time dependency and should not pollute the runtime closure. The end-result for users is a runtime closure much larger than what it should be.

Deliverables

This task area involves a rewrite of the PyTorch package to ensure Magma is not referenced in its runtime closure.

Timeline

Task Complexity Estimate (Weeks)
Refactor PyTorch Low 1
Testing and review Low 1
TOTAL 2

Task 3: Migrate PyTorch Closure from CUDA Toolkit to CUDA-Redist (Funded by PDT Partners)

As one of the premier Machine Learning libraries available in Nixpkgs, the quality of PyTorch’s packaging has an outsized impact on how users view Nixpkgs’ Machine Learning readiness. Unfortunately, it has several shortcomings, not least of which is its closure size.

PyTorch’s outsized closure is due in part to the use of the monolithic CUDA Toolkit instead of CUDA-redist. The end-result for users is a runtime closure much larger than what it should be.

Deliverables

This task area involves a rewrite of the PyTorch package and any of the packages in its closure which depend on the CUDA Toolkit. Resulting packages should instead use individual packages from CUDA-redist, thus removing the CUDA Toolkit monolith from the closure.

Timeline

Task Complexity Estimate (Weeks)
Refactor NCCL Medium 2
Refactor PyTorch High 4
Refactor remainder of closure High 4
Testing and review Medium 2
TOTAL 12

Task 4: Standardize CUDA Options (Unfunded)

Currently, CUDA-enabled packages in Nixpkgs have a variety of configuration options, which are often inconsistent between packages. Understandably, this makes it difficult for users to understand how to configure CUDA-enabled packages, and for maintainers to understand how to package them.

Deliverables

This task area involves standardizing configuration options for CUDA-enabled packages in Nixpkgs. A consistent and relevant set of options for CUDA-enabled packages should be created, documented, and existing packages updated to use them.

Timeline

Task Complexity Estimate (Weeks)
Create and document standard options Low 1
Update Nixpkgs to use standard options Low 1
Testing and review Low 1
TOTAL 3

Task 4: Automate Runpath Patching of CUDA-Enabled Packages (Unfunded)

Unlike other distributions, NixOS does not install libraries or packages into global locations like /usr/lib. Instead, NixOS installs packages into a per-package store, and uses the RPATH and RUNPATH of binaries to find dependencies. This is a powerful feature, as it allows for multiple versions of the same library to be installed simultaneously, and for packages to be installed without root privileges. CUDA-enabled packages in Nixpkgs have historically used the addOpenGLRunpath function to patch the RUNPATH of binaries to include the location of relevant libraries and drivers. Unfortunately, this is a manual process, and is not always done correctly.

Deliverables

This task area involves automating the runpath patching of CUDA-enabled packages in Nixpkgs by migrating from addOpenGLRunpath to autoAddOpenGLRunpathHook tree-wide, where possible.

Timeline

Task Complexity Estimate (Weeks)
Refactor Nixpkgs to use autoAddOpenGLRunpathHook High 4
Testing and review High 4
TOTAL 8

Task 5: Patch CUDA-Enabled Packages to Build Requested Targets (Unfunded)

Some packages have a hard-coded list of architectures and platforms for which they can be built. Others, like PyTorch, go a step further and prevent extensions built against configurations outside an allowed list from being loaded. While a good way to prevent users from attempting to build for platforms which are truly unsupported, or to ensure program extensions target the same platform as the base application, this is problematic for Nixpkgs. Owing to granular control over dependencies and the ability to ensure programs consistently target the same platform, such restrictions are unnecessary and prevent users from building for their own platforms.

Deliverables

This task area involves patching CUDA-enabled packages in Nixpkgs to build requested targets, rather than a hard-coded list of targets. It also involves patching packages to remove functionality which prevents extensions built against the requested targets from being loaded.

Timeline

Task Complexity Estimate (Weeks)
Patch Nixpkgs to build requested targets High 4
Patch Nixpkgs to remove target restrictions High 4
Testing and review High 4
TOTAL 12

Future Work

Upstream CUDA packages move very quickly in terms of features and breaking changes. Continued maintenance will be easier with the outlined work completed, but is still required to provide up-to-date and working CUDA-enabled software.

Future work requires we have sustainable, community-operated computational resources to keep builds and tests running in order to catch and fix breakages in a timely manner.

This is not accounted for in the current plan, and this is where we will need the most support from stakeholders.

Risks

Build Infrastructure

The current build infrastructure consists of compute generously donated by the CUDA maintainers themselves. Unfortunately, this is not sustainable in the long-term. We need to find a way to provide compute resources for the CUDA maintainers to use, either through a community-operated build farm or a cloud provider.

This is perhaps one of the largest risks to the project, as an inability to reliably and quickly build and review changes slows maintenance, chills community involvement, and reduces our ability to populate our binary cache for users. Additionally, several foundational packages, such as OpenCV and, transitively, GStreamer, use CUDA. As a result, changes to core CUDA packages or those which rely on them can trigger “rebuild-the-world” events.

Funding to support build infrastructure is critical to ensure the long-term success of the project.

Test Infrastructure

Unlike most other packages in Nixpkgs, CUDA-accelerated packages require a GPU to be adequately tested. Normally the Nix sandbox is set up such that the builders, and thus the tests run during checkPhase, do not have access to the CUDA devices. As a result, the success of the checkPhase does not indicate whether a CUDA-accelerated package will work at runtime. It is possible to conditionally expose the GPU for specially marked derivations. We could maintain a collection of tests that ensure basic GPU functionality of our packages, and we could run these tests in an external CI. Maintaining such a collection in-tree in Nixpkgs would facilitate better synchronization between the packages and the tests.

Funding work to implement and merge changes to the sandbox during checkPhase to allow access to host hardware or to provide a community-operated testing infrastructure would help us ensure the quality of CUDA-accelerated packages.

Community Involvement

The CUDA maintainers are a small group of volunteers, and the current CUDA ecosystem in Nixpkgs is the result of their hard work. However, the current processes are not viable in the long term without more community involvement and funding.

In the short term, funding members like @connorbaker allows a faster pace of work and feature delivery. However, funding for documentation, tooling, and community outreach would help us attract new maintainers, grow the community, and enjoy a more sustainable pace of work.

Call for Sponsors

The CUDA-maintainers team is seeking sponsors to help fund the work outlined in this document and supply the required technical infrastructure. The current tasks can be parallelized well across multiple maintainers in order to reduce time to delivery.

If you or your organization:

  • are impacted by these issues,
  • have related issues with Nixpkgs CUDA support,
  • would like to prioritize or accelerate certain work,

please consider supporting this effort directly or through the NixOS Foundation. Reach out via GitHub @connorbaker, or email connor.baker@tweag.io to get involved.

16 Likes

It might be helpful to get an explicit project section in Open Collective similar to the documentation project.

1 Like

I think I can find some time to do review the changes if they don’t grow out of hands.

3 Likes

Your reports are always amazing :star_struck:

New update!

CUDA Team Roadmap Update (2023-08-29).

3 Likes