CUDA Team Roadmap Update (2023-08-29)

ConnorBaker · August 30, 2023, 6:02pm

tl;dr The Nixpkgs CUDA maintainers are looking for support in the form of funding and technical infrastructure to help make Nixpkgs the first choice for users of CUDA-related software. More about our mission and members here: CUDA Team on NixOS.org 10.

Before we begin, a thank-you to those that have funded these efforts and enabled @ConnorBaker to work on this full-time:

A special thank-you to my colleagues on the CUDA Team: @SergeK and @samuela. Your support and feedback have been invaluable, and when I work with you, I feel like anything is possible.

For our last update, see CUDA Team Roadmap and Call for Sponsors.

Recap

Many thanks to PDT Partners for funding all the below work. Without your support, this could not have happened anywhere near as quickly as it did.

Create Multiple-Output Derivations for CUDA-Redist Packages.
- Completed by cudaPackages: multiple outputs for redistributables #240498.
- Numbers (details in PR):
  - cuDNN closure and NAR size went from 2.4G to 1.1G (-1.3G), as does every package downstream of it which uses the new split output.
    - Magma closure size went from 2.9G to 1.6G (-1.3G).
    - PyTorch closure size went from 9.7G to 8.4G (-1.3G).
Remove Magma from PyTorch’s Runtime Closure.
- Completed by python3Packages.torch: statically link against magma #238465.
- Numbers (details in PR):
  - The torch-lib NAR increases from 1006.5M to 1.3G (+300M), but the closure size decreases from 12.3G to 9.1G (-3.2G).
Migrate PyTorch Closure from CUDA Toolkit to CUDA-Redist.
- Completed by python3Packages.torch: migrate to CUDA redist from CUDA Toolkit #249259.
- Numbers (details in PR):
  - The torch-lib NAR increases from 475.3M to 534.5M (+59M), but the closure size decreases from 11.3G to 4.9G (-6.4G).

New Sponsor: Anduril Industries

Many thanks to Anduril Industries for sponsoring this new round of work!

Timeline

Task Area	Estimate (Weeks)
Multi-Platform and Cross-Compilation Support for Nixpkgs’ CUDA Package Set	8
Fix Standard C/C++ Library Linker Errors Caused by Nixpkgs’ CUDA Package Set	4
Migrate Derivations from CUDA Toolkit to CUDA Redistributables	6
TOTAL	18

Task 1: Multi-Platform and Cross-Compilation Support for Nixpkgs’ CUDA Package Set

Context

NVIDIA’s CUDA redistributables provide support for multiple platforms. However, Nixpkgs currently hardcodes support for x86_64-linux, and does not support cross-compilation. This poses a number of challenges:

Nixpkgs is unable to support Jetson, or generally any platform the CUDA redistributables support aside from x86_64-linux.
Downstream users must either fork Nixpkgs to reimplement CUDA redistributable handling or repackage NVIDIA’s releases for other operating systems, like Debian.
An inability to cross-compile packages necessitates creating binaries on the same systems they are to be run.

These can be resolved by generalizing the way Nixpkgs handles CUDA redistributables and introducing support for splicing (Nixpkgs’ cross-compilation implementation) to Nixpkgs’ CUDA package set.

Scope

Each platform supported by the CUDA redistributables maps to a system supported by Nixpkgs:

Redist Architecture Name	Nixpkgs System Name	Nixpkgs Cross Attribute (`pkgsCross`)
`linux-ppc64le`	`powerpc64le-linux`	`powernv`
`linux-x86_64`	`x86_64-linux`	`x86_64-multiplatform`
`linux-aarch64`	`aarch64-linux` (Jetson)	`aarch64-multiplatform`
`linux-sbsa`	`aarch64-linux`	`aarch64-multiplatform`
`windows-x86_64`	`x86_64-windows`	`mingwW64`

Enabling multi-platform support involves generalizing Nixpkgs’ handling of the CUDA redistributables. Additionally, introducing splicing to Nixpkgs’ CUDA package set provides the ability to cross-compile packages for embedded systems on larger, x86-based servers. Care should be taken to ensure that OpenCV and Magma both run on NVIDIA Jetson.

Best-effort should be made to backport the relevant changes to the 23.05 branch.

Timeline

Task	Complexity	Estimate (Weeks)
Add multi-platform support to Nixpkgs’ CUDA-redist handling	Medium	2
Add cross-compilation support to Nixpkgs’ CUDA package set	High	4
Testing and review	Medium	2
TOTAL		8

Task 2: Fix Standard C/C++ Library Linker Errors Caused by Nixpkgs’ CUDA Package Set

Context

Each release of NVIDIA’s CUDA Compiler (NVCC) is compatible with a range of GCC and Clang releases. Unfortunately, these typically lag behind the release Nixpkgs’ standard environment uses to build packages. Linking libraries produced by different versions of these compilers can result in missing standard C/C++ libraries, or symbol errors.

Scope

As suggested by cudaPackages: extend cc-wrapper and rewrite cudaStdenv to properly solve libstdc++ issues #226165, rewriting Nixpkgs’ CUDA-specific standard environment to use an NVCC-compatible compiler with the same version of the standard C/C++ library used by Nixpkgs’ standard environment. This will likely involve updating Nixpkgs’ cc-wrapper derivation.

Best-effort should be made to backport the relevant changes to the 23.05 branch.

Timeline

Task	Complexity	Estimate (Weeks)
Rewrite `cudaStdenv` to use standard libraries consistent with `stdenv`	Medium	2
Testing and review	Medium	2
TOTAL		4

Task 3: Migrate Derivations from CUDA Toolkit to CUDA Redistributables

Context

Introduced alongside the 11.4 release of the monolithic CUDA Toolkit installer, CUDA redistributables provide a more fine-grained way to pull in dependencies on NVIDIA’s libraries. Whereas the CUDA Toolkit installer provides a number of libraries in a single package, CUDA redistributables offer individually-packaged components, and generally remove the need to wrangle with the CUDA Toolkit installer monolith. Additionally, CUDA redistributables provide split outputs, allowing the inclusion of only that which a package requires: binaries, shared libraries, static libraries, headers, or any combination thereof.

Unfortunately, many of the CUDA-enabled packages offered by Nixpkgs use the older monolithic CUDA Toolkit installer or otherwise include it in their closure. As such, any closure size benefits gained by switching to CUDA redistributables are withheld until the CUDA Toolkit installer is fully removed from the closure.

Scope

This task area involves rewriting the Nixpkgs derivations of CUDA-enabled software to leverage CUDA redistributables. PyTorch is excluded from this effort, as that effort has been funded separately: CUDA Team Roadmap and Call for Sponsors - Announcements - NixOS Discourse.

Best-effort should be made to backport the relevant changes to the 23.05 branch.

Timeline

Task	Complexity	Estimate (Weeks)
Migrate usages of CUDA Toolkit installer to CUDA Redistributables	High	4
Testing and review	Medium	2
TOTAL		6

Future Work

In our original roadmap, we recognized two high-level areas for improvement: user experience and sustainable maintenance. We have made progress on both fronts, but there is still more to do! Many of the concerns and pain-points raised there are still relevant, and we hope to address them in the future.

Improve User Experience

Provide new releases faster
- Automate runpath patching of CUDA-enabled packages to reduce errors and breakages caused by package changes.
- Add proper FindCudaToolkit.cmake support to simplify packaging.
- Find more permanent build infrastructure to replace volunteer CPU-time to better populate our current Cachix binary cache.
- Discuss binary distribution, via Hydra or otherwise.
Reduce source build time and binary size significantly.
Improve runtime performance.
- Patch package sources which hard-code supported CUDA versions or architectures to use Nixpkgs’ configuration.
- Target user-requested CUDA architectures by leveraging cudaCapabilities.
- Leverage newer CUDA features like separable compilation (to carry around device-specific object files as outputs) and LTO (to improve runtime performance).
Offer more, systematic customization options.
- Standardization of cudaCapabilities.
- Introduction of package-specific, CUDA-related configuration options.

Ensure Sustainable Maintenance

Lower the bar for contributions.
- Document best-practices for creating CUDA-enabled packages.
- Unify CUDA and CUDA-enabled package configuration interfaces across Nixpkgs.
- Create a tutorial for packaging a CUDA-enabled application with Nixpkgs on nix.dev.
Tend to build and test infrastructure.
- Find more permanent build infrastructure to replace volunteer CPU-time to better populate our current Cachix binary cache.
- Investigate options like escaping the sandbox or including impure tests in passthru to enable tests which are indicative of runtime behavior.
Set up reliable processes for package creation, updates, and distribution.
- Create, document, and harmonize CUDA Nixpkgs configuration options.
- Investigate and document current maintainer workflows to record organizational knowledge.
- Investigate and apply design patterns to increase package consistency and maintainability.
- Document best-practices for building and using CUDA-enabled packages to ease adoption.
Increase Nixpkgs adoption within the scientific computing community.
- Speak at conferences and meetups to increase awareness of Nixpkgs’ scientific computing capabilities.
- Work with the NixOS marketing team to reach out to industry partners and better communicate the value of Nixpkgs to the scientific computing community.

Call for Sponsors

The CUDA-maintainers team is seeking sponsors to help fund the work outlined in this document and supply the required technical infrastructure. The current tasks can be parallelized well across multiple maintainers in order to reduce time to delivery.

If you or your organization:

are impacted by these issues,
have related issues with Nixpkgs CUDA support,
would like to prioritize or accelerate certain work,

please consider supporting this effort directly or through the NixOS Foundation. Reach out via GitHub @connorbaker, or email connor.baker@tweag.io to get involved.