CUDA Team Roadmap Update (2023-08-29)

tl;dr The Nixpkgs CUDA maintainers are looking for support in the form of funding and technical infrastructure to help make Nixpkgs the first choice for users of CUDA-related software. More about our mission and members here: CUDA Team on NixOS.org 10.

Before we begin, a thank-you to those that have funded these efforts and enabled @ConnorBaker to work on this full-time:

A special thank-you to my colleagues on the CUDA Team: @SergeK and @samuela. Your support and feedback have been invaluable, and when I work with you, I feel like anything is possible.

For our last update, see CUDA Team Roadmap and Call for Sponsors.

Recap

Many thanks to PDT Partners for funding all the below work. Without your support, this could not have happened anywhere near as quickly as it did.

New Sponsor: Anduril Industries

Many thanks to Anduril Industries for sponsoring this new round of work!

Timeline

Task Area Estimate (Weeks)
Multi-Platform and Cross-Compilation Support for Nixpkgs’ CUDA Package Set 8
Fix Standard C/C++ Library Linker Errors Caused by Nixpkgs’ CUDA Package Set 4
Migrate Derivations from CUDA Toolkit to CUDA Redistributables 6
TOTAL 18

Task 1: Multi-Platform and Cross-Compilation Support for Nixpkgs’ CUDA Package Set

Context

NVIDIA’s CUDA redistributables provide support for multiple platforms. However, Nixpkgs currently hardcodes support for x86_64-linux, and does not support cross-compilation. This poses a number of challenges:

  • Nixpkgs is unable to support Jetson, or generally any platform the CUDA redistributables support aside from x86_64-linux.
  • Downstream users must either fork Nixpkgs to reimplement CUDA redistributable handling or repackage NVIDIA’s releases for other operating systems, like Debian.
  • An inability to cross-compile packages necessitates creating binaries on the same systems they are to be run.

These can be resolved by generalizing the way Nixpkgs handles CUDA redistributables and introducing support for splicing (Nixpkgs’ cross-compilation implementation) to Nixpkgs’ CUDA package set.

Scope

Each platform supported by the CUDA redistributables maps to a system supported by Nixpkgs:

Redist Architecture Name Nixpkgs System Name Nixpkgs Cross Attribute (pkgsCross)
linux-ppc64le powerpc64le-linux powernv
linux-x86_64 x86_64-linux x86_64-multiplatform
linux-aarch64 aarch64-linux (Jetson) aarch64-multiplatform
linux-sbsa aarch64-linux aarch64-multiplatform
windows-x86_64 x86_64-windows mingwW64

Enabling multi-platform support involves generalizing Nixpkgs’ handling of the CUDA redistributables. Additionally, introducing splicing to Nixpkgs’ CUDA package set provides the ability to cross-compile packages for embedded systems on larger, x86-based servers. Care should be taken to ensure that OpenCV and Magma both run on NVIDIA Jetson.

Best-effort should be made to backport the relevant changes to the 23.05 branch.

Timeline

Task Complexity Estimate (Weeks)
Add multi-platform support to Nixpkgs’ CUDA-redist handling Medium 2
Add cross-compilation support to Nixpkgs’ CUDA package set High 4
Testing and review Medium 2
TOTAL 8

Task 2: Fix Standard C/C++ Library Linker Errors Caused by Nixpkgs’ CUDA Package Set

Context

Each release of NVIDIA’s CUDA Compiler (NVCC) is compatible with a range of GCC and Clang releases. Unfortunately, these typically lag behind the release Nixpkgs’ standard environment uses to build packages. Linking libraries produced by different versions of these compilers can result in missing standard C/C++ libraries, or symbol errors.

Scope

As suggested by cudaPackages: extend cc-wrapper and rewrite cudaStdenv to properly solve libstdc++ issues #226165, rewriting Nixpkgs’ CUDA-specific standard environment to use an NVCC-compatible compiler with the same version of the standard C/C++ library used by Nixpkgs’ standard environment. This will likely involve updating Nixpkgs’ cc-wrapper derivation.

Best-effort should be made to backport the relevant changes to the 23.05 branch.

Timeline

Task Complexity Estimate (Weeks)
Rewrite cudaStdenv to use standard libraries consistent with stdenv Medium 2
Testing and review Medium 2
TOTAL 4

Task 3: Migrate Derivations from CUDA Toolkit to CUDA Redistributables

Context

Introduced alongside the 11.4 release of the monolithic CUDA Toolkit installer, CUDA redistributables provide a more fine-grained way to pull in dependencies on NVIDIA’s libraries. Whereas the CUDA Toolkit installer provides a number of libraries in a single package, CUDA redistributables offer individually-packaged components, and generally remove the need to wrangle with the CUDA Toolkit installer monolith. Additionally, CUDA redistributables provide split outputs, allowing the inclusion of only that which a package requires: binaries, shared libraries, static libraries, headers, or any combination thereof.

Unfortunately, many of the CUDA-enabled packages offered by Nixpkgs use the older monolithic CUDA Toolkit installer or otherwise include it in their closure. As such, any closure size benefits gained by switching to CUDA redistributables are withheld until the CUDA Toolkit installer is fully removed from the closure.

Scope

This task area involves rewriting the Nixpkgs derivations of CUDA-enabled software to leverage CUDA redistributables. PyTorch is excluded from this effort, as that effort has been funded separately: CUDA Team Roadmap and Call for Sponsors - Announcements - NixOS Discourse.

Best-effort should be made to backport the relevant changes to the 23.05 branch.

Timeline

Task Complexity Estimate (Weeks)
Migrate usages of CUDA Toolkit installer to CUDA Redistributables High 4
Testing and review Medium 2
TOTAL 6

Future Work

In our original roadmap, we recognized two high-level areas for improvement: user experience and sustainable maintenance. We have made progress on both fronts, but there is still more to do! Many of the concerns and pain-points raised there are still relevant, and we hope to address them in the future.

Improve User Experience

Ensure Sustainable Maintenance

  • Lower the bar for contributions.
    • Document best-practices for creating CUDA-enabled packages.
    • Unify CUDA and CUDA-enabled package configuration interfaces across Nixpkgs.
    • Create a tutorial for packaging a CUDA-enabled application with Nixpkgs on nix.dev.
  • Tend to build and test infrastructure.
    • Find more permanent build infrastructure to replace volunteer CPU-time to better populate our current Cachix binary cache.
    • Investigate options like escaping the sandbox or including impure tests in passthru to enable tests which are indicative of runtime behavior.
  • Set up reliable processes for package creation, updates, and distribution.
    • Create, document, and harmonize CUDA Nixpkgs configuration options.
    • Investigate and document current maintainer workflows to record organizational knowledge.
    • Investigate and apply design patterns to increase package consistency and maintainability.
    • Document best-practices for building and using CUDA-enabled packages to ease adoption.
  • Increase Nixpkgs adoption within the scientific computing community.
    • Speak at conferences and meetups to increase awareness of Nixpkgs’ scientific computing capabilities.
    • Work with the NixOS marketing team to reach out to industry partners and better communicate the value of Nixpkgs to the scientific computing community.

Call for Sponsors

The CUDA-maintainers team is seeking sponsors to help fund the work outlined in this document and supply the required technical infrastructure. The current tasks can be parallelized well across multiple maintainers in order to reduce time to delivery.

If you or your organization:

  • are impacted by these issues,
  • have related issues with Nixpkgs CUDA support,
  • would like to prioritize or accelerate certain work,

please consider supporting this effort directly or through the NixOS Foundation. Reach out via GitHub @connorbaker, or email connor.baker@tweag.io to get involved.

17 Likes