The pytorch
package marks itself as broken if compiled with rocmSupport=true
“because rocmPackages.hipblaslt
is unpackaged.”
What needs to happen to resolve that? Was this packaged in nixpkgs at some point in the past, and since bitrotted? If so, could I pin pytorch to a previous version of nixpkgs where that still works? What needs to happen in order for hipblaslt
to be packaged again, and is it something I could reasonably teach myself how to do?
Is there anything that would work as an interim workaround (i.e. enabling pytorch’s cuda features and somwhow passing it ZLUDA instead of CUDA)? If so, how would I do that?
vcunat
March 7, 2025, 7:17am
2
I think I’ve spotted similar discussion around this PR:
NixOS:staging
← LunNova:rocm-update
opened 04:55PM - 23 Dec 24 UTC
Fixes #337159
Fixes #383836
Fixes #379354
Bump to 6.3.3 for rocmPackages_6 … package set and associated updates in packages which depend on changed or newly introduced ROCm packages.
## Upstream PRs/issues Raised
- rocMLIR compile error with LLVM libc++ - https://github.com/ROCm/rocMLIR/pull/1708
- RCCL UB on non-RDMA kernels - https://github.com/ROCm/rccl/pull/1470 https://github.com/ROCm/rccl/issues/1469
- RCCL UB in ncclProxyNewConnection - raised as issue because not sure about fix - https://github.com/ROCm/rccl/issues/1468
- CLR UB in new Semaphore() - https://github.com/ROCm/clr/issues/118
- ROCr-Runtime UB due to signed left shift overflow - https://github.com/ROCm/ROCR-Runtime/pull/273 https://github.com/ROCm/ROCR-Runtime/issues/271
- ROCr-Runtime UB getting number of available devices - https://github.com/ROCm/ROCR-Runtime/issues/272 https://github.com/ROCm/ROCR-Runtime/pull/274
- composable_kernel build fails for gfx908 if built without optimization flags https://github.com/ROCm/composable_kernel/issues/1759
- pytorch inductor composable_kernel GEMM backend is very slow https://github.com/pytorch/pytorch/issues/143687
- hipcc UB https://github.com/ROCm/llvm-project/issues/182 https://github.com/ROCm/llvm-project/pull/183
- hipBLASlt build failure in TensileCreateExtOpLibraries https://github.com/ROCm/hipBLASLt/issues/1571
- We currently have TensileCreateExtOpLibraries patched out so hipBLASlt is missing some ops
## TODO List
- [x] Fix rocmcxx GCC prefix
- [x] Contemplate trying to make a normal Nix style CC wrapper work again instead of this sysroot style mess and then don't because I spent 2 weeks on it already (please someone fix this)
- [x] Expand GPU targets list for *blas libraries
- [x] Maybe? expand GPU targets list for CK
- CK seems to be ~untested on anything other than MI200/MI300 series so might be safer not to
- Trying this out we can reduce the list if breakage is reported.
- [x] Hack cuda backend out of triton 3.2 so we can build torch for ROCm without deps on unfree cudart
- [x] Get compression of offload ~~and msgpack~~ working for hipblaslt. **10GB derivation is not ok.**
- [x] Apply https://github.com/ROCm/hipBLASLt/pull/1374
- [x] Remove debug info / dontStrip settings
- [x] Clean up the triton mess in rocm-modules/6/default.nix
- [x] Turn traces into TODO items in this list
- [x] Upstream patches
- [x] Resurrect binary compatibility patches for new COMGR (gfx1036 -> uses gfx1030 if 1036 not available)
- [x] Confirm patches are working correctly with "new" unbundler path which we have enabled
- [x] ~~Make use of working LLVM packages .override to simplify LLVM~~ overrideScope isn't present and is needed.
- [x] Import minimal set of pytorch changes to build with rocm 6.3 from https://github.com/LunNova/ml.nix/blob/main/pytorch-rocm.nix
- [x] ~~Allow better build parallelism by creating -minimal versions of some of the huge packages built for no gfx arches~~ Too difficult, not doing in this PR
- [x] Clean up hacks related to build parallelism
- [x] Document clang-ocl, rocm-thunk going away.
- [x] Fix migraph packages
- [x] Remove Tensile parallelism patches
- [ ] Convert in-tree patches to fetchpatch usage where possible
## Things done
- Built on platform(s)
- [x] x86_64-linux
- [ ] aarch64-linux
- [ ] x86_64-darwin
- [ ] aarch64-darwin
- For non-Linux: Is sandboxing enabled in `nix.conf`? (See [Nix manual](https://nixos.org/manual/nix/stable/command-ref/conf-file.html))
- [ ] `sandbox = relaxed`
- [x] `sandbox = true`
- [ ] Tested, as applicable:
- [NixOS test(s)](https://nixos.org/manual/nixos/unstable/index.html#sec-nixos-tests) (look inside [nixos/tests](https://github.com/NixOS/nixpkgs/blob/master/nixos/tests))
- and/or [package tests](https://github.com/NixOS/nixpkgs/blob/master/pkgs/README.md#package-tests)
- or, for functions and "core" functionality, tests in [lib/tests](https://github.com/NixOS/nixpkgs/blob/master/lib/tests) or [pkgs/test](https://github.com/NixOS/nixpkgs/blob/master/pkgs/test)
- made sure NixOS tests are [linked](https://github.com/NixOS/nixpkgs/blob/master/pkgs/README.md#linking-nixos-module-tests-to-a-package) to the relevant packages
- [ ] Tested compilation of all packages that depend on this change using `nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD"`. Note: all changes have to be committed, also see [nixpkgs-review usage](https://github.com/Mic92/nixpkgs-review#usage)
- [ ] Tested basic functionality of all binary files (usually in `./result/bin/`)
- [25.05 Release Notes](https://github.com/NixOS/nixpkgs/blob/master/nixos/doc/manual/release-notes/rl-2505.section.md) (or backporting [24.11](https://github.com/NixOS/nixpkgs/blob/master/nixos/doc/manual/release-notes/rl-2411.section.md) and [25.05](https://github.com/NixOS/nixpkgs/blob/master/nixos/doc/manual/release-notes/rl-2505.section.md) Release notes)
- [ ] (Package updates) Added a release notes entry if the change is major or breaking
- [ ] (Module updates) Added a release notes entry if the change is significant
- [ ] (Module addition) Added a release notes entry if adding a new NixOS module
- [ ] Fits [CONTRIBUTING.md](https://github.com/NixOS/nixpkgs/blob/master/CONTRIBUTING.md).
<!--
To help with the large amounts of pull requests, we would appreciate your
reviews of other pull requests, especially simple package updates. Just leave a
comment describing what you have tested in the relevant package/service.
Reviewing helps to reduce the average time-to-merge for everyone.
Thanks a lot if you do!
List of open PRs: https://github.com/NixOS/nixpkgs/pulls
Reviewing guidelines: https://github.com/NixOS/nixpkgs/blob/master/pkgs/README.md#reviewing-contributions
-->
---
Add a :+1: [reaction] to [pull requests you find important].
[reaction]: https://github.blog/2016-03-10-add-reactions-to-pull-requests-issues-and-comments/
[pull requests you find important]: https://github.com/NixOS/nixpkgs/pulls?q=is%3Aopen+sort%3Areactions-%2B1-desc
(but I haven’t been watching these topics)