Booting from external thunderbolt 3/4 SSD Enclosure

Hello,

I’ve been trying to install nixos onto an external thunderbolt 3/4 SSD, and boot it from a Framework 13 AMD 7040 laptop.

The install succeeds, the thunderbolt kernel module is properly included in stage 1, but it seems like I’m missing enough stuff to add the /dev node and perhaps authorize it. The boot just sits there waiting for the /dev node to exist.

Here’s an excerpt from my configuration.nix:

  # Use the systemd-boot EFI boot loader.
  boot.loader.systemd-boot.enable = true;
  boot.loader.efi.canTouchEfiVariables = true;

  boot.initrd.services.udev.packages = [ pkgs.bolt ];
  boot.initrd.services.udev.binPackages = [ pkgs.bolt ];
  boot.initrd.systemd.packages = [ pkgs.bolt ];
  boot.initrd.systemd.enable = true;

  services.hardware.bolt.enable = true;

I’m basically trying to replicate what the services.hardware.bolt.enable does but in stage 1 of the bootloader - to hopefully let it automatically authorize and add the /dev/nvme* node which happens properly otherwise.

I’ve gotten into the emergency shell with boot.shell_on_fail and systemd.setenv=SYSTEMD_SULOGIN_FORCE=1, and it seems to be in an odd state. The systemd unit isn’t started when the SSD is plugged in, and while find /nix/store -name "*bolt*" shows up with bolt in the store, that path doesn’t exist when I ls.

What can I do to get booting from this external thunderbolt SSD?

EDIT:
There seems to be two issues here

  1. Missing thunderbolt authorization.
    a) This can be done probably in some way similar to the above, but it seems to need /usr for boltd.
    b) This can be done with manual udev rules
boot.initrd.services.udev.rules = ''
    ACTION=="add|change", SUBSYSTEM=="thunderbolt", \
    ATTR{unique_id}=="get from udevadm --attribute-walk" \
    ATTR{authorized}="1"
'';
  1. There seems to just be a kernel or motherboard bug with this particular external ssd enclosure. Basically, the drive needs to have the power cycled for some reason - otherwise I’m hit with:
nvme nvme0: Device not ready; aborting initialisation, CSTS=0x0

I tried patching it with NVME_QUIRK_DELAY_BEFORE_CHK_RDY, even increasing the delay amount, unbinding and rebinding the nvme driver, and it seems to be basically permanent until power cycling.

Hi, I know this old topic, but maybe this will still help somebody:

If your system has an IOMMU and it’s enabled, you can replace the functionality of boltd with a single udev rule:

ACTION=="add", SUBSYSTEM=="thunderbolt", \
ATTRS{iommu_dma_protection}=="1", ATTR{authorized}=="0", \
ATTR{authorized}="1"

(keep backslahes or write everything into a single line).

The rule comes straight from the Linux kernel documentation: USB4 and Thunderbolt — The Linux Kernel documentation
The reasoning: explicit user authorization is required for Thunderbolt because without an IOMMU, devices can access any of the system memory via DMA. But that’s exactly one of the things an IOMMU solves: the device can only access memory explicitly mapped to them by the operating system’s drivers.

In any case, boltd does the exact same thing as the udev rule for IOMMU-protected devices:

‘IOMMU’ support: if the hardware and firmware support using the input–output
memory management unit (IOMMU) to restrict direct memory access
to certain safe regions, boltd will detect that feature and change its
behavior: As long as iommu support is active, as indicated by the
iommu_dma_protection sysfs attribute of the domain controller, new devices
will be automatically enrolled with the ‘iommu’ policy and existing
devices with ‘iommu’ (or ‘auto’) policy will be automatically authorized
by boltd without any user interaction. When iommu is not active, devices
that were enrolled with the ‘iommu’ policy will not be authorized
automatically. The status of iommu support can be inspected by using
boltctl domains.

(OK, it also has a nice CLI and saves a history of Thunderbolt devices connected to the system, but you don’t need that to use Thunderbolt devices without manual authorization on IOMMU-enabled systems)

Oops, I overlooked before that you already considered enabling the dock via udev, sorry for that.
But if your system has an IOMMU (which I’m pretty sure it has), the udev rule I posted would probably still be both more secure and generic.

One more idea:
I’ve added the pci=realloc kernel parameter because my kernel previously printed: pci_bus 0000:00: Some PCI device resources are unassigned, try booting with pci=realloc

I think this fixed USB daisy chaining for me (Monitor USB hub attached to TB3 dock). I’m not sure though, because I had given up on it a while ago after I always had to run echo 1 | sudo tee /sys/bus/pci/rescan, and then often either the dock’s or the monitor hub’s USB ports would stop working.