Hey,
maybe a more generic forum would be better for this but I don’t really know of
one that I’d like to join, so I am posting this here. Hope that’s alright with
you all.
Today I learned how to keep the temperature of my NVMe SSD in my laptop in check
without having to do hardware modifications for extra cooling. I didn’t find any
good post that would practically explain this for Linux, so I figured it might
be useful for someone.
There is a good post about this here that explains the temperature values
conceptually: Technology Power Features - NVM Express
The tl;dr is that NVMe devices have configuration options that can be queried
using nvme get-feature --human-readable /dev/nvmeX
. One of these standard
options is one to set maximum temperatures at which point you really would
rather have the SSD throttle down than keep getting hotter. Not every SSD has a
reasonable default for this. If your SSD is not cooled very well, like e.g. in a
laptop, you may want to set your own values for this.
Of course using this throttling means that performance will degrade if the
limits are hit. Personally I prefer this versus potential premature hardware
failure and a 90+ degree hot SSD very few centimeters away from a lithium ion
battery. Because you totally can hit temperatures like that when you really tax
the SSD, for example when doing a BTRFS scrub.
Here is a short example script that persistently sets the throttle
temperatures to 50 and 65 degrees, feel free to adapt those values to your
preference of course:
#!/bin/sh
set -eux
nvme_dev="$1"
# the lower and upper thermal throttle management temperatures in degrees
# celsius. for explanation see:
# https://nvmexpress.org/resource/technology-power-features/
temp1=50
temp2=65
celsius_to_kelvin() {
celsius="$1"
echo $((celsius + 273))
}
# the value for the "Host Controlled Thermal Management" feature. mash together
# the two temperatures into one integer in hexadecimal representation.
hctm=$(printf '0x%04x%04x' "$(celsius_to_kelvin "$temp1")" "$(celsius_to_kelvin "$temp2")")
nvme set-feature "$nvme_dev" --feature-id=0x10 --value="$hctm" --save
With these settings, I now still have the SSD hit up to 70 degrees during a
BTRFS scrub but I think that’s more tolerable than 90 degrees and it does take a
couple minutes more because of the throttling but it’s not critical anyway.
Maybe I’ll tune those values a bit more down the line.
Alternatively, if you want this to be managed at run time and not persistently
somewhere in the specific storage controller’s firmware, you could write a udev
rule for NixOS somewhat like this:
ACTION=="add|change", SUBSYSTEM=="nvme", \
ATTRS{vendor}=="XXX", ATTRS{device}=="YYY", \
RUN+="${pkgs.nvme-cli}/bin/nvme set-feature $devnode --feature-id=0x10 --value=0x01430152"
Replace XXX
and YYY
with the device and vendor from lspci -nn
for your
particular device that you want to keep cool. Also the hex --value=
is what
you would get from the script example, again feel free to adapt to your needs.
The only real difference is that we’re not using the --save
argument.
I personally use a Framework Laptop 11th gen with an WD SN850x. That SSD gets
very hot, and the reason is that it just straight up doesn’t set a default
throttle temperature and the highest power state consumes 9 watts. Not even
something like 105 degrees as a limit. It just sets this value to 0.