Hi everyone,
I’m having issues with my NVMe drive not being detected. Strangely, this issue arose today, even though I’ve been using the system without any problems for about a month. As part of my journey in learning NixOS, this morning when I tried booting up my Dell XPS 9560 laptop, I encountered a boot issue. Here’s the error I got:
nvme nvme0: Identify Controller failed (-4)
To my surprise, I couldn’t find my NVMe drive. My initial thought was, “No worries, I’ll just rollback to a previous generation.” After attempting this, I found that the only generation that worked was the very first one the system ever created.
Deciding to take a more drastic measure, I opted to reinstall NixOS. However, I was unable to detect my NVMe drive in the Calamares installer, GParted, or lsblk
during a live boot session. Intriguingly, when I booted using a Linux Mint live CD, the drive was detected perfectly. As a sanity check, I proceeded with the Linux Mint installation, and everything went smoothly without a hitch.
Here’s a list of troubleshooting steps I’ve tried
- Reinstalled NixOS.
- Re-downloaded a brand new image and recreated my USB live boot. (I tried GNOME, KDE, and the minimal variants)
- Reset my BIOS to default and then retested.
- Carefully checked BIOS settings:
- Disabled fast boot
- Disabled secure boot
- Set SATA drives to AHCI (instead of RAID)
- Disabled C-states
- Disabled speed step
- Disabled any power-saving options
- Attempted to disable TPM 2.0
- Reseated the NVMe drive, RAM, and battery.
- Ran
modprobe nvme
during the live CD session.
Some additional observations:
- The drive appears in the
lspci
output during the live-boot session. - The drive is visible in the
lsblk
output only before Calamares finishes loading. I noticed a message that said something along the lines of “loading 1 module”. Interestingly, the drive remains visible up until the Calamares installer completes loading that module. Post that, it’s just invisible.
I checked the SMART data for the drive:
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 32 Celsius
Available Spare: 92%
Available Spare Threshold: 50%
Percentage Used: 0%
Data Units Read: 6,135,172 [3.14 TB]
Data Units Written: 7,358,838 [3.76 TB]
Host Read Commands: 102,455,992
Host Write Commands: 86,450,036
Controller Busy Time: 3,882
Power Cycles: 1,021
Power On Hours: 577
Unsafe Shutdowns: 134
Media and Data Integrity Errors: 49
Error Information Log Entries: 0
Warning: 0
Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 32 Celsius
Temperature Sensor 2: 39 Celsius
The closest I’ve come to identifying the root cause was when I tried running fdisk
. The process seemed to hang for a while before spitting out the following error:
Unable to change power state from D3cold to D0, device inaccessible.
I/O error on dev nvme0n1, logical block 0, async page read
System Info:
Dell XPS 9560
nixos-23.05.2975
CPU: i7-7700HQ
GPU: GTX 1050
RAM: 16G DDR4-2400MHz
Storage: SKHynix PC401 NVMe 1TB revision: 80002E00
I’m really at my wit’s end here and any guidance or suggestions would be immensely appreciated.
Edit: Some information I forgot to add.
When I was running lspci
the drive showed up and was recognized as an SKHynix NVMe drive, however it was the wrong size. lspci
said the drive was 256G but I only have a 1TB drive installed.