Nix corrupted even after multiple reinstalls

For the past several months I have been having recurring issues with NixOS which seem to make little to no sense and am hoping that someone here might be able to make sense of them.

It started when using certain programs (specifically playing Minecraft and others I can’t remember right now) as well as upgrading my system, the whole system would hang requiring a restart. The freezes were far and few between, but always tied to specific actions I was doing on my computer (mostly nixos-rebuild). I thought this issue was related to bad configuration I had set at the time, so I tried many things to fix it such as disabling zram ect. Not sure if these freezes are even related to the main issue, but could be signs of hardware issues?

At some point I started getting build errors running nixos-rebuild, usually various SEGFAULT errors or missing file. These errors happened completely randomly and running the command multiple times in a row yields a different error every time I run it. This baffled me since from my understanding of NixOS, this sort of thing should not happen.

This lead me to the conclusion that this either must be a hardware issue, or something is corrupted in nix store or something similar. I decided to do a full reinstall of NixOS to fix this issue, but upon calling nixos-install recieved similar strange errors related to cmake builds failing and missing files. I also at this point tried several combinations of different configs, drives, and filesystems all resulting in the same errors.

Since NixOS continues building where it left off from, I was able to able to run nixos-install over and over until my system completed building, and it was completely usable for quite some time. I was still receiving the freezes from before and nixos-rebuild still didn’t work randomly most of the time. Eventually (probably bcs nix store was corrupted by faulty install and updating), rebuilds and building new software just stopped working, resulting in even more strange errors (I lost the screenshots, but something like: derivation for *** failed to evaluate, expected ‘H’ but got ‘C’).

This was today, I realized if I ever wanted to install a new package on my system, I would have to leave NixOS, or reinstall it now and try to figure it out so here I am.

Below is everything I HAVE tried, and below that are various error codes from TTY that I do have pictures of (sorry for the low quality, a lot of my screenshots are on my current system which is not currently bootable, but I can get if requested)
• Full reinstall of nixos, losing all files and starting fresh on same hardware, same fs (zfs) and same config
• Full reinstall of nixos on another drive using ext4 on another disk
• Installing other Linux distributions on the same PC (I didn’t run into any of the same problems, but not tested extensively)
• Full reinstall of nixos with default config on ext4 on another disk stable and unstable
• Updating firmware in another distro (no updates available)
• Using the nix package manager after loading Arch on another disk under btrfs (got the same error even on another distro)
• Trying a new ISO image since I considered it was corrupted and was the cause of the issues across multiple installs
• Running nixos-rebuild build of this configuration on another machine which works

I am leaning towards the idea that this is a hardware issue, and not specific to the Nix package manager as I’ve seen problems in other programs. I don’t understand which hardware component would cause these faults, or even where to begin on debugging this. Here are the error codes from various things:





This one is right now on Arch trying to nix run fastfetch
if I run the command again I get a different error, but still “internal compiler error: Segmentation fault signal”

I am worried that if I make this post much longer, it could become even more confusing than it already is, so please ask me clarifying questions or things for me to try.

Current hardware:
13700k
32GB DDR5 RAM
RTX 3080 GPU
pls ask for anything else

if it interests you my config is hosted here, although since the problem persists on a newly generated config, I highly doubt it. I also use this same config on a laptop and it works exactly as I would expect it to with no issues whatsoever.

Have you tested your RAM? Random errors like your descriptions sound like faulty RAM.

1 Like

I ran MemTest86 from UEFI and got 0 errors. Not too sure what else it could be though.

The RAM is still suspicious to me. Try running Prime95’s RAM torture test (“Large FFTs” torture test). In the past, I had RAM errors that went undetected by MemTest86 but was exposed by Prime95.

3 Likes

How long did you run memtest for? Also did you run the proprietary memtest86+ or the libre one? With my freedom hat id recommend the free one, but the proprietary one gets more results. Also how long did you run it for? Id say at least 24h are needed, heat can cause failures to surface.

3 Likes

Are you sure that you ran it correctly / long enough?

I had the EXACT same errors as you have been getting and it was extremely confusing. I tried just about everything and it ended up being a bad RAM stick. Took about an hour for memtest to find it running the more thorough tests.

2 Likes

Yeah i had a case where it would only show errors after 18 hours of memtest :grimacing:

2 Likes

I ran P95 RAM torture test on large FFTs mode for 30+ hours and got 0 errors. I’ll try MemTest86+ again at a longer time as well, and try to swap in new RAM tonight, but not sure if itll fix.

I have had issues with high temps as well, so I could see that as a culprit, and maybe the test didn’t expose high enough temperatures to see a negative effect, where-as nixos-rebuild makes my cpu very hot.

1 Like

Tried completely new memory and got the same errors.
Any other ideas? Currently thinking cpu/motherboard, but I can’t try out a different one that easily.

I am pretty confident now that the problem is related to my cooler either not functioning or not being configured correctly. CPU temps are likely causing cascading errors, leading to instability. Running blend test with high CPU and RAM usage in mprime torture test caused errors almost immediately. I’ll try to get reasonable temps and see if that solves my problem.

2 Likes

Is your computer connected to ground earth?

Installed a new CPU cooler today, this seems to have fixed the issue. I’ll mark this as the solution. Thanks for everyone who helped out however they could.

4 Likes