I’ve spent days flashing images onto my Raspberry Pi 4 (1gb) and I end up with two outcomes:
an error saying the block size doesn’t match and requires an fsck (from a fresh ‘dd’ this is just bizarre).
after going through the full NixOS stage 1 / 2 boot process, just as I should be getting a login shell, I get a black screen, the display says no input, and the Pi flashes the green led twice (which apparently doesn’t align to any status code).
I’ve flashed the raspbian image on and updated the bootloader and firmware.
No change.
Trying to build my own packages (by following 12 ) results in:
$ nix-build '<nixpkgs/nixos>' -A config.system.build.sdImage --argstr system aarch64-linux -I nixos-config=./pi.nix
error: attribute 'sdImage' in selection path 'config.system.build.sdImage' not found
I’m at my wits end.
I’m going to put raspbian on, I’ve burnt so much time trying to get nix/nixops/etc working.
I wanted to raise this since it seems no one is discussing it, or at least not having the same issues I am.
SDCard is fine, I’ve put raspbian on it and its running with no problems.
running fsck on the sdcard immediately after dd’ing the image across (no first boot):
$ fsck.fat /dev/sdd2
fsck.fat 4.2 (2021-01-31)
Logical sector size is zero.
$ fsck.fat /dev/sdd1
fsck.fat 4.2 (2021-01-31)
There are differences between boot sector and its backup.
This is mostly harmless. Differences: (offset:original/backup)
65:01/00
1) Copy original to backup
2) Copy backup to original
3) No action
[123?q]? 3
FATs differ - using first FAT.
Orphaned long file name part "adau1977-adc.dtbo"
1) Delete
2) Leave it
[12?q]? 2
/overlays/Ç
Bad short file name (Ç).
1) Drop file
2) Rename file
3) Auto-rename
4) Keep it
[1234?q]? 4
/overlaysÇ
Bad short file name Ç).
1) Drop file
2) Rename file
3) Auto-rename
4) Keep it
[1234?q]? 4
Reclaimed 1159 unused clusters (593408 bytes).
Dirty bit is set. Fs was not properly unmounted and some data may be corrupt.
1) Remove dirty bit
2) No action
[12?q]? 2
Free cluster summary wrong (454536 vs. really 455695)
1) Correct
2) Don't correct
[12?q]? 2
*** Filesystem was changed ***
The changes have not yet been written, you can still choose to leave the
filesystem unmodified:
1) Write changes
2) Leave filesystem unchanged
[12?q]? 2
/dev/sdd1: 42 files, 60495/516190 clusters
I just tried it on a smaller monitor.and it now boots to a terminal prompt.
I had been using my 4k TV as it’s the only one spare.
It seems the image has issues when it detects the higher resolution and tries to transition across.
Do you see any interesting failures in dmesg or journalctl -b related to the 4K failure?
The fsck problem is invalid. The second partition is type ext4, which will fail when checked with fsck.fat. fsck succeeds for both partitions on both image files:
Once this occurs, sudo reboot no longer seems to reboot the system.
Specifically, the ssh session hangs and the system never becomes responsive to ssh which makes me think its unable to finish shutting down the system.
I captured another set of logs that are from a boot-up with the TV plugged in, but in standby.
This issue still occurs in this situation. So it says to me that there is still HDMI communications going on there.
This is turning into a fun rabbit hole. Do you intend to run ZFS on your RPi? (Exploring ZFS on RPi is on my personal project list.) You might have to disable ZFS to try newer kernels.
The crash log looks like the kernel is evaluating the newly connected video hardware. The NULL dereference clearly is a kernel bug. All this could be fixed in a newer kernel. Can you try building the image from nixos-unstable or with boot.kernelPackages = pkgs.linuxPackages_latest; to get kernel 5.18?
I particularly like this boot message
[ 10.773827] vchiq: module is from the staging directory, the quality is unknown, you have been warned.
I am not familiar with how modern RPi gets GPU firmware updates. If it is not part of the SoC firmware (loaded from /boot), then checking for newer GPU firmware may be fruitful.
The same crash occurs at 13.583324 when booting with the TV in standby when the kernel switches from its internal video frame buffer to using the RPi graphics.
Other than trying different kernels, I’m not sure what more to do with this. You might look at what kernel version works for Raspian or Ubuntu.
My guess is that something related to the hdmi is not initialized properly. Possibly there are RPi boot config parameters which could help. Comparing contents of /boot/firmware/config.txt (and files it includes) between working and non-working OS images might reveal more to try. I’d like to know if you find anything useful.
This unrelated GitHub discussion suggests there are many things to tweak for the RPi hdmi connection.
Looks like the kernels are:
raspbian: 5.15.30
NixOS 22.05: 5.15.32.
I would assume that the raspbian image has more pi related patches installed.
There is mention in the change notes above of “Remove 4kp60 option from Raspberry Pi Configuration”. But nothing I can see that screams “bug fix for 4k”
Re: using latest kernel.
I’m using the RPi4 nix configs from Nix-Hardware.
Specifically:
I’ve gotten a response on the GH issue stating that that particular kernel module has had some updates.
There doesn’t seem to be a “latest” for this, and I suspect using vanilla latest would cause other issues.
I’m not versed enough to switch nixops over to unstable without doing the same for my host system. Nor have I had much luck applying kernel patches in the past.
NixOS used to build images using linuxPackages_rpi4, but they were dropped in favor of the ones using the mainline kernel. For what it’s worth (which isn’t much), I don’t encounter this issue using linuxPackages_rpi4, but I’m also not using any graphical programs (I just have it connected to my monitor in case I need to troubleshoot it).
This change is a very strong candidate for fixing the failure we’ve been discussing. Unfortunately it does not appear until kernel 5.19 – that could be awhile.
I would try using older kernels, hoping that one predates introducing the bug (vc4_hdmi_enable_scrambling()). Maybe eperiment with linuxKernel.kernels.linux_5_10, linuxKernel.kernels.linux_5_4 and linuxKernel.kernels.linux_4 in your NixOS config. You also might be able to try 5.18 this way.
With all the changes to that area of the code, its entirely possible one of these older kernels will work for you while you wait for 5.19 to land in NixOS.
Also, if you have not yet tried this, run the build where you get the blank screen and let it sit for 20 minutes to see if it begins working. Some of the 4k display problems worked this way. (Although they did not involve a bad dereference, so this may not work for you, but the cost of trying it is low.)