Error while building kernel remotely

Hi all. I’ve encountered “Error 2” while building a custom kernel remotely. My device requires a specific custom kernel from nixos-hardware, and it needs to compile the kernel every time there’s a kernel update. I’ve set up a remote builder on another system running arch. My full configuration can be seen here.
The compilation works fine locally but it takes a very long time. Compiling remotely triggers an “Error 2” failure about 20 minutes into the build every time.
The last couple of lines of the log looks like this:

  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/modules/hdcp/hdcp_ddc.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/modules/hdcp/hdcp_log.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/modules/hdcp/hdcp_psp.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/modules/hdcp/hdcp.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/modules/hdcp/hdcp1_execution.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/modules/hdcp/hdcp1_transition.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/modules/hdcp/hdcp2_execution.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/modules/hdcp/hdcp2_transition.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_isp.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/isp_v4_1_0.o
  CC [M]  drivers/gpu/drm/amd/amdgpu/isp_v4_1_1.o
  LD [M]  drivers/gpu/drm/amd/amdgpu/amdgpu.o
  AR      drivers/gpu/built-in.a
  AR      drivers/built-in.a
make[1]: *** [/build/linux-6.12.19/Makefile:1944: .] Error 2
make: *** [../Makefile:224: __sub-make] Error 2

Can anyone please help me troubleshoot this? Thank you.

I think the true error is probably higher up in the log. There should be a line saying builder for '/nix/store/.....-linux-$version.drv' failed. You can use nix-store --read-log or nix log on that .drv path to show the full log and see what failed higher up.

EDIT: I was able to build that kernel fine. So I’m guessing the builder is running out of memory and you need to reduce its cores setting?

Thank you for testing it. I was able to build the kernel too locally and it only fails when building remotely. I’ve tried setting the core (which defaults to 16) on the builder to 1, 2, 4, and 8, and it always fails with the same error message. I’ve also been closely monitoring the resource utilization, and there’s always at least 10GB of available ram at the time of failure. Is it possible that the error is not memory related? Are there any other factors that may be causing it?

It’s impossible to say much more without seeing more of the build log.

I’ve uploaded the full build log here: nix build log · GitHub
Can you please take a look? Thank you very much.

Are you sure that’s the whole thing? It looks cut off. It doesn’t contain this error at the end from the original post

EDIT: Or maybe it’s this error from early on somehow? nix build log · GitHub

BusyBox v1.36.1 () multi-call binary.

Usage: xz -d [-cfk] [FILE]...

Decompress FILEs (or stdin)

	-d	Decompress
	-c	Write to stdout
	-f	Force
	-k	Keep input files
	-t	Test integrity
tar: kernel/kheaders_data.tar.xz: Wrote only 4096 of 10240 bytes
tar: Child returned status 1
tar: Error is not recoverable: exiting now
make[3]: *** [../kernel/Makefile:159: kernel/kheaders_data.tar.xz] Error 2
make[3]: *** Deleting file 'kernel/kheaders_data.tar.xz'
make[2]: *** [../scripts/Makefile.build:478: kernel] Error 2
make[2]: *** Waiting for unfinished jobs....

Why on earth is it using busybox?

Looks like github’s web UI is truncating the file, but the full file can be viewed in raw mode.
I have no idea why it’s using busybox. I have a pretty simple setup with just one configuration.nix and one flake.nix. Busybox is not in my config.

I can’t reproduce the error, even if I build directly from your flake. Does the builder have sandboxing disabled? Maybe that’s where it’s getting busybox from?

I found out that busybox came from a package called nix-busybox which is a requirement for the nix package manager on arch.
I have never explicitly configured sandbox on the builder. I just enabled sandbox on both the nixos system and the builder, then I got the exact same error message (full log). Busybox still appears in this log.

It shouldn’t matter. The Nix environment should clear any system packages out of PATH, and the sandbox should make system packages outright invisible.

1 Like

So my arch system was not running in sandbox mode? This is the current content of the /etc/nix/nix.conf file:

build-users-group = nixbld
trusted-users = rui
sandbox = true

I’ve restarted the nix daemon service and rebooted the system after adding sandbox = true. How else can I ensure sandbox mode is enabled?

That really ought to do it, so I have no idea what’s going on.

Thank you for looking into this. :smiling_face_with_tear:

For what it’s worth I am having a similar issue trying to build a custom aarch64 kernel. Although I have been able to reproduce the error relatively consistently. But also no output other than the Makefile Error 2. This is the kernel derivation I am trying to build: nix-parcels/packages/linuxPackages-pinenote/default.nix at 2f4d70094c213238672556b1f4d871fdcb1b952b · jzbor/nix-parcels · GitHub

EDIT: Here is my log for comparison

That’s not how you debug make logs (or most build systems). You have to look further up to find the errors.

In this case it’s lines 16692 to 16772:

/build/cc9HKvOR.s: Assembler messages:
/build/cc9HKvOR.s:136: Error: selected processor does not support `aese v19.16b,v21.16b'
/build/cc9HKvOR.s:137: Error: selected processor does not support `aesmc v19.16b,v19.16b'
...

Usually the easiest way to find the actual error is to search for some string like error: (applies to many build systems) or waiting for unfinished jobs (in multithreaded builds, and specific to make).

Thanks a lot. I did look um manually, but it seems like I missed that.