Application crashes on x86_64-darwin only when build through Nix

Hello everyone,

at $WORK I have introduced Nix when we ported an application to Linux. It’s worked wonderfully and provides the development environment everyone on the team uses. The application is also bundled as a package and works without a hitch.

Now I wanted to build the application on macOS, as well. We must link against proprietary binaries, which were built for a minimum target of 10.15, which initially caused some issues. I was able to resolve these simply by using a toolchain based on the macOS 11 SDK. The application now builds successfully, but crashes immediately with a SEGFAULT. Attaching lldb yields the following information:

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x0000000000000000
error: memory read failed for 0x0

This seems peculiar to me, like we never even jumped to _start. This may be related to these warnings that are also printed:

warning: (x86_64) /nix/store/pavi51i5f5j5cl9s2rq1q4kqd58vxg2j-app-x86_64-apple-darwin-90/bin/app(0x0000000100000000) address 0x0000000100000000 maps to more than one section: app.__TEXT and app.__TEXT
warning: (x86_64) /nix/store/pavi51i5f5j5cl9s2rq1q4kqd58vxg2j-app-x86_64-apple-darwin-90/bin/app(0x0000000100000000) address 0x00000001035a7000 maps to more than one section: app.__DATA_CONST and app.__DATA_CONST
warning: (x86_64) /nix/store/pavi51i5f5j5cl9s2rq1q4kqd58vxg2j-app-x86_64-apple-darwin-90/bin/app(0x0000000100000000) address 0x00000001037cc000 maps to more than one section: app.__DATA and app.__DATA
Process 12440 launched: '/Users/zimmermann/development/development/result/bin/app' (x86_64)
warning: (x86_64) /nix/store/pavi51i5f5j5cl9s2rq1q4kqd58vxg2j-app-x86_64-apple-darwin-90/bin/app(0x0000000100000000) address 0x0000000100000000 maps to more than one section: app.__TEXT and app.__TEXT
warning: (x86_64) /nix/store/pavi51i5f5j5cl9s2rq1q4kqd58vxg2j-app-x86_64-apple-darwin-90/bin/app(0x0000000100000000) address 0x00000001035a7000 maps to more than one section: app.__DATA_CONST and app.__DATA_CONST
warning: (x86_64) /nix/store/pavi51i5f5j5cl9s2rq1q4kqd58vxg2j-app-x86_64-apple-darwin-90/bin/app(0x0000000100000000) address 0x00000001037cc000 maps to more than one section: app.__DATA and app.__DATA
warning: (x86_64) /nix/store/pavi51i5f5j5cl9s2rq1q4kqd58vxg2j-app-x86_64-apple-darwin-90/bin/app(0x0000000100000000) address 0x0000000100000000 maps to more than one section: app.__TEXT and app.__TEXT
warning: (x86_64) /nix/store/pavi51i5f5j5cl9s2rq1q4kqd58vxg2j-app-x86_64-apple-darwin-90/bin/app(0x0000000100000000) address 0x00000001035a7000 maps to more than one section: app.__DATA_CONST and app.__DATA_CONST
warning: (x86_64) /nix/store/pavi51i5f5j5cl9s2rq1q4kqd58vxg2j-app-x86_64-apple-darwin-90/bin/app(0x0000000100000000) address 0x00000001037cc000 maps to more than one section: app.__DATA and app.__DATA

Looking at otool -l, I cannot see any duplicate load commands. When building the application with the native toolchain and dependencies installed through brew, it runs as expected and attaching lldb does not yield any such errors.

I have never seen this behaviour before and have not been successful in Googling what could cause this issue. The warnings lead me to believe it may be related to linker problems, but I don’t understand why that would behave so differently between the native and Nix-provided toolchain.

The application is built using some decently complicated CMake files, which I am unfortunately not at liberty to share. I will try to recreate a minimal example that reproduces the issue. In the meantime, does anyone have any pointers for potential causes?

Thank you for your time!

1 Like

Does your app use C++? Are the proprietary binaries linking against the system libc++? If yes to both, can use use install_name_tool to -change the libc++ in the primary binary to the one the rest of your app is using?

2 Likes

My app does use C++. The proprietary code in question is a library, let’s call it libfoo, which we link into the application statically. libfoo provides an interface written in C++, including methods that accept stringstreams. I assume it was built against the system libc++ on whatever build host the vendor uses.
However, otool does not list any dynamic load commands for libfoo.a.

The completed app binary has load commands for /nix/store/1b9w7bv8rdcjxr1a7947j1i4rrdinshs-libcxx-16.0.6/lib/libc++.1.0.dylib and /nix/store/zh76ka9jq97lsshp7n3p5a8madg5x8zb-libcxxabi-16.0.6/lib/libc++abi.1.dylib, but not the system libc++. If I build the application using the standard Xcode toolchain, I have a load command for /usr/lib/libc++.1.dylib.

Is it possible there’s an ABI incompatibility at play?

Fundamentally, I am happy to use a solution that involves patching binaries. I tried to swap the Nix-provided libc++ for the system libc++ in the final binary, but that does not resolve the problems. However, the crash now occurs due to an access to address 2 rather than 0.

If indeed there’s an ABI incompatibility, my code would now have the wrong library. Since libfoo does not have any dynamic load commands, it does not seem I can use install_name_tool on it.

I have since been able to confirm the same behaviour on aarch64-darwin. I believe I have ruled out an ABI incompatibility, however. I have created a minimal C++ program that links against the proprietary libfoo. I can build and run it just fine with Nix. Thus, I conclude that the error is somehow a result of the much more complicated CMakeLists of the original project. However, I do not understand why the error manifests only with the combination of Nix and Darwin. Everything works on either Darwin with the system toolchain or Linux with a Nix-based toolchain.