I’m working on a project in which I’m implementing the vulkan-tutorial.com examples in Haskell: https://github.com/rotaerk/vulkanTest. It’s setup to use nix for building the project, and for setting up nix-shell, with the specific version of nixpkgs locked down.
A couple weeks ago or so, I decided to upgrade to the latest version of nixpkgs. After doing so, if I built my project using cabal from within a nix-shell environment, the resulting program would give an error at runtime indicating that there was an undefined symbol in libGL.so.1. Below, I will refer to the working version before the upgrade as the “good” version of my project, and the broken version after the upgrade as the “bad” version.
After some investigation, I learned the following:
The good version was loading libGL from
/run/opengl-driver/lib, while the bad version was loading it from some nix store path.
The good version was discovering the load path from the LD_LIBRARY_PATH environment variable, while the bad version was discovering the load path from the RPATH dtag built into the binary.
The good version’s binary contained a RUNPATH dtag, while the bad version’s binary contained an RPATH dtag.
According to https://en.wikipedia.org/wiki/Rpath, RPATH overrides LD_LIBRARY_PATH, while LD_LIBRARY_PATH overrides RUNPATH.
RUNPATH is built into the binary instead of RPATH if the linker is provided the flag
Thus, a change to nixpkgs was made somewhere along the line that caused the new-dtags feature to be disabled.
I then spent some time (over a week >_>) git bisecting nixpkgs to identify where this break occurred. It turns out that two changes contributed to this:
In the binutils package, there is a patch file that changes binutils such that new-dtags is enabled by default. However, a commit in Feb 2018 upgraded the binutils package from version 2.28.1 to 2.30, and didn’t account for changes in ldmain.c that invalidated this patch file. This is why RPATH is used instead of RUNPATH.
However, even though this change caused my project to build a binary containing RPATH instead of RUNPATH, my program still worked. This is because the libGL version pointed to by RPATH happened to be one that worked the same as the one specified by LD_LIBRARY_PATH.
Later, the libGL package was changed such that a different one ended up in RPATH, thus producing the symptoms of the bad version of my project. Unfortunately, I’ve lost track of which specific commit made this change to libGL.
I have created a pull request to fix the first issue. With this fix in place, when I cabal build my project from within a nix-shell, it now works fine. I also note that the RUNPATH is set instead of RPATH in the binary.
However, if (within that same nix-shell) I run my program with cabal repl or ghci, it gives me a similar undefined symbol error (though a slightly different one):
can't load .so/.DLL for: /nix/store/bmlp2ppjxxfsd15fgh1jw44l17p4iw6a-libGL-1.0.0/lib/libGL.so (/nix/store/bmlp2ppjxxfsd15fgh1jw44l17p4iw6a-libGL-1.0.0/lib/libGL.so: undefined symbol: __GLXGL_CORE_FUNCTIONS).
Apparently GHCI uses its own linker, and this behavior appears to indicate that it ignores LD_LIBRARY_PATH and relies entirely on whichever paths would get specified in RUNPATH. Thus, barring GHCI being modified to look at LD_LIBRARY_PATH, if this is to work again, the second issue needs to be resolved; the libGL package needs to be fixed.
Any thoughts on what’s wrong with the current libGL package, and why it might produce that error?