libGL: undefined symbol __GLXGL_CORE_FUNCTIONS

Rotaerk · July 14, 2018, 9:04pm

I’m working on a project in which I’m implementing the vulkan-tutorial.com examples in Haskell: GitHub - Rotaerk/vulkanTest. It’s setup to use nix for building the project, and for setting up nix-shell, with the specific version of nixpkgs locked down.

A couple weeks ago or so, I decided to upgrade to the latest version of nixpkgs. After doing so, if I built my project using cabal from within a nix-shell environment, the resulting program would give an error at runtime indicating that there was an undefined symbol in libGL.so.1. Below, I will refer to the working version before the upgrade as the “good” version of my project, and the broken version after the upgrade as the “bad” version.

After some investigation, I learned the following:

The good version was loading libGL from /run/opengl-driver/lib, while the bad version was loading it from some nix store path.
The good version was discovering the load path from the LD_LIBRARY_PATH environment variable, while the bad version was discovering the load path from the RPATH dtag built into the binary.
The good version’s binary contained a RUNPATH dtag, while the bad version’s binary contained an RPATH dtag.
According to rpath - Wikipedia, RPATH overrides LD_LIBRARY_PATH, while LD_LIBRARY_PATH overrides RUNPATH.
RUNPATH is built into the binary instead of RPATH if the linker is provided the flag --enable-new-dtags.

Thus, a change to nixpkgs was made somewhere along the line that caused the new-dtags feature to be disabled.

I then spent some time (over a week >_>) git bisecting nixpkgs to identify where this break occurred. It turns out that two changes contributed to this:

In the binutils package, there is a patch file that changes binutils such that new-dtags is enabled by default. However, a commit in Feb 2018 upgraded the binutils package from version 2.28.1 to 2.30, and didn’t account for changes in ldmain.c that invalidated this patch file. This is why RPATH is used instead of RUNPATH.

However, even though this change caused my project to build a binary containing RPATH instead of RUNPATH, my program still worked. This is because the libGL version pointed to by RPATH happened to be one that worked the same as the one specified by LD_LIBRARY_PATH.
Later, the libGL package was changed such that a different one ended up in RPATH, thus producing the symptoms of the bad version of my project. Unfortunately, I’ve lost track of which specific commit made this change to libGL.

I have created a pull request to fix the first issue. With this fix in place, when I cabal build my project from within a nix-shell, it now works fine. I also note that the RUNPATH is set instead of RPATH in the binary.

However, if (within that same nix-shell) I run my program with cabal repl or ghci, it gives me a similar undefined symbol error (though a slightly different one): can't load .so/.DLL for: /nix/store/bmlp2ppjxxfsd15fgh1jw44l17p4iw6a-libGL-1.0.0/lib/libGL.so (/nix/store/bmlp2ppjxxfsd15fgh1jw44l17p4iw6a-libGL-1.0.0/lib/libGL.so: undefined symbol: __GLXGL_CORE_FUNCTIONS).

Apparently GHCI uses its own linker, and this behavior appears to indicate that it ignores LD_LIBRARY_PATH and relies entirely on whichever paths would get specified in RUNPATH. Thus, barring GHCI being modified to look at LD_LIBRARY_PATH, if this is to work again, the second issue needs to be resolved; the libGL package needs to be fixed.

Any thoughts on what’s wrong with the current libGL package, and why it might produce that error?

guibou · July 14, 2018, 10:06pm

Are you running on NixOS? Is the pinned nixpkgs version of your package the same as the one used by NixOS?

If you answered no to any of the two previous function, you may try GitHub - nix-community/nixGL: A wrapper tool for nix OpenGL application [maintainer=@guibou] . Disclaimer: I’m the author of nixGL

NixGL ensures that the libGL you use is compatible with the project you are building.

I wanted to try it with your project, but nix-shell in the root of your repository started to build gcc from source, and was not motivated enough to do that. Is that normal? I’ll try tomorrow when GCC will be built.

Rotaerk · July 14, 2018, 10:52pm

The reason GCC tries to build from scratch is because I made my binutils fix on top of a nixpkgs commit that doesn’t have a build in the nixpkgs binary cache. I would have used the latest commit in nixpkgs master, but that breaks my build for an entirely unrelated reason… (Some hspec test fails.)

Anyway, I’m using NixOS, and yes, generally my project will not target the same version of nixpkgs as NixOS. I can try out nixGL later, buy really even if that works, wouldn’t it just be a workaround of a flaw in nixpkgs?

Rotaerk · July 16, 2018, 10:18pm

I just git bisected again, and the breaking change is one of these three commits:

https://github.com/NixOS/nixpkgs/commit/03a6766a6d080e064bea374f442a9f36a9a18b31
https://github.com/NixOS/nixpkgs/commit/803e87aa1e5bd071713276fd13e55854f7e5e385
https://github.com/NixOS/nixpkgs/commit/6bf1421f13d667c2997b67728cf777c6a70716a5

Rotaerk · July 17, 2018, 3:04am

Okay, the issue is resolved, and it was not a bug in those commits. Rather, it’s due to me building my project against a version of nixpkgs containing those commits while running a NixOS version based on a version of nixpkgs from before those commits. They’re incompatible.

Since I was using the nixos-18.03 channel, and these commits don’t exist in it yet, I switched to the nixos-unstable-small channel. After the upgrade, the above “undefined symbol” error disappeared.

guibou · July 17, 2018, 2:41pm

@Rotaerk

(sorry for the time my answers took, this tab stays open for a while… I had issues explaining the issue succintly, so I finally decided to send my draft, sorry, that’s a bit rough

(Note: I use OpenGL in the following message, but thats exactly the same idea with vulkain)

Indeed your issue is due to incompatibility between the library used in your package AND the librairies available on the system. nixGL fixs that, you should try it

You are right, that’s a flaw in nixpkgs, but actually that’s not that simple: Vulkain / OpenGL / Cuda / … are libs which depends on the target machine hardware. When you build in isolation with nix, you cannot know in advance which library will be available in the target system, so you cannot build for them.

Let’s use an example to understand the issue. Imagine you build a program which depends on a bunch of libraries, for example libc and libOpenGL. By definition, the libOpenGL will be provided by the system running the program, because it depends on the hardware of the machine.

On a “traditional” build setup, you’ll build your executable program, and dynamically link them with all the other libraries provided by your system. It will work flawlessly because you build from source with a total knowledge of the system on which the executable will be run. Now imagine that you want to ship you program to another system. You will ship program and hope that the other system correctly ship libc with an ABI compatible with your program. Most of the time it will run, because you are only using one public function (say fopen) from libc, the ABI is stable since many versions. The risk of incompatibility are really small (i.e. the small fopen symbol).

However, this situation is totally inacceptable, because:

It may fail because one the library is not available on the system.
libc changed its ABI and is not compatible with your expectation. That’s especially important knowing that libc have a compatibility policy where a program built with an older libc will run on a newer libc but the opposite is not true. In my former company we were building on ubuntu 10.04 to ensure the libc compatibility.
everything links correctly, runs correctly, but there is a small difference between the libc version you built and test with and the one on the system. This small difference will slowly leads to a corruption of the internal state of your program which will eventually leads to the end of the modern civilization.

This is why nix was created],you know exactly which bytes of which library will be used for the build and at runtime. So:

No missing library
No ABI incompatibility
No incompatible libraries behaviors. That’s especially important when library runMissilles had changed its C API from void launchMissile(bool dryrun) to void launcheMissile(bool realRun) if you see what I mean.

That’s perfect, we should all use nix!

However, there is the libOpenGL issue, which is not known in advance, which depends on the system on which your executable will be run. And now we have a big problem:

program depends on libc, provided by your nix store.
program depends on libOpenGL provided by the system, it will be alright, the OpenGL symbols are well known, the ABI is stable, no issue at all.
libOpenGL on the system, depends on libc from the system!

And now, we have an issue, because the libc of your system and the one from the nix store may be totally incompatible. Both libraries will be loaded in the same address space, they will share the same symboles, there will be conflict, incompatibilities, that’s a nightmare. libc is an example here, but there are too many libraries in this context. In the best case, it fails immediately, in the worse case, it runs and gives weird results.

We have a really simple solution, it is to set LD_LIBRARY_PATH=/usr/lib, this way, your program will load everything from the system. This way it will be compatible with the libOpenGL from the system, but we lose every guarantee provided by nix.

So the question is: is this possible to provide a libOpenGL compatible with my program (i.e. using the same libc, …)? and compatible with the hardware of the system on which your program is run, we need two informations:

The nixpkgs version used to build program
The OpenGL driver and the version used on the target machine. The version is important because of the binary interface between the userland driver and the kernel space driver.

The quick answer is no: obviously, you don’t know “at build time” if your user will use Nvidia / Intel / Amd… However user can install a tool which will install the right libOpenGL knowing their system and the nixpkgs version used to build the program they want to run. And this tool is nixGL.

This is a hack for a situation which is complicated. Most other solutions need a way to separate the “library space” of libOpenGL from the “library space” of your program. There is work to do that (e.g. https://git.collabora.com/cgit/user/vivek/libcapsule.git/tree/README ). You can also write a client / server solution (we’ll call it libGLNetwork), where each computer have a running server which accepts json representing OpenGL commands and your program will link with the libGLNetworkClient from the nix store. As long as the json protocol used by both end is the same, it will work. You can also have a software implementation of libOpenGL, this way it will be provided by your nix store, but we don’t want any of these solution for performance reasons.

There is no satisfying solution for now (nixGL have its issues too, it breaks every month, it does not support AMD GPU for now (I don’t have the hardware, feel free to contribute), and I, as main and only contributor, will slowly lose interest in it: I don’t have Nvidia hardware anymore and I’m not working in the computer graphic industry anymore.

Good luck.

vcunat · August 3, 2018, 6:41pm

It’s actually more complex, e.g. libGL depends on more libraries than just libc. We had such a problem with some wayland library a couple years ago.