Using nix infrastructure to reliably generate `compile_commands.json`

danielbarter · September 24, 2022, 4:40pm

Hey everyone,

I have been digging around in the core stdenv infrastructure recently and have come up with a robust solution for generating compile_commands.json files on nix.

First, let me explain some background. In the C++ world, the most popular LSP server is clangd. For clangd to function correctly, it needs a compile_commands.json, which is essentially just a list of all the compilation commands which were executed during a build process. This sounds simple, but because of the complexity of modern build systems, it can sometimes be hard to generate a compile_commands.json. Currently, there are three approaches:

Cmake will generate a compile_commands.json if asked. Unfortunately, this only works for cmake projects, and in my experience, is not super reliable. I have had projects in the past where cmake just refuses to generate a compile_commands.json. With nix it doesn’t work great either because we hide some of our compile flags inside the compiler wrappers. This can be worked around by reading environment variables from nix-support.
The project bear will track system calls to try and figure out exactly which compile commands are being used. This is the only choice for build systems other than cmake This works quite well on debian based systems, but not on nix, again because of the compiler wrappers. It is something they are actively working on, but it is just a difficult problem to solve generally.
clang has the pretty much undocumented flag -MJ which generates compile_commands.json files, but they need to be spliced together. I have never managed to get this approach working correctly.

On nix we religiously wrap our C compilers. This interferes with cmake and bear’s ability to capture compile commands accurately. I personally think the compiler wrappers are great, because they allow up to set up custom compiler environments very easily. It is also fairly straightforward to instrument the compiler wrappers so they they can be used to generate compile_commands.json files reliably! Here is a PR doing that:

https://github.com/NixOS/nixpkgs/pull/192694

It works as follow: It adds support for a post-wrapper-hook.sh in the nix-support of the wrapper. The post-wrapper-hook can be used to generate a compile_commands.json for any project using:

You use mini_compile_commands_client.py in the post-wrapper-hook.sh to extract the compile commands, which get sent through a unix socket to a running mini_compile_commands_server.py, that stores them and then writes them to an output file. The client server architecture is required here because usually, multiple compiler invocations are happening in parallel.

All of the code linked above is still very rough, but it is working. I have tested it on several large projects. Here is an example shell.nix which creates an environment with mini_compile_commands_client.py hooked into the compiler wrapper.

with (import <nixpkgs> {});
let llvm = llvmPackages_latest;
in (mkShell.override {stdenv = ( mini-compile-commands.wrap llvm.stdenv );}) {
   buildInputs = [ cmake gtest ];
}

When the compiler is called, if a mini_compile_commands_server.py is running, it will send through the compiler commands.

Note, if you do want to try it out, you will need to rebuild large chunks of nixpkgs locally, so be careful!

I am writing this post because I think this is a very nice feature. In particular, it would allow us to have IDE integration with clangd for working on nix from nixos! I am pretty committed to getting it merged and I would love some community feedback about implementation details. As I mentioned above, this is still a work in progress. There a lot of obvious nits, and nothing is well documented right now. I am more interested in feedback about the approach in general.

Thanks for reading

ElvishJerricco · September 24, 2022, 11:48pm

I know absolutely nothing about bear, but I’m quite surprised that sniffing syscalls isn’t enough to deal with Nixpkgs’ wrappers. At the end of it all, the wrapper still invokes the compiler with a determined set of flags, so that syscall should be quite evident.

danielbarter · September 24, 2022, 11:55pm

i think the main issues end up being there are a lot of ways to invoke the compiler. The nix wrapper uses exec and for whatever reason, bear doesn’t handle this case particularly well. Regardless, bear is quite a large project at this point. There is quite a lot of logic in bear for trying to distinguish between wrappers and compilers already. In theory, it should be fixable, but every platform I have ever used bear on (mostly exotic super computer environments and debian based systems), there has always been issues. Hence why I to just produce a nix specific solution

vcunat · September 25, 2022, 8:24am

1b. meson also generates compile_commands.json (or maybe ninja underneath does it). But in my experience it’s not complete and I have to add some extra search paths from the nix-shell that I’m using (some of the paths added in our wrappers – and regenerate occasionally to keep up with their updates).

danielbarter · September 25, 2022, 4:22pm

Ah yeah, I forgot about meson. Sounds like it is similar to cmake, in that it will produce something, but you need to augment using the flag files in nix-support to make sure clangd knows where C++ stdlib is. Needing to regenerate is another issue, which i don’t think is solvable.

Anyway, the point I am trying to make is that our approach to compiler wrapping means that on nix, we could have a method for generating compile_commands.json which works the same for all C++ projects. No need to remember which flags to pass to the make file generator, or how to use bear etc.

vcunat · September 26, 2022, 8:13am

I regenerate the bulk by running a command inside the corresponding nix-shell, producing a screenful of text to be pasted on the right place in .clangd

echo "$NIX_CFLAGS_COMPILE" | sed 's/ -isystem /\n-I/g' | sed -e 's/^/    - /'

Though it’s an overkill, as many of the dependencies aren’t used as a C library in my case.

EDIT: and yes, that command misses glibc and gcc paths. I don’t know how to get those so nicely. Usually I compile one file with NIX_DEBUG=1 and find the paths inside… but all that is because I rarely actually need regeneration.

danielbarter · September 26, 2022, 6:45pm

@vcunat: yeah, that is pretty much what I currently do. All the flags are listed in the nix-support folder in the wrapper, but I am getting sick of extracting them each time i want to have a working IDE in some unfamiliar C++ project.

I spent some time this morning cleaning everything up, and refactoring. The ergonomics are massively improved. You now create an environment as follows:

with (import /home/danielbarter/nixpkgs {});
let llvm = llvmPackages_latest;
in (mkShell.override {stdenv = ( mini-compile-commands.wrap llvm.stdenv );}) {
   buildInputs = [ cmake gtest ];
}

danielbarter · October 8, 2022, 4:57pm

OK, bumping this because I want to get it into nixpkgs so I can start using it! Since it requires a small change to cc-wrapper, using it without cache support is expensive.

Here is a video of using it to get ide features working for nix itself: https://youtu.be/FB1aLN_MUuY. As far as I am aware, there isn’t a straightforward way to do this currently

nrdxp · October 8, 2022, 11:50pm

I spent some time trying to generate a compile_commands.json for the Nix codebase itself, and I realized that it is quite important to use a version of bear cut from the same nixpkgs as the rest of the toolchain.

I originally tried nix run nixpkgs#brear but that gave me a bunch of random errors, so what I ended up doing was just adding bear to the devshell for Nix (to ensure it comes from the same nixpkgs as the compiler, etc) and it worked just fine. Perhaps it didn’t track everything properly, I never paid close enough attention, but it seemed to work well enough with clangd while I was exploring the codebase.

danielbarter · October 9, 2022, 12:29am

I just tried this and bear is only detecting the invocations of the compiler wrappers, so the resulting compile commands is missing all the flags related to c/c++ standard libraries:

Here is an example entry generated using bear, where I added bear to the buildInputs of the nix derivation using overrideAttrs:

  {
    "arguments": [
      "/nix/store/nyn8hpjrdi6qix9bi0hc9iwn3xy6bdmc-clang-wrapper-14.0.6/bin/clang++",
      "-c",
      "-O3",
      "-fPIC",
      "-g",
      "-Wno-deprecated-declarations",
      "-g",
      "-Wall",
      "-include",
      "config.h",
      "-std=c++17",
      "-I",
      "src",
      "-I/nix/store/crd6z8q9mw8b9qw7fgwrbwjwv3h5gl1j-lowdown-1.0.0-dev/include",
      "-I/nix/store/9ljgq99z6wsanbz58cf3h6wd7isi94zi-boehm-gc-8.2.2-dev/include",
      "-I/nix/store/rn933kx7qcl2pwc2hq8637grm3ljghq6-libseccomp-2.5.4-dev/include",
      "-I/nix/store/4f7sh6hkx6rk7g6g2s3zrg59przrvjkq-libcpuid-0.5.1/include/libcpuid",
      "-I/nix/store/7gk042i5ph2n58881441vy8vnccad17z-brotli-1.0.9-dev/include",
      "-I/nix/store/w1l3kzsh8n9zw8l54sw9wdlgc2m0irwl-libsodium-1.0.18-dev/include",
      "-I/nix/store/4wndgrwxxc2gfcz7mx9skg0l8n2jqs6k-editline-1.17.1-dev/include",
      "-I/nix/store/v0afq0pydak5q898gff3pvcikvclcmmd-curl-7.85.0-dev/include",
      "-I/nix/store/v0dls0i84zr4b0jyclcj8mip69pg1cgk-sqlite-3.39.3-dev/include",
      "-I/nix/store/8y9cbb1lqc0qk50rskkdrk2w0c7wygwp-libarchive-3.6.1-dev/include",
      "-I/nix/store/0yi321ikhmjzy4f17mwwx2vz1ifg8ack-openssl-3.0.5-dev/include",
      "-o",
      "src/libutil/config.o",
      "src/libutil/config.cc"
    ],
    "directory": "/tmp/source",
    "file": "/tmp/source/src/libutil/config.cc",
    "output": "/tmp/source/src/libutil/config.o"
  },

When I try and load up a file, clangd complains that it has encountered too many errors (as a result of not being able to locate where all the C++ stdlib symbols are defined), and gives up. Were you using vscode by any chance @nrdxp? I think it is smart enough to provide clangd with fallback C/C++ standard libraries

doronbehar · October 9, 2022, 1:56pm

What is the equivalent way of creating such an stdenv with gcc?

danielbarter · October 9, 2022, 3:24pm

I simplified things a lot since that first post. To get an env with gcc you would do this:

with (import /home/danielbarter/nixpkgs {});
(mkShell.override {stdenv = ( mini-compile-commands.wrap stdenv );}) {
   buildInputs = [ cmake gtest ];
}

I have updated the initial post!

doronbehar · October 9, 2022, 3:31pm

Your PR would benefit from such an example code at the top comment, and even better - documented somewhere in Nixpkgs - perhaps even in stdenv.chapter.md.

nrdxp · October 9, 2022, 4:32pm

No, I used the clangd lsp in helix, a vim like terminal editor that has lsp integration built in. I didn’t recieve any errors from the server either. I’ll give it a shot again later and report back, maybe with a screen recording so you can see what I’m doing.

danielbarter · October 9, 2022, 5:18pm

That would be super useful @nrdxp! Three things I would be interested in seeing are

What the compiler invocations recorded in the compile_commands.json look like.
Where you get sent when you jump into a C++ standard library header
Maybe a dump of of the environment variables and command line arguments for the running clangd process:

strings /proc/$(pidof clangd)/environ
strings /proc/$(pidof clangd)/cmdline

nrdxp · October 9, 2022, 8:29pm

https://asciinema.org/a/VMpTJzih4fmeu3EpmbFfMwgXq
Here is the compile_commands.json that was generated:
cc.json.tar.gz
I have to run but I’ll see if I can capture the env later.

danielbarter · October 9, 2022, 8:56pm

@nrdxp: no need to post anymore, more than enough in the compile_commands.json! Bear seems to be detecting both the wrapped compiler calls and the unwrapped compiler calls, which i have never noticed before. I wonder if this is a gcc thing? I pretty much always use clang.

danielbarter · October 9, 2022, 9:40pm

OK, here is a simple test program:

#include <iostream>
#include <stdio.h>
int main() {
    std::cout << "hello from cpp\n";
    printf("hello from c\n");
}

In an environment

nix-shell -E "with (import <nixpkgs> {}); (mkShell.override { stdenv = llvmPackages.stdenv;}) { buildInputs = [bear];}"

running bear -- $CXX test.cc -o test does capture both the wrapper and the compiler call:

[
  {
    "arguments": [
      "/nix/store/cjlm0g395i329qm9wb2s6nwl5sikcd51-clang-wrapper-11.1.0/bin/clang++",
      "-c",
      "-o",
      "test",
      "test.cc"
    ],
    "directory": "/tmp/test",
    "file": "/tmp/test/test.cc",
    "output": "/tmp/test/test"
  },
  {
    "arguments": [
      "/nix/store/sddis0ibg2vrxqxd541746lwjllg9fvf-clang-11.1.0/bin/clang-11",
      "-cc1",
      "-triple",
      "x86_64-unknown-linux-gnu",
      "-emit-obj",
      "-disable-free",
      "-disable-llvm-verifier",
      "-discard-value-names",
      "-main-file-name",
      "-mrelocation-model",
      "pic",
      "-pic-level",
      "2",
      "-mframe-pointer=none",
      "-fmath-errno",
      "-fno-rounding-math",
      "-mconstructor-aliases",
      "-munwind-tables",
      "-target-cpu",
      "x86-64",
      "-fno-split-dwarf-inlining",
      "-debugger-tuning=gdb",
      "-nostdsysteminc",
      "-resource-dir",
      "/nix/store/cjlm0g395i329qm9wb2s6nwl5sikcd51-clang-wrapper-11.1.0/resource-root",
      "-idirafter",
      "/nix/store/bjhfs0gqi3p5zswg7r9bxjyn0iywq79g-glibc-2.34-210-dev/include",
      "-isystem",
      "/nix/store/507896pwv9ghpzb0rwd456613l9mar43-compiler-rt-libc-11.1.0-dev/include",
      "-isystem",
      "/nix/store/507896pwv9ghpzb0rwd456613l9mar43-compiler-rt-libc-11.1.0-dev/include",
      "-isystem",
      "/nix/store/65v2c245h5qa9mpc7dxhqkfjinl6phx0-gcc-11.3.0/include/c++/11.3.0",
      "-isystem",
      "/nix/store/65v2c245h5qa9mpc7dxhqkfjinl6phx0-gcc-11.3.0/include/c++/11.3.0/x86_64-unknown-linux-gnu",
      "-D",
      "_FORTIFY_SOURCE=2",
      "-internal-isystem",
      "/nix/store/cjlm0g395i329qm9wb2s6nwl5sikcd51-clang-wrapper-11.1.0/resource-root/include",
      "-O2",
      "-Wformat",
      "-Wformat-security",
      "-Werror=format-security",
      "-fdeprecated-macro",
      "-fdebug-compilation-dir",
      "/tmp/test",
      "-ferror-limit",
      "19",
      "-fwrapv",
      "-stack-protector",
      "2",
      "-stack-protector-buffer-size",
      "4",
      "-fgnuc-version=4.2.1",
      "-fcxx-exceptions",
      "-fexceptions",
      "-fcolor-diagnostics",
      "-vectorize-loops",
      "-vectorize-slp",
      "-faddrsig",
      "-x",
      "c++",
      "-o",
      "/run/user/1000/test-92e997.o",
      "test.cc"
    ],
    "directory": "/tmp/test",
    "file": "/tmp/test/test.cc",
    "output": "/run/user/1000/test-92e997.o"
  }
]

but when I open my editor and direct to the file, clangd complains it can’t find headers for C/C++ standard libraries. If I delete the first entry which corresponds to the wrapper, then everything works. So it seems that clangd is parsing the list and only taking the first entry that corresponds to the file, which makes sense. From the compilers perspective, why would you compile a file twice.

danielbarter · October 19, 2022, 10:09pm

Bumping this again, to try and get some more eyeballs. I got some really great documentation feedback from fricklerhandwerk which I have tried to address. I have added some more examples demonstrating how to use mini-compile-commands to generate compile_commands.json files for both the linux kernel and nix itself.

The main thing which needs to be vetted is the 5 lines being added to cc-wrapper.sh. At this point I am fairly convinced that it isn’t an issue, since analogous code already exists in ld-wrapper.sh. I think I am probably going to need to change the name post-wrapper-hook.sh to something like compiler-wrapper-hook to be more consistent with the post-link-hook that is used by ld-wrapper.

https://github.com/NixOS/nixpkgs/pull/192694

danielbarter · November 13, 2022, 6:53pm

Just a quick update on this. It is now finished! A simpler PR with the required hook has been merged into master, so now mini compile commands doesn’t require a full nixpkgs rebuild. I have been using it for the past week and am very happy with it. Makes standing up IDE support for projects with exotic build systems really easy. Overall, i would say the user experience is not as nice as bear, but IMO it is significantly more reliable.