How to create a nix package running in a sandbox, if the upstream build script would download files?

Hi all,
I am working on porting HHVM to Nix, and successfully make it work when sandboxing is disabled. HHVM was a package in nixpkgs however it was removed because the derivation was broken so I hope I could add it back to nixpkgs.

Unfortunately, the build script of HHVM would download its dependencies from internet, preventing it from building in a sandbox. I wonder if we have any general solution to deal with this situation?

I have some ideas like this

I would propose a more general solution, providing a special HTTP proxy to record URLs and hashes to download:

As a package maintainer, the usage would like this:

  1. Setting up the HTTP proxy as part of the derivation
  2. Adding a passthru.updateScript that would build the derivation with --no-sandbox
  3. Executing the updateScript, which will trigger the special proxy to record files downloaded and the proxy will update the nix file to include recorded URLs and hashes as inputs.
  4. Building the derivation again with --sandbox. This time the special proxy should redirect HTTP requests to these recorded files, which should have been downloaded locally because they are part of the derivation inputs.

I just wonder if there is anyone who have any attempt of a similar approach or if there is a better way to deal with the upstream build scripts that download files.

1 Like

That approach is cute, and would certainly work in general. You run into problems of when and how you collect those artifacts, and how you persist them, though, especially with proprietary applications where you might not be allowed to redistribute the downloaded artifacts.

It also really doesn’t help if the build script downloads binaries that need to be patched before they work on nixos, which is usually when these scripts are actually problematic.

The best option is to (very nicely and patiently) ask upstream to make their scripts not require downloads at build time through a switch (or just in general), by permitting a pre-download step, whose individual downloads can then be predicted and provided by fetchurls before you ever run the script. Bonus points if they give access to information of what they want to download in a standard file format, ala npm lock file.

I think this is preferable, because these download-at-build strategies need to end, and every upstream we cause to at least think about it is a step in the right direction. If we get as much mindshare as Debian one day this will have a huge beneficial effect on the reproducibilty of builds in general.

If that fails, or you’re just too hesitant to be that ideological about your packages, or just need something in the mean time, I think patching the build scripts to no longer download things at build time, and managing the downloads by hand or by parsing their build scripts, is the correct solution.

You can also fall back to FODs with a suitably neutered version of the script that only downloads but doesn’t build. There are a lot of problems with this, but it’s the most common approach I think.

Unfortunately this is a hard problem.

3 Likes

I once tried the proxy approach but it didn’t work because most of the web uses https, which means the proxy can see the server and port, but not the full URL.

You must provide a self signed CA so that your proxy can decrypt the HTTPS traffic, then the proxy should use normal CA to talk to the internet.

1 Like

It seems this idea has been implemented in GitHub - fzakaria/mvn2nix: Easily package your Maven Java application with the Nix package manager. , but not generalized for any HTTP GET requests.

There is a tweet about the same idea https://twitter.com/AlesyaHuzik/status/1356970175577219073

Hosted by Flying Circus.