`callPackage` increases closure size x70 versus using `import`

Is this normal, or is there a known gotcha that the closure size of a derivation created by callPackage can be increased by A LOT (about 70 times bigger than its import equivalent)? It increases from 38MB to 2.6GB containing GHC and the whole nine yards that are needed at build-time and not at run-time.

./nix/pkgs.nix:

compiler:
let
  config = {
    allowUnfree = true;
  };

  overlays = import ./overlays.nix compiler;

  stable1903 =
    fetchGit (builtins.fromJSON (builtins.readFile ./pkgs-rev.json));
in
  import stable1903 { inherit config overlays; }

And then the relevant part from default.nix:

...
compiler = "ghc864";
pkgs = import ./nix/pkgs.nix compiler;
myDrv = import ./somewhere { inherit pkgs compiler; };
...

versus

...
compiler = "ghc864";
pkgs = import <nixpkgs> {};
myDrv = pkgs.callPackage ./somewhere { inherit pkgs compiler; };
...

The derivation is:

{ pkgs, compiler, ... }:

let
  src = pkgs.lib.sourceByRegex ./.
    [ "app"
      "app/.*"
      "resources"
      "resources/.*"
      "src"
      "src/.*"
      "my-project.cabal"
      "CHANGELOG.md"
      "Makefile"
      "LICENSE"
    ];
in (pkgs.haskell.lib.dontHaddock
     (pkgs.haskell.packages.${compiler}.callCabal2nix "my-project" src {})
     ).overrideAttrs (oldAttrs: {
       configureFlags = ["--ghc-option=-Werror"] ++ oldAttrs.configureFlags;
     })

I have multiple projects each with its own derivation and this problem is reproduced with all of them. So it’s probably not an issue with the derivation itself.

I can just fix the problem with import as I don’t really benefit from using callPackage in this simple scenario but I’d like to learn what is going on.

Any help is greatly appreciated!

What’s the content of overlays.nix?

overlays.nix:

compiler:
let
  forBrittany = self: super:
    let
      inherit (self.haskell.lib) overrideCabal dontCheck;
      markUnbroken = drv: overrideCabal drv (drv: { broken = false; });
    in {
      haskell = super.haskell // {
        packages = super.haskell.packages // {
          "${compiler}" = super.haskell.packages."${compiler}".override {
            overrides = hsSelf: hsSuper: rec {
              #depends on multistate (which was marked as broken)
              butcher = markUnbroken hsSuper.butcher;
              #only tests are broken due to depending on an older hspec than
              #what is in nixpkgs.
              multistate = dontCheck (markUnbroken hsSuper.multistate);
              #v0.12 (ghc 8.6 support, whereas stable-19.03 has v0.11
              brittany =
                hsSelf.callPackage extra-haskell-deps/brittany.nix {};
            };
          };
        };
      };
    };

  profiling = self: super: {
    profiledHaskellPackages = self.haskell.packages."${compiler}".override {
      overrides = hsSelf: hsSuper: {
        mkDerivation = args: hsSuper.mkDerivation (args // {
          enableLibraryProfiling = true;
        });
      };
    };
  };

in [forBrittany profiling]

I looks to me like your overlay is duplicating all Haskell modules, then overwriting a ton of them with the merge operator. If my observation is right, and this is being done in multiple files, you might be duplicating every Haskell module an exponential number of times.

I’m having difficulty following this web of derivations/expressions here (maybe that’s just me though). I wouldn’t mind poking around and testing this if you could upload a gist.

One thing that might be worth checking out is super.haskellPackages.extend. I didn’t know about it until recently and made my global haskell overlay much much cleaner.
This might not be relevant at all; but here is snippet of my global overlay that might help. Haskell overlays are always a bit tricky compared to most other package additions/overrides.

# `/etc/nixos/mypkgs/overlay.nix`
self: super:
let
  hlib = super.haskell.lib;
  addHaskPack = path: hargs: hlib.dontCheck
    (hlib.dontHaddock (self.haskellPackages.callPackage path hargs));
in {
  # Haskell Modules
  haskellPackages = super.haskellPackages.extend (hself: hsuper: {
    xmobar = addHaskPack ./pkgs/development/haskell-modules/xmobar {};
    gutenhasktags = addHaskPack
        ./pkgs/development/haskell-modules/gutenhasktags {};
    gitlib-libgit2 = hlib.dontCheck hsuper.gitlib-libgit2;
  });

  /* ... other derivations ... */
}

Thanks for the suggestion. I’ll try looking into haskellPackages.extend.

I looks to me like your overlay is duplicating all Haskell modules, then overwriting a ton of them with the merge operator.

Would you mind elaborate which of the overlays (or any of them?) is duplicating?
While it is profiling defines entirely different derivations, they are used only when profiling is enabled (nix is lazy after all), and the existence of these 2 overlays is the same wnen using callPackage and import so I don’t suspect the overlays themselves to be the source of the closure size increase. This is why I didn’t post the overlays originally.

I would like to reproduce a minimal example but I’m extremely busy with work at the moment.
When I have a breather I shall upload a minimal example for us all to test our theories on.

Yeah I can elaborate a bit. As a preface/disclaimer this is a theory.

callPackage does more than just a normal import. You can check out the relatively short definition in nixpkgs/lib/customization.nix. The important bit being that it makes it’s target “overrideable” which in this case is not necessary, and potentially problematic since your “calling” all Haskell packages, not a single one. Additionally I think that use of // in the overlay instead of using extend is causing all haskell packages to be listed as unevaluated “thunks” twice, and then twice again in the outer layer in the closure’s environment. This is a rare case where lazy evaluation is killing your space efficiency.
I also have some concerns about nesting overlays inside of overlays which I see happening 4 levels deep, potentially producing 64 (4^3) unnecessary clones of every Haskell derivation.

The TLDR; is that you have a lot of nested overlays and merges going on, and callPackage is making things even worse by trying to make every single attribute in the mix overrideable.

1 Like

First of all, thank you very much for the elaboration, even if it’s just a
theory, I am looking (really hard) for opportunities to learn and get better
at this, and I think that these kinds of discussions (even if the theory ends
up not applying here) to be beneficial to this goal.

If you don’t mind I will ask a couple more questions

The important bit being that it makes it’s target “overrideable” which in this case is not necessary, and potentially problematic since your “calling” all Haskell packages, not a single one.

What do you mean by calling all Haskell packages?

The callPackage was to specific haskell projects, why does it amount to
“calling” all packages?

The TLDR; is that you have a lot of nested overlays and merges going on, and callPackage is making things even worse by trying to make every single attribute in the mix overrideable.

I tried removing the overlays completey,

pkgs.nix:

compiler:
let
  config = {
    allowUnfree = true;
  };

  #overlays = import ./overlays.nix compiler;
  overlays = [];

  stable1903 =
    fetchGit (builtins.fromJSON (builtins.readFile ./pkgs-rev.json));
in
  import stable1903 { inherit config overlays; }

They were needed for profiling environment, and for getting Brittany to build.

If I’m just building one of the Haskell projects, I need neither so it works
when I remove the overlays. It did not even re-build anything as it got
the same hash as when the overlays were enabled, so just to make sure I added
a blank line to one of the Haskell source files to cause a rebuild.
The bloated size (with callPackage) and dependencies did not change even
after the aforementioned change. I guess the next step is for me to prepare
a reproducible example.

Rereading this I think that I misunderstood the the structure of this build; particularly your default.nix file. When I set out to write a small test version of the situation based on the description I realized that I misunderstood where the callPackage was being used. Why is callPackage in your default.nix? It’s definitely not unheard of, I just need context.

To be honest, there’s no practical reason to use callPackage except that I saw it being used often, to get the project’s derivation.
Derivations are themselves defined in their own files, as functions of their dependencies, and callPackage automatically passes the dependencies from nixpkgs.
In this case specific case, the number of dependencies that come from nixpkgs is practically 0, so there’s no advantage of calling callPackage over import for me, but I just wanted to be closer to convention.
Perhaps I’m wrong and it’s actually a bad idea to use callPackage in this simple context? If so I would be glad to be told why, as I don’t remember coming across it in any of the manuals or blog posts.

The default.nix just gets everything together to return a set of derivations that are used for the project.
2 Haskell projects, their variants use for profiling, a diagrams project for documentation’s diagrams, and the nixpkgs that is used, for the occasional debugging purposes.

Is callPackage a bad move in simple contexts?