Status of lang2nix approaches

I missed buildBazelPackage, whose fetchAttrs.sha256 is hash of directory of vendored deps, similar to cargoSha256.

It is not a lang2nix, but it suffers from the same problems: the hash is drifting too as there is something changing on the servers it downloads from (for example, python3.pkgs.tensorflow_2's fetchAttrs.sha256 in nixpkgs's master is not valid:

hash mismatch in fixed-output derivation '/nix/store/wmnf16gin2pcqgawjk26rggprnxfbdb8-tensorflow-gpu-2.4.2-deps'
  wanted: sha256:10m6qj3kchgxfgb6qh59vc51knm9r9pkng8bf90h00dnggvv8234
  got:    sha256:1xjmfp743vmr6f36d15dlmkgiin89g31f68hhfzgk3sm1xpk1mj2

BTW, snapshoting of PyPi and Conda on 12hr basis is already here:

The H variant - self-modification of Nix-code from builtins.exec in case of outdated lock-file - can be extended to managing sha256 of fetch*-functions, removing this burden from people too:

If the url passed to fetch* function exists in <nixpkgs/fetch.lock> - it will be used, otherwise - calculated and added.

Also, fetch.lock would be the single source of truth for everything that nixpkgs downloads, for offline installation (it is of demand: Offline build "source closure", Using NixOS in an isolated environment, …)

Deleting of fetch.lock would initiate mass-test for dead and changed links.

A related case: NixOS’s system.requiredKernelConfig which is currenty broken (it does not enable or check the kernel options)
Enabling a kernel config option affects other options, a process which has something to do with Maven resolving.
Implementing system.requiredKernelConfig puts us in front of the choice:

  1. always recompile the kernel adding the requested options when a relevant NixOS config setting (zram.enable or swapDevices or adding "amdgpu-pro" to services.xserver.videoDrivers, …) is changed. Even if the requested option is already enabled on default kernel implicitly.
  2. use IFD to read the final kernel options
  3. memorize somewhere the result of resolving process (requested options → final options), similar to (requested artifacts → final artifacts); if the lang2nix-problem can be solved in a general way, its solution can also serve system.requiredKernelConfig
2 Likes

BTW, if we are heading to adopt SWHIDs, we should retire fetchFromGitHub (and its tarball-downloading friends for other git-hostings) in favor of fetchgit.

Tarballs from https://githib.com/$owner/$repo/archive/$revision.tar.gz miss .gitignore files, and they can be patched by git-archive

Example https://github.com/cryfs/cryfs/blob/3f66c7ceda4f934a78a6a83d0d735f911aaaecf8/src/gitversion/_version.py#L21-L27 - the files in tarball and repository are different:

index 207dc691b..1c24d6422 100644
--- a/nix/store/k8vzbf7isybzyqh48fhsbf5hnn8rzdlc-fetchgit-rcf3023406969b14610df03a043fca8a078c9c195-2019-06-08/src/gitversion/_version.py
+++ b/nix/store/6m0ain6g8pwrp63676ymd62xgxw92n95-source/src/gitversion/_version.py
@@ -23,8 +23,8 @@ def get_keywords():
     # setup.py/versioneer.py will grep for the variable names, so they must
     # each be defined on a line of their own. _version.py will just call
     # get_keywords().
-    git_refnames = "$Format:%d$"
-    git_full = "$Format:%H$"
+    git_refnames = " (tag: 0.10.2)"
+    git_full = "cf3023406969b14610df03a043fca8a078c9c195"
     keywords = {"refnames": git_refnames, "full": git_full}
     return keywords

if we fall back downloading sources from softwareheritage.org, where these macros are not extended, the build will fail. So buildPhase should not rely on the sources being pre-processed with “git-archive”.

Also:

  1. those $Format macros could even have current time Git keyword expansion — Git Memo v1.1 documentation making tarball content volatile. Yes, not even the tarball itself could vary with tar/gzip upgrade/command line switches on git-hosting’s cloud, but the files inside tarball could be changed in the next instance of the tarball, with the same commit-id and tree-id.
  2. It seems that choices which files not to include in the tarball and whether to preprocess them depends on an ad hoc decision on git-hosting, and could change. Moving to stable ids like SWHIDs means we could not download tarballs and rely on preprocessors on git-hostings
1 Like

I’m really excited about this idea.

Below is a small demo. The demo is not about lang2nix (first, to avoid language-specific objections - “rust has lock files”, “there is a sbt plugin for that”, …, and second, I have not yet implemented it for language frameworks, and by the time it happens, I will probably get rid of bash and I will have trouble showing a working demo example on that Nix we can all read).

An example about shaderc which has vendored dependencies published in a separate branch https://github.com/google/shaderc/tree/known-good shortly before or after the release and maintained in nixpkgs manually: https://github.com/NixOS/nixpkgs/blob/67c4132368dd7612d5226a99ec8a2e3c1af68b76/pkgs/development/compilers/shaderc/default.nix

The setting very similar to lang2nix, isn’t it?

{ lib, stdenv, fetchgit, cmake, python3, pkgsCurrent }:

let
  version = "2021.2";
  #         -^^^^^^- to upgrade, just change this
  #                  (and even that can be automated)

  src = fetchgit {
    url      = "https://github.com/google/shaderc";
    rev      = "v${version}";
    memoFile = ./.memo.nix; # memoFile is optional; there is a global default
    #
    # Look, ma, no `sha256`.
    #
    # There is a magic inside `fetchgit` which is explained below on example
    # of `known-good`.
    #
    # `fetchgit` is a bit more complex, there are 2 memoization steps:
    #   1. `rev`     -> `fullRev`
    #   2. `fullRev` -> (`sha256`, `commitTime`, `narsize`)
    #
    # Shortly, if `sha256` is in `import ./.memo.nix`, it is just used, without
    # any IFD. Otherwise, we pause here, run `git` in sandbox and mutate
    # `./.memo.nix`
    #
    # It lacks parallelism of @Ericson2314's .drv.drv, but has the advantage
    # that the memo files are local, and can (should) be placed under version
    # control, similar to the ubiquitous .lock files.
    #
    # Actually, `./.memo.nix`'s attrset is maintained in memory and
    # flushed to disk once on Nix's exit, so this is just another
    # obstacle to parallelism.
    #
  };

  # Tolerate "known-good" branch updated within a day after the release.
  # `builtins.timeToString` and `builtins.timeFromString` are guests
  # from the future.  Nothing magical here: just pure functions which
  # could be implemened in pure Nix
  commitTime-nextday =
    builtins.timeToString (builtins.timeFromString src.commitTime + 86400);
  # But look, ma, there is not only auto-maintained `src.sha256`,
  # but also `src.commitTime`.
  # and `src.fullRev`
  # and could be auto-maintained `src.swhid`, `src.ipfs`, `src.magnet`, ....

  known-good =
    builtins.head (lib.memoize {
      # `memoFile` is optional. the global default is usually a good choice
      memoFile   = ./.memo.nix;
      # `memoFile` is a Nix file with attrs set inside.
      # Here we define the keys of that attrset we are interested in
      # There could be more than 1 key (e.g. `fetchurl`'ing from multiple urls)
      memoKeys   = [ "version=${version} fullRev=${src.fullRev}" ];
      # Either Nix function (on top of functions like `builtins.fetchGit`) or
      # the code to run in sandbox when `memoFile` has no requested `memoKeys`
      # (`lib.memoize` also has `mode` which could be "all" or "any"  to tell
      # if we need values for all the `memoKeys` or for any one) or `memoKeys`
      # are obsolete (there is also `memoRevision`  to tell if it is desirable
      # to try to calculate the value again; useful for `pkgs.geoip-database`)
      calcValues =
        # `pkgsCurrent` is defined next to `pkgsi686Linux`
        # overriding `system=builtins.currentSystem`
        # this is usualy `x86_64-linux` even if we build for/on something else.
        pkgsCurrent.stdenvNoCC.mkDerivation {
          # it should be actually not an IFD-derivation,
          # but `builtins.sandboxedExec`, which is not yet implemented.
          # Creation of derivation in Nix Store is needless and
          # reuse the existing results from Nix Store is undesirable
          name = "known-good-${toString builtins.currentTime}.nix";
          # The derivation is not FOD, so let's allow networking explicitly
          # `__allowNetworking` - another guest from the future -
          # works only in IFD-derivations.  again: it is actually
          # `builtins.sandboxedExec` simmulated via an IFD-derivation
          __allowNetworking = true;
          GIT_SSL_CAINFO = "${pkgsCurrent.cacert}/etc/ssl/certs/ca-bundle.crt";
          buildInputs = [ pkgsCurrent.gitMinimal ];
          # Get the newest https://github.com/google/shaderc/tree/known-good
          # but not newer than (`src.commitTime`+1day) and then
          # store `known_good.json`'s content to `./.memo.nix`'s attrset
          # under key `memoKeys`
          buildCommand = ''
            git init
            git remote add origin ${lib.escapeShellArg src.url}
            git fetch origin known-good
            git checkout $(git rev-list -n1 --before=${commitTime-nextday} \
                           origin/known-good)

            # emit a list of the same size as `memoKeys`
            # each value corresponds to a key
            # (with memoMode="any", it is possible to return `null` for some)
            echo "[ { json = '''$(cat known_good.json)'''; # FIX: proper escape
                    } ]"  > $out
          '';
        };
    });

# and the rest is trivial...

in stdenv.mkDerivation rec {
  pname = "shaderc";
  inherit version src;

  outputs = [ "out" "lib" "bin" "dev" "static" ];

  patchPhase =
  let
    # parse JSON of
    # https://github.com/google/shaderc/blob/ee00a6bc9388acbc332b1ef2290ff6481b78b2cf/known_good.json
    p             = lib.listToAttrs (
                      map (args: lib.nameValuePair args.name args)
                          (builtins.fromJSON known-good.json).commits
                    );
    glslang       = fetchgit {
                      memoFile = ./.memo.nix;
                      url = "https://github.com/${p.glslang      .subrepo}";
                      rev = p.glslang      .commit;
                    };
    spirv-tools   = fetchgit {
                      memoFile = ./.memo.nix;
                      url = "https://github.com/${p.spirv-tools  .subrepo}";
                      rev = p.spirv-tools  .commit;
                    };
    spirv-headers = fetchgit {
                      memoFile = ./.memo.nix;
                      url = "https://github.com/${p.spirv-headers.subrepo}";
                      rev = p.spirv-headers.commit;
                    };
  in ''
    mkdir -p ${p.glslang      .subdir}
    mkdir -p ${p.spirv-tools  .subdir}
    mkdir -p ${p.spirv-headers.subdir}
    #
    # `fetchgit` by default produces tarballs, so `tar xf` instead of `cp`
    #
    tar xf ${glslang      } --strip-components=1 -C ${p.glslang      .subdir}
    tar xf ${spirv-tools  } --strip-components=1 -C ${p.spirv-tools  .subdir}
    tar xf ${spirv-headers} --strip-components=1 -C ${p.spirv-headers.subdir}
  '';

  nativeBuildInputs = [ cmake python3 ];

  postInstall = ''
    moveToOutput "lib/*.a" ${placeholder "static"}
  '';

  cmakeFlags = [ "-DSHADERC_SKIP_TESTS=ON" ];
}
4 Likes

[RFC 109] Allow "import from derivation" in Nixpkgs, simply-stupidly, and safely by Ericson2314 · Pull Request #109 · NixOS/rfcs · GitHub I hope can stimulate the development and adoption of lang2nix work.

2 Likes

During Summer of Nix, @DavHau started work on dream2nix, which is a framework for wrapping up the various lang2nix tools in an easy-to-use and easy-to-implement manner: GitHub - DavHau/dream2nix: A generic framework for 2nix tools

This is still in the early phases, but I am hopeful this will simplify the lang2nix ecosystem and make it much easier for new tools to be created. Along with the RFCs from @Ericson2314 (like [RFC 0092] Computed derivations by Ericson2314 · Pull Request #92 · NixOS/rfcs · GitHub and [RFC 109] Allow "import from derivation" in Nixpkgs, simply-stupidly, and safely by Ericson2314 · Pull Request #109 · NixOS/rfcs · GitHub), this could be a nice improvement for Nixpkgs and the wider Nix ecosystem.

5 Likes

The badness of IFD is not (only) the incompatibility with CI.

IFD is basically an eval-time computation cached in Nix Store. Thus, it can be GC’ed at any moment forcing to re-evaluate on the next eval: to generate fresh Nix code to import which no one reviews, no one controls its re-evaluation cycles, and there is no way to undo to the old code.

@voltth I think the proposal covers that: allImportedDerivations is manual rooting of all imported derivations, so nothing need be GC’d, and everything can and should be reviewed.

and there is no way to undo to the old code

I don’t get this? The imported derivations ought to be determinstic as we always strive for.

Don’t you plan to allow networking for IFD computations ?

Only fixed output ones – just like normal.