Status of lang2nix approaches

I missed buildBazelPackage, whose fetchAttrs.sha256 is hash of directory of vendored deps, similar to cargoSha256.

It is not a lang2nix, but it suffers from the same problems: the hash is drifting too as there is something changing on the servers it downloads from (for example, python3.pkgs.tensorflow_2’s fetchAttrs.sha256 in nixpkgs’s master is not valid:

hash mismatch in fixed-output derivation '/nix/store/wmnf16gin2pcqgawjk26rggprnxfbdb8-tensorflow-gpu-2.4.2-deps'
  wanted: sha256:10m6qj3kchgxfgb6qh59vc51knm9r9pkng8bf90h00dnggvv8234
  got:    sha256:1xjmfp743vmr6f36d15dlmkgiin89g31f68hhfzgk3sm1xpk1mj2

BTW, snapshoting of PyPi and Conda on 12hr basis is already here:

The H variant - self-modification of Nix-code from builtins.exec in case of outdated lock-file - can be extended to managing sha256 of fetch*-functions, removing this burden from people too:

If the url passed to fetch* function exists in <nixpkgs/fetch.lock> - it will be used, otherwise - calculated and added.

Also, fetch.lock would be the single source of truth for everything that nixpkgs downloads, for offline installation (it is of demand: Offline build "source closure", Using NixOS in an isolated environment, …)

Deleting of fetch.lock would initiate mass-test for dead and changed links.

A related case: NixOS’s system.requiredKernelConfig which is currenty broken (it does not enable or check the kernel options)
Enabling a kernel config option affects other options, a process which has something to do with Maven resolving.
Implementing system.requiredKernelConfig puts us in front of the choice:

  1. always recompile the kernel adding the requested options when a relevant NixOS config setting (zram.enable or swapDevices or adding "amdgpu-pro" to services.xserver.videoDrivers, …) is changed. Even if the requested option is already enabled on default kernel implicitly.
  2. use IFD to read the final kernel options
  3. memorize somewhere the result of resolving process (requested options → final options), similar to (requested artifacts → final artifacts); if the lang2nix-problem can be solved in a general way, its solution can also serve system.requiredKernelConfig
2 Likes

BTW, if we are heading to adopt SWHIDs, we should retire fetchFromGitHub (and its tarball-downloading friends for other git-hostings) in favor of fetchgit.

Tarballs from https://githib.com/$owner/$repo/archive/$revision.tar.gz miss .gitignore files, and they can be patched by git-archive

Example https://github.com/cryfs/cryfs/blob/3f66c7ceda4f934a78a6a83d0d735f911aaaecf8/src/gitversion/_version.py#L21-L27 - the files in tarball and repository are different:

index 207dc691b..1c24d6422 100644
--- a/nix/store/k8vzbf7isybzyqh48fhsbf5hnn8rzdlc-fetchgit-rcf3023406969b14610df03a043fca8a078c9c195-2019-06-08/src/gitversion/_version.py
+++ b/nix/store/6m0ain6g8pwrp63676ymd62xgxw92n95-source/src/gitversion/_version.py
@@ -23,8 +23,8 @@ def get_keywords():
     # setup.py/versioneer.py will grep for the variable names, so they must
     # each be defined on a line of their own. _version.py will just call
     # get_keywords().
-    git_refnames = "$Format:%d$"
-    git_full = "$Format:%H$"
+    git_refnames = " (tag: 0.10.2)"
+    git_full = "cf3023406969b14610df03a043fca8a078c9c195"
     keywords = {"refnames": git_refnames, "full": git_full}
     return keywords

if we fall back downloading sources from softwareheritage.org, where these macros are not extended, the build will fail. So buildPhase should not rely on the sources being pre-processed with “git-archive”.

Also:

  1. those $Format macros could even have current time Git keyword expansion — Git Memo v1.1 documentation making tarball content volatile. Yes, not even the tarball itself could vary with tar/gzip upgrade/command line switches on git-hosting’s cloud, but the files inside tarball could be changed in the next instance of the tarball, with the same commit-id and tree-id.
  2. It seems that choices which files not to include in the tarball and whether to preprocess them depends on an ad hoc decision on git-hosting, and could change. Moving to stable ids like SWHIDs means we could not download tarballs and rely on preprocessors on git-hostings
1 Like

I’m really excited about this idea.

Below is a small demo. The demo is not about lang2nix (first, to avoid language-specific objections - “rust has lock files”, “there is a sbt plugin for that”, …, and second, I have not yet implemented it for language frameworks, and by the time it happens, I will probably get rid of bash and I will have trouble showing a working demo example on that Nix we can all read).

An example about shaderc which has vendored dependencies published in a separate branch GitHub - google/shaderc at known-good shortly before or after the release and maintained in nixpkgs manually: https://github.com/NixOS/nixpkgs/blob/67c4132368dd7612d5226a99ec8a2e3c1af68b76/pkgs/development/compilers/shaderc/default.nix

The setting very similar to lang2nix, isn’t it?

{ lib, stdenv, fetchgit, cmake, python3, pkgsCurrent }:

let
  version = "2021.2";
  #         -^^^^^^- to upgrade, just change this
  #                  (and even that can be automated)

  src = fetchgit {
    url      = "https://github.com/google/shaderc";
    rev      = "v${version}";
    memoFile = ./.memo.nix; # memoFile is optional; there is a global default
    #
    # Look, ma, no `sha256`.
    #
    # There is a magic inside `fetchgit` which is explained below on example
    # of `known-good`.
    #
    # `fetchgit` is a bit more complex, there are 2 memoization steps:
    #   1. `rev`     -> `fullRev`
    #   2. `fullRev` -> (`sha256`, `commitTime`, `narsize`)
    #
    # Shortly, if `sha256` is in `import ./.memo.nix`, it is just used, without
    # any IFD. Otherwise, we pause here, run `git` in sandbox and mutate
    # `./.memo.nix`
    #
    # It lacks parallelism of @Ericson2314's .drv.drv, but has the advantage
    # that the memo files are local, and can (should) be placed under version
    # control, similar to the ubiquitous .lock files.
    #
    # Actually, `./.memo.nix`'s attrset is maintained in memory and
    # flushed to disk once on Nix's exit, so this is just another
    # obstacle to parallelism.
    #
  };

  # Tolerate "known-good" branch updated within a day after the release.
  # `builtins.timeToString` and `builtins.timeFromString` are guests
  # from the future.  Nothing magical here: just pure functions which
  # could be implemened in pure Nix
  commitTime-nextday =
    builtins.timeToString (builtins.timeFromString src.commitTime + 86400);
  # But look, ma, there is not only auto-maintained `src.sha256`,
  # but also `src.commitTime`.
  # and `src.fullRev`
  # and could be auto-maintained `src.swhid`, `src.ipfs`, `src.magnet`, ....

  known-good =
    builtins.head (lib.memoize {
      # `memoFile` is optional. the global default is usually a good choice
      memoFile   = ./.memo.nix;
      # `memoFile` is a Nix file with attrs set inside.
      # Here we define the keys of that attrset we are interested in
      # There could be more than 1 key (e.g. `fetchurl`'ing from multiple urls)
      memoKeys   = [ "version=${version} fullRev=${src.fullRev}" ];
      # Either Nix function (on top of functions like `builtins.fetchGit`) or
      # the code to run in sandbox when `memoFile` has no requested `memoKeys`
      # (`lib.memoize` also has `mode` which could be "all" or "any"  to tell
      # if we need values for all the `memoKeys` or for any one) or `memoKeys`
      # are obsolete (there is also `memoRevision`  to tell if it is desirable
      # to try to calculate the value again; useful for `pkgs.geoip-database`)
      calcValues =
        # `pkgsCurrent` is defined next to `pkgsi686Linux`
        # overriding `system=builtins.currentSystem`
        # this is usualy `x86_64-linux` even if we build for/on something else.
        pkgsCurrent.stdenvNoCC.mkDerivation {
          # it should be actually not an IFD-derivation,
          # but `builtins.sandboxedExec`, which is not yet implemented.
          # Creation of derivation in Nix Store is needless and
          # reuse the existing results from Nix Store is undesirable
          name = "known-good-${toString builtins.currentTime}.nix";
          # The derivation is not FOD, so let's allow networking explicitly
          # `__allowNetworking` - another guest from the future -
          # works only in IFD-derivations.  again: it is actually
          # `builtins.sandboxedExec` simmulated via an IFD-derivation
          __allowNetworking = true;
          GIT_SSL_CAINFO = "${pkgsCurrent.cacert}/etc/ssl/certs/ca-bundle.crt";
          buildInputs = [ pkgsCurrent.gitMinimal ];
          # Get the newest https://github.com/google/shaderc/tree/known-good
          # but not newer than (`src.commitTime`+1day) and then
          # store `known_good.json`'s content to `./.memo.nix`'s attrset
          # under key `memoKeys`
          buildCommand = ''
            git init
            git remote add origin ${lib.escapeShellArg src.url}
            git fetch origin known-good
            git checkout $(git rev-list -n1 --before=${commitTime-nextday} \
                           origin/known-good)

            # emit a list of the same size as `memoKeys`
            # each value corresponds to a key
            # (with memoMode="any", it is possible to return `null` for some)
            echo "[ { json = '''$(cat known_good.json)'''; # FIX: proper escape
                    } ]"  > $out
          '';
        };
    });

# and the rest is trivial...

in stdenv.mkDerivation rec {
  pname = "shaderc";
  inherit version src;

  outputs = [ "out" "lib" "bin" "dev" "static" ];

  patchPhase =
  let
    # parse JSON of
    # https://github.com/google/shaderc/blob/ee00a6bc9388acbc332b1ef2290ff6481b78b2cf/known_good.json
    p             = lib.listToAttrs (
                      map (args: lib.nameValuePair args.name args)
                          (builtins.fromJSON known-good.json).commits
                    );
    glslang       = fetchgit {
                      memoFile = ./.memo.nix;
                      url = "https://github.com/${p.glslang      .subrepo}";
                      rev = p.glslang      .commit;
                    };
    spirv-tools   = fetchgit {
                      memoFile = ./.memo.nix;
                      url = "https://github.com/${p.spirv-tools  .subrepo}";
                      rev = p.spirv-tools  .commit;
                    };
    spirv-headers = fetchgit {
                      memoFile = ./.memo.nix;
                      url = "https://github.com/${p.spirv-headers.subrepo}";
                      rev = p.spirv-headers.commit;
                    };
  in ''
    mkdir -p ${p.glslang      .subdir}
    mkdir -p ${p.spirv-tools  .subdir}
    mkdir -p ${p.spirv-headers.subdir}
    #
    # `fetchgit` by default produces tarballs, so `tar xf` instead of `cp`
    #
    tar xf ${glslang      } --strip-components=1 -C ${p.glslang      .subdir}
    tar xf ${spirv-tools  } --strip-components=1 -C ${p.spirv-tools  .subdir}
    tar xf ${spirv-headers} --strip-components=1 -C ${p.spirv-headers.subdir}
  '';

  nativeBuildInputs = [ cmake python3 ];

  postInstall = ''
    moveToOutput "lib/*.a" ${placeholder "static"}
  '';

  cmakeFlags = [ "-DSHADERC_SKIP_TESTS=ON" ];
}
5 Likes

[RFC 0109] Nixpkgs Generated Code Policy by Ericson2314 · Pull Request #109 · NixOS/rfcs · GitHub I hope can stimulate the development and adoption of lang2nix work.

2 Likes

During Summer of Nix, @DavHau started work on dream2nix, which is a framework for wrapping up the various lang2nix tools in an easy-to-use and easy-to-implement manner: GitHub - nix-community/dream2nix: Simplified nix packaging for various programming language ecosystems [maintainer=@DavHau]

This is still in the early phases, but I am hopeful this will simplify the lang2nix ecosystem and make it much easier for new tools to be created. Along with the RFCs from @Ericson2314 (like [RFC 0092] Computed derivations by Ericson2314 · Pull Request #92 · NixOS/rfcs · GitHub and [RFC 0109] Nixpkgs Generated Code Policy by Ericson2314 · Pull Request #109 · NixOS/rfcs · GitHub), this could be a nice improvement for Nixpkgs and the wider Nix ecosystem.

7 Likes

The badness of IFD is not (only) the incompatibility with CI.

IFD is basically an eval-time computation cached in Nix Store. Thus, it can be GC’ed at any moment forcing to re-evaluate on the next eval: to generate fresh Nix code to import which no one reviews, no one controls its re-evaluation cycles, and there is no way to undo to the old code.

1 Like

@voltth I think the proposal covers that: allImportedDerivations is manual rooting of all imported derivations, so nothing need be GC’d, and everything can and should be reviewed.

and there is no way to undo to the old code

I don’t get this? The imported derivations ought to be determinstic as we always strive for.

Don’t you plan to allow networking for IFD computations ?

Only fixed output ones – just like normal.

I’ve been following this thread, and I really want to get involved to help with this. I’m primarily a Rust and Typescript programmer. When I’m developing an application, I use a shell.nix to just bring the normal language toolchains into scope, but then eventually I have to face the lang2nix problem in order to make a NixOS-acceptable installer. The extreme difficulty of this right now is a major pet peeve of mine, and one of the reasons I don’t actually try to encourage others to use NixOS.

So, I’m not sure how I can contribute, but if it’s just as simple as wrangling discussions into a coherent piece of documentation or applying experimental tools to projects I already have on hand, I’ll help.

9 Likes

bump

Let’s get this project moving again.

Can anyone describe to me some next steps that I could take right now? Would it be helpful if I provided an essay describing the use cases I have for this?

Let’s say I want to stabilize Rust development with a) being able to identify exact compiler versions (i.e., oxalica) and b) being able to build everything from a Cargo.lock file? Is there work I can do to helping that along, and has anyone thought about how to do the same thing with package-lock.json and other lock file formats?

How about for multi-lingual projects that may multiple distinct build steps?

3 Likes

IMHO, the silent consensus is Nix and Nixpkgs are too much a Yenga built by adding tons of features trying not to break 15 years old compatibility so adding even a small feature is a big challenge (just look at poetry2nix and mach-nix: the majority of their code is not the business logic but hooks and overlays to adapt to existing codebase - and that glue locks both the lang2nix projects and nixpkgs even harder).
So, there is no easy answer to how to move forward. Maybe to build a Nix-Lite first as the playground sandbox for experimental features?

6 Likes

I’d be interested in that. We could target certain languages as a proof-of-concept (say, Python, Go, Rust, Haskell) and then RFC it.

What would be the strategy? I don’t know what the consensus is from reading the discussion. Myself, I would be most inclined towards a strategy that reads the language’s lockfile directly into Nix and evaluates from there. But I’ve had better luck with tools like crate2nix. OTOH, I also see the comment that danielkd wrote about how Cargo.lock doesn’t actually contain all of the necessary information (I’m not actually sure that’s the case… I have a project that includes reqwest, with json, and I see serde_json included in the Cargo.lock).

This is the kind of thing that I would experiment with, but I need some idea at first of what direction we should go, and some idea that the community would be interested if I got started.

3 Likes

the majority of their code is not the business logic but hooks and overlays to adapt to existing codebase

I disagree with this statement.

Most of the poetry2nix code base is overrides which are mostly fixing the fact that Python dependencies are not aware of native dependencies. This is about half of the code base.

The rest of the code is split between business logic domains (rough estimates):

  • About 10% is API surface
  • Python environment markers, wheel parsing, external fetchers etc accounts for about 20-25%
  • Hooks are another 5% or so
  • Another few percent are shared across CI, small random utilities and so on.

As you can see that’s nowhere near “the majority”.

2 Likes

a “blame the others” solution: normalize lockfiles

so, missing lockfiles would be an upstream bug

we make a pull-request to add the lockfile,
and in nixpkgs we reference our pull-request.
(cos we depend on github anyway …)

or, to relax the github dependency, we make a separate repo nixlocks
and reference the lockfiles directly, like nixlocks:filepath#sha256
to keep this noise out of nixpkgs
(ideally use git-sha256 for the nixlocks repo … git.nixos.org?)
(or we say sha1 is good enough for fetchurl, and use git-sha1)

both nixpkgs and nixlocks repos can be mirrored away from github

option Z: dont use nix. this works for simple projects,
where i just have to run (for example)
npm install && npm run build … and it “just works”
(but only until the next breaking change of some API)

lets just move everything to nickel ; )

note that pypi-deps-db does not have file checksums
only dependency versions, like a-1.2.3 needs b>=2.3.4

immutable snapshots

also the crates.io database is hosted on github
and the main repo is continuously squashed
cargo is smart enough to fetch git deltas
after squash, cargo starts a new clone

problem with this top-down-solution: size
see https://libraries.io/
assuming we store only the sha256 hash (32 bytes)
and assuming 15 releases per package (on average) …

1200MB for 2.34M npm Packages
 200MB for 472K Maven Packages
 200MB for 442K PyPI Packages
 160MB for 377K NuGet Packages
 180MB for 369K Go Packages
 170MB for 357K Packagist Packages
  85MB for 178K Rubygems Packages
  40MB for 87.4K CocoaPods Packages
  40MB for 82.3K Cargo Packages
  33MB for 69.5K Bower Packages
  20MB for 39.1K CPAN Packages
...

… and as soon as we start filtering packages
we need hybrid solutions, to support adding of missing hashes

X-post from (Future) of npm packages in nixpkgs? - #29 by olebedev

Hey everyone, I’ve announced js2nix here Announcing js2nix - scale your Node.js project builds with Nix. Please have a look. I think I managed to address all the @sander’s concerns in this project.

@sander, @Profpatsch, @DavHau, as you are the ones who are interested and contributed in that field lot, I would love to hear your feedback.