`fetchFromGitHub` with a `postFetch` acts weird

A Python package I maintain distributes testing data that is not tracked in their Git repository. I want to download that data using a postFetch in src - so that I won’t have to maintain an additional hash:

  src = fetchFromGitHub {
    owner = "scipy";
    repo = pname;
    rev = "v${version}";
    hash = "sha256-kSq5KVWNlNz8kM5stZ9KEhnSNGOpwhTQEytAduII130=";
    # For simplicity, let's assume the following downloads the data into $out 
    postFetch = ''
      python $out/scipy/datasets/_download_all.py
    '';
  };

However, I observe weird behavior when I try to build the package with the test data included. No matter how many times I verified the output hash with the additional data downloaded, every time I run nix build -Lf. python3.pkgs.scipy I see it attempts to build it, as if it doesn’t trust the hash it calculated.

$ nix build -Lf. python3.pkgs.scipy.src && nix build -Lf. python3.pkgs.scipy.src
source> trying https://github.com/scipy/scipy/archive/v1.10.1.tar.gz
source>   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
source>                                  Dload  Upload   Total   Spent    Left  Speed
source>   0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
source> 100 23.5M    0 23.5M    0     0  5785k      0 --:--:--  0:00:04 --:--:-- 7088k
source> unpacking source archive /build/v1.10.1.tar.gz
source> Downloading data from 'https://raw.githubusercontent.com/scipy/dataset-ascent/main/ascent.dat' to file '/nix/store/39lan2shgi5hscl66dkqqi1s1mykbmrq-source/scipy-data/ascent.dat'.
source> Downloading data from 'https://raw.githubusercontent.com/scipy/dataset-ecg/main/ecg.dat' to file '/nix/store/39lan2shgi5hscl66dkqqi1s1mykbmrq-source/scipy-data/ecg.dat'.
source> Downloading data from 'https://raw.githubusercontent.com/scipy/dataset-face/main/face.dat' to file '/nix/store/39lan2shgi5hscl66dkqqi1s1mykbmrq-source/scipy-data/face.dat'.
source> trying https://github.com/scipy/scipy/archive/v1.10.1.tar.gz
source>   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
source>                                  Dload  Upload   Total   Spent    Left  Speed
source>   0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
source> 100 23.5M    0 23.5M    0     0  6005k      0 --:--:--  0:00:04 --:--:-- 6884k
source> unpacking source archive /build/v1.10.1.tar.gz
source> Downloading data from 'https://raw.githubusercontent.com/scipy/dataset-ascent/main/ascent.dat' to file '/nix/store/39lan2shgi5hscl66dkqqi1s1mykbmrq-source/scipy-data/ascent.dat'.
source> Downloading data from 'https://raw.githubusercontent.com/scipy/dataset-ecg/main/ecg.dat' to file '/nix/store/39lan2shgi5hscl66dkqqi1s1mykbmrq-source/scipy-data/ecg.dat'.
source> Downloading data from 'https://raw.githubusercontent.com/scipy/dataset-face/main/face.dat' to file '/nix/store/39lan2shgi5hscl66dkqqi1s1mykbmrq-source/scipy-data/face.dat'.

Thankfully, at least it doesn’t fail so indeed the hash is consistent. I wouldn’t have minded that, unless I also observed that when I try to build the actual derivation, I get the following errors:

Sourcing python-remove-tests-dir-hook
Sourcing python-catch-conflicts-hook.sh
Sourcing python-remove-bin-bytecode-hook.sh
Sourcing pip-build-hook
Using pipBuildPhase
Using pipShellHook
Sourcing pip-install-hook
Using pipInstallPhase
Sourcing python-imports-check-hook.sh
Using pythonImportsCheckPhase
Sourcing python-namespaces-hook
Sourcing python-catch-conflicts-hook.sh
@nix { "action": "setPhase", "phase": "unpackPhase" }
unpacking sources
unpacking source archive /nix/store/39lan2shgi5hscl66dkqqi1s1mykbmrq-source
do not know how to unpack source archive /nix/store/39lan2shgi5hscl66dkqqi1s1mykbmrq-source

What’s weird is that if I remove my postFetch from src, and recalculate the hash of course, I don’t observe both behaviors, and the build doesn’t fail there. I even tried to set an empty unpackPhase but it didn’t help.

The branch with the above changes applied is available here:

1 Like

greetings, indeed something strange going on, I also have no idea whats going on

I think it something with python download code

this works

  src = fetchFromGitHub {
    owner = "scipy";
    repo = pname;
    rev = "v${version}";
    hash = "sha256-sHps+W9M99sAL6/wcFfG+En3iiWO0BxI/Nbshq3jkRA=";
    postFetch = ''
      mkdir $out/scipy-data
      curl https://raw.githubusercontent.com/scipy/dataset-ascent/main/ascent.dat -o $out/scipy-data/ascent.dat
      curl https://raw.githubusercontent.com/scipy/dataset-ecg/main/ecg.dat -o $out/scipy-data/ecg.dat
      curl https://raw.githubusercontent.com/scipy/dataset-face/main/face.dat -o $out/scipy-data/face.dat
    '';
  };

1 Like

Looks Good! But unfortunately I get:

curl: (77) error setting certificate file: /no-cert-file.crt

I got that too empty your hash eg hash = “”

actually, i just tried to clear my nix store and i get same error

if I make hash empty and nix-build i get same hash, which kinda means its same behavior we get from python download code

sorry for multiple replies ( even dicourse is angry at me )

here is this code in fetchurl derivation

  SSL_CERT_FILE = if (hash_.outputHash == "" || hash_.outputHash == lib.fakeSha256 || hash_.outputHash == lib.fakeSha512 || hash_.outputHash == lib.fakeHash)
                  then "${cacert}/etc/ssl/certs/ca-bundle.crt"
                  else "/no-cert-file.crt";

this can be pybassed by

  src = fetchFromGitHub {
    owner = "scipy";
    repo = pname;
    rev = "v${version}";
    hash = "sha256-sHps+W9M99sAL6/wcFfG+En3iiWO0BxI/Nbshq3jkRA=";
    postFetch = ''
      mkdir $out/scipy-data
      curl --insecure https://raw.githubusercontent.com/scipy/dataset-ascent/main/ascent.dat -o $out/scipy-data/ascent.dat
      curl --insecure https://raw.githubusercontent.com/scipy/dataset-ecg/main/ecg.dat -o $out/scipy-data/ecg.dat
      curl --insecure https://raw.githubusercontent.com/scipy/dataset-face/main/face.dat -o $out/scipy-data/face.dat
    '';
  };
1 Like

This works :rocket: . I wonder what’s the reason behind that certificate file logic…

I think that using --insecure aligns with that logic, according to git blame:

1 Like

it seems to me that logic backfires,

fetchurl: only allow empty hash when cacert is available

to me it looks like the opposite, only allow ca-certs when hash is empty, but when you have a hash but not a file in store then you can never fetch it.

I can be wrong about that interpretation, i still think there is something else going on with python download code, so i would try to bisect that download code