How to build a development version of Python scikit-learn from source?

Hi,
I am trying to build a scikit learn from their GitHub repo. Loading Scikit learn using the nix python package works without issues, but I want to have my own development version. I am very new to this, so I am not even sure if I am on the right track. So far, I have made a nix-shell with some dependencies mostly based on nixpkgs/doc/languages-frameworks/python.section.md at 346de1fb8c3273ea74a0d15ff0580b0d6610a476 · NixOS/nixpkgs · GitHub

with import <nixpkgs> { };

let
  py = python312Packages;
in pkgs.mkShell rec {
  name = "impurePythonEnv";
  venvDir = "./.venv";
  propagatedBuildInputs = [
    # A python interpreter including the 'venv' module is required to bootstrap
    # the environment.
    py.python

    # This execute some shell code to initialize a venv in $venvDir before
    # dropping into the shell
    py.venvShellHook

    # Those are dependencies that we would like to use from nixpkgs, which will
    # add them to PYTHONPATH and thus make them accessible from within the venv.
    py.numpy
    py.scipy
    py.joblib
    py.threadpoolctl
    py.cython
    py.meson-python
    py.ninja
    py.pytest
    py.setuptools
    py.wheel
    #py.pillow
    #py.pythonRelaxDepsHook
    # In this particular example, in order to compile any binary extensions they may
    # require, the python modules listed in the hypothetical requirements.txt need
    # the following packages to be installed locally:
    git
    gcc
    libgcc
    glibc
    gfortran
    stdenv.cc.cc
    glibcLocales
  ];
  LD_LIBRARY_PATH = "$LD_LIBRARY_PATH:${pkgs.stdenv.cc.cc.lib}/lib";
}

and then try following the sklearn documentation to build it with pip, either as

pip install --editable . --verbose --no-build-isolation --config-settings editable-verbose=true --upgrade

or

pip install -e . 

There, it will fail on a compilation step with a very long and, for me, not understandable error dump.

  Installing build dependencies ... done
  Checking if build backend supports build_editable ... done
  Getting requirements to build editable ... done
  Preparing editable metadata (pyproject.toml) ... error
  error: subprocess-exited-with-error
  
  × Preparing editable metadata (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [110 lines of output]
      + meson setup --reconfigure /home/brick/Code/scikit-learn /home/brick/Code/scikit-learn/build/cp312 -Dbuildtype=release -Db_ndebug=if-release -Db_vscrt=md --native-file=/home/brick/Code/scikit-learn/build/cp312/meson-python-native-file.ini
      Cleaning... 0 files.
      WARNING: Regenerating configuration from scratch.
      Reason: Coredata file '/home/brick/Code/scikit-learn/build/cp312/meson-private/coredata.dat' references functions or classes that don't exist. This probably means that it was generated with an old version of meson.
      The Meson build system
      Version: 1.5.1
      Source dir: /home/brick/Code/scikit-learn
      Build dir: /home/brick/Code/scikit-learn/build/cp312
      Build type: native build
      Project name: scikit-learn
      Project version: 1.6.dev0
      C compiler for the host machine: gcc (gcc 13.2.0 "gcc (GCC) 13.2.0")
      C linker for the host machine: gcc ld.bfd 2.41
      C++ compiler for the host machine: g++ (gcc 13.2.0 "g++ (GCC) 13.2.0")
      C++ linker for the host machine: g++ ld.bfd 2.41
      Cython compiler for the host machine: cython (cython 3.0.10)
      Host machine cpu family: x86_64
      Host machine cpu: x86_64
      Compiler for C supports arguments -Wno-unused-but-set-variable: YES
      Compiler for C supports arguments -Wno-unused-function: YES
      Compiler for C supports arguments -Wno-conversion: YES
      Compiler for C supports arguments -Wno-misleading-indentation: YES
      Library m found: YES
      Program python found: YES (/home/brick/Code/scikit-learn/.venv/bin/python3.12)
      Run-time dependency OpenMP for c found: YES 4.5
      Did not find pkg-config by name 'pkg-config'
      Found pkg-config: NO
      Run-time dependency python found: YES 3.12
      Build targets in project: 111
      
      scikit-learn 1.6.dev0
      
        User defined options
          Native files: /home/brick/Code/scikit-learn/build/cp312/meson-python-native-file.ini
          buildtype   : release
          b_ndebug    : if-release
          b_vscrt     : md
      
      Found ninja-1.11.1 at /nix/store/p5y20shjad8an1jhcby82sz843lirvwi-ninja-1.11.1/bin/ninja
      + /nix/store/p5y20shjad8an1jhcby82sz843lirvwi-ninja-1.11.1/bin/ninja
      [1/145] Linking target sklearn/ensemble/_hist_gradient_boosting/histogram.cpython-312-x86_64-linux-gnu.so
      FAILED: sklearn/ensemble/_hist_gradient_boosting/histogram.cpython-312-x86_64-linux-gnu.so
      gcc  -o sklearn/ensemble/_hist_gradient_boosting/histogram.cpython-312-x86_64-linux-gnu.so sklearn/ensemble/_hist_gradient_boosting/histogram.cpython-312-x86_64-linux-gnu.so.p/meson-generated_sklearn_ensemble__hist_gradient_boosting_histogram.pyx.c.o -Wl,--as-needed -Wl,--allow-shlib-undefined -Wl,-O1 -shared -fPIC -lm -fopenmp
      /nix/store/7v7g86ml0ri171gfcrs1d442px5bi1p3-binutils-2.41/bin/ld: /nix/store/llmjvk4i2yncv8xqdvs4382wr3kgdmvp-gcc-13.2.0/lib/libgomp.a(barrier.o): relocation R_X86_64_TPOFF32 against hidden symbol `gomp_tls_data' can not be used when making a shared object
      /nix/store/7v7g86ml0ri171gfcrs1d442px5bi1p3-binutils-2.41/bin/ld: failed to set dynamic section sizes: bad value
      collect2: error: ld returned 1 exit status
      [2/145] Copying file sklearn/utils/_typedefs.pxd
      [3/145] Copying file sklearn/utils/_heap.pxd
      [4/145] Copying file sklearn/utils/_random.pxd
      [5/145] Copying file sklearn/utils/_sorting.pxd
      [6/145] Copying file sklearn/utils/_vector_sentinel.pxd
      [7/145] Copying file sklearn/metrics/__init__.py
      [8/145] Copying file sklearn/metrics/_pairwise_distances_reduction/_classmode.pxd
      [9/145] Copying file sklearn/__init__.py
      [10/145] Copying file sklearn/utils/_openmp_helpers.pxd
      [11/145] Copying file sklearn/metrics/_pairwise_distances_reduction/__init__.py
      [12/145] Copying file sklearn/utils/__init__.py
      [13/145] Copying file sklearn/utils/_cython_blas.pxd
      [14/121] Copying file sklearn/_loss/_loss.pxd
      [15/117] Compiling C object sklearn/cluster/_hdbscan/_reachability.cpython-312-x86_64-linux-gnu.so.p/meson-generated_sklearn_cluster__hdbscan__reachability.pyx.c.o
      FAILED: sklearn/cluster/_hdbscan/_reachability.cpython-312-x86_64-linux-gnu.so.p/meson-generated_sklearn_cluster__hdbscan__reachability.pyx.c.o
      gcc -Isklearn/cluster/_hdbscan/_reachability.cpython-312-x86_64-linux-gnu.so.p -Isklearn/cluster/_hdbscan -I../../sklearn/cluster/_hdbscan -I../../../../../../tmp/pip-build-env-m39z5bzz/overlay/lib/python3.12/site-packages/numpy/_core/include -I/nix/store/k5i0778pfpqazsms6bk1pkmqc4bkq57n-python3-3.12.4/include/python3.12 -fvisibility=hidden -fdiagnostics-color=always -DNDEBUG -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -std=c11 -O3 -Wno-unused-but-set-variable -Wno-unused-function -Wno-conversion -Wno-misleading-indentation -fPIC -DNPY_NO_DEPRECATED_API=NPY_1_9_API_VERSION -MD -MQ sklearn/cluster/_hdbscan/_reachability.cpython-312-x86_64-linux-gnu.so.p/meson-generated_sklearn_cluster__hdbscan__reachability.pyx.c.o -MF sklearn/cluster/_hdbscan/_reachability.cpython-312-x86_64-linux-gnu.so.p/meson-generated_sklearn_cluster__hdbscan__reachability.pyx.c.o.d -o sklearn/cluster/_hdbscan/_reachability.cpython-312-x86_64-linux-gnu.so.p/meson-generated_sklearn_cluster__hdbscan__reachability.pyx.c.o -c sklearn/cluster/_hdbscan/_reachability.cpython-312-x86_64-linux-gnu.so.p/sklearn/cluster/_hdbscan/_reachability.pyx.c
      In file included from /nix/store/k5i0778pfpqazsms6bk1pkmqc4bkq57n-python3-3.12.4/include/python3.12/Python.h:38,
                       from sklearn/cluster/_hdbscan/_reachability.cpython-312-x86_64-linux-gnu.so.p/sklearn/cluster/_hdbscan/_reachability.pyx.c:16:

Any idea how to make it work?

Why not simply nix-shell '<nixpkgs>' -A python3Packages.scikit-learn?

It has all the things required to build scikit-learn as it’s the exact same env as is used to build the package in Nixpkgs.

pip will not work on NixOS. You don’t need it though as you already have all the python deps you need provided by the nix-shell.

As far as I understood pip doesn’t work out of the box, but it should work with venv which is what that shell is setting up, also because sklearn documentation is using pip.

How would I build sklearn in the shell you’ve provided without pip?

Nix replaces pip. Dependencies are provided using Nix. Your role while using Nix is to use it to specify the dependencies.

I am fine with that, that would be my preferred solution anyway

Mirror what the Nix build does internally.

I’m not familiar with the package but, glancing at the definition, it uses meson-python to build. My intuition would be to attempt to build it like any other meson build.
That intuition might be wrong and you might have to build via pip (just facilitate the build, not install deps).

When in doubt, just run the phases of the nix build in the nix-shell (i.e. mesonConfigurePhase).

The deps are provided by Nix but Nix does not replace pip as a build tool, only the package manager part.

That’s not the issue at hand: The deps are already present in the nix-shell.

Looking into the build log, it appears to be built using the pypaBuildPhase.

so I deleted my scikit-learn git folder and cloned it again, and I made my shell like this to get all the dependenies (I know it is a mess, and I don’t really know what I am doing):

{ pkgs ? import <nixpkgs> {} }:

pkgs.mkShell {
  buildInputs = pkgs.python312Packages.scikit-learn.buildInputs ++ pkgs.python312Packages.scikit-learn.nativeBuildInputs ++ pkgs.python312Packages.scikit-learn.dependencies ++ pkgs.python312Packages.scikit-learn.build-system ++ pkgs.python312Packages.scikit-learn.propagatedBuildInputs ++ [

    pkgs.python312Packages.pip 
    pkgs.python312Packages.virtualenv 
    pkgs.python312Packages.venvShellHook
    pkgs.python312Packages.meson
    pkgs.python312Packages.meson-python
    pkgs.python312Packages.pytest
    ];

  venvDir = "./.venv";

    prePatch = ''
    substituteInPlace pyproject.toml \
      --replace-fail "numpy==2.0.0rc1" "numpy"
  '';
    
}

and then the pip command from the sklearn documentation worked without errors

pip install --editable . \
    --verbose --no-build-isolation \
    --config-settings editable-verbose=true

And also everything seems to work within python. Although pip install -e . didn’t work and I’ve got error that Importing the numpy C-extensions failed.

>>> import sklearn
>>> sklearn.__version__
'1.6.dev0'
>>> sklearn.__check_build
<module 'sklearn.__check_build' from '/home/brick/Code/scikit-learn/scikit-learn/sklearn/__check_build/__init__.py'>
>>> 

However, if I try to run unit tests on the sklearn repo I get an error that sklearn was not build correctly or maybe not build at all

[nix-shell:~/Code]$ pytest scikit-learn/scikit-learn/sklearn/
ImportError while loading conftest '/home/brick/Code/scikit-learn/scikit-learn/sklearn/conftest.py'.
scikit-learn/scikit-learn/sklearn/__init__.py:83: in <module>
    from . import (
scikit-learn/scikit-learn/sklearn/__check_build/__init__.py:54: in <module>
    raise_build_error(e)
scikit-learn/scikit-learn/sklearn/__check_build/__init__.py:35: in raise_build_error
    raise ImportError(
E   ImportError: No module named 'sklearn.__check_build._check_build'
E   ___________________________________________________________________________
E   Contents of /home/brick/Code/scikit-learn/scikit-learn/sklearn/__check_build:
E   __pycache__               meson.build               __init__.py
E   _check_build.pyx
E   ___________________________________________________________________________
E   It seems that scikit-learn has not been built correctly.
E
E   If you have installed scikit-learn from source, please do not forget
E   to build the package before using it. For detailed instructions, see:
E   https://scikit-learn.org/dev/developers/advanced_installation.html#building-from-source
E
E   If you have used an installer, please check that it is suited for your
E   Python version, your operating system and your platform.

Maybe the problem is not with building, but with my mental model of how building and unit testing should work if I install it this way.

I also tried meson setup builddir && cd builddir && meson compile which does something without crashing, but it’s not moving me further. Sorry I do not understand build systems, i am just used to copypaste commands form the docs

Simply do

nix-shell '<nixpkgs>' -A python3Packages.scikit-learn

go to your source code checkout and then run

$ patchPhase
$ pypaBuildPhase
...
$ export out=/some/absolute/path/ 
$ pypaInstallPhase
...
$ pytestCheckPhase
...

It produces wheel in dist/ and puts something that looks like a python program distribution in the specified path. Then I can run the tests on that.

I don’t know python enough to know whether this is what you want but I know build systems and this sounds pretty complete to me.

almost there. But there is an error after running patchPhase

substituteStream() in derivation nix-shell: ERROR: pattern numpy==2.0.0rc1 doesn't match anything in file 'pyproject.toml'

Maybe that’s because scikit-learn nix package has this line in it

  # Avoid build-system requirements causing failure
  prePatch = ''
    substituteInPlace pyproject.toml \
      --replace-fail "numpy==2.0.0rc1" "numpy"
  '';

and there is no numpy==2.0.0rc1 in the pyproject.toml.

Also, I don’t know why, but when I clone the repo, pyproject.toml contains some information, but after I try to build it with patchPhhase, or I even think it was happening before with pip or meson, it is empty.

Your tree might be newer simply not have that line or have it slightly altered.