Need help enabling grammars in treesitter (Python)

I currently have a Python project where I want to use various grammars in treesitter, but I have no idea how to enable the different grammars and I could really use some help.

The project in question is here:

The requirements that I have: it needs to be usable on distros other than NixOS (I am using Fedora for example) using nix-shell. Some nudges into the right direction would be very much appreciated.

1 Like

So I looked into that:

(1) You add py-tree-sitter as a Python library
(2) Then, you need to expose the treesitter grammars we have

(2) is hard because we wrap AFAIK no treesitter grammar as a Python package, so there would be some adjustments to nixpkgs to be done to wrap all of them which do support Python and thus you can just add them to your Python project and they will be available. (https://github.com/tree-sitter/tree-sitter-python/blob/b8a4c64121ba66b460cb878e934e3157ecbfb124/setup.py packaging this sort of stuff).

Thanks. Would adding this tree-sitter-languages · PyPI to nixpkgs also help?

That’d be the easy way out, adding this package, patchelf’ing completely to the right libraries of Nixpkgs and then consuming in would solve a bunch of things, yep.

1 Like

Unfortunately it seems that the maintainer of that particular package no longer wants to maintain the project. I guess that means wrapping all the individual grammars? I could really use some help there.

Yeah, tree-sitter-languages is just a simple wrapper around a deprecated API that is going away. You could easily just copy it verbatim into your source code:

import pathlib
import sys
import os
from tree_sitter import Language, Parser

treesitter_path = os.environ.get("TREESITTER_PATH")


def get_language(language):
    if sys.platform == "win32":
        filename = f"{language}.dll"
    else:
        filename = f"{language}.so"

    binary_path = str(pathlib.Path(treesitter_path) / filename)
    language = Language(binary_path, language)
    return language


def get_parser(language):
    language = get_language(language)
    parser = Parser()
    parser.set_language(language)
    return parser


if __name__ == '__main__':
    parser = get_parser("python")
    tree = parser.parse(
        bytes(
            """
def foo():
    if bar:
        baz()
""",
            "utf8",
        )
    )
    print(tree)
let
  # pkgs = import <nixpkgs> {};
  pkgs = import ../. {};
in
pkgs.mkShell {
  buildInputs = [
    (pkgs.python3.withPackages (pp: [
      pp.tree-sitter
    ]))
  ];

  env = {
    # Gotta catch 'em all.
    TREESITTER_PATH = pkgs.tree-sitter.withPlugins (gg: builtins.attrValues gg);
  };
}

The recommended way of installing is using one Python package per language. So we would need to package all of those.

But looking at the Python bindings wheel, the package is just a simple native extension that exposes a single method simply wrapping the C API so we could probably just re-create it in Nixpkgs for all tree-sitter grammars automatically:

2 Likes