Llama-index: advice on how to get it working?

I am so glad to see llama-index included in nixpkgs unstable. I am curious if any of you have managed to get it working, and would have a recipe for doing so.

My first attempt was simply to add the package llama-index. But that collides with llama-index-cli, so I tried llama-index-core instead and that works fine.

I did need to set the environment variable NLTK_DATA so that nltk would not attempt to write to the nix store.

It feels like there are other gotchas, though, with llama-index not playing well with the read-only store. For instance, after a call to VectorStoreIndex.from_documents, I receive Read-only file system: '/nix/store/zlsx2pa9mhp1rvnq575hqpdgffip3vr8-python3-3.12.2-env/lib/python3.12/site-packages/llama_index/core/_static/tiktoken_cache/9b5ad71b2ce5302211f9c61530b329a4922fc6a4.addd3212-e92b-4c65-9d5b-cd081afd8099.tmp'

Are there other recommended llama-index settings or environment variables others have used to keep llama-index from attempting to write to the nix store?

Well, I should have just thought to read the Python code. The environment variable I needed was appropriately named TIKTOKEN_CACHE_DIR.

My shell.nix now looks as follows, and this allows me to do the starter tutorial.

let
  nixpkgs = fetchTarball "https://github.com/NixOS/nixpkgs/tarball/nixos-unstable";
  pkgs = import nixpkgs { config = {}; overlays = []; };
in

pkgs.mkShellNoCC {
  packages = with pkgs; [
    (python312.withPackages (ps:
      with ps; [
        llama-index-core
        llama-index-llms-openai
        llama-index-program-openai
        llama-index-readers-file
        llama-index-embeddings-openai
      ]))
  ];

  NLTK_DATA = ".nltk_data";
  TIKTOKEN_CACHE_DIR = ".tiktoken";
  OPENAI_API_KEY = "sk-blablabla";
}

Hoping this might help someone else!

1 Like