Nix-workflow for scientific workflow

1 Like

Thank you for sharing. I’m lacking some more background. What’s the motivation? Why are existing tools not satisfactory? What does this do, then? And so on.

Thanks for the question. Existing tools fall short in three areas:

  1. No pinning of step runtimes. Tools like DVC, MLflow, Metaflow, and Snakemake rely on conda, pip, or uv, which pins Python packages but not the entire runtime, including system libraries, CUDA, or dependencies in other languages. Docker gets closer, but its imperative nature makes it fundamentally less reproducible and composable than Nix.
  2. The workflow definition language is not functional. Some tools like Snakemake and DVC are declarative, but not functional. A functional language makes steps to be accurately addressed, easier to reason about, discover, and reuse: each step is a pure value (an attribute set) rather than an ad-hoc name or a mix of expressions and statements. In addition, nix-workflow uses Nix for both defining workflows and pinning runtimes.
  3. No automatic dependency tracking. The tool adopts Nix’s String interpolation with context. Referencing one step inside another automatically tracks the dependency and its lineage. In MLflow, W&B, or Neptune, lineage must be recorded through explicit API calls. In nix-workflow, dependency management is a property of the language itself, via string context.

Why not just use Nix directly? Nix builds packages; nix-workflow extends that model to workflows, where the process needs to be interruptible. Scientific workflows are lengthy, and you need ergonomic support for pinning an environment after a process finishes or resuming from a partially completed step. This is what nix-workflow aims to address, with more planned.

This project is still in the idea and demo phase. Feedback, skepticism, and questions are very welcome and valuable.

What’s the difference between lib.output "mycmd" and runCommand "mycmd" {} "mkdir $out; cd $out; mycmd"?

Also, just a note, but builtins.fetchTarball is considered evil. You may want to rework your example to use fetchurl so that it is guarded by a hash, or assume the repo is already checked out locally.

The main difference right now is that lib.output normalizes commands. For example, it sorts arguments when their order does not matter so that semantically equivalent commands do not produce different experiment recipes.

At a higher level, the goal of lib.output is to translate experiment recipes defined in a Nix environment into the nix-workflow environment.

The underlying motivation for this tool is the assumption that workflow or experiment steps require a different build model from the standard Nix build model. If that premise holds, then lib.output acts as a mapping function that converts recipes from Nix into a representation suitable for nix-workflow, and nix-workflow can then execute the steps using those converted recipes.

Thanks for the suggestion about fetchurl. I understand that builtins.fetchTarball can also be used with a fixed hash. If both builtins.fetchTarball and pkgs.fetchurl are used with a hash, which is the better choice for fetching Nix source that will be imported during evaluation? Is the main distinction simply that builtins.fetchTarball runs at evaluation time, while pkgs.fetchurl runs at build time?

Yes, in that case it may be better to use fetchTarball with a hash, which will run at eval time and not count as an “IFD” whereas importing from pkgs.fetchurl does I believe count as IFD.

1 Like

Looks cool. How does it differ from bionix and rixpress?

When building something like this a few years ago, I found it helpful to define fetching the original data from storage with a FOD, chucked and piecemeal if needed, and a final impure derivation to push the result back into long-term storage.

It looks pretty nice, thanks for the post!

Looks cool. How does it differ from bionix and rixpress?

Thanks! I didn’t know about those tools, appreciate you mentioning them.

I think the fundamental difference is that nix-workflow’s primary use case is machine learning, whereas BioNix and rixpress focus on biotech and data science. In ML, it’s hard to run experiments in a fully sandboxed environment like a native Nix build, since ML workflows commonly depend on tools that need internet access and GPUs. So nix-workflow uses a looser, optional sandbox.

We do lose some reproducibility, but there’s still a lot of value in getting as close to functional as possible: tracking environments and inputs through content addressing even when full hermeticity isn’t feasible. You still get environment pinning, dependency tracking via string context, and content addressed caching for the steps that can be pure.

It also provides helpers for partial progress. For example, you can lock the result of a long running training step without needing the entire pipeline to complete, so downstream steps can build on it immediately.

Thanks a lot! I’m currently developing features very close to what you’ve described. My general approach around tracking and caching is input addressed recipes with content addressed outputs. For now I’m working on static files with hash verification, and FODs for remote files are important but further down the roadmap.

On the long-running process side, I’m also thinking about sliceable results, where checkpoints at different stages can be consumed individually by downstream steps. Haven’t figured out the exact API for that yet though.

Really appreciate the feedback.

1 Like