Easy source filtering with file sets

Have you ever tried to filter a derivation source using functions like builtins.path or lib.cleanSourceWith? If so, you probably already wrote your own helper function to make it easier, because it’s really hard to get it right!

Sponsored by Antithesis :sparkles:, I’ve been developing a new approach to filter local sources, with the goal of making it easier, safer and more flexible to do so.

With this post I’m asking for feedback to figure out whether the interface of the current draft achieves those goals while satisfying the necessary use cases, or whether some changes are still necessary. To that end I’ll give a brief introduction here and show you how you can try it out.
This idea is implemented here: [WIP] File set combinators by infinisil · Pull Request #222981 · NixOS/nixpkgs · GitHub

Trying it out

The PR can be loaded into a nix repl as follows. We also set fs = lib.fileset for convenience.

nix repl -f https://github.com/tweag/nixpkgs/tarball/file-sets

Welcome to Nix 2.15.1. Type :? for help.

Loading installable ''...
Added 19162 variables.

nix-repl> fs = lib.fileset
Flakes

To temporarily override the default nixpkgs input in your flake.nix:

nix build --override-input nixpkgs github:tweag/nixpkgs/file-sets

I then recommend also defining fs = inputs.nixpkgs.lib.fileset for convenience.

Overview

The PR implements a file set abstraction, which as you might expect, allows representing sets of files. Common set operations are supported, including:

  • Union: fs.union a b / fs.unions [ ... ]
  • Intersection: fs.intersect a b / fs.intersects [ ... ]
  • Difference: fs.difference a b
  • Filtering: fs.fileFilter predicate a

Examples:

let

  # The file ./Makefile and recursively all files in the ./src directory
  a = fs.union ./Makefile ./src;

  # Recursively all files in the ./. directory that are not in the ./tests directory
  b = fs.difference ./. ./tests;

  # Recurlively all Nix files in the ./. directory
  c = fs.fileFilter (file: file.ext == "nix") ./.;

  # Recursively all Nix files in the ./src directory
  d = fs.intersect ./src c;

in null

To see which files are included in a file set, you can use fs.trace:

nix-repl> fs.trace {} (fs.union ./Makefile ./src) null
trace: /home/user/my/project
trace: - Makefile (regular)
trace: - src (recursive directory)
null

Notably none of these operations actually import these files into the Nix store!
Instead the only way to get the files to be imported, and therefore usable in derivations, is to use the toSource function:

# Can be used as the `src =` of a derivation
fs.toSource {
  root = ./.;
  fileset = fs.unions [
    ./Makefile
    (fs.fileFilter (file: file.ext == "c") ./src)
  ];
}

These are some of the core functions, but more are available. The best way to explore them is to build the manual locally and open the lib.fileset reference section in your browser:

nix-build '<nixpkgs/doc>' -I nixpkgs=https://github.com/tweag/nixpkgs/tarball/file-sets

firefox result/share/doc/nixpkgs/manual.html#sec-functions-library-fileset

Goals, limitations and alternatives

The goal of this abstraction is to be able to precisely specify which files should have an effect on your derivation builds. Doing this should be straightforward, with obvious semantics, explanatory error messages and good performance.

In order to achieve this, some limitations are imposed:

  • Only local files at evaluation time are supported. Files in Nix store paths are not supported.
    Rationale: The path expression-based interface would be hard to use; might require IFD; without CA, original files would still be imported.
    Alternative: Use build-time tools to create a new derivation with the desired layout.

  • Empty directories cannot be represented.
    Rationale: It’s not obvious what the semantics should be if this were allowed, it couldn’t be explained as set operations anymore.
    Alternative: fs.toSource supports an extraExistingDirs argument which can be used to ensure certain directories exist in the resulting Nix store path.

File sets are intended as a replacement for builtins.path-based filtering and the lib.sources functions.
In contrast, file sets are not a replacement for functions like pkgs.nix-gitignore, gitignore.nix, Flakes’ tracked-by-git filtering or fetchGit.
However, file sets can serve as a more performant and composable foundation to implement such functions on top of.


If this is something you could benefit from, please give it a try and use this thread for any questions or feedback about the interface!
For the implementation, see the draft pull request.

25 Likes

Nice, it looks like a more featureful version of GitHub - numtide/nix-filter: a small self-container source filtering lib

In terms of UX, it might be worth adding a glob function. It could make those expressions a lot shorter for the common case. Eg:

fs.toSource {
  root = ./.;
  fileset = fs.globs [
    "Makefile"
    "**/*.nix"
    "!nix/source.nix" # negative filter
  ];
}

Note that docs are now published at https://tweag.github.io/nixpkgs/file-sets/manual.html#sec-functions-library-fileset.

1 Like

With such a proposed function, cases where the result isn’t clear quickly arise. E.g. should this include foo/bar or not?

fs.globs [
  "!foo"
  "foo/bar"
]

And is foo in the local directory relative to the current Nix file or somewhere else?

We could have a very similar interface like this which avoids these problems:

fs.globDifference ./.
  [
    "Makefile"
    "**/*.nix"
  ]
  [ 
    "nix/source.nix"
  ]

However, this still wouldn’t be very composable then, since it requires a path as the first argument, and a list of strings as the second and third. Additionally it requires learning/knowing the globbing string DSL.

Instead we can get the same result using simple combinators and a convenience function fs.fileExtFilter (not introduced in the PR yet):

fs.difference
  (fs.unions [
    ./Makefile
    # Currently the PR only supports this with
    # (fs.fileFilter (file: file.ext == "nix") ./.)
    (fs.fileExtFilter [ "nix" ] ./.)
  ])
  (fs.unions [
    ./nix/source.nix
  ])
1 Like

No, as traversal of foo is already forbidden.

Swapping items though would include foo/bar and then exclude everything else from foo.

This at least is how it works for most backup programs I have worked with.

While we can define semantics for !, it always leads to tricky cases. Here’s an excerpt from man gitignore, giving an example for ! semantics:

Example to exclude everything except a specific directory foo/bar (note the /* - without the slash, the wildcard would also exclude everything within foo/bar):

$ cat .gitignore
# exclude everything except directory foo/bar
/*
!/foo
/foo/*
!/foo/bar

I’d argue this is definitely not obvious, and just a simple mistake of forgetting the /* leads to the wrong result. In comparison, here’s the same with the proposed file set interface:

fs.difference ./. ./foo/bar
1 Like

I’d already argue against adding more syntax on the basis of parsing opening up a huge surface for implementation errors, and therefore introducing another source of complexity. We have enough quirks and moving parts to deal with as it is.

While I understand supporting gitignore’s globbing (or a subset or variant) may add complexity it will lead to a massive usability boost. Which nix could certainly benefit from :slight_smile: When I try to introduce nix to new coworkers this is usually one of the things people frown upon the most; how complex it is to just get the right files into the derivation.

Actually, you can already use globbing to exclude files today using pkgs.nix-gitignore, and that can make sense when people are already familiar with globbing and its complexity, and composability is not needed.

File set combinators are good for when that’s not the case. But also, file set combinators are well fit as a foundation to implement functions like pkgs.nix-gitignore on top of with the benefit of composability. Does that make sense?

Here’s an example of what I mean regarding composability: Say you’re writing the Nix build for a component under ./some/project of a larger project. There is a .gitignore at the repo root you want to use, but you also only want files from the component directory.

If pkgs.nix-gitignore used file sets underneath, this would be possible:

fs.intersect ./.
  (pkgs.nix-gitignore [] ../..)

Furthermore, if you now wanted to add one file back from some other path in your project, you could do that using

fs.unions [
  (fs.intersect ./.
    (pkgs.nix-gitignore [] ../..)
  )
  ../../some/file/anywhere.sh
]

Comparatively this would be tricky using globbing.

1 Like

Update: After talking with @roberth, he agreed for me to just go ahead and start incrementally merging PR’s implementing this. The first one is merged now, though it’s a very limited interface:

The second one is much more interesting, but I just opened it, reviews appreciated!

6 Likes