Easy source filtering with file sets

Nice, it looks like a more featureful version of GitHub - numtide/nix-filter: a small self-container source filtering lib

In terms of UX, it might be worth adding a glob function. It could make those expressions a lot shorter for the common case. Eg:

fs.toSource {
  root = ./.;
  fileset = fs.globs [
    "Makefile"
    "**/*.nix"
    "!nix/source.nix" # negative filter
  ];
}

Note that docs are now published at Nixpkgs Manual.

1 Like

With such a proposed function, cases where the result isn’t clear quickly arise. E.g. should this include foo/bar or not?

fs.globs [
  "!foo"
  "foo/bar"
]

And is foo in the local directory relative to the current Nix file or somewhere else?

We could have a very similar interface like this which avoids these problems:

fs.globDifference ./.
  [
    "Makefile"
    "**/*.nix"
  ]
  [ 
    "nix/source.nix"
  ]

However, this still wouldn’t be very composable then, since it requires a path as the first argument, and a list of strings as the second and third. Additionally it requires learning/knowing the globbing string DSL.

Instead we can get the same result using simple combinators and a convenience function fs.fileExtFilter (not introduced in the PR yet):

fs.difference
  (fs.unions [
    ./Makefile
    # Currently the PR only supports this with
    # (fs.fileFilter (file: file.ext == "nix") ./.)
    (fs.fileExtFilter [ "nix" ] ./.)
  ])
  (fs.unions [
    ./nix/source.nix
  ])
1 Like

No, as traversal of foo is already forbidden.

Swapping items though would include foo/bar and then exclude everything else from foo.

This at least is how it works for most backup programs I have worked with.

While we can define semantics for !, it always leads to tricky cases. Here’s an excerpt from man gitignore, giving an example for ! semantics:

Example to exclude everything except a specific directory foo/bar (note the /* - without the slash, the wildcard would also exclude everything within foo/bar):

$ cat .gitignore
# exclude everything except directory foo/bar
/*
!/foo
/foo/*
!/foo/bar

I’d argue this is definitely not obvious, and just a simple mistake of forgetting the /* leads to the wrong result. In comparison, here’s the same with the proposed file set interface:

fs.difference ./. ./foo/bar
1 Like

I’d already argue against adding more syntax on the basis of parsing opening up a huge surface for implementation errors, and therefore introducing another source of complexity. We have enough quirks and moving parts to deal with as it is.

While I understand supporting gitignore’s globbing (or a subset or variant) may add complexity it will lead to a massive usability boost. Which nix could certainly benefit from :slight_smile: When I try to introduce nix to new coworkers this is usually one of the things people frown upon the most; how complex it is to just get the right files into the derivation.

Actually, you can already use globbing to exclude files today using pkgs.nix-gitignore, and that can make sense when people are already familiar with globbing and its complexity, and composability is not needed.

File set combinators are good for when that’s not the case. But also, file set combinators are well fit as a foundation to implement functions like pkgs.nix-gitignore on top of with the benefit of composability. Does that make sense?

Here’s an example of what I mean regarding composability: Say you’re writing the Nix build for a component under ./some/project of a larger project. There is a .gitignore at the repo root you want to use, but you also only want files from the component directory.

If pkgs.nix-gitignore used file sets underneath, this would be possible:

fs.intersect ./.
  (pkgs.nix-gitignore [] ../..)

Furthermore, if you now wanted to add one file back from some other path in your project, you could do that using

fs.unions [
  (fs.intersect ./.
    (pkgs.nix-gitignore [] ../..)
  )
  ../../some/file/anywhere.sh
]

Comparatively this would be tricky using globbing.

1 Like

Update: After talking with @roberth, he agreed for me to just go ahead and start incrementally merging PR’s implementing this. The first one is merged now, though it’s a very limited interface:

The second one is much more interesting, but I just opened it, reviews appreciated!

6 Likes

Having worked on this the past months, the file set library is fairly usable now! See File set library tracking issue and feature requests · Issue #266356 · NixOS/nixpkgs · GitHub for status, updates and if you have feature requests :slight_smile:

6 Likes

I’ve wanted something like fs.union(s) for so long… Blacklisting stuff is pain

2 Likes

This is an awesome library.

I’m wondering: Is support for renaming of files and/ or folders planned?

I’m pondering a use-case in which collecting a set of files and folders would improve if I could apply renaming rules :thinking:

@don.dfh We’re limited by what the underlying builtins.path/builtins.filterSource primitive can do, which doesn’t support renaming files.

However I’d also argue it shouldn’t be in scope, because the job of lib.fileset (or the builtins) is to exactly pick the files that you want to be able to influence derivations. Past that, you can use derivations to further transform them in any way necessary, including renaming, changing and adding files. All of these operations don’t change the selected files.

We do lack a nice general function to do that, but it would be fairly easy to add: Check out Function for transforming store path contents · Issue #264541 · NixOS/nixpkgs · GitHub, where I’m proposing a pkgs.transformStorePath.

1 Like

Is there something I could use to get a file set from a glob? I understand how do use the filters, but it would be really convenient to be able to pass */**/*.{ml,mli} or load up a ./.ignore file from the project root which use the globbing syntax.

One of the goals of the fileset library is for functions to have obvious semantics, and I think globs are a bit out there regarding that.

Globs are a separate syntax, so it requires a parser to be implemented in Nix, it requires people to understand that additional language, and comes with edge cases like how files with * or { in them should be handled. Should ?, ! or other features be implemented too? How would one debug those (lib.fileset.trace wouldn’t work)? Etc.

So while I don’t think it’s a good idea for the lib.fileset library to implement that, it’s a really good foundation for other libraries to be built upon, as it takes care of all the obscurity of builtins.path underneath and exposes an easy and safe interface on top.

Notably there’s already gitignore.nix and pkgs.nix-gitignore that take care of gitignore-style glob filtering. These are still based on lib.sources for now, but could be adapted to return filesets instead for improved composability, but you can also use lib.fileset.fromSource to convert any lib.sources-based value to a fileset :slight_smile:

Oh and if you want to get Git-tracked files instead, you can use lib.fileset.gitTracked too.

2 Likes

Is there a reason that lib.fileset functions aren’t aliased to provide access from the top-level of lib? For most sub-level functions, like lib.lists.singleton, they’re also accessible via lib.singleton. Maybe I’m missing a paradigm here.

Yeah that’s intentional, because without fileset it would either be confusing what these functions do, or it could be assumed they do something else. E.g. I might think that lib.union should be a union of lists, removing duplicates. lib.{from,to}Source wouldn’t make any sense (to/from what is the value being converted?). lib.trace could be confused with builtins.trace (though arguably there should be lib.trace that can trace arbitrary values). lib.difference might be interpreted as difference between integers. Etc.

3 Likes

Hi zimbatm, this has now been manifested into reality!

1 Like

Excellent. With that, I can retire numtide/nix-filter.

1 Like