However, this still wouldn’t be very composable then, since it requires a path as the first argument, and a list of strings as the second and third. Additionally it requires learning/knowing the globbing string DSL.
Instead we can get the same result using simple combinators and a convenience function fs.fileExtFilter (not introduced in the PR yet):
fs.difference
(fs.unions [
./Makefile
# Currently the PR only supports this with
# (fs.fileFilter (file: file.ext == "nix") ./.)
(fs.fileExtFilter [ "nix" ] ./.)
])
(fs.unions [
./nix/source.nix
])
While we can define semantics for !, it always leads to tricky cases. Here’s an excerpt from man gitignore, giving an example for ! semantics:
Example to exclude everything except a specific directory foo/bar (note the /* - without the slash, the wildcard would also exclude everything within foo/bar):
I’d argue this is definitely not obvious, and just a simple mistake of forgetting the /* leads to the wrong result. In comparison, here’s the same with the proposed file set interface:
I’d already argue against adding more syntax on the basis of parsing opening up a huge surface for implementation errors, and therefore introducing another source of complexity. We have enough quirks and moving parts to deal with as it is.
While I understand supporting gitignore’s globbing (or a subset or variant) may add complexity it will lead to a massive usability boost. Which nix could certainly benefit from When I try to introduce nix to new coworkers this is usually one of the things people frown upon the most; how complex it is to just get the right files into the derivation.
Actually, you can already use globbing to exclude files today using pkgs.nix-gitignore, and that can make sense when people are already familiar with globbing and its complexity, and composability is not needed.
File set combinators are good for when that’s not the case. But also, file set combinators are well fit as a foundation to implement functions like pkgs.nix-gitignore on top of with the benefit of composability. Does that make sense?
Here’s an example of what I mean regarding composability: Say you’re writing the Nix build for a component under ./some/project of a larger project. There is a .gitignore at the repo root you want to use, but you also only want files from the component directory.
If pkgs.nix-gitignore used file sets underneath, this would be possible:
fs.intersect ./.
(pkgs.nix-gitignore [] ../..)
Furthermore, if you now wanted to add one file back from some other path in your project, you could do that using
Update: After talking with @roberth, he agreed for me to just go ahead and start incrementally merging PR’s implementing this. The first one is merged now, though it’s a very limited interface:
The second one is much more interesting, but I just opened it, reviews appreciated!
@don.dfh We’re limited by what the underlying builtins.path/builtins.filterSource primitive can do, which doesn’t support renaming files.
However I’d also argue it shouldn’t be in scope, because the job of lib.fileset (or the builtins) is to exactly pick the files that you want to be able to influence derivations. Past that, you can use derivations to further transform them in any way necessary, including renaming, changing and adding files. All of these operations don’t change the selected files.
Is there something I could use to get a file set from a glob? I understand how do use the filters, but it would be really convenient to be able to pass */**/*.{ml,mli} or load up a ./.ignore file from the project root which use the globbing syntax.
One of the goals of the fileset library is for functions to have obvious semantics, and I think globs are a bit out there regarding that.
Globs are a separate syntax, so it requires a parser to be implemented in Nix, it requires people to understand that additional language, and comes with edge cases like how files with * or { in them should be handled. Should ?, ! or other features be implemented too? How would one debug those (lib.fileset.trace wouldn’t work)? Etc.
So while I don’t think it’s a good idea for the lib.fileset library to implement that, it’s a really good foundation for other libraries to be built upon, as it takes care of all the obscurity of builtins.path underneath and exposes an easy and safe interface on top.
Notably there’s already gitignore.nix and pkgs.nix-gitignore that take care of gitignore-style glob filtering. These are still based on lib.sources for now, but could be adapted to return filesets instead for improved composability, but you can also use lib.fileset.fromSource to convert any lib.sources-based value to a fileset
Oh and if you want to get Git-tracked files instead, you can use lib.fileset.gitTracked too.
Is there a reason that lib.fileset functions aren’t aliased to provide access from the top-level of lib? For most sub-level functions, like lib.lists.singleton, they’re also accessible via lib.singleton. Maybe I’m missing a paradigm here.
Yeah that’s intentional, because without fileset it would either be confusing what these functions do, or it could be assumed they do something else. E.g. I might think that lib.union should be a union of lists, removing duplicates. lib.{from,to}Source wouldn’t make any sense (to/from what is the value being converted?). lib.trace could be confused with builtins.trace (though arguably there should be lib.trace that can trace arbitrary values). lib.difference might be interpreted as difference between integers. Etc.