2022-11-28 Nixpkgs Architecture Team Meeting #19

Infinisil · November 29, 2022, 8:08pm

Nixpkgs Architecture Team Meeting #19

Past meeting notes
Recording
Matrix announcement
Recorded by: @infinisil
Lead by: @infinisil
Meeting notes by: @growpotkin
Present members: @infinisil @growpotkin @roberth @Ericson2314

Notes

Issues

Simple Package Paths

@infinisil created PR associated with sharding directory names.
- https://github.com/nixpkgs-architecture/simple-package-paths/pull/20
- Collected stats on how many packages fall into each substring category.
Plan to use builtins.substring 0 4 <attribute> to create shard-directory.
Compatibility Layer
- Emits warning for legacy style package names.
- Should we use symlinks for legacy support?
  - @roberth okay with ignoring support for it
  - @blaggacao if you build it, they will come, if you allow legacy names with symlinks people will depend on that.
- We don’t need it!
  - @infinisil: +1
  - @Ericson2314: +1
  - @growpotkin +1
  - @roberth: +1

Is the tree an API?

There’s pkgs.path
```
nixpkgs
pkgs = import nixpkgs {}
pkgs.path == nixpkgs
```
Probably an anti-pattern, doesn’t work as expected with overlays, config
There’s meta.position → for debugging and tooling
modulesPath → NixOS module specific, but definitely part of the API, should have symlinks when/if we change those

numbers.nix

@infinisil created a utility for collecting statistics about shard-directory members/tree: https://github.com/nixpkgs-architecture/simple-package-paths/pull/21

`pkgs/unit/hell/hello/unstructured/test.nix`

https://github.com/nixpkgs-architecture/simple-package-paths/issues/19

Protected namespace for misc files associated with a package.
Makes existing files with common names distinct from new “standardized” filenames.
- Example: test.nix might exist in a package today, but may not follow standardized structure for “new” pkgs/unit/<SHARD>/<PNAME>/test.nix checks.
  - NOTE: I have no idea if test.nix is a prospective “standardized” file, I’m just providing an example.
- @roberth: Since for now we only have packages, let’s use a pkg subdirectory for all extra package-related files
  - @infinisil: +1
  - @growpotkin: Spell out package to avoid annoying completion?
    - @infinisil: pkg/fun.nix instead of pkg-fun.nix?
      - @roberth: Right direction. Goes into removing initial pkgs nesting, but we agreed to not do that
    - pfunk.nix???
  - @roberth: (implicit +1)
- @Ericson2314: Missing default.nix a bit, who uses which subdirectory?
- @roberth: pkg-fun/pkg-fun.nix?
- @infinisil: Does it make sense to have a single file to import all dirs? I don’t think so

Proposals:

pkgs/unit/gnum/gnumake
- pkg-fun
  - pkg-fun.nix
  - foo.patch (imported by pkg-fun.nix)
  - test.nix (imported by pkg-fun.nix)
pkgs/unit/gnum/gnumake
- pkg-fun.nix
- pkg
  - foo.patch (imported by …/pkg-fun.nix)
  - test.nix (imported by …/pkg-fun.nix)
@Ericson2314: defer more conventions for later, incubate in nixpkgs, before making it a more standardized convention
pkgs/unit/gnum/gnumake (
- pkg-fun.nix
- foo.patch (imported by pkg-fun.nix)
- test.nix (imported by pkg-fun.nix)
- @growpotkin: +1
- @roberth: +1
- @Ericson2314: +1
- @infinisil: +1
@infinisil: In the future, add versioning:
pkgs/unit/gnum/gnumake
- .unit-version → Should be on a higher level, not package-specific, pkgs/.unit-version
- pkg-fun.nix
- pkg
  - foo.patch (imported by …/pkg-fun.nix)
  - test.nix (imported by …/pkg-fun.nix)

Relative File References

@infinisil don’t want to create tooling to automatically fixup relative path import and other references.
@infinisil recommends a strict boundary for references with a few rules:
- No import ../<PATH>
- References to “stuff” outside of the package dirs needs to be through function args.
- Anything that currently requires fixup of ../ paths by hand, then the automated tooling will ignore them, leaving them to be manually rewritten later.

Tooling

@infinisil: Tooling can be run on a commit history with git filter-branch
We don’t need to worry about old PRs too much
@growpotkin: Dry-run this!

No symlinks and no file changes are great for not triggering merge conflicts in lots of PRs

fricklerhandwerk · November 30, 2022, 8:38pm

What is the rationale for sharding directories other than GitHub not displaying all of them on one page? (I just hope it’s not the only reason, because that would be a pretty arbitrary constraint.)

Infinisil · November 30, 2022, 9:57pm

While we have yet to benchmarks this, we have some reports that git performs pretty poorly with huge flat directories. git status could be slow and the fetching updates could get expensive. The main reason is that whenever an entry in a directory updates, git needs to write a git object containing all the entries with their hashes.

Briefly checking the flat directory of the WIP implementation, its object file is 376041 bytes with 10476 subdirs in it. So any update to that directory writes another ~0.4MB file. If we limit it to only 1000 subdirs with sharding, the size will be limited to like 0.04MB for these object files, but you have two levels of them, so double that.

Infinisil · November 30, 2022, 10:01pm

Or with a simpler example, if updating a single out of 1’000’000 files in a flat directory requires git to write X bytes, sharding those files into 1’000 directories would only require git to write X/1000*2 bytes, so 500 times less. And if we shard it into 100x100 directories (with 100 files in each), it would take X/10000*3 bytes, so about 3333 times less

Infinisil · November 30, 2022, 10:06pm

git cat-file can be used to get such numbers:

❯ git cat-file -p HEAD
tree a00c7f0cb26f5f098e0acad8bad323606c2a7729
parent 84c0723a6e159ce1f6a30595edb6381a7abcb6f6
author Silvan Mosberger <contact@infinisil.com> 1668199038 +0100
committer Silvan Mosberger <contact@infinisil.com> 1668199106 +0100

Some fixes
❯ git cat-file -p a00c7f0cb26f5f098e0acad8bad323606c2a7729
100644 blob a4f216d71a22d419c08668e21429b01350672580	.editorconfig
100644 blob bf350b0cf8b339d940d14e0ef1a2a9a6b540ca41	.git-blame-ignore-revs
100644 blob 4862e0eab93c9de01feaf673d1633679d8d75db5	.gitattributes
040000 tree 9fac0551b688bae6672111ecb59a7c50a6dd7605	.github
100644 blob 3bd5fe66df498d8e91a28cf3b6bd8e19474755ae	.gitignore
100644 blob f07cdfac44f194a03972733abd6c0ad48ab5c4b9	.version
100644 blob 4c4bea0ae25281ca679e8b816ed33592bf9a69a5	CONTRIBUTING.md
100644 blob 65ac1feaf010f656a21706a1b0ed31bcad348e42	COPYING
100644 blob c7e14f6934957af789c9cd58d18498fc1892e8af	README.md
100644 blob faed7e26354037f783701ebfee695757bd8f34da	default.nix
040000 tree e5c6c657c882cc2ffd5f64f396ea498d75e5f0d7	doc
100644 blob 67ecfc6eb08410b9662c5f15d057714aa6be1ea8	flake.nix
040000 tree ac83d13daf49a6d29de53d48db481a6be8050ba3	lib
040000 tree 29959033935e14fa3a4ea095ffa05333587fbea1	maintainers
040000 tree 5d4768caed5a2f3bceffc9115d9e27f6afffbbc1	nixos
040000 tree db0d27807518367ba33d8570461ca4de426d54e8	pkgs
❯ git cat-file -s a00c7f0cb26f5f098e0acad8bad323606c2a7729
593

fricklerhandwerk · November 30, 2022, 10:30pm

Thanks, this is plausible. Please document this where it’s immediately visible (e.g the README in pkgs), otherwise I fear it will be the first thing people get annoyed over if they don’t know there is a very good but non-obvious reason to do it that way.