How many people are paid to work on Nix/Nixpkgs?

blaggacao · July 29, 2020, 2:53am

Btw if anyone can lift me past the hurdle for making nix-review more useful…
https://github.com/Mic92/nixpkgs-review/pull/117#issuecomment-665389853

It’s kind of blocking my attempt to make the PR templates more user friendly and add more automatism - for the occasional leaf packages only.

ryantm · July 29, 2020, 3:36am

I look at every r-ryantm PR and read all the comments on them, and I think probably about 5 to 15% of them are bad in some way. The issues include: dependency changes, runtime breakages, important reverse dependencies failing, Darwin/ARM failures, needing to be updated as part of a group, already part of another more comprehensive PR, etc.

I think it really does take human knowledge and intuition to decide if they are okay to merge.

I could see this changing if we had more package level tests. For example, I’d expect it to be safe to auto-merge one where the package and all its reverse dependencies had tests and the tests passed.

Ryan

abathur · July 29, 2020, 6:56am

Yeah. I’ve wished we had a way to nudge people using under-tested packages for test cases (and for them to submit, and for us to collect…)

doronbehar · July 29, 2020, 8:35am

Personally, for all the packages I maintain, I’ve added sanity checks so I’d be able to stay confident that @r-ryantm’s PRs haven’t broken anything (e.g this).

I once sketched in my head a policy we could adopt, where packages could hold a meta field that will enable PRs automatically merge by ofborg once they pass it’s tests along with @r-ryantm’s reverse dependencies tests. Thus maintainers would hopefully be encouraged to add this meta field as long is there’s an appropriate check phase of some sort.

RFC 50 wanted to address that but it got stall because of disagreements upon the kind of policy the bot would implement.

Even in the current state, we are ranked as the top updated repo at repology. Thanks @ryantm .

danieldk · July 29, 2020, 8:48am

I don’t know. This gives upstream maintainers more or less the power to directly push into nixpkgs by creating a tag. It only requires one malicious actor or compromised GitHub account to push a backdoor or trojan into nixpkgs.

When reviewing PRs I try to (where humanly possible) do a diff between upstream versions to quickly glance if there are no fishy things going on. I also run binaries in a very restrictive bubblewrap sandbox to check if they attempt require strange things (I also plan to automatically extract a list of accessed paths, attempted connections, etc.).

I think there should be a human in the loop, so that the chance to catch something malicious (or just a stupid error) is increased.

IMO this is one of the tasks of a Linux distribution, to ensure that the users get vetted, working, packages. If that is not doable anymore, then perhaps we are packaging more than we can manage.

I completely agree that it would be great if maintainers could merge PRs against their packages. I think this speeds up the process and will keep more maintainers around, since they are more empowered to actually maintain their packages.

doronbehar · July 29, 2020, 11:03am

I think the fact we package mostly free software, means developers of packages we ship won’t do such a thing. Though I still agree that this is a big risk, as even if the developer is trusted, their infrastructure could be compromised or perhaps the developer’s credentials to the hosting system.

Yea I guess you are right. To my view, this all comes down to my view to our way of managing human resources…

blaggacao · July 29, 2020, 3:24pm

This might be maybe part of what you’re looking for: cached-nix-shell/nix-trace at master · xzfc/cached-nix-shell · GitHub

This is exactly what I’m trying to do on my pr to nix-review. But when I think about it, your approach is superior. Wouldn’t this call for a mass PR to try to add a --help check to every package? - and merge those where it’s successful?

/cc @Mic92 ? What do you think with regards to my PR vs @doronbehar 's approach? Both?

cdepillabout · July 29, 2020, 3:43pm

The ship has already sailed on this to some extent.

99% of the Haskell packages in nixpkgs are updated automatically with no (or little) human interaction.

I’m not necessarily saying that this is good or bad, just that the Haskell package set in nixpkgs is not checked to the same standards you’re describing. One may argue that it not good to have different standards for different parts of nixpkgs.

jonringer · July 29, 2020, 4:05pm

Someone is still looking at breakages. I think this is the nixpkgs sweet spot. If the packages have a enough logic to validate that they work “under normal” conditions as part of the build, then there’s not much reason for human involvement.

For python packages, I generally recommend people to include unit tests, and import the module. This gives good assurance that the package is in a usable state.

EDIT: Since python packages aren’t directly imported from a package repository, it’s still necessary for a human to review that nothing malicious is being introduced into the code, but I would consider this “minimal” human interaction.

danieldk · July 29, 2020, 4:22pm

Quite some commands do not have a --help option. Most notably, the BSD tradition doesn’t even use long options.

I will probably use something bpf-based. LD_PRELOAD is too easy to work around by malicious software.

danieldk · July 29, 2020, 5:33pm

Agreed. The same is true for many Rust, Go, and Node projects, that pull in so many dependencies that it becomes infeasible to review everything. So, even if the application is vetted, the transitive dependencies are just ack’ed more or less automatically.

I think in the end we (as a field) would probably want heuristics or perhaps models that recognize suspicious patterns and could point people to it. But I don’t think we are there yet. I guess another technical solution would be to sandbox applications as much as possible.

In general, if we’d have automatic merges across the whole package set, I would be scared as hell to deploy NixOS/nixpkgs inside an organization or for a customer. It opens quite a large surface for general or targeted attacks [1]. I think there are at least (for me) two preferable options:

Accept that (outside CVEs) software is sometimes a few revisions behind. It is better to have slightly older software that is somewhat vetted than software that is automatically updates but anything could happen; or
accept that it is maybe not feasible to incorporate every possible package in nixpkgs and focus on a smaller set of packages that are needed by a large set of people. Then have niches covered by (groups of) people who maintain their own package sets/flakes.

[1] Of course, at this point you also have to trust other reviewers and committers. But it is an additional layer of defense.

7c6f434c · July 29, 2020, 7:11pm

I’m not necessarily saying that this is good or bad, just that the Haskell package set in nixpkgs is not checked to the same standards you’re describing. One may argue that it not good to have different standards for different parts of nixpkgs.

It might be a good idea to maintain some parts of Nixpkgs to a higher standard than others, but then I guess we need documentation which part has which standard.

(I do not believe in human skimming of upstream diffs of many central packages like freetype, definitely not all of the core packages; if some packages are heroically reviewed up to tracing syscalls and should not be merged until such a check is done, this is definitely worth documenting — a new meta entry for maintenance standards?)

blaggacao · July 29, 2020, 7:20pm

That sounds like a concealed approval… - Since a whole lot do. And this would give @r-ryantm maybe the extra bit of confidence it needs.

7c6f434c · July 29, 2020, 7:22pm

In general, if we’d have automatic merges across the whole package set, I would be scared as hell to deploy NixOS/nixpkgs inside an organization or for a customer. It opens quite a large surface for general or targeted attacks [1]. I think there are at least (for me) two preferable options:

I … do not believe we are even close to having any procedures capable to avoid a targeted attack by an attacker capable to make one upstream release and keep it up and out of security news for a few of weeks. (Note that r-ryantm tracks other repositories, so there is some delay here, and then we definitely need one week window for «wait, no»). A targeted attack surely will be an easily remotely exploitable weakness on par with median submissions to Underhanded Code Contests buried in a related bugfix. It takes time to find it by manual and automatic analysis, and it does not affect the behaviour immediately upon launch…

blaggacao · July 29, 2020, 7:27pm

Are you suggesting a “Wait a week or two and see if a CVE pops up somewhere, then automerge - if the scope allows it?” - (Heavily simplified)

7c6f434c · July 29, 2020, 7:56pm

Are you suggesting a “Wait a week or two and see if a CVE pops up somewhere, then automerge - if the scope allows it?” - (Heavily simplified)

Wiat a week or two in case something that humans might notice appears («oops the rev-dep conditions are set up wrong»), then, if the package is marked as auto-mergeable and passthru.tests build fine, auto-merge.

(I guess we will end up having some logic what rev-dep impact is allowed for rev-deps not in passthru.tests…)

Of course I hope that when we get auto-merge capabilities in bots it will initially be something to be triggered by non-committer package maintainers, but I will be grateful to whoever implements the full-auto merge for r-ryantm bumps once our custom merge-bot is considered a tested technology, support an RFC to that effect, etc.

danieldk · July 29, 2020, 8:24pm

I think that would go a long way of avoiding problems with high profile projects. Some cases were caught rather quickly (IIRC Handbrake and Transmission). Though for the Webmin backdoor it took a while.

But every extra day without CVEs makes the merge safer. Of course, lesser known projects do not receive the same amount of scrutiny.

danieldk · July 29, 2020, 8:42pm

I meant targeted in the sense of people who know that automatic merging in nixpkgs exists and exploits that as a weakness. Of course, you have to get some people to install the software. But you could introduce a package in nixpkgs without any malicious code and then later get a flavor merged that does whatever it needs to do. Since this would only affect nixpkgs, it wouldn’t be caught by someone else and covered in any security news. Until it comes out of course…

danieldk · July 29, 2020, 8:48pm

Well, you could make it opt in like the Python imports check and let the writer of the derivation decide what commands to invoke and how.

timokau · July 29, 2020, 9:08pm

You might be interested in [RFC] nixpkgs/tests: create nixos-independent PoC tests by 7c6f434c · Pull Request #36842 · NixOS/nixpkgs · GitHub, I’d stlil like to see that idea revived at some point (soon most of the replies on that issue will probably be the discourse bot listing all the times I’ve plugged it somewhere). Easy sanity and cheap sanity checks for packages would be great.

Personally, I think we should care more about avoiding build-time failures and less about avoiding run-time failures. That in itself is an incentive for maintainers to add checks and is basically my understanding of what the “unstable” in nixpkgs-unstable should mean. If the diff looks good and all reverse dependencies build & test okay, I think that should be good enough for nixpkgs-unstable. The human testing comes with the stable channels.

Also I think the “malicioius upstream maintainer” attack vector is not really one we can defend against. Upstream has infinitely many possibilities to introduce subtle vulnerabilities, we could not possibly catch them all. If you use a package, you have to trust the distribution and the upstream maintainer to some degree.

Completely automatic merges have other downsides and attack vectors though, I’d rather rather keep a human in the loop but make it very low-friction to handle those PRs.