How should we handle software created with LLMs?

If by “internal” you mean “code that lives in nixpkgs/”, then any PR open by a human must be 100% written by a human with all the appropriate co-authored-by: tags. We don’t need more automated PRs, we need less, and r-ryantm is enough. Upstream projects that are known to merge code generated by LLMs must be marked as such, and we should maintain a list of exceptions for projects s.a. the kernel.

LLM usage is an a lot more important marker about what risks and what complexity a piece of software brings into the closure, than meta.license or meta.insecure ever were, and we should maintain an up-to-date view of exactly how compromised we are.

Keeping Nixpkgs uncontaminated is also how you avoid tanking its value at the time when every other project’s source is becoming increasingly useless as training data.

There is way too much low hanging fruit[1] along the lines of data analysis, visualization, and automation for us to even think about LLMs.


  1. ideas/2024: propose an analytics project (time budgeted builds) by SomeoneSerge · Pull Request #16 · NixOS/GSoC · GitHub ↩︎

4 Likes