Surely there would be irony if one way to enable this work (and alleviate the limited capacity of nix volunteers) would be to rely on… LLM-assisted changes in nixpkgs itself.
What do you mean? MIT licensed code is FOSS.
I think licenses like MIT aren’t considered FOSS because they are more permissive. You can take MIT code, use it for your own publicly available project without publishing the code.
I think you’re thinking of copyleft. Copyleft refers to GPL-style licenses that e.g. require people making derivative works to release their source code. MIT isn’t copyleft, but it is still open-source. (There’s a helpful table at Wikipedia saying MIT is approved by e.g. the FSF, the Open Source Initiative, and Debian).
I will note, that I have already said “sorry” for the tone, but the content was and remains entirely true - saying LLM usage does not in general correlate wit worse quality software is either deluding oneself or others ¯\_(ツ)_/¯
EDIT: INB4 “muh proofs” I will note that “fair”, but that you have also not presented your proofs that this new factor in software development is either totally orthogonal or even beneficial, should be very easy to furnish one if it’s like you say
Oh, sure, everybody has a different way of working. For what it’s worth - as surprising as it may seem in the context, I actually do use LLMs quite a bit these days, now that Opus 4.6 turned out to be competent enough. They are a great thing to stop oneself from wanting to kill themselves as an useless piece of shit they are, for example : ) So I’m not going to throw the first stone if someone just has to use models or they will KMS. But if they life does not depend on that, I reserve the right to be judgemental P:
That said, some ways of working are objectively worse than others - for example ways of working in which 348 people end up dead are worse than those where those people are still alive. And unrestricted use of LLMs has more in common with the former, rather than latter - at least those humans had the potential capacity to understand what they were doing wrong, even if they dun goof’d terribly. A model has at best an ability to generalise pretty well over 200k (sometimes more) tokens of non-persistent short-term memory. It will be scarily competent in one context window, and then in the next will make puzzling mistakes on things it had shown to have “mastered” in the previous one. This is all well and good as long as somebody pays attention to what they’re doing. But if they are surrendering all their authority over their code to a a very fancy multivariate non-linear regression model with a bit of randomness injected in so it “feels more human”, then what trust I can have in the code being well done at all? It’s not “ableism” to say “you should understand the code written as if you had done it yourself”. If you want to accept code nobody actually understands anymore (human never did, model stopped having what passes as understanding when the context window went away) then why we not just abolish PRs and CI? Gonna be even more efficient that way
I’m also not sure how I can be a “jury, judge and executioner”, if I hold absolutely zip zilch zero null nada power in this community.
I’m just making noises because I don’t want my homelab to get pwned because somebody’s decided we’re being inclusive of slop now. I don’t really mind people writing it or even publishing it, I just mind it being default on, so there’s more opportunities for supply chain attacks. I suppose I could love with the flag being default on, but disabled for builds in Hydra, I usually rebuild half the world each update anyway…
Agreed. I’d hate NixOS to become less secure. This might be a good opportunity to discuss how nixpkgs tests the security of submitted packages regardless of whether they’re LLM-generated or not.
Ah yes, it seems like i was confusing it with free software vs open source software, which seems to be more of an ideological difference, and i thought MIT wasn’t considered free, but it looks like it is.
At this point I don’t think projects where AI iterates on its own feedback loops, or projects that are led by people who are inexperienced in programming, or any other form of projects that lack supervision, are reliable.
But at the same time, it is undoubtedly impossible to ban all types of AI-assisted (completions, test coverage, or human-refined code) contributions while allowing a series of popular software (Linux kernel, FireFox, Python, LLVM, etc) to stay, which may be as ridiculous as the Romanian typewriter law. Even within communities where AI contributions are strictly prohibited, I don’t think you should be 100% sure that there really isn’t any such contribution.
Still, I don’t think AI necessarily writes worse code. If it is on its own, this may indeed be the case, but just as management of projects by senior developers is extremely important in software engineering, the control of people on AI is also extremely important now. Believe it or not, the rate of hallucinations in humans is also ridiculously high. If you don’t believe it, you can take a look at the pull requests before 2024. LLM also makes mistakes, but the good thing is that the mistakes it makes are somewhat complementary to humans, so a senior programmer + AI may still be a stronger combination.
Regarding copyright, I believe that training LLM is transfer learning (and thus is not restricted to redistribution limitations), but I also agree that it should not be able to spit out the original text easily, and should not be trained with explicitly pirated data. That being said, even if the previous two conditions are not met, I do not think a project assisted by such a model must be plagiarized, unless there is relevant evidence on the code itself (must bear the burden of proof).
I know that many folk have a more profound concern about AI, such as its impact on social structure, environment, and culture. But I don’t think any long discussions in this area can achieve any real results, as I am increasingly aware that different people in this community have essential differences in values due to their growth and educational experiences, and those things that are believed as “any human should recognize” are not necessarily so obvious.
My opinion on our policy is,
- Internal contributions cannot be completely submitted by an agent, including other types of automated tools, which must be agreed by main maintainers of the community.
- It is necessary to disclose what role LLM plays in the contribution. This should be consistent with the current academic requirements of most universities.
- Repeated submissions of low-quality content generated by LLM will be banned. This is already written in our Code of Conduct.
- For external projects, we don’t judge whether it uses LLM, but we should judge their completion (i.e., how much it feels like a stable, usable project), code quality, and its recognition/popularity.
The other points are fine (and well-articulated), but I disagree with this in particular as gatekeeping. We should judge external projects (if we must!) based on a) do people want to be able to install them from nixpkgs and b) is somebody hopeful enough to volunteer to maintain them and keep them reasonably up-to-date.
That’s it. That’s all. We’re not here to run popularity contests and play “your code is bad”. There is plenty of questionable-quality code out there that nonetheless is extremely useful to people.
This topic is becoming more important and I think we need to give it some attention. It has had some async discussion in the SC. I’ll add this to next week’s SC meeting agenda to get the time for a policy proposal or to get an indication of direction. For what it’s worth, I’m finding myself generally aligned with how @Aleksanaa described the situation.
Question for the community; much of the discussion has been about figuring out ways to minimize the harm that can come from LLMs. Additinally, are there ways we can leverage LLMs to safely reduce toil, are there benefits that we can purposefully enable? Are there ways for us to be forward-looking?
I’d say yes, but only iff we build/script an agent that does exactly what we want, I don’t think it makes sense to try doing it with just the prompts and hoping for the best. Since I couldn’t really find a good scriptable agent runtime (which is why I’m making one for myself, but who knows how long it will take), I guess the only option is writing an agent, which someone would have to do and then maintain it.
For the people that care about it, I think an overlay like the one linked in the thread makes a lot of sense: Hachyderm.io
This overlay can be maintained externally without needing nixpkgs to choose a position.
My two cents is: wanting to avoid software that has LLM-powered contributions will become a completely untenable position over the next year. The naughty list linked from that thread already has Firefox and systemd in it and I’d be very surprised if the chromium developers don’t use Gemini, since they work at Google. So if you want to avoid those projects you either can’t boot your NixOS system and browse the web, or freeze your system to nixos-25.05. You might not like either option, but this is reality.
Isn’t this a matter more suited to @nixpkgs-core? While I’m positive they’d appreciate input from the SC, it does appear to be much more squarely in their purview.
There are already conventions around how a project should indicate which license it uses (look for LICENSE.md in the repo root). And nixpkgs already consumes that info and makes it available for users to filter based on.
Maybe there needs to be a more generic DISCLOSURES.md convention so that nixpkgs can just be a medium through which users express their preferences, this would sidestep the risk of anybody think that nixpkgs maintainers are acting as AI detectives re: the code they package. I imagine it could look something like this:
nixpkgs.config.allowPackagePredicate = pkg:
let
disclosures = pkg.meta.disclosures or;
isVibeCoded = builtins.elem "ai:vibeCoded" disclosures;
hasTelemetry = builtins.elem "telemetry" disclosures;
isUnfree =!(pkg.meta.license.free or true);
shouldFail = isVibeCoded | hasTelemetry | isUnfree;
in
(!shouldFail);
That way the effort to come to consensus about what ai:vibeCoded actually means doesn’t end up being fragmented between nix users and non-nix users.
I believe the answer is yes to both.
- I think there’s a chance to see just how good LLMs actually are with common Nix/NixOS tasks. I’d pitched that as a fun GSoC project, but having that information in hand would do a lot to keep discussions grounded in what’s actually possible.
- I think there’s similarly a good chance to create guidelines for LLMs to help with usage of Nix/NixOS and giving back to the community. Again, I proposed this as a fun GSoC project–but there’s nothing to say folks can’t experiment on their own and blog about it.
- There are interview-driven workflows that are pretty common to Claude Code; figuring out how to leverage those or something similar would be a great way of helping people create modules and packages.
- We have no shortage of issues in the backlog. One thing LLMs are acceptable at is summarization and basic “these things are similar shapes” sort of work–getting a rough pass through the backlog and suggesting tags for “hey this issue is already addressed” that then can make human review easier would be great.
- We have a lot of documentation issues and cleanups (possibly including translation) that might benefit from a pass through LLMs. This is a place where human oversight is essential, but it’s a lot easier to edit than to write in many cases.
- We have some packages (favorite example of this is the Clickhouse service) that have configuration options that are either underexplained or incompletely ported to nix expressions. That’s another place where LLMs could be helpful.
- The Nix implementation itself is long, storied CPP codebase. Anything we can do to document it and help people hack on it reduces bus numbers, and that’s important.
There’s a saying that robots should be used for the “3 Ds: work that is dull, dangerous, or dirty”. There’s no shortage of any of those three categories of work in the maintenance we have.
In general, the forward looking philosophy I suggest is “Let’s figure out how to use LLMs to lower barriers to entry for people to contribute useful code to nixpkgs and nix, and let’s find ways of making it so that mistakes with them are caught as readily as mistakes by unaided humans.”
If by “internal” you mean “code that lives in nixpkgs/”, then any PR open by a human must be 100% written by a human with all the appropriate co-authored-by: tags. We don’t need more automated PRs, we need less, and r-ryantm is enough. Upstream projects that are known to merge code generated by LLMs must be marked as such, and we should maintain a list of exceptions for projects s.a. the kernel.
LLM usage is an a lot more important marker about what risks and what complexity a piece of software brings into the closure, than meta.license or meta.insecure ever were, and we should maintain an up-to-date view of exactly how compromised we are.
Keeping Nixpkgs uncontaminated is also how you avoid tanking its value at the time when every other project’s source is becoming increasingly useless as training data.
There is way too much low hanging fruit[1] along the lines of data analysis, visualization, and automation for us to even think about LLMs.
- Internal contributions cannot be completely submitted by an agent, including other types of automated tools, which must be agreed by main maintainers of the community.
To enforce these it could be helpful to hijack AGENTS.md to add guidance surrounding how the LLM should respond to its context window, e.g. see Helium which has used it to enforce their policy of strictly no LLM contributions.
This is why I feel at a minimum a single dimensional numeric spectrum would be good - at least as a user I can set the rough level at which I’m prepared to ‘compromise’ with an increasingly grim reality in order to have a complete system.
(A bit OT, but: I’m increasingly seeing very skilled people with strong senses of ethics expressing a desire to just walk away from the tech world completely, mostly because of the AI madness. This doesn’t bode well for the long term health of the FLOSS ecosystem. It’ll be annoying if the more militant and niche free/libre software ecosystem ends up as the only viable lifeboat!)