How should we handle software created with LLMs?

The vibe-coding craze rather uniquely simultaneously combines a whole bundle of moral, ethical, quality and legal issues in one, though. That feels like a good enough reason to make an exception for it.

1 Like

Maintainers or contributors of packages check, like they check the license.

Since this work is done by volunteers, all that is best effort. So it can get left empty if unsure. But for packages where we know, we can add that information.

I hope nixos users who rant on social media about LLM use in software also make PRs to add the information.

That’s why i created this discussion instead of letting it happen in the chardet update PR. We should discuss it with the whole community and create a clear policy (if needed).

I think there is a fundamental difference between code written by humans and code written by LLMs. LLMs make mistakes no human would and for that reason these are also hard to notice in reviews.

@ledettwy do you have any evidence for that? I feel like most veteran free software developers that maintain those projects don’t use LLMs.

@crertel so we should remove the unfree and insecure flag?

NixOS is about having control over your system. A flag for LLM use would enable the user to be intentional about what software they use. Like the mandatory AI disclosure on Steam.

I have not suggested to forbid LLM created software by default by the way. Users who care can set it to false. Everyone else is not even affected by the change.

This is not about moral.

@truh if you don’t act on the information, why bother at all? if you want to be intentional about something, i think some friction helps

If i understand software correctly, a single character can create a security issue. And since LLMs choose characters (or tokens) by probability (weights) and randomness (temperature), it has a higher chance to be the one which makes your whole system vulnerable compared to when a human did choose a character intentionally.

so: any

sure, we have to. policies apply to all packages

This is a very relevant question here. If 99,9% of software has LLM generated code, we can’t have a working system without them.

Yes, we could not have one binary flag, but categories, like

  • No LLM use
  • Responsible LLM use (following best practices)
  • Unresponsible LLM use
  • Fully vibe-coded (no human has looked at the code)

Technically, i think we can implement that easily in nixpkgs like the different licenses that you can also selectively allow.

But we would have to come up with useful categories that are clearly defined. I think the optimal solution will become clearer with this discussion going on, especially in the broader free software and linux user community.

That’s a great idea.

I think the next steps to make progress here can be:

  • collect evidence about LLM use in FOSS projects (especially those we have packaged)
  • try to sort them in multiple clearly separated categories
  • test implementation in nixpkgs

Here is a pad that everyone can edit: LLM use in FOSS projects - HedgeDoc

I think we also need more clarity what exactly is the issue we try do address to find the best solution. Especially when LLM use becomes the norm and avoiding it not an option.

5 Likes

It’s just speculation on the basis that any open source project that doesn’t actively enforce no-LLM generated code policies are going to have AI-generated contributions if it’s popular enough. I think it’s fair speculation. We know for sure that some people are using it for kernel development.

1 Like

Might be worth collecting links to other similar attempts as well:

4 Likes

I think maintaining such a list is a good idea and we could also write a linter, which could be used by those that do not want to run such software. This could be external to nixpkgs.

1 Like

Comparing government backdoors and racism to LLM slop is an insane take. Take a moment to reflect on your trolling, then stop. No one is interested.

It’s a shame we don’t have moderators anymore to stop such bigoted nonsense.

10 Likes

All of those are groups of people, that no matter the purpose/motive of the software they created (not talking about countries of course), have an actual incentive and (oftentimes) responsibility to keep the software working good and produce good quality code, at the very least they actually know and fully understand the code they wrote, which can’t be said the same about AI. The morals of the code and the quality of the code are two completely different discussions. The problem with LLMs is that, even putting all morals aside, the code it produces is just frankly not good. It will absolutely produce mistakes, and if there isn’t anyone around to catch those mistakes, things will break. Having completely vibe coded software that no human has reviewed is just a security nightmare or outage waiting to happen. Just look at the recent AWS outage due to vibe coded slop making it to production.

I think that the more metadata Nixpkgs can have about different packages the better. We already have sourceProvenance, licenses to name a few, why not add e.g. authorshipProvenance? It could be a list of the sources of the code, e.g. human, LLM, or both. The filtering stuff would be nice to have as well, but purely from a metadata perspective, to gain visibility into how AI is affecting open source is a huge benefit in itself, in my opinion.

2 Likes

I think you misunderstand my point. The fact that you dismiss it as bigoted nonsense is what I’m getting at.

If the problem is low-quality code, we should tag for that. If the problem is license issues, we should tag for that. Those problems are not limited to LLMs (or indeed, even indicative of LLMs).

If the problem is…it came from a source we don’t like or that we’d like to dismiss as low-quality or we consider gauche or are economically threatened by, that starts to look a bit less defensible. And if the problem is that it represents a supply-chain risk, that’s a risk that already exists.

I picked somewhat inflammatory examples because they’re obviously something that we’d go “Wait, that’s bigoted nonsense if we saw it going on”.

2 Likes

We already have people writing code making mistakes.

We already have security nightmares and outages waiting to happen.

These are not new (or even rare) problems.

2 Likes

Yes but those are mistakes people make. Difference between LLMs and people is that people learn from their mistakes. I like to compare LLMs to gambling but with code. You are essentially pulling a slot machine (slop machine) and see what you win. Whereas in reality the odds are completely against you, and the only party that actually wins is the casino (LLM provider) since you pay them money. Anyway tangent aside, you are correct that people create and introduce security vulnerabilities. But with LLMs the security vulnerabilities and bugs are much more likely because the LLM has no idea what it’s doing outside of it’s context window. It has no idea about the implications of the code, and as soon as the context is gone, it doesn’t even remember it wrote it. You can ask a person about why they wrote that code, why something is done a certain way, or the flaws of the code. You can’t get an honest answer from an LLM to those questions because it doesn’t think. It can try to guess, based on patterns in the code and what it was trained on, but it has zero knowledge and, more importantly, zero responsibility of that code.

In other words, who owns the code the LLM generated? Not from a copyright perspective (that is a whole other can of worms) but from an engineering perspective. If a problem is found in that code, who do you talk to? The LLM? Good luck getting a response unless you somehow manage to dig up the conversation that was happening when it was produced. The person that committed it? Well here is the problem, did they review the code? If not then that code might as well be written by someone that has just vanished from existence, and that is just a liability in and of itself.

9 Likes

Tell me you’re an american without telling me you’re an american xDDD In $CURRENT_YEAR literally any of those apart from Indians are well deserved.

It might surprise you, but often people who have strong stance against use of LLMs on moral grounds, would also probably say that supporting genocidal nations and their violence apparatuses is bad, actually. So you know, either someone already agrees with your laissez-faire attitude or will think you’re disingenuously trolling by putting an ethnic group alongside obviously evil entities.

I don’t really care either way about it — I only really want to reduce the blast radius — but it feels like you’re starting your 2026 election baiting campaign kind of early this year. And not going to be sorry for that one, because using those as examples is such an obvious bait. You could’ve used any other nationalities for the examples instead and made it stronger for that. And yet you chose those that are actively killing people for the lulz (with the exception of India, I think?), curious.

We already have people writing code making mistakes.

People can’t write 10 fully-functional projects in under a month. Well-driven Opus 4.6 can. And this is when I cared about the output being good enough. Imagine how many more you could slop out if you didn’t. This is not a problem who can or can not make mistakes. It’s a problem of scale — both in terms of the volume of code in need of auditing being produced and how easy is it now to automat writing supply chain exploits at scale with LLMs. Until there is no good way to provide assurance for software written chiefly by LLMs, then we should to care not to widen the attack surface.

9 Likes

After some checking, it becomes clear that you already can’t have a Linux system without some LLM generated code since Linux itself and also systemd has it. Also Nix has some LLM code.

I shared that conclusion also on the Fediverse: davidak: "After some research today it becomes clear that y…" - chaos.social

From a very quick check, FreeBSD, NetBSD and OpenBSD has no git commits co-authored by Claude. So you don’t have to stop using computers if you can’t accept AI.

So, which LLM generated code is a real problem and how do we handle that specifically? Maybe don’t package problematic software. And accept everything else and hope the software maintainers are responsible.

It can be checked if maintainers follow best practices when using LLMs or accepting contributions, like security best practices can be checked, but that is not something NixOS has to be involved in.

2 Likes

I don’t really know if I agree with crertel or not, but I do not think that it is accurate to say that crertel is trolling.

That’s not accurate. I am interested in what crertel has to say.

I think that part of what you wrote here is also inaccurate. Specifically, I do not think that what crertel wrote is bigoted nonsense.

3 Likes

More info is good. But AI is a tricky thing to filter by. Before even deciding how to categorize AI use in projects we first need to categorize AI use in general.

  • Autocomplete (i.e. supermaven)
  • Conversational (i.e. copilot, Amazon Q)
  • Vibes (Cursor, Kiro)

You could call vibes agentic but that’s not strictly true. Really what we mean is no-code AI workflows where the author does not and never has understood the code.

After that I think the most accurate thing would be to have

  • confirmed used
  • confirmed not used
  • unknown

For each category for a particular project. For the purposes of putting it on a linear scale, we could assign the following numbers:

  1. Completely unknown
  2. No AI at all
  3. Possible autocomplete
  4. Confirmed autocomplete
  5. Possible conversational
  6. Confirmed conversational
  7. Suspected vibe coded
  8. Confirmed vibe coded

This conveniently fits in 3 bits which means nothing but pleases my C sense of aesthetics. In addition if a project has an AI policy we could attach that.

Basically everything would be a 4 or 5 including the Linux Kernel. Most everything which has AI bans is going to be a 2 or 3. 1 would pretty much just be solo projects by luddites like myself.

It’s also worth noting that metadata isn’t just for blocking software but also for documenting it in package search tools.

Nix is a programmable system. If I have structured data I can do with it whatever I want.

I think blocking evaluation for packages that were developed with LLMs would be a huge pain. It’s a lot worse than for unfree packages since unfree packages are usually leaf packages in a dependency tree, it’s pretty uncommon that FOSS packages depend on unfree packages.

2 Likes

Also, don’t forget that package can be vibecoded and the whole process of packaging can be vibecoded too and we need to separate it for someone who finds this as a stuff needed to be known

I think given the limited capacity of nix volunteers to implement metadata infrastructure (including some already-approved and uncontroversial cases, like categories), I think we should have a strong bias towards inaction and scope minimalism. If we don’t need to do something about this, we shouldn’t.

To the extent that this is driven by concern about code quality, therefore, we should only act once we have clear examples of this being a problem for us: for me, this means multiple examples of packages being broken in a way that seems related to them being LLM-written or LLM-assisted, and which our existing QA process is having a hard time dealing with.

To the extent that this is a morally-driven concern, it’s a bit more difficult to propose a criterion for action, because you don’t often get external signals that you’re making a mistake. In that arena I think discussing it here is reasonable, but I don’t want a discussion that pretends to be about the pragmatics when it’s really about the principle.

8 Likes

allowSlop = mkEnableOption "Allow LLM-contaminated software"

2 Likes

The same way you handle spam in email. By marking it as such.

2 Likes