How should we handle software created with LLMs?

This topic came up on the Fediverse:

Is there an AI slop problem in NixOS?

The situation is that more and more developers use LLMs to generate code and at the same time, there is more and more backlash from users towards GenAI, especially in gaming.

A recent example is the library chardet, which was rewritten with the help of LLMs and the license changed. The original author disapproved of the license change and said it’s not legal.

Someone working at Nvidia said, they are not comfortable to use the new version for legal reasons. When we handle this topic reasonable, it could become an advantage for NixOS to be used at companies.

The current maintainer said:

If you want pre-any-LLM assistance chardet, you’d need to go back to 5.2.0

We are still at 5.2.0, but this topic becomes relevant when someone updates the package.

Another example is Fluxer.

I think LLM use can be seen as a red flag for quality, security and licensing/legal reasons. So it would be good that users know which software are created with LLMs and have the choice to disallow them.

We could add metadata to the packages and options similar to allowUnfree and allowUnfreePredicate, but this adds more work for maintainers and reviewers. It is also not always clear if developers use LLMs.

What do you think? Would you use such options?

Do you care if software was created with LLMs?
  • I don’t care and trust the maintainers.
  • I want to know and selectively allow them.
  • I don’t want to use any software created with LLMs!
  • I don’t know.
0 voters
14 Likes

It’s quite reasonable to hide it behind the flag, and in the current situation very unreasonable to not have it at all, that only would cut off some potentially valueable apps.

Of course fully automated slop that is generated without human overlook would be unwise to even consider to include, but should it be maintainer job to judge the code? I don’t know, you need to be kinda expert in both language and LLMs to spot it.

So good balance in my eyes will be to just have the flag for repos that clearly states that majority of the code is written by agent

5 Likes

I want to at least know, but I’ve also created new stuff with the help of LLM’s both in my field of expertise and in programming areas I’m not as familiar. In my fields, it’s been pretty great. Some odd stuff sometimes, but has done a better job at security than most of the public code I’ve seen written by people.

It more often just gets in a loop and doesn’t work at all than produce something bad, for me.

I clearly can’t speak as much to fields I’m not a 20 year veteran in, but when I’ve followed through the code and learned more about what it’s doing then referenced documentation, it still seems pretty good.

If people know they can make their own decisions.

7 Likes

There’s basically no real visibility without representation and polls like these just create self-amplifying loops of narrow-interest brigades who believe they are majority activists until an actual vote is held. Sorry for not being diplomatic. Just in a hurry.

8 Likes

But how to define such metadata? It is a spectrum from a developer that takes some help from a chat to completely vibecoded. And there is no proof, only a smell, unless the developer is upfront about it.

It definitely is useful information, but is including it in the packaging metadata the best option? Or are there other databases that can keep track better?

19 Likes

The poll is not intended to vote on a decision or be representative. I thought it would be the easiest way to give feedback.

4 Likes

I don’t really care if something was done by a model or a human on, let’s call it, moral grounds. I care that this correlates with people not reading the code, higher potential for bugs/exploits and potential licensing issues. So yeah, this probably should have a flag like unfree.

10 Likes

This is not very objective and I immediately have those things on my mind:

  • Does asking ChatGPT while developing count as LLM usage? How do we know if that is not disclosed?
  • People can use LLM locally and also never disclose it.
  • What if some contributor used initially Claude in their PR, but it got then refactored to the point that no initial code is left and then it got merged with full history? Does than my project get forever tainted?
  • Does that only apply if commits are co authored by Claude? Would that insetive people to lie and remove that?
  • Does generating semi-random test data count if it never touched actually code?

and my final though is: Who is going to judge that? This is not very objective like most of the other meta attributes and makes an easy source for discussing, conflicts, nitpicking and discussions.

42 Likes

There has always been bad quality code on github and there have always been licencing issues. Shall we also add a stackoverflow metadata attribute for any project that contains copy-pasted code they do not understand?

8 Likes

“There’s dust on my floor already. Might as well smear shit on the walls.”

9 Likes

Unfortunately, I think a lot of core software is now LLM assisted, so I don’t think such a flag would be useful if you still want to obtain a functioning NixOS system (or a package with any dependencies).

10 Likes

People want to run software.

NixOS is useful to people mainly due to the degree in which it lets them run software.

Anything we do to limit the ability of NixOS to run that software is counterproductive–and bluntly, there is nothing magically more secure or safer by mere dint of having been written by human hands.

Enough moralizing, keep nix boring.

18 Likes

Magically? No. There’s nothing magical that we have and models don’t. But if you’re pretending using LLMs to write code does not correlate with a certain level of disregard to the quality of the resulting software, then you are either lying to yourself or us. And apart from that, while I think the only really special thing humans have over LLMs is way richer input modalities, the models also don’t have any good form of long term memory and learning (INB4: note that I said “good”, I doubt anything you’ll try to counter this with qualifies), so they will keep making the same mistakes over and over when the context expires. We, at least, have the theoretical capacity to learn from our mistakes (even if we often don’t).

And yes, I have only anecdata on me, not hard publication receipts, but I can tell what effect this had on someone who never hated the models (Stanford’s Junior powersliding in a reverse sweep into a parking spot will never not be cool), but always hated the then-baseless hype, thought SWE agents started being useful at all maybe last June or something and only started using it a lot with Opus 4.6, because I might as well get some side projects done before it eats my job as well.

And that effect is, that I do care less. This is a complex effect, because I’m broken in a complex way — another facet, is that I actually stopped feeling like an useless piece of shit wanting to die, because I could overpower my executive dysfunction leveraging Claudius Opposum 4.6 — but I can at the very least tell that.

I have still never released any code I haven’t thoroughly turned upside down to understand it and structure it the way I like. Hopefully never will, would make current me be very disappointed in the future me. But my personal projects basically run themselves (don’t judge, using this as an executive dysfunction therapeutical crutch) and I don’t care that much about the code - between it being written in Rust, built with Nix, heavily tested (including NixOS VM tests enhanced with “computer use" for things like “if I add a bookmark in Firefox on one computer, will it sync to another computer going through ma Firefox Account re-implementation?”) and, like, just working as advertised, such “slopengineering” is good enough for throwaway/personal stuff. If lives depended on it, I would review thoroughly, but they don’t (and in a perverse way, one may be contingent on not having to look at it, actually).

And honestly, I think most people are less autistic than me. They’re just gonna slop it out, publish and not care. It’s not about moralising, it’s about not painting a bigger supply chain attack target on us than necessary, and a default off config option sounds reasonable here.

Now, it’s a good question on how to reliably check whether software is (unacceptably) slopped, but that’s orthogonal to whether it’s desirable and I would argue it is. For me the point of models is I can slop something for myself if I need to and can’t find it in me to do this, not perpetuate the Dead Internet Theory even further.

7 Likes

I want to know but I don’t want to have to allow them.

allowUnfree and permittedInsecurePackages are some of my least favorite nixpkgs features.

9 Likes

But if you’re pretending using LLMs to write code does not correlate with a certain level of disregard to the quality of the resulting software, then you are either lying to yourself or us.

That’s a large assertion to make without any evidence to back it up, even before you get to the accusing me of lying part. There are a lot of people who care about the quality of the resulting software who use LLMs specifically because it helps them write better software–and as these tools have dropped the costs for trying to fix things the quality of software is likely to increase.

Consider: would you prefer a pile of untouched bug reports because the humans are all swamped or burned out, or would you prefer somebody attempting to fix those bug reports (and likely failing! Even a success rate of 10% is a net win if we get 11x more attempts!)?

They’re just gonna slop it out, publish and not care. It’s not about moralising, it’s about not painting a bigger supply chain attack target on us than necessary, and a default off config option sounds reasonable here.

That supply chain attack target already existed, and it’s trivial to obfuscate the use of these tools anyways. It’s just adding a friction point that’ll be used to further fracture and divide and promote unfriendly behavior (for example, your calling of me a liar just now).

Even assuming that adding such a tag is a good idea (which, again, I think is an incorrect assumption but let’s walk the garden path…), we still have this unanswered problem:

“How much LLM content earns the tag?”

Like, on one extreme, you have something like Gastown that is all vibes and just totally bonkers. On the other extreme, you have projects that’s won’t accept any computer assistance (except for, of course, linkers, and compilers, and SAT solvers, and profilers, and…).

How many bugfixes with Claude as a co-author make a project get the slop tag? If Firefox merges one do we now flag Firefox with that? How about the Linux kernel? For how many years, given industry trends, is this even going to be a relevant question?

6 Likes

I think this is the crucial point from a policy perspective:

There’s no way to make a good objective measure, whether you want to or not. It really is no different from overly heavy stackoverflow use in the past.

I do sympathize, and actually agree that generally heavy LLM use is a smell and leads to long-term stagnation - I see it second-hand daily, at work and in the wild. But it’s not really different from any other poor practice (besides copyright implications, but who actually believes the de-facto authorative US courts will rule against nvidia in 2026?).

Bad code is bad code, and nixpkgs currently has no quality assessment criteria for packages. It’d be weird to selectively introduce one just for this; there are honestly more egregious flaws that are totally permissible.

On specifically chardet...

The only difference here is the much clearer copyright risk, but as long as the upstream stays alive I don’t think it should be treated any differently than grey-area emulators and such.

As much as I hate what the chardet maintainer is doing, ultimately it’d be wrong to treat them differently simply because I dislike their particular moral interpretation of copyright law. We could argue that it goes against the spirit of FOSS, but that’d still just be selective application of policy, and sets a bad precedent. Besides, the nix ecosystem is mostly MIT licensed, so we’re not exactly a FOSS community.

Yay to emulators means yay to chardet, at least until they’re hit by a DMCA claim.

13 Likes

Guess why I was making disclaimers about anecdata? Because I am aware that I do not know for sure if that’s true. Still, I’d rate it as having a high enough probability of being true, that I’d rather have a default-off switch for this than not. I’m gonna readily admit thinking it’s likely true paints me as very cynical about other carbon-based lifeforms, but it is what it is.

For the rest of the argument for that quote — I do not disagree. I’m actually even aiming to write a STEP B grant proposal to just that effect (DMs welcome if somebody’s interested xD). I just don’t think that’s most of developers. Just some. And that’s not enough.

Yes, which — as you may have noticed — I have explicitly called out in my post. Figuring out a good rubric for this will be hard, but that does not mean we should just punt on it. Even having a “this is a naughty list you get on when you monumentally fuck up with LLM usage in your project” approach will be better than nothing.

Yes, and we have allowInsecurePredicate already, do we? Actually, now that I think about it, I’d be fine with just lumping projects that had shown being cavalier about potential security/supply chain issues LLMs can bring (exact bar to clear to be bikeshed) into allowInsecurePredicate and they don’t get to leave until they have demonstrated a meaningful security posture improvement. I assume some people would prefer to be able to differentiate those two groups, however.

Ok, fair. This was rhetorical, not judgement of character, but sorry about that. I do still think a blanket assertion that LLM usage by a project changes nothing is not true.

Then we can, like, just retire this option then? I’d be more than happy to have the LLM do things for me in an auditable and trustworthy enough manner I don’t need such an option and literally don’t have to work anymore. And hopefully some rich asshole won’t eat the UBI payout I would get from this P:

That’s a good point. But I’d rather have those options and maybe think about having some ~/.config/nixpkgs/default.nix where you can set this globally once? It certainly had vexed me enough times when trying to nix shell something unfree that I would also appreciate that.

Right, but EU courts are likely to, especially given how the us of a is being right now. We’re still an European project as far as I know, so I would assume we will need to care.

3 Likes

Won’t generally work, impure. All the same problems as with unfree/insecure packages.

I think developers going rogue like this is more than just a legal and ethical risk. I’d would feel more comfortable sticking to earlier versions or removing the package, but I can’t say I’ve looked into the impact given that it’s a quite common library. But in the end I’d leave that decision to the nixpkgs python maintainers.

What do you mean? MIT licensed code is FOSS.

Right but you have to trade some purity for some convenience, unfortunately. I think for end-user machine usage that should be good enough of a solution. When you import nixpkgs in your *.nix files it’s not that a big deal to write it once at the import point. Very annoying with CLI use. YMMV but strategically trading local devshell purity for convenience sounds like a good enough deal to me — git hooks and CI are gonna catch it where it matters anyway.

1 Like

Realistically, it’s not going to be possible to (a) know exactly what upstream has been doing, nor (b) represent that information perfectly. But I do think we need to do something. My intuition would be to aim for one single-dimensional spectrum, with various levels defined somehow. LLMs used only to generate test cases would worry me a lot less than fully LLM-generated PRs merged with minimal human review, for example.

(I mean, ideally I’d like to avoid all of it, on both moral and security grounds, but I’m not naive enough to imagine that’s going to be possible, sadly.)

Personally for the moment I’d like to see it in the metadata, so it could be warned about if over a user-configured threshold. Initially though perhaps we need to experiment in a way that doesn’t impact nixpkgs - is it possible for overlays to add metadata? If so, perhaps such an overlay could be auto-generated from a simple database? (CSV file even? json snippets in a git repo?) Maybe we could tie up with other projects trying to track this situation.

2 Likes