(26) The principles of data protection should apply to any information concerning an identified or identifiable natural person. […] The principles of data protection should therefore not apply to anonymous information, namely information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable. This Regulation does not therefore concern the processing of such anonymous information, including for statistical or research purposes.
The upstream link to the legislation is down ATM so please make do with an archive link.
Anonymous data is totally legal to collect without prior consent under GDPR. IP addresses or device IDs can be considered personal data and need sufficient pseudonymisation to be outside the protection requirements.
In this particular case there doesn’t seem to be that. IP addresses could be collected but there’s no evidence.
I think people would feel more comfortable if there was some question like “This AI query will send such-and-such data, do you agree to this? If not, set this envvar.” on the first run. And for unattended situations there could be an explicit e.g. --send-ai-telemetry flag to explicitly opt in.
Yes, but the prompts and generated devenv configs are being collected. This might include personal data without any pseudonymisation applied. This is not just telemetry but rather a dump of user data.
@fpletz and @j-k while I think the discussion on whether devenv is GDPR compliant is valid, I don’t think it will lead to much in this space. Same as with the question of copywrite of the uploaded data.
I would prefer to keep focus on the “moral” side of the issue, not the legal side, I think it’s more actionable.
Here’s my unsolicited IMHO: Honestly, I think we should start formulating policies for nixpkgs. Regardless of this issue. Instead of derailing this thread into dorm-room philosophizing about morals, ethics and law we could discuss if nixpkgs wants e.g. ship all packages in as default state as possible? Or actively disable some features? If the former: Should certain packages be hidden behind the attribute? Should pull requests made by package author reviewed by someone else? and so on.
Honelstly it feels like even a 5-sentence global policy document will save everyone of us hours of time and some nerve cells down the line, regardless of what exact decisions will stand within.
I agree with that. One goal I have with this discussion is to test the waters and find out which way the community is leaning.
Next step would then probably require formulating an RFC for that.
I didn’t want to shut down the discussion entirely, and could have formulated myself not so harshly :)
Not sure if this should go here, in a separate thread or directly into RFC pull but here are my two cents:
Regarding how to proceed with this is not an easy question, writing well rounded contribution guidelines is not an easy feat, here’s Fedora’s for example. Nixpkgs right now only has “how to write nix code” section scattered between CONTRIBUTING.md and pkgs/README.md, which have almost nothing to do with actual packaging (i.e. how to deal with optional dependencies, which initial config to ship, when it’s acceptable to patch, etc). The easiest will be just getting heavy inspiration from something like this, doesn’t matter which distro.
Second, we need enforcement. As far as I understand pull requests are already required to be reviewed before merge anyway. And reviewing your own pull request is a bit insane tbh, kinda defies the purpose of code review (or is it just me?). What I’m trying to say, imho there are some positive changes that could be made already that shouldn’t require a full RFC process.
I can at least speak to the PR that disabled telemetry in Mattermost (I share no affiliation with them other than packaging and hosting it): there were a couple telemetry options; one that enables notifications for security updates and one that sends diagnostic data to Mattermost.
The former seemed reasonable to default to true but be controllable, and the latter was causing 10-15 second stalls in NixOS tests while the DNS resolution timed out. It also seemed counterintuitive to a free software project. Looking back on it, this is consistent with the values:
Free software is our priority, but we also support our users’ needs to use non-free software, when practical.
The Values link the GNU Project’s definition of free software which doesn’t specifically go into details on telemetry, but they have some other thoughts on the topic. We’re not really talking about proprietary software in this thread, but telemetry does put the developers in a position of power over the users, so seems somewhat inconsistent with free software.
My take is to disable it by default but give users the option. We want to try to prioritize free software, but people can re-enable it if they’d like.
Yes, I think self-merges with no reviews are a bit obnoxious. Unfortunately most committers don’t seem to think so, so even this change would probably require an RFC.
EDIT: or if not an RFC, at least the group in charge of granting committer access should somehow be convinced that self-merges hurt the project, and mete corresponding consequences when such guidelines are violated.
The reason why I think an RFC would make sense is because this specific topic is quite controversial. Having it rubber stamped by a body of authority would make enforcement easier and hopefully prevent endless debates on every account.
Personally I feel the default should be to match upstream, unless there would be a policy to do otherwise (which there isn’t in this case, so the revert seems warranted). I find having projects maintained by its creator is a good thing.
Tagging packages for those kinds of (anti-)features à la F-Froid as mentioned in the merge request would be a nice way to make users more aware, as would system-wide opt-in/opt-out flags.
Personally I feel relatively consistent behavior across packages would be ideal, so that I don’t need to hound everything for potential anti-features by hand.
That said, I expect that with the ridiculous number of packages available in nixpkgs, and the relatively lax QA, it’d be quite unlikely that we’ll ever manage to track every package that does telemetry and disable that by default. Introducing system-wide options, or having an explicit policy, might lull users into a false sense of security.
It’d probably be difficult to provide things like this in a useful way without also adopting a “core” package policy or suchlike.
Personally, I would like to see a nixpkgs.config.allowUserDataUploadingPredicate as something that includes but isn’t limited to telemetry.
I’m not a devenv user, so I haven’t been paying attention to its development and didn’t realize that uploading user data to an AI service had become part of its functionality. Knowing that changes me from ‘somewhat curious potential future devenv user’ to ‘never touching or recommending devenv’, with or without an honor-system DO_NOT_TRACK. I would like Nixpkgs to support users who can’t allow their data to be leaked in this way, and prevent us from running nix-shell -p devenv one day if we get curious enough (unless we explicitly decide to compromise on a per-package basis, just like we can with allowUnfree).
(Obviously metadata like this can’t be expected to be perfect, and when trade secrets on the line you need more than just this to prevent exfiltration. But every layer helps.)
I agree, so it’s not like I would be against a policy to try to disable telemetry by default, but matching what upstream does is at least consistent in its own way, and it’s what you’ll likely get for most packages unless there’s a conscious effort to address it.
I brought it up also because I don’t think having the creator of a project (commercial or not) be the maintainer of the package in nixpkgs needs to be seen as a conflict of interest as suggested in the orignal post:
The simple fact of the matter is that we’re going to eventually run into packages that can’t easily be bundled in such a way as to remove telemetry, and we’re doing a disservice to our community by saying “no, no, nixpkgs knows best, we won’t carry this for you.”
Having a simple way of marking a package as “hey, this might send telemetry” (which is something that even a newbie contributor could flag via PR without having to know enough to patch the software, assuming it can even be patched!) and then letting users opt-in would be perfectly cromulent.
The predicate should probably warn the user but not block the evaluation (like allowUnfree does). User can then elect to disable the telemetry(using package settings where available) and/or suppress the warning by setting the predicate.
Forcing ripping out the telemetry logic for every package seems like a pretty big burden to put on a package maintainer. Users are free to submit/use patches to do this, but this should not be mandatory.
CREATE TABLE `runs`(
`id` INTEGER NOT NULL PRIMARY KEY,
`source` TEXT NOT NULL,
`duration_sec` INTEGER NOT NULL,
`finished_at` TIMESTAMP NOT NULL,
`devenv_nix` TEXT NOT NULL,
`devenv_yaml` TEXT NOT NULL
);
What is actually stored in this table? If you are collecting plaintext code snippets (prompts or generated code) without explicit consent, that could be a huge liability both for you and your users. Even if it’s just configuration boilerplate.
Or do you believe we have some ill intentions, and which?
Not op, but I assume good faith. I think telemetry is a useful tool that can be implemented in privacy-preserving ways.
This said, you should specify the extent of what data is collected, and how it is (hopefully not!) linked to PII (IPs, uuids), and how you use it in your product: is it only for this one GenAI pipeline? Do you train other models on it? Any analytics? Do you share it with third parties? You get the gist.
On a personal note; telemetry should always be opt-in, even more so in FOSS projects.
As a user, I’d like to have a governance model for this in nixpkgs (like we have for allowUnfree).
Could a idea be that it might be not system wide but by company. What I mean if the user wants to send data up for the OS like kde & gnome for crashes, or how kde has the telem slider. The user my trust them but not a individual application or vise versa.
Would it be possible to have a setting that allows the distro but not other applications.
The other thing is would most users check each individual git to see commit changes? At least to me that’s like expecting users to read a mailing lists.
I’m quite the outsider so please ignore if this makes no sense.
If this is added, it probably should block evaluation. Once the software in question is deployed and running it’s already too late to opt-out, data will have been collected, and it’s normally incredibly hard to reverse that.
As in the case in point, many organizations are simply not aware enough or don’t have the necessary capacity in place to reverse accidents, even if we assume good faith. And sure, I could file a complaint with the EDPS and likely get devenv into all kinds of trouble, but that’s a lot of effort compared to accidentally forgetting to turn off the opt-out (ask me how I know) and that agency is swamped so it’d take months to years to resolve. Non-EU citizens largely don’t even have this option.
If we think that protecting users against anti-features is something nixpkgs should do, doing so with an opt-out that warns you after you’ve been exposed just seems silly. And for what it’s worth, the EU seems to agree; I actually think the GDPR is an excellent piece of legislation to base your data privacy policy on regardless of whether you serve EU citizens, and it explicitly demands opt-in.
As a point of clarity, allowUnfree = false does block evaluation. At the same time, evaluation does not deploy software; that would be realization. However, I don’t know if we have any mechanisms or if it is even possible to have a flag that blocks realization.