Pre-RFC: Add Telemetry Meta Attribute

djacu · February 15, 2025, 9:40am

Hi everyone!

In light of the recent discussion surrounding telemetry in nixpkgs, I thought it best to start drafting a pre-RFC to discuss the possibility of a new meta attribute as a solution and work out the details before submitting it as a PR.

I have left some fields without detail as I think they might not be necessary or we can either come consensus here.

Summary

Collect and maintain a new meta attribute in packages allowing users to easily identify and manage their preference for packages that collect telemetry.

Motivation

Different users have different expectations from a software distribution.
We acknowledge that much with the collection of license information and the existence of the allowUnfree nixpkgs option, much as Debian maintains a separate -nonfree repository.

Similarly, there are a number of different reasons users may have to disfavour packages that collect telemetry:

Privacy Concerns – Users may not want their personal data, usage patterns, or system information shared with a third party.
Security Risks – Sending data to external servers could expose them to potential data breaches or interception.
Lack of Transparency – Many software vendors do not clearly explain what data is collected, how it is used, or whether it is shared with third parties.
Performance Impact – Telemetry collection may introduce additional resource usage, affecting system performance, network bandwidth, or battery life.
Regulatory Compliance – Organizations handling sensitive data may need to restrict telemetry collection to comply with privacy laws or industry regulations (e.g., GDPR, HIPAA).

Detailed design

For packages that collect telemetry, add a new meta attribute, collectsTelemetry.
The value of this attribute would be a boolean.
Packages that do not specify this attribute can be left as-is with the assumption that a missing attribute being the equivalent of collectsTelemetry = false.

Add a mechanism to allow .nixpkgs/config.nix to specify allowTelemetry = true to allow use of these packages in a similar manner to allowUnfree.
An allowTelemetryPredicate parameter would allow the distinction to be customized in the same way that allowUnfreePredicate does.

Add the required logic to pkgs/stdenv/generic/check-meta.nix to check the validity of the meta attribute at evaluation time.

Examples and Interactions

TBD from discussions.

Drawbacks

TBD from discussions. (not sure this is necessary)

Alternatives

Patching packages to disable telemetry is a possible alternative, but this likely will require significantly more effort on the part of the package maintainer compared to adding a meta attribute if it is at all possible.

Prior art

The unfree license meta attribute.

The fromSource meta attribute introduced in RFC 0089.

Unresolved questions

TBD from discussions. (not sure this is necessary)

Future work

Packages that are known to collect telemetry must be updated with the new meta attribute.

anon67371346 · February 15, 2025, 11:56am

I think we need a way to differentiate between “good telemetry” where the package maintainer can just disable it by default and the package remains installable (eg. golang ) and “bad telemetry” where you’d need to effectively patch the software to remove it (eg. devenv ).

Also I think there is a question of how this will work together with the module system. For example, the matrix-synapse module turns telemetry off by default. Is the way then to have the package flagged but whitelisted via the modules setup ?

Echo51 · February 15, 2025, 3:33pm

It might also make sense to differentiate between anonymous and more privacy invasive analytics. I personally would gladly contribute to the system in the original discussion (assuming that is the only data stored), but not sentry (or similar) that grab a lot of identifying and system info and upload it to a server.

But then there needs to be some tiers or lines on what would count as too much, and it will increase the scope and burden on the packagers.

anon67371346 · February 15, 2025, 4:33pm

That means something different for everyone individually and I don’t think we would be having this conversation if we could just agree on whats anonymous and whats invasive.

RossComputerGuy · February 15, 2025, 5:34pm

How does this affect build tools like Flutter which do collect telemetry but is necessary to build many packages?

waffle8946 · February 15, 2025, 5:50pm

Good point, nonSourcePredicate is already a nightmare to manage with e.g. dotnet, see Identifying a package's closure, I’d like to avoid a repeat.

Also this should also consider the problems infrastructure: [RFC 0127] Nixpkgs "problem" infrastructure by piegamesde · Pull Request #127 · NixOS/rfcs · GitHub

djacu · February 15, 2025, 5:58pm

Let me try to understand what you are saying and write some more words so we can differentiate between the two cases.

I think we can partition along the line of packages where telemetry collection is easily configurable, the user has knowledge that telemetry is being sent, and the user has control and discretion to send that telemetry.

Examples of configurable and controllable packages would be where the collection of telemetry:

can be configured through simple build flags
can be configured through environment variables
is configured during installation and can be updated after installation
is prompted before collection

Examples where this is not the case would be:

patches have to be applied to the source code
it is not possible to configure

I am not familiar with how golang collects telemetry. Would you mind sharing some details about how it works? Does it match one of the descriptions I have mentioned above?

I am not so concerned with the first set of examples and had in mind only the second set of examples when drafting this pre-RFC. For example, when a browser crashes and asks if it can send crash telemetry fits into the first category in my mind and does not warrant a flag that blocks evaluation. However, if the browser did that without my consent, without my knowledge, and I could not turn it off, then I would consider that a package that needs the collectsTelemetry attribute (or whatever we call it in the end).

Does this align with what you were thinking?

djacu · February 15, 2025, 6:00pm

I’m not familiar with Flutter’s ecosystem. Can you add some more details about what it does and any pain points an RFC like this might cause?

djacu · February 15, 2025, 6:10pm

@Echo51 I am going to have to agree with @kampka on this. There are too many personal preferences to capture this in a meaningful way that can be implemented. I think it best to find the common ground that we agree on. Do I know telemetry is being sent and can I opt out? If yes, then the package is not of concern. If no, then it gets a meta attribute to warn you during evaluation (edit: and blocks eval). If you are still okay with the “if no” situation, then a simple modification to your configuration should be all that is necessary–similar to allowUnfree.

anon67371346 · February 15, 2025, 6:10pm

Simple, actually. It collets telemetry locally and only sends it once the user runs go telemetry on at least once. Users can go and inspect what’s collected before opting in.

They had btw. the same discussion of whether opt-in would be useful because people would never turn it on enough to be useful. They were wrong. Opt-in telemetry works, if you make it transparent and explain openly what you are doing.

djacu · February 15, 2025, 6:31pm

I have seen in other people’s configuration a locally defined option like allowUnfreeLists where they can define unfree packages they want to allow in multiple locations. This is then merged and passed into allowUnfreePredicate at the end because (unless it has changed) it does not have merge capability. Would something like this solve or partially alleviate some of the pain?

Good call out. How were you imagining this would be implemented. Appears that they have an advisory kind which might fit.

waffle8946 · February 15, 2025, 8:07pm

That’s unrelated.

Every single dotnet package has sourceProvenance set to [binaryBytecode binaryNativeCode] due to fetchNupkg: conservatively set sourceProvenance · NixOS/nixpkgs@be577a2 · GitHub, so I had to whitelist several hundreds of packages (or revert that commit - I ended up reverting that commit).

I’m concerned the same might happen here with flutter, which means (if there’s some option to disallow building the config with telemetry) then this issue would come up again, since whitelisting a closure isn’t really feasible, or if it is, no one answered my topic about it.

What I’m getting at is, if not designed properly, we might eagerly label flutter packages as “sending telemetry” even if they don’t. AIUI flutter only sends telemetry at buildtime, but once the package is built, there’s no inherent telemetry being sent - so the telemetry is only relevant if building locally and not substituting.

RossComputerGuy · February 15, 2025, 8:13pm

All the details on its telemetry is available here: Flutter crash reporting | Flutter

anon67371346 · February 15, 2025, 8:32pm

Flutter and Dart can disable telemetry via CLI flags. I’d expect the package maintainer to make use of this by default and thus prevent the package from being flagged.

7c6f434c · February 15, 2025, 8:50pm

Can we get clear on definitions here: does devenv send anything to the server when doing any non-server-based operation, or is it about how badly documented is the scope of data used for a server-side operation with nothing useful done locally?

Because these two things are definitely two very different kinds of issues. Without separating them, it’s easy to write a definition that includes git push, or Nix default binary cache, under telemetry, without making it clear in what sense.

anon67371346 · February 15, 2025, 9:08pm

devenv is a tricky candidate, because you cannot differentiate between the telemetry data and the usage data for the remote code generation.

In this case, I would go with this: the author declares that the software respects the Console Do Not Track standard, which states

This is a proposal for a single, standard environment variable that plainly and unambiguously expresses LACK OF CONSENT by a user of that software […]
We just want local software, and by providing it to us you are not entitled to our usage, our crashes, or our IP addresses.

Given that, I would expect that if the generate function cannot be run without sending remote data, is should return an error stating as such.

waffle8946 · February 15, 2025, 9:26pm

In that case, we might need an “unknown” option. (We might need that in any case if the current version of the code hasn’t been checked.)

RossComputerGuy · February 16, 2025, 2:00am

I’m not sure if the wrapper would even be a good place for it since I don’t think flutter likes having that in specific arguments. The other way is to override the user configuration.

anon67371346 · February 16, 2025, 10:06am

I’m not at all familiar with flutter or dart, the flag just came up first when searching for it.
IMO, “implementation details” are the prerogative of the maintainer. They usually know a good deal about the intricacies of the software and should have leeway to make it compliant as they see fit, or flag it if they can’t / don’t want to do that.

waffle8946 · February 16, 2025, 10:13am

That’s the point, “just flagging it” makes it fundamentally unfeasible to only whitelist certain software, if that package that’s flagged propagates that field to hundreds of packages, without a way to account for them.

Again, I point you to the case of nonSourcePredicate which was not thought through (and I assume very few must actually use it, then). It makes the option borderline useless? sourceProvenance becomes “ah that’s nice to know, but we either have to let all non-source package in or none at all, because in-between is a mess”.