NAR unpacking vulnerability -- post mortem

Speaking from the perspective of being largely a NixOS user more than anything else, I am happy to see process improvements regarding security problems, even if it doesn’t really seem this particular incident had any major lapses from the Nix side. Having better processes for handling security issues is a win-win any way you look at it. I think it’s unfortunate that drama is basically inevitable any time anything notable happens in the Nix world, but that’s just how things are for now. People will have their motivations and justifications for acting a certain way, and I don’t think there’s a whole lot to be gained by debating it, personally. I think we all just have to live with it for now, and probably try not to dwell on it every time it happens.

That said, potential future Nix security issues are definitely a major concern. Naturally, any serious security issue like this raises questions about whether the defenses employed by Nix against potential security issues are “deep” enough. I hope that more thought can be put into not just the security disclosure processes, but also for how to prevent a serious bug like this in the future. I can only imagine there are plenty of different approaches for how one might go about that, but I don’t really have the capacity to be invested in this right now.

On the other hand. this does also make me wonder what the best practices are for NixOS hardening, beyond what is typical for Linux hardening. I don’t believe I am currently using the allowed-users option, and I’d like for there to be more documentation w.r.t. lowering the surface area of a NixOS machine. (I suppose the Nix daemon itself is probably one of the main things that will differentiate NixOS from typical Linux setups with regards to surface area, though, so maybe there isn’t really that much to be said.)

3 Likes

From what I can tell, a 7 day (or 9 day) disclosure window is extremely short within industry.

It’s also concerning to me that the reporter had to be advised to not publicly disclose the vulnerability immediately, who is this possibly helping?

7 Likes

When we observe a previously unknown and unpatched vulnerability in software under active exploitation (a “0day”), we believe that more urgent action—within 7 days—is appropriate. The reason for this special designation is that each day an actively exploited vulnerability remains undisclosed to the public and unpatched, more devices or accounts will be compromised.

Pretty much speaks for itself. If it was being actively exploited, the short timeline makes more sense. Local privesc seems to be more of a threat than a malicious binary cache, FWIW, and it’s also hard to prove active exploitation based on the behavior of local users (or those SSHed into your machine). So, if it wasn’t being actively exploited by malicious cache servers or local users, it didn’t need such an aggressive timeline.

2 Likes

Agreed, if a vulnerability is being actively exploited then there’s nothing to be gained by keeping it secret (if anything, keeping it a secret is worse). However, there’s been no claim nor evidence of this being actively exploited.

2 Likes

First off, I appreciate the “blameless post mortem” outlook, I think that’s good. That said, “suspicions about the attitudes of certain people with respect to making Nix look good or bad” is not blameless. The thing you’ve been writing between the lines is that you clearly have a critical view of the reporter, their motivations, and that does seem to be a bias towards the NixCpp team not being at fault. And that’s a mistake.


Also, @rnhmjoj, to give additional context:

I wonder what were these past experiences that motivate such an unreasonably short deadline. As an outsider, It does not look responsible.

I’ll let the reporters own words stand for themselves, this is what was initially written in the Nix development channel on matrix:

Okay, so. As mentioned on GitHub yesterday, I have found a vulnerability in Nix 2.24 that allows any untrusted user or substituter (even without trusted signing key) to potentially escalate to root on macOS and some very quirky Linux setups. Due to the severity of this bug, and my past experiences trying to disclose vulnerabilities to Nix, I am considering disclosing this vulnerability publicly in one week (seven days). Any Nix team member may message me for the details of the vulnerability.

(regarding “past experiences”; GHSA-wf4c-57rh-9pjg (a different vulnerability) was known to the Nix team on February 9th, along with an almost-fully-tested patch. The last communication I received about this vulnerability has been around March 8th, with me poking them again on May 21st, whilst pointing out that it had been patched by Lix nearly a month prior. (Amusingly, had the Nix team applied this fix, or one like it, it would have decreased the scope of the vulnerability mentioned above here.) Based on this, as well as the fact that the new vulnerability can be exploited silently, I believe a seven-day deadline is warranted.)

To me, it seems quite clear that the culture of the NixCpp security team has persistently failed to handle security incidents in a timely manner, that was respectful of the reporters time, and that a breakdown in trust and willingness to wait for NixCpp (potentially forever) to fix security problems has passed.

NixCpp being caught unprepared is unpleassant, and embarassing, but it’s also the kind of tough medicine that I hope will actually change things. I do not think any Lix member has the intention of making NixCpp look bad, because they’re already doing a great job of that themselves.

Based on this, as well as the fact that the new vulnerability can be exploited silently, I believe a seven-day deadline is warranted.

Also, to be clear, as much as NixCpp people may call this an “irresponsible disclosure” and leave blame on the reporter, this is quite clear to me. And for once, the Nix team did respond in a timely manner. It’s on the team to make sure that it doesn’t become the norm that “to get NixCpp to fix anything, you have to disclose it first before they take you seriously” and to treat people with the respect required so that they don’t just disclose without bringing the NixCpp team in the loop in the first place.

5 Likes

You don’t have to read between the lines. I’ve quoted, twice, the two events in the timeline of which I have a critical view. I can understand the initial decision to set an unusually short initial disclosure timeline. I’m glad that another person talked the reporter down from simply publishing without a disclosure timeline, and I’m not blaming the reporter for actions they didn’t ultimately take. It is the seven-hour time period between Tom reaching out to the reporter with a fix, demonstrating that this time is different from the previous case, and the reporter publishing, that I am interested in. Because if repeated it will lead to more harm to our users going forward, and because I don’t understand how, when assuming good faith, even with the background of having a previous disclosure not go well, the reporter’s actions in that seven-hour time period make sense. Of course some hypotheses come to mind. One of them—that Matrix ate the message and the reporter never received it—is entirely unrelated to the people involved. If that’s the case, we need to talk about the role of Matrix in things going forward. If I have a bias, it’s in hoping that this or something like it is the explanation, because it’s the least dramatic outcome and has the clearest path forward. I’m being intentionally circumspect about other hypotheses because I don’t want to start any witch hunts. I don’t want to accuse anyone of anything without knowing the facts. Even after knowing the facts, I want to address technical problems with technical solutions, and social problems, if solvable, with social solutions. So let’s get some facts. I don’t think that’s a mistake, or a waste of time.

8 Likes

That’s literally the problem, and that puts the failure on the sender of not ensuring the information they rely on was received. That’s the most fundamental problem here, even if a technical failure was involved, we should never just assume that sending a message is the same as that message being seen, specially when that information is load bearing to the security of the users.

1 Like

Consider that three weeks ago, on a PR bumping Nix to nixpkgs bumping NixCpp from 2.18.5 -> 2.24.4, this was written:

[The reporter] intends to report what appears to be a severe vulnerability in Nix 2.24 tomorrow or in the next couple of days that does not affect older versions of Nix or Lix. Please do not merge this until that situation is resolved (either by it being confirmed not exploitable or by a fix being released).

(for early warning: it’s a bug on macOS only affecting all installations, likely providing root privilege escalation. mitigations are to use something older or to use Lix. there’s another possible mitigation which I would prefer not to reveal before the report to not leak exploit details nope that mitigation would block the original variant but not any of the further ones that were found. just don’t use nix 2.24 until this is fixed)

After this point, any attacker knowing this could comb through the changes between 2.18.5 and 2.24.4, and make use of this exploit in the wild. Thus, the short deadline makes perfect sense, from this point on, the attack was now possible, and it was absolutely just a matter of time before it would have been exploited.

Anyone seeking a reasonable guarantee of security would, if made aware of this, have had strong incentives to stay on 2.18.5. The pressure to move for 2.24.4 in this case made the situation much more urgent, if your goal is to protect the users.

Another solution would be to heed the warning:

just don’t use nix 2.24 until this is fixed.

Instead, disclosure happened, and it was in no way “irresponsible”.

1 Like

The issue in question.

puckipedia intends to report what appears to be a severe vulnerability in Nix 2.24 tomorrow or in the next couple of days that does not affect older versions of Nix or Lix. Please do not merge this until that situation is resolved (either by it being confirmed not exploitable or by a fix being released).

Yes, that is what happens when you start with a public discussion of a security vulnerability. You have to create a short timeline because you just revealed its existence in public.

1 Like

You may be right that the security reporting protocol needs to be more 2G-resilient. But if this failure wasn’t explained by the message going unseen, then any work we do to make the security reporting protocol more 2G-resilient would not have prevented this failure, or, if the causes of this failure are repeated, the next one.

That’s the difference, to me, between the Two Generals’ problem being a theoretical concern for this scenario or a practical one. Both can be important but I’d like to address practical concerns first.

…issue tracker image…

I’m curious if this is the actual state of the issue tracker, or just a placeholder template. The use of TODO and blabla look a lot like the sort of stuff I’ve encountered in Github templates in the past–and while it definitely is a point where they could probably improve, it also is exactly the sort of thing that I would expect people with an axe to grind to wave around and claim disrespect when the truth of the matter is the more mundane “we have a lame issue template”.

to get NixCpp to fix anything, you have to disclose it first before they take you seriously

Linked from I think the lobsters thread, somebody pointed out that puckipedia had been kind of a jerk ( Revert "Merge pull request #9902 from NixOS/require-fixed-output-fetchurl" by puckipedia · Pull Request #9911 · NixOS/nix · GitHub , I’d bet there are others if we dig around). Is anybody actually surprised that somebody who acts like that in a PR is going to have a little trouble being taken seriously?

Similarly, consider “their own words”:

The last communication I received about this vulnerability has been around March 8th, with me poking them again on May 21st, whilst pointing out that it had been patched by Lix nearly a month prior.

It sucks that communication got broken. But, again, there were things that started happening right around that time, culminating in the Save Nix Together thing, which was basically a full-page attack ad on Nix and Eelco–a key person involved in the Nix security team and presumably vulnerability evaluation and remediation.

There are absolutely things to be said here around “how can we have a more resilient security process”–and I think that’s being addressed in the post-mortem–but it seems like useful context to know that the people complaining about this have some overlap with the people that effectively DDoS’ed the folks handling security issues. I would not be surprised if Eelco et al had other things on their plate in the late April-early May timeframe cited here as when the security followup happened.

Bluntly: it’s a bit like saying “A friend of mine set fire to the restaurant and these people had the nerve to neglect to get my drink order. Terrible service, never go there.”

I do not think any Lix member has the intention of making NixCpp look bad

They certainly seemed to capitalize on it, or at least some of their supporters did–even the term NixCpp came from that public screed against the Nix founder in a transparent bid to make Nix seem like a mere flavor instead of the reference implementation!

There’s similarly a bunch of back-handed victim-blaming of the sort you committed here:

NixCpp being caught unprepared is unpleassant, and embarassing, but it’s also the kind of tough medicine that I hope will actually change things

“I’m sorry somebody had to hurt you, but it was for your own good”.

Don’t let’s pretend anything other than hostility.

It kinda looks like the author of the vulnerability, or at least a bunch of their fellow travelers in the Lix project and the fediverse threads defending them, were actively participating in a side-channel attack on the security infrastructure of the Nix project via what amounts to social engineering.

We can harden the project to make such attacks more difficult, but the sooner we all confront that basic truth the better off we’ll all be.

(I for one would be unsurprised if the next set of exploits or whatever come from those same fellow travelers in the Lix ecosystem, both because of their knowledge of weird corners of Nix and their antisocial actions in public in cases like this and previously.)

4 Likes

@crertel can we please have a lot less of the second half of that post?

We need deescalation, and we need to be able to productively discuss this incident without pushing away people with different perspectives.

6 Likes

Fair enough. At the very least, and I think most folks would agree on this, we should make sure that there aren’t any things in the backlog that’ll be a pain later and we should make sure that communication channels are clear and centralized and timelines documented.

And that’s exactly what I think the post-mortem suggests in its conclusions and action items.

1 Like

I think people here are pretty set in their positions. I think I’ll make my final point: no amount of after the fact denial will change the consequences of this neglect.

For what it’s worth, I am a person that works on critical infrastructure relying heavily on NixCpp, and I’m also someone with a non-trivial FOSS project, that has participated in handling security issues that were embargoed.

Considering this, it’s my view that there are several clear failures here, and likewise, when interacting with peers that don’t have a bias towards presuming the NixCpp team to be infallible, this is something they find concerning. No matter how defensive people are on online forums, this is something professionals and users are noticing, and no amount of arguing will detract from the reality that NixCpp delivers persistent security failures due to their pathological inability to take ownership.

The amount of conspiratorial and paranoid thinking I’m seeing in this thread is also a clear example of how these failures will keep on continuing if the course isn’t corrected soon. It seems obvious to me that if Lix developers were seeking to harm the project, they simply wouldn’t inform with NixCpp team anymore. The entire Lix project seems driven directly as a response to all of these persistent failures in the very basics of how to maintain and operate an open source project, whether those be release engineering failures, security failures, inability to onboard new contributors, the list just keeps on going.

I’m in no means a Lix “true believer”. I’ve waited to even attempt using it before it had it’s second stable release, because I had very low trust in an alternative being able to deliver a better product than that of several veteran contributors, or it just being a “side project”. It was only last month that I even started taking it seriously and have been “skunkworking” testing it for various projects (for example, nix-weather). And even then, I’m extremely cautious to consider deploying it in production, given it’s simply a very new project, and it’s not yet clear if it will be able to continue as the dust settles and the years go on. FOSS maintainership is hard, and simply from existing for a long time, NixCpp gets a lot of points.

After the 2.24.4 incident, I’ve noticed a marked interest in moving to Lix at my dayjob, as well as a general interest from other peers, primarily for security reasons. And I personally am still against that, for reasons I’ve outlined above, and because there is a very reasonable chance that this could be “the wakeup call” to the NixCpp team.

But, I think my outlook is shifting to NixCpp being “default dead”, in the sense that I’m not sure it’s defensible anymore to use it for anything security critical, and I think at best, one more incident like this, and I will no longer be asked whether or not we should move to Lix, I will be told to move us.

There isn’t any arguments on an online forum that can shift this reality. And I think NixCpp failing in these areas would be a massive loss to the Nix ecosystem in general, having alternatives is very important for the strength of the project, whether or not you’re “team lix” or “team nix” is completely irrelevant from a technical perspective, the existence of an alternative seems to be driving both projects to create much better software, and the loss of that effect would be a shame.

Heck, just recounting this to myself is making me question my own position that there is anything to be gained for staying on NixCpp. We broke past the trust thermocline last year, and we’re just stumbling in the ruins of the consequences of that, and perhaps instead of arguing with people on the internet in the hopes that some moment of clarity comes to the people in charge, like we’ve done countless times before only to be ignored… maybe I should just accept that it’s falling on deaf ears, and that NixCpp is another darling that has to be killed.

Regardless, that’s my last reply in this thread.

8 Likes

Your personal dislike of Lix devs, thinly veiled by fallacious rhetoric, has little to do with how robust Nix’s security processes are.
The latter is what is under discussion in this thread.

Please do remember, this is an open source project - anyone sufficiently motivated could read the code and identify vulns themselves; vilifying the messenger, as distasteful as you might find them, seems wildly unproductive here.

At some point, the ten or so people really involved Nix/Lix will need to determine if they want to have a working relationship or not. All other technical things will stem from that decision. I think we’re better together, but seeing events like this transpire really make me question that belief.

The mastodon/lobsters threads were an absolute embarrassment, all the way around. Whatever happens, it has to be better than that.

7 Likes

This is hugely taken out of context and is part of a long series of frustrating interactions with the cppnix team leading to the alienation of an entire second nix team into an entire second nix project being formed in late February to achieve objectives relating to technical direction, stability, governance, contributor experience, and community building, yielding Lix.

The cppnix team here “fixed” this “vulnerability”, breaking puck’s code in Zilch and rendering certain kinds of realization of non-fixed-output store paths from nix language impossible without a system-specific derivation. It’s a regression. The vuln stated in the pr was not real in the form suggested, it broke external code, and the patch didn’t actually fix the problem stated. It should have been reverted.

So I understand her frustration with the team here. They were repeatedly failing to correctly read her post and she got mad about that.

Tone policing puck is not the way to achieve anything here. Yes there are flaws in her communication and she accidentally mixes up conversation state sometimes. Yes she can get frustrated like anyone. However if you just assume a woman is malicious if she’s frustrated in public but never do that to a man, you’ve just invented sexism. It’s pretty popular, I’ve heard.

There’s reasons to have a bit of frustration after several years of frustrating interactions with the cppnix team. Don’t assume that because someone gets frustrated sometimes with a group they’ve had issues working with for years that this is a sign they’re malicious.

As for save nix together: there exists a completely different read of it than yours: of a well connected group of contributors making a last ditch effort to try to get the project to stop driving away both their most committed group of contributors and the mod team (who were observed being driven away by the stated internal dysfunctions), and trying to cause that internal dysfunction to get fixed after more polite attempts failed. I will leave it as an exercise to the reader to consider this possibility alongside the stream of departure PRs of that time period.

You can think of the group of friends nearby these things however you want. But there’s always multiple ways to see any actions. There’s a read to this present vulnerability disclosure of a communication failure (accurate). Things have been done in the background to try to prevent that happening again. There’s your read as malice (but then why would Lix post on the official mastodon account detailed mitigation instructions including how to use nix upgrade-nix to downgrade nix versions to not the one in nixpkgs (something otherwise entirely undocumented) and not “use lix lol”? why this disclosure timing? surely a malicious actor would do something else than that timing?).

By trying to paint a picture of the Lix team as somehow maliciously gaining something by this situation, you’re not helping make things better in the future, you’re just escalating division that will help nobody.

Everyone including the Lix team has a vested interest in a secure ecosystem, no matter which implementation they run. There’s a reason I posted the warning on the GitHub thread to ensure it doesn’t get merged: I wanted to reduce the number of users impacted by this bug.

Now: I don’t know if someone has said this yet but it’s probably useful to state publicly: since this incident occurred, there now exist more effective private channels to get communication problems like those encountered with this security issue resolved.

9 Likes

[The reporter] intends to report what appears to be a severe vulnerability in Nix 2.24 tomorrow or in the next couple of days that does not affect older versions of Nix or Lix. Please do not merge this until that situation is resolved (either by it being confirmed not exploitable or by a fix being released).

CERT recommends not doing this. This isn’t responsible.

We’re getting caught up in “responsible” disclosure, which is why that term is disfavored by many in the security community in favour of “coordinated disclosure”.

I think it’s quite silly to make hard and fast rules about what is “responsible”. Is it more responsible to let the nixpkgs update of nix land in a channel so that NixOS users actually have a 2.24 to begin with? Is it more responsible to block the pr with “do not merge without explicit consent from nix team” without saying why?

I don’t care which of those things is more responsible in the eyes of CERT, and I think such a discussion is a waste of time when it’s pretty obvious this was done in a manner to try to mitigate as much as possible the damage given prior experience; the decision to do this was made based on multiple past experiences of slow vuln fixes, and I must commend the CppNix team for turning this one around lightning fast, meaning that maybe we should do the next one differently, perhaps by having them block the pr with “there’s a couple things we need to fix, give us a week” kind of thing.

Was it uncoordinated? Definitely. But a discussion calling best-effort behaviour irresponsible is why this terminology is outdated.

6 Likes

And giving 7/9 days notice is your definition of doing that when that’s very unusual anywhere else, even with an un-cooperative vendor?

Do you also consider publicly announcing that a vulnerability exists within a specific version of Nix before it even being officially reported to the dev team to be “ to mitigate as much as possible the damage”?