Protect Nix codebases against Trojan Source (CVE-2021-42574)

shamrocklee · November 29, 2021, 7:53am

A cross-language source-level vulnerability known as Trojan Source (CVE-2021-42574) was made public on November 1st, 2021. It is based on non-closed Unicode explicit-directional-changing (RTL) control characters that allows the attacker to arbitrarily change the order of character sections and influence characters outside a comment or a string.

The Nix toolchain seems to be unaware of such kind of vulnerabilities. GitHub has added a warning to files with non-closed RTL characters in the source tree, but it would be better if the problem can also be found by the Nix linter, formatter and Nix-CI (OfBorg).

I have opened an issue in the nix-community/nixpkgs-fmt project which includes the Python scripts to generate poisoned Nix expressions as a proof-of-concept implementation of the vulnerability. Here’s how it looks like:


{ lib, hello }:
hello.overrideAttrs (oldAttrs:
  let
    scrSecure = builtins.trace "Using the secure source" oldAttrs.src;
  in {
    pname = oldAttrs.pname + "-secure";
    /*Replace the source with a secure one<U+202E><U+2066>src = srcSecure;<U+2069><U+2066>*/
})

https://github.com/nix-community/nixpkgs-fmt/issues/276

earvstedt · November 29, 2021, 1:51pm

Related:
https://research.swtch.com/trojan
https://github.com/ziglang/zig/issues/10074

Summary: This has to be addressed in editors/code review tools, not at lower levels of the tooling stack.

raboof · November 29, 2021, 4:01pm

Yes

Though why not both? The linter/formatter/CI seem like sensible places to (also) validate this.

shamrocklee · November 30, 2021, 2:32pm

Editors often rely on linters/formatters to highlight the code correctly, to format the code, and to show diagnostic messages. A general editor such as VSCode doesn’t understand the string block/comment block itself, and requires language-specific plugins to do the work. The plugins then turns to the linters and formatters.

I don’t know much about the review tools, but similar situations might occur IMHO.

Update: I’m unable to come out with an example in which the closing RTL characters fall into another string/comment block in the same line, nor is there one in Boucher and Anderson’s paper. But that’s still imaginable somehow.