Nix regex match

Another silly question, but I want to match “blahblah {something} blahblah”.
So I’ve tried:

builtins.match “\\{.+\\}” s

I am aware {} have a special meaning in regular expression but I want the literal “{” and “}” characters. Well this still fails with:

error: invalid regular expression ‘\{.+\}’

(since \ is an escape character on the forum, I’ve duplicated them in the question, just if it won’t be displayed correctly that is two backslashes in code and one in error message)

Any hints?

1 Like

Nix expressions and the match function are a bit peculiar. You need to match the whole string, or you won’t get any results, and escaping can be tricky, so I’d recommend using character lists for special characters.

builtins.match "^.*[{](.+)[}].*$" "blahblah {something} blahblah"
[ "something" ]
2 Likes

Thanks. [{] is a nice way to escape the character, didn’t even think about it :slight_smile:
Solved all my issues, just out of curiosity, why wouldn’t \{ properly work?

I’m really not sure, Nix uses http://www.cplusplus.com/reference/regex/ECMAScript/ and according to its documentation this should be possible, but there must be some weird going on. Maybe you can file a bug for this?

One issue is that \{ will be parsed as { by Nix, as it is not a valid escape sequence in " quotes. This behaviour is confusing coming from other languages (see https://github.com/NixOS/nix/issues/3063) and you will need to double the backslashes in " strings, or use '' strings.

builtins.match "\\{.+\\}" s
builtins.match ''\{.+\}'' s

The reason why regex escaping fails is not clear to me but it seems to work if we do not escape the closing bracket:

builtins.match ''\{.+}'' s

Now an interesting observation is that I can reproduce this just on Linux but not on MacOS (Nix 2.3.6 in both cases). If you escape the first { everything is fine then. I guess that’s because {} is used as a quantifier and when there is no “{” the second “}” is harmless. Doesn’t work vice-versa tho - possibly because the parser then searches for a real “}” following the escaped one ending the expression and doesn’t find one.

builtins.match "\\{.+\\}" s works for me on macOS 10.15.5. Are you testing on an older version of macOS?

Nix is using C++'s std::regex with mode std::regex::extended. macOS has historically had a buggy implementation of this for std::regex (surprising because the POSIX regcomp() family works just fine). I’m not aware of any bugs that would cause the regex \{.+\} to fail to compile, but it’s certainly plausible that older versions of macOS had a bug here that’s fixed as of macOS 10.15.5.

By “reproduce this” I meant that it doesn’t work (so just to avoid confusion the regex is fine on MacOS, but NOT working on Linux). I am also testing on 10.15.5 MacOS, for Linux I had 20.03 NixOS as well as Debian Stretch w/ Nix installed.

Ideally we would fix Regex behavior differs across platforms · Issue #1537 · NixOS/nix · GitHub by switching to pcre or re2.

3 Likes

Ah hah. Looks like both Linux and macOS are actually valid behaviors. From the standard for Extended Regular Expressions:

The interpretation of an ordinary character preceded by an unescaped ( '\\' ) is undefined, except in the context of a bracket expression

And the ERR goes on to define { as special but not }. And upon reflection, there’s no reason to ever need to escape } because interval expressions cannot contain anything except numbers and a comma.

Given this, it seems macOS has made the (sensible IMO) decision to allow escaping of any ordinary character, and Linux has chosen to treat it as an error.

It’s not so much an issue of Linux vs macOS as libstdc++ vs libc++. You can reproduce these problems on Linux by using libcxxStdenv. A couple of days ago I found that nix develop was broken on macOS because libc++ has a stricter (probably more correct) interpretation of some regexes (see nix develop: Fix bad regex · NixOS/nix@c3c7aed · GitHub).