I considered opening an issue, but they’re all about bug reports, and this is more of a design decision.
I think the chrony project makes a very good argument for why systemd-timesyncd should not be used, except where we can rely on a LAN NTP server. The risk of relying on systemd-timesyncd on your average laptop that is set to use pool.ntp.org seems nontrivial to me, given how many protocols rely on good time synchronization for security.
The risk is definitely mitigated by the fact that the ntp servers are set to what I assume to be NixOS foundation-controlled time servers by default, and the fact that changing the time server apparently switches to chrony supports that interpretation, but I don’t really see why systemd-timesyncd should be used over something that fully implements an NTP client even with this. The impact on system resources should be trivial for most use cases, we might as well use something that properly syncs against a pool of servers.
The commit that introduced the systemd-timesyncd default (over ntpd at the time) mostly reasons that it saves some disk space, but I don’t find that very convincing. chrony doesn’t appear to have any dependencies that wouldn’t be on any practical NixOS system anyway, and the binaries themselves are ~500KB; this isn’t much compared to the amount of space NixOS naturally guzzles.
If the point is to reduce attack surfaces by removing the ntp server, just removing the chronyd binary from the default output seems pretty trivial (and would shave off ~150KB).
Does anyone know why we default to systemd-timesyncd? Is there a good reason why this is chosen over chrony?
fwiw, this does not appear to be the case: I think the *.nixos.pool.ntp.org addresses are approximately equivalent to just using *.pool.ntp.org directly (see pool.ntp.org: The NTP Pool for vendors)
Hah, I think that means that in the case where there is a potential security risk (using not fully trustworthy servers from pool.ntp.org) we use the systemd-timesyncd that assumes full trust, but the moment you change your servers to a (likely) trusted LAN NTP we switch to chrony which can tape over issues. Seems ass-backwards.
I think it’s pretty safe to say the defaults just weren’t thought through in this much detail?
I also think that SNTP (what systemd-timesyncd implements) is not a good default. It’s fine if are connecting to a local NTP server and know what you’re doing, but things can go wrong for a desktop, or mobile device.
I always wondered how this worked. Apparently it’s just a way to easily block traffic, nothing more. Say that NixOS systems start misbehaving by generating tons of NTP traffic towards servers of the pool, then they can just remove the NixOS DNS records or point them to a black hole to drop it.
Any sources for this? This makes it sound like having a wrong clock enables an attacker to manipulate your communication or gain access to your system. But really what happens with any widely-used protocol is that a wrong clock would make a handshake fail. So then it’s not a security issue, but an issue of service availability, isn’t it? Because with a wrong clock, you would be unable to connect to certain websites rather than connecting to them “insecurely”
You’d also be able to do the inverse; connect to sites with expired certificates “securely”.
This means that a leaked key can potentially be used to MITM your connections. HTTPS (alongside most asymmetric key encryption protocols) breaks down in the face of inaccurate clocks precisely because key expiry is one of the mechanisms by which we ensure keys cannot simply be brute forced over long periods of time. It’s not just there to annoy you.
In general, nonces are often time based to ensure they aren’t reused. It’s not always an instant exploit, but it can be for some protocols, and it definitely weakens security.
Sure, abusing this hole is convoluted and theoretical, but we’re talking about 500KB to make protocols behave as designed.
I agree with @TLATER, NTP based vulnerabilities and attacks are not the most common or sexy/flashy things, however are fundamentally vulnerabilities. Of which, we have the ability to literally shut the book on. I stand in support of the proposal to change defaults to Chrony or another sufficient alternative.
I like Chrony, but I had not been aware of ntpd-rs. Given that ntpd-rs seems approximately equivalent to Chrony for general usage as a client of pool.ntp.org (albeit not as a server or a client of a custom pool) and is trusted by LE, I am inclined to support using ntpd-rs. How does its RAM and storage usage compare to Chrony?
In my machine with 10 NTS servers it uses 1.59 MB of RAM, so basically nothing. The size of the 3 binaries inside the derivation is 13.7 MB, which is larger than chrony (560 KB). This is mostly due to the fact that it ships its own libraries like rustls. There are other considerations like ntpd-rs already supporting the NTPv5 standard and support for the experimental NTS pool (not sure if chrony implements this one).
This makes it sound like having a wrong clock enables an attacker to manipulate your communication or gain access to your system.
There’s not just TLS (and even that has problems if time is incorrect, think of revoked certificates).
Off the top of my head: if you mess with the time of a Kerberos KDC you can keep using expired tickets; you can expire DNSSEC records or prevent a cached DNS record to expire; you can trick a logging system to rotate the logs and hide your tracks; you can force a Bitcoin node to selectively reject blocks from the network…
I haven’t used either Chrony or ntpd-rs before, so I gave them both a try. This is my assessment of the tools and their respective NixOS modules:
Chrony:
Has an entry on the official NixOS Wiki
Only extraConfig is available (no RFC0042 settings)
Has module options for common config scenarios
Somewhat counterintuitive CLI
Logs RTC errors and permission warnings with the default config (see nixpkgs#445035)
ntpd-rs:
No entry on the official NixOS Wiki
Has RFC0042-style settings
Has otherwise limited module options
CLI commands are limited
Spams journal by default
Jumps time backwards in the default configuration instead of slewing, even for small steps (e.g. one jump was just 17ms)
I don’t know much about the complexities of time synchronization, so I can’t speak to much there.
Between the two, I think I prefer Chrony. Its shortcomings can mostly be addressed through fixes to the NixOS module. But I find myself wishing systemd-timesyncd just handled time synchronization better!
I can’t find the doc page that mentions this point, but backup software Kopia (and it’s not alone, I definitely don’t want to seem like I’m calling it out specially, it’s just something that I saw recently) requires on a well-synchronised clock for maintaining locks across multiple clients accessing the same backup destination and also for garbage collection.
A wrong clock could potentially do some damage there.
I probably don’t need to say this, but backward jumping recklessly is a troublesome default, I hope this won’t become the default config for NixOS.
I see written a lot of software that doesn’t handle backward jumps in time properly. I’m pretty pedantic about it and my colleagues go along with me (despite a ‘eh, servers usually have good NTP synchronisation right?’ thought process) but I suspect software that doesn’t care is more prevalent. For big jumps, you can’t really avoid it, but doing it for 17ms is overkill This will needlessly inflict pain on users when it causes subtle bugs.
Of the software that *does* handle this, some of it will be assuming it’s a rare edge case and have suboptimal performance, e.g. I think I see systemd’s journal rotates logs when the time jumps backwards.
For the record, I think this is a misrepresentation of how ntpd-rs works. If the time is particularly far from what it should be, it will perform a step, which may jump backwards, but normal behavior is slewing. Both chrony and the traditional ntpdimplement such features as well, I suspect @Majiir just happened to trigger it on one but not the other or something.
The exact behavior is also fully configurable if you don’t like the defaults, which yeah - the sample configuration addresses the exact concerns you have as well, @oktola.
systemd-timesyncd also implements slewing FWIW, but it completely lacks features for allowing steps when the time increment/decrement is too large.
That said, if you’d like to dig into this, I’m sure upstream will have discussed default behaviors for this in the past as well, and have more reasoned arguments than “LOL, screw stupid programmers who rely on clocks monotonically incrementing”.
This is not to weigh in on a choice between the two, as much as I like rewriting everything in rust, I don’t have enough experience with either to really make an informed decision. I just think the current analysis is a bit too surface-level.