Yeah, this is pretty much what one would hope/expect (and I have nothing against ntpd-rs for clarity! I am probably on the RIIR side of the fence in most cases anyway), but if things are as you say, one wonders how a 17ms jump occurred on Majiir’s test setup, which was said to be in the default configuration.
My only concern here is really as a user; I don’t mind what solution is used but as long as jumps are reasonably scarce and reserved for situations a little more extreme than a 17ms desync . And this thread just caught my eye in an e-mail summary (because I have half a reputation for advocating being careful around dodgy clocks in the software we write).
ntpd-rs logged several steps (all backwards) within several minutes after startup, with nothing else happening on the system. 17ms was the shortest jump, and the others were in the 30-50ms range.
If I’m interpreting the ntpd-rs manual correctly, the default threshold is 10ms. I’m no expert, but that seems extremely sensitive. My impression was that steps are for situations like booting up and having the wrong day or decade in your RTC.
In any case, on the same hardware, Chrony logged this at boot:
chronyd[861]: System's initial offset : 0.057841 seconds fast of true (slew)
so it is slewing where ntpd-rs performs multiple steps.
Are you sure that’s it stepping, rather than just how it logs slewing? It’s not like these services take control of the kernel and force it to interpret the hardware clock progression differently.
Fair enough; in either case, these are default configuration values that could easily be changed if the service was to be used for NixOS’ default. I don’t think it’s reasonable to consider this a dealbreaker.
Asking upstream about these default settings also still seems reasonable, it’s clear that slewing is implemented, and there is likely reasoning for these defaults.
Yes, of course. But I think when we’re talking about choosing a new default, we should choose something that itself has good defaults, and something that is likely to be stable and unobtrusive. Right now, as far as the upstream software is concerned, Chrony looks a lot better. The issues with Chrony are in the NixOS module, which is under our control to fix.
Maybe ntpd-rs is brilliant. But if we have to ask upstream questions about their defaults, it’s better for NixOS to make it an option for motivated users than the default for everyone.
They kinda do though. “Step” is a call to clock_settime, “slew” is a call to adjtimex, and there’s no wiggle room for the kernel in how it carries out those system calls.
FYI: I you care about NTP servers with NTS support, there is this (up-to date) GH repo jauderho/nts-servers, that also builds a chrony.conf(which maybe can become incorporated?).
Seems like a great place to start for someone motivated to make an improvement! I think chrony has some good potential for a default replacement, but a better module would help in any case.
It seems like chrony operates on directives, and I couldn’t locate any defaults. So we would need to manually invoke makestep, which currently never happens in the module. Instead, we use the deprecated initstepslew. This means that, as configured by the module, we only ever make a step at startup. Even with makestep, we only step within the first limitclock updates. ntpd-rs seems to just step whenever, as long as the deviation is above the step threshold.
Hi, one of the ntpd-rs maintainers here. Just to clarify a few things around our jump behavior, the default behavior we have was chosen to keep the time to convergence when starting time synchronization reasonably short.
We currently do our slews under our own control, directly manipulating the clock frequency. Because of that, and to keep margin for very poor oscillators in the default configuration, we limit our steering to 200ppm. With such a rate, slews on the order of 100ms already take a very significant amount of time (about 15ish minutes), which has consequences for how quickly we can adapt when starting up.
As for limiting jumping to the first few updates, that hasn’t come up yet as a feature request with us. If that is desired feel free to open an issue on our repository. We are planning to do another revision of the way we steer clocks to better support things like hardware timestamping anyways, so requests now are likely a bit easier to include in that process.
Finally, with regard to being somewhat spammy in the logs, this is on our radar and we are looking at improving the situation, hopefully somewhat later this year. For the time being, for an operating system default, setting the loglevel to warn is probably a good way to reduce it at least somewhat for now, though in cases of network unavailability there still may be somewhat frequent repeated log messages.