Yes, even “silly” things like using both at the same time
nixos-rebuild-ng repl
And of course things that already worked before, like nixos-rebuild-ng switch/boot/test/build/dry-build/dry-activate/list-generations.
--target-host and --build-host is implemented slightly different from nixos-rebuild, since they don’t allocate a pseudo-TTY (that in my tests causes more issues and headaches that is worth it). Instead, to ask for sudo password there is a --ask-sudo-password flag that will ask the remote sudo password at the start of the program, and injects the pasword at any remote sudo command via stdin.
This means that there is no more double sudo password prompt like it happens in nixos-rebuild, and the whole experience is more streamlined (e.g.: you don’t need to wait until the build finishes to type your password), but also there is also no error checking (we don’t validate the password, so if you type it wrong it will fail later).
Another big change in this release is implementation of proper logging. We now have logging.debug() information that should help debug issues. It is disabled by default but can be enabled using --verbose flag (that also enables verbose inside the nix commands).
The only feature missing is the _NIXOS_REBUILD_REEXEC to “update” in-place the nixos-rebuild-ng in case there is a newer one so we can have the bug fixes from it before switching to a new configuration. The feature is implemented but disabled from this release because we first need to implement the module changes so we can have config.system.build.nixos-rebuild point to nixos-rebuild-ng, that will probably be my next focus.
@k0kada Cool! Have been using it, and I feel that when it’s ready, maybe the nix wrapper library can be extracted for other uses.
One quick question: is there a way to build on-site remotely? If not I’d like to give a proposal: skip nix-copy-closure when build_host == target_host. Useful for rebuilding a beefy remote machine.
If I understood your question, this should already work, except for the fact that yes it will call nix-copy-closure to copy between build_host and target_host (that will be the same in this case). It shouldn’t be a huge issue though since the call should finish really fast (since it will conclude that no copy should be made).
Now for implementing something new: I want to avoid any feature requests for now. The reason is because otherwise this will forever be chasing new use cases instead of focusing in replacing the current code. The most important thing right now is to get testers to exercise the different use cases of the original and fixing bugs that we find. Once the code is stable we can go back to focus in implementing new features (that is also the objective, one of the reasons of the rewrite is to make it easier to implement new features).
I am not sure if I want to try to make this a library. Most of the logic is pretty heavily inspired in the use cases of nixos-rebuild, and I few that if you want to do something else with it it is better to wrap nix yourself. But not sure, maybe I will change my mind in the future.
Understandably! What about features that might be added to the existing implementation soon? Would it be a good time to start a PR on build-image for the python implementation or shall we wait until it stabilizes?
It would definitely help me focus in the more strutural PRs that needs to be done (like the changes to the module system), and if you can also give me feedback in what can be improved (e.g.: is updating the code/documentation easy?) I would be grateful.
Right now while using --build-host alice --target-host alice works, the implementation is definitely wonky: once the build finishes, we call ssh alice -- nix-copy-closure --to alice, that means that alice needs to be able to SSH’d itself.
Even if you’re using different hosts, let’s say, --build-host alice --target-host bob, this is still wonky because it means that alice needs to somehow be able to talk with bob, e.g.: if alice and bob are aliases in the current ~/.ssh/config, it means that alice also needs to have bob configured in the SSH config and credentials, contrary to what the user probably expects (i.e.: only having the config in their eval machine).
(Keep in mind that this is also the case of nixos-rebuild, since we use basically the same strategy).
Sadly, this is difficult to fix with nix-copy-closure, since it supports either --to or --from, not both. So right now I am thinking of two possible solutions:
Call nix-copy-closure twice, once nix-copy-closure --from alice and another nix-copy-closure --to bob. This would mean that the copy will probably take twice as long and we will need extra space in the eval machine (sufficient to keep the whole configuration)
Replace nix-copy-closure with nix copy, that supports using --to and --from together. It is still kinda sub-optimal because instead of having alice talking directly with bob, now the connection between the two seems to be done in the eval machine (I tested this with a Chromebook and it was really slow because the Wi-Fi connection in my Chromebook is poor), but not sure there is much we can do (and --use-substitutes still exist for those cases). This also would mean that any usage of --to and --from would need a newer version of Nix, and we try to be compatible to nix 2.3 since some folks still depends on it
Right now I am thinking of maybe checking which Nix version the person is using and using the first strategy if the user has an older version of Nix, or the second one if the user has a newer version. But if anyone can think of a better solution, I am accepting suggestions.
Note that I said adventurous users, I still recommend adding it to environment.systemPackages = [ pkgs.nixos-rebuild-ng ]; instead of system.rebuild.enableNg = true; for now, since this way you can use nixos-rebuild-ng side-by-side with nixos-rebuild. I am using system.rebuild.enableNg = true; for a few days though with success, but your milage may vary (especially if you’re using classic Nix instead of Flakes, since I don’t have non-Flakes systems to test).
This is also the end of the big PRs in nixos-rebuild-ng. I expect from now most PRs will be for bug fixes. There are also a few improvements that I am planning (like the usage of nix copy in place of nix-copy-closure), but I don’t expect big changes.
Another thing missing is testing how nixos-rebuild-ng behaves during installer. We have some installer tests that I didn’t port to nixos-rebuild-ng yet since I had some issues that I didn’t bother debugging too much (the installer tests are painful since they’re really slow to run). But this is in my to do list.
It is also a good time for people to take a look and suggest improvements. @phaer if you want to port build-image this is definitely a good time to do now.
For users in Nix <2.18 (in the current nixpkgs this effectively only covers users of Nix 2.3), the code will fallback to use nix-copy-closure twice, once with --from build_host and another with --to target_host, meaning this will result in the copy being slowed down and the evaluation host also needing free space to store the copy of build result. But I don’t think this should impact many users.
Some other small improvements:
Paralelization of nixos-rebuild-ng list-generations: this command can get surprinsingly slow specially with lots of generations. When I had ~10, this change reduces from ~170ms to ~100ms. For comparison, nixos-rebuild list-generations will take up to ~1 second(!) in the same situation, so not bad in general
Removal of tabulate as dependency. It was bothering me that we were adding dependency in a library that is almost as complex as this whole program as dependency. With the help of ChatGPT, we now have 0 runtime dependencies again
Add support to --build-host for build-vm and build-vm-with-bootloader, something that I completely missed by mistake
For another announcement that I forgot to do, nixos-rebuild-ngnow also has a proper manual (e.g.: not just copying the old manual from nixos-rebuild) and can automatically generate Bash/ZSH completion from the command line arguments.
The manual is generated from scdoc, kind like a Markdown but for man pages. Hope this drives up collaboration, since I find troff difficult to edit/understand, while the scdoc format is so small that I put an explanation of its syntax in a few lines of comment in the man page document itself.
Perhaps it would be good to have an option to enable verbose logging for nixos-rebuild-ng itself, but not the forked Nix commands? nix-build -v can be quite noisy: evaluating a simple NixOS system using the verbose mode prints out 2746 lines of logs, only a few of which come from nixos-rebuild-ng.
Out of curiosity - since the entire python ecosystem breaks on staging merges, is there / will there be any failsafe in place to ensure that the rebuild command doesn’t break every 2 weeks?
What breaks in the Python ecosystem? Because nixos-rebuild-ng has no runtime dependencies, so if the issue are Python packages breaking this is a non issue.
Edit: there is one build dependency on pytest, but if this becomes an issue I have a plan to convert all tests to unittest.
I just saw the python3Packages.shtab dependency in nativeBuildInputs, I’m not familiar with that package to know if it’s well-tested post staging merges or what its transitive dependencies might be.
EDIT: Now that I think about it, if nixos-rebuild-ng didn’t build due to some dependency issue, then I wouldn’t be able to deploy a broken version of the rebuilder. So I guess either way it’s not as much of an issue as I originally thought.
Forgot about shtab, but it is an optional build dependency for the shell completion. It can be disabled if you set withShellFiles = false in the package input.
Yes, I know how overrides work I was simply asking about failsafes within the python ecosystem. I’ll also note the default is on, so that package should be kept in good shape for the average user, especially if nixos-rebuild-ng becomes the default.
My main point is that if this becomes a problem, there are solutions because the core of nixos-rebuild-ng is designed to be dependency free. For example, if the shtab becomes an issue, we can just go back to manually writing shell completions. If pytest becomes an issue, we can convert tests to unittest that is included in Python’s standard library.
Also, those things are improvements compared to the original implementation, not requirements. For example, the original implementation generates shell completion manually and it is missing a few recent commands. The original implementation also has no unit tests.