You may find Liminix interesting.
The memory overhead in your scenario (on-target eval) is down to using nix to evaluate nixpkgs. nixpkgs evaluation could maybe be made more efficient, as could nix, but this is hard and I expect that the memory overhead of evaluating nixpkgs is just going to grow (that’s certainly been the general trend for the entire lifetime of the project). Perhaps this could be solved if the big split ever happens, but I don’t really see it within the next 5 years (and I won’t try to predict beyond that). The current governance just doesn’t lend itself to the big, committed decisions we’d need for that to happen (or be decided against for good). For the time being, NixOS’ minimum memory target for your use case is probably about 700MB, but that’s just taking alpine’s number and adding the usual nix eval overhead. I don’t think NixOS is hugely more memory hungry than alpine, but systemd & co probably adds some, so a more realistic target is probably ~1GB.
As for disk space, nixpkgs isn’t based on musl, glibc alone is a big deal, let alone things like systemd. Alpine is designed around being small, it’s difficult to get NixOS to that size without building a custom nix-based distro. I can see a world in which things get better (currently my smallest instance is ~6GB), but I don’t think it’ll quite get to alpine sizing, especially since nixpkgs does not focus on this use case - in fact it is probably antithetical to most users’ use cases.
It’s definitely possible - even today - to build an alpine-sized Linux deployment with nix, but nobody has turned that into a popular distro yet. You can indeed probably get quite a bit smaller, but that’s a highly specific use case and you’ll likely have to forego using distros altogether for that goal.
This isn’t impossible, I’d argue not even very hard, like @aanderse says; mostly time consuming. Even if you want to go fully custom, you can still reuse some nixpkgs packages (notably the kernel build infra), and the musl-based stdenv means that a lot of work is done for you, but you’ll have to know what you’re doing with nix (you seem to), nixpkgs (read the repo), and be able to build a Linux system from scratch. Given that you’re saying you don’t have the experience to just sit down and do that today, but that your understanding of nix seems solid enough, I’d say maybe a half year of learning and toying with this, if your spare time looks anything like mine and you invest all of it?
The difficulty after that is to create a NixOS-style deployment system on top of that baseline yourself, since NixOS’ activation script and modules rely very heavily on systemd and a bunch of other stuff which you’d probably eject from a super-minimal Linux. But it can be done, albeit at the cost of some robustness. The activation script is ultimately not that complex, and maybe it can even be reused to some extent. You’d probably also need significantly smaller parts of it for your use cases (at some point it becomes more reasonable just to go with an A/B update scheme instead of having granular generations, but you would still want a module system).
If the Liminix project is anything to go by, a full project, all polished and nice, would probably take a competent developer years. But something to toy around with is probably doable in normal hobby time scales.
As an aside, your goal of an “eventually consistent” deployment strategy comes down to writing a very clever activation script (or maybe an activation daemon since it would need to be stateful?), so if you really want to chase that, learning how the activation script works in detail is where you’d start. It’s very counter to how NixOS is supposed to work, though, so you’ll be facing some technical adversity, and keeping up with changes to NixOS’ activation script will be a continuous maintenance effort.
My attempt at giving you the toys you’re asking for out of the way; the pragmatic answer is to make your updates more granular (a time-based heuristic just doesn’t map well to the backing git model, you should be limiting the number of commits, not the time period), and to use some of the optimizations @bme is suggesting.