Hi Folks! Now that 23.11 went out successfully and there is space for new ideas, let’s talk about state management in NixOS.
There are several observable problems with it right now:
- No hook point for extending NixOS state transition, in order to provide features
- extended
boot
action besides updating system symlink and creating bootloader entry, e.g. for writing firmware - extended
switch
action, e.g. for applying database migration alongside version update
- extended
- No common language for error checking and recovery
- No common language for state transition previews
- No common language for optional or recommended transitions
- Each state-keeping module has its custom initialization/transition/migration logic
- Migration logic is often shoehorned into systemd pre-start scripts, but while convenient,
- this can only deal with idempotent state transitions and
- error handling means reading error logs and debugging units
I’ve written out an approach, that would recognize and formalize the existing top-level boot/switch point.
Check it out! https://github.com/bendlas/nixpkgs/blob/de724683ff087f130a035d8786fdd5c727c43bc5/nixos/state/requirements.md
Let me know what you think and if you know of other efforts that overlap or complement with this.
If you like this direction, I’d like some help with refining the requirements document, as well as with getting this into an actual RFC, and defining some core language and show its use with examples. So basically everything
Examples would generally be defined as NixOS VM tests:
- Define tests for a module’s state migration scenario
- Scenario: Offer a resolution for PostgreSQL version downgrade by
pg_dump
/pg_restore
- Scenario: Recover from a failed update by restoring the last backup
- Scenario: A migration has multiple steps with a reboot in the middle.
- …
Some references:
- RFC: ensure-style options in NixOS modules: a current attempt at solving this globally
- postgresql_15 requires granting permissions on schema public, ensureUsers insufficient: There was a minor panic during the 23.11 release, with the new PostgreSQL version having a breaking change for nixos-style permission grants. This is my motivating use case for writing all of this.
- Enrich PostgreSQL ensure users and databases support: an abandoned attempt at solving this locally for pg
- nixos/postgresql: enrich ensure users and databases support: an abandoned attempt at solving this locally for pg