What about state management?

Hi Folks! Now that 23.11 went out successfully and there is space for new ideas, let’s talk about state management in NixOS.

There are several observable problems with it right now:

  • No hook point for extending NixOS state transition, in order to provide features
    • extended boot action besides updating system symlink and creating bootloader entry, e.g. for writing firmware
    • extended switch action, e.g. for applying database migration alongside version update
  • No common language for error checking and recovery
  • No common language for state transition previews
  • No common language for optional or recommended transitions
  • Each state-keeping module has its custom initialization/transition/migration logic
  • Migration logic is often shoehorned into systemd pre-start scripts, but while convenient,
    • this can only deal with idempotent state transitions and
    • error handling means reading error logs and debugging units

I’ve written out an approach, that would recognize and formalize the existing top-level boot/switch point.

Check it out! https://github.com/bendlas/nixpkgs/blob/de724683ff087f130a035d8786fdd5c727c43bc5/nixos/state/requirements.md

Let me know what you think and if you know of other efforts that overlap or complement with this.

If you like this direction, I’d like some help with refining the requirements document, as well as with getting this into an actual RFC, and defining some core language and show its use with examples. So basically everything :slight_smile:

Examples would generally be defined as NixOS VM tests:

  • Define tests for a module’s state migration scenario
  • Scenario: Offer a resolution for PostgreSQL version downgrade by pg_dump/pg_restore
  • Scenario: Recover from a failed update by restoring the last backup
  • Scenario: A migration has multiple steps with a reboot in the middle.

Some references:

7 Likes

You might be interested in this issue:
https://github.com/NixOS/nixpkgs/issues/273972

And a past RFC on NixOS migrations:
https://github.com/NixOS/rfcs/pull/155

And also some pointers provided by @RaitoBezarius in this recent interview:

1 Like