Pre-RFC: Decouple services using structured typing

ibizaman · January 3, 2025, 11:13pm

Hi everyone!

Summary

Currently in nixpkgs, every module accessing a shared resource must implement the logic needed to setup that resource themselves. This leads to a few issues and subpar end user experience, the first one springing to mind is tight coupling between modules.

I propose a solution to this issue that is fully backwards compatible and can be incrementally introduced into nixpkgs. These two properties are very important to me and IMO critical to the success of adoption of this RFC.

In this (pre) RFC, I will:

Provide a motivating example.
Identify the core issue in the example, generalize it and enumerate all related downstream issues and drawbacks.
Provide a solution and show how it solves the issues.
Conclude with defining two contracts and show their usage.

Motivation

As a motivating example, I’d like to take the Nextcloud module service. This service:

configures Nextcloud, which is the core of the module;
sets up Nginx as the reverse proxy;
sets up the PHP + PHP-FPM stack;
lets the user choose between multiple database and sets it up;
lets the user choose between multiple caches and sets it up.

Now, all the code to do this is located in the Nextcloud module linked above. There is no abstraction over Nginx, the databases or the caches. The issues I identified with this are quite common in the software world:

This leads to a lot of duplicated code. If the Nextcloud module wants to support a new type of database, the maintainer of the Nextcloud module must do the work. And if another module wants to support it too, the maintainers of that module cannot re-use easily the work of the Nextcloud maintainer, apart from copy-pasting and adapting the code.
This also leads to tight coupling. The code written to integrate Nextcloud with the Nginx reverse proxy is hard to decouple and make generic. Letting the user choose between Nginx and another reverse proxy will require a lot of work.
There is also a lack of separation of concerns. The maintainers of a service must be experts in all implementations they let the users choose from.
This is not extendable. If the user of the module wants to use another implementation that is not supported, they are out of luck. The only way, without forking nixpkgs, is to dive into the module’s code and extend it with a lot of mkForce, if at all possible, and that is not sub-optimal experience.
Finally, there is no interoperability. It is not currently possible to integrate the Nextcloud module with an existing database or reverse proxy or other type of shared resource that already exists on a non-NixOS machine.

Detailed Design

The solution I propose is also known to the software world, it is to decouple the usage of a feature from its implementation. The goal is to make this nextcloud.nix file mostly about the core of the module and offload the rest.

To make this happen, we need a new kind of module, a module orthogonal to existing modules and services and which acts as a layer between a requester module (Nextcloud) and a provider module (Postgres, Nginx, Redis, etc.). I called this layer a contract. In practice, it is an option type coupled with an intended behavior enforced with generic NixOS tests.

The requester module adheres to a contract by providing a new option with the correct type (with helper functions, see implementation below). The contract specifies what options the requester can set - the request.

The provider module adheres to a contract by accepting the options set by the requester and sets options to define the result, which are also specified by the contract.

For provide modules, the easiest for incremental adoption will be to add a small layer on top of transitioning packages. That layer will just translate the options for the contract to options already defined for that module. With time, maybe we’ll get to merge that layer into the service directly.

If we come back to the Nextcloud example and zoom in on the Reverse Proxy contract, we can identify the actors responsible for each part.

Introducing this decoupling in the form of a contract allows:

Reuse of code. Since the implementation of a contract lives outside of modules using it, using the same implementation and code elsewhere without copy-pasting is trivial.
Loose coupling. Modules that use a contract do not care how they are implemented as long as the implementation follows the behavior outlined by the contract.
Full separation of concerns. Now, each party’s concern is separated with a clear boundary. The maintainer of a module using a contract can be different from the maintainers of the implementation, allowing them to be experts in their own respective fields. But more importantly, the contracts themselves can be created and maintained by the community.
Full extensibility. The final user themselves can choose an implementation, even new custom implementations not available in nixpkgs, without changing existing code.
Incremental adoption. Contracts can help bridge a NixOS system with any non-NixOS one. For that, one can hardcode a requester or provider module to match how the non-NixOS system is configured. The responsibility falls of course on the user to make sure both system agree on the configuration. But then, the NixOS system can be deployed without issue and talk to the non-NixOS system.
Last but not least, Testability. Thanks to NixOS VM test, we can ensure each implementation of a contract, even custom ones written by the user outside of nixpkgs, provide required options and behaves as the contract requires thanks to generic NixOS tests.

Examples and Implementation

I’d like to let you know first that this idea went through a lot of iterations on my part. There are some complicating factors, some due to how the module works, some due to how the documentation system works.

In the end, I landed on an implementation using structural typing to distinguish between contracts. I find it is quite nice for the users to use, and I mean those requesting or providing a module and the end user who needs to do the plumbing. The incidental complicated parts are hidden behind some functions.

That being said, I did not spend time trying to make the error messages look good. That’s definitely one area of improvement.

To explain the implementation, I’ll first go through an example, from the perspective of the users I describe in the previous paragraph. Afterwards, I’ll go through the code in a different order. Hopefully having both perspectives will paint a good picture of why the implementation is the way it is.

Files Backup Contract

Requester Side

To backup the files of the Nextcloud service, currently users must know those files live in the directory provided by services.nextcloud.dataDir and they must configure the backup job to have the correct user "nextcloud" (which is hardcoded) to be able to access those files.

The contract, as said above is a coupling between options and an expected behavior. The options for this contract are:

user = mkOption {
  description = "Unix user doing the backups.";
  type = str;
  example = "vaultwarden";
};

sourceDirectories = mkOption {
  description = "Directories to backup.";
  type = nonEmptyListOf str;
  example = "/var/lib/vaultwarden";
};

excludePatterns = mkOption {
  description = "File patterns to exclude.";
  type = listOf str;
};

hooks = mkOption {
  description = "Hooks to run around the backup.";
  default = {};
  type = submodule {
    options = {
      beforeBackup = mkOption {
        description = "Hooks to run before backup.";
        type = listOf str;
      };

      afterBackup = mkOption {
        description = "Hooks to run after backup.";
        type = listOf str;
      };
    };
  };
};

The goal with this contract is to capture the essence of what it means to backup files in most cases. Maybe this is not enough for some peculiar cases and we’ll need a superset contract of some sort. This has been enough for me so far to backup Nextcloud, Jellyfin, Vaultwarden, LLDAP, Deluge, Grocy, Forgejo, Hledger, Audiobookshelf and Home-Assistant.

The files backup contract allows the Nextcloud module to express how to back it up in code by exposing a backup option (the name does not matter but should be indicative of the contract) which is a submodule and whose fields are given values by the Nextcloud module using default:

backup = lib.mkOption {
  description = ''
    Backup configuration.
  '';
  default = {};
  type = lib.types.submodule {
    options = contracts.backup.mkRequester {
      user = "nextcloud";
      sourceDirectories = [
        cfg.dataDir
      ];
      excludePatterns = [".rnd"];
    };
  };
};

Surprise! You didn’t know you must exclude the .rnd file from the backup? It’s actually not a file. It can’t be backed up. I discovered this the hard way by having my backup fail one day. But now, nobody needs to suffer anymore since we can write this information in code!

We can see the usage of a function contract.backup.mkRequester which produces a set of fields that gets plugged into the options field of a submodule. This function forces the maintainer of the requester module to only give fields relevant for the backup contract. It also correctly sets up default and defaultText correctly. This was actually hard to get right to satisfy the documentation tooling so the function helps quite a lot here.

The full definition of the requester part of the contract can be found here. You’ll see the contract provides a mkRequest function which is in turned plugged in here to produce the final contract and this mkRequester function. This double layer of functions was again useful to get an uniform structure for all contracts and to please the documentation tooling. Rendered documenation can be seen here.

Provider Side

Now that we have something describing how to be backed up, we need something that backs that thing up. This is the role of a provider module. Here, I’ll use a layer I wrote that’s above the original nixpkgs Restic module that exposes the options of the contract. Code is here and I copied the snippet hereunder:

let
  repoSlugName = name: builtins.replaceStrings ["/" ":"] ["_" "_"] (removePrefix "/" name);
  fullName = name: repository: "restic-backups-${name}_${repoSlugName repository.path}";
in

// ...

instances = mkOption {
  description = "Files to backup following the [backup contract](./contracts-backup.html).";
  default = {};
  type = attrsOf (submodule ({ name, config, ... }: {
    options = contracts.backup.mkProvider {
      settings = mkOption {
        description = ''
          Settings specific to the Restic provider.
        '';

        type = submodule {
          options = commonOptions { inherit name config; prefix = "instances"; };
        };
      };

      resultCfg = {
        restoreScript = fullName name config.settings.repository;
        restoreScriptText = "${fullName "<name>" { path = "path/to/repository"; }}";

        backupService = "${fullName name config.settings.repository}.service";
        backupServiceText = "${fullName "<name>" { path = "path/to/repository"; }}.service";
      };
    };
  }));
};

So here also, we have this other mkProvider function which produces a set of options that gets plugged into a submodule. Here though, the exposed instances type is an attrsOf, and we use the name and config from the submodule to configure the defaults of the contract. Getting this right was messy to figure out, but it’s pretty powerful in the end. We can reference other part of the config to generate the default values as long as we provide a “*Text” version which is hardcoded. The rendered documentation can be found here.

Compared to the mkRequester function above, this function actually takes in a nested attrset. The resultCfg is the part which configures the result of the provider module. For a files backup contract, the provider module must provide a systemd service that does the backup and an executable that can rollback to a previous version. It’s a pretty loose definition but I didn’t spend too much time on enforcing some precise behavior for the executable.

There’s also the settings field which is essentially a pass-through option that allows one to define any other options not defined in the contract but which are still necessary to setup the provider correctly.

The config part of the module, like stated above, just does the translation between this contract world and the actual options defined by the nixpkgs Restric module. It can be seen here. I won’t copy it here because it’s lengthy and doesn’t add any value related to contracts.

End User Side

Finally, we need the end user to setup a Restic backup job and plug it in the correct option in Nextcloud. The complete example snippet can be found here but in essence it looks like this:

shb.restic.instances."nextcloud" = {
  request = config.shb.nextcloud.backup.request;

  settings.repository = "/srv/backups/restic/nextcloud";
};

I use the shb prefix here to denote modules using contracts. That 3 letter word is an abbreviation of my project SelfHostBlocks where I use contracts already for my server.

The request from the Nextcloud module (shb.nextcloud.backup) is plugged in the request field of the provider module. The passthrough settings option is used to give options specific to the actual provider used.

One can see how all the details are hidden from the user here. But even better, one could create a second Restic instance backing up to an S3 bucket:

shb.restic.instances."nextcloud_s3" = {
  request = config.shb.nextcloud.backup.request;

  settings.repository = "s3://...";
};

Or one can use BorgBackup, assuming such a layer translating the files backup contract to then nixpkgs BorgBackup module exists, with:

shb.borgbackup.instances."nextcloud" = {
  request = config.shb.nextcloud.backup.request;

  settings.repository = "/srv/backups/borgbackups/nextcloud";
};

And backing up another service is also really obvious:

shb.restic.instances."vaultwarden" = {
  request = config.shb.vaultwarden.backup.request;
  settings.repository = "/srv/backups/restic/vaultwarden";
};
shb.restic.instances."vaultwarden_s3" = {
  request = config.shb.vaultwarden.backup.request;
  settings.repository = "s3://";
};
shb.borgbackup.instances."vaultwarden" = {
  request = config.shb.vaultwarden.backup.request;
  settings.repository = "/srv/backups/borgbackup/vaultwarden";
};

Having all modules use this files backup contract is a huge boost in usability and freedom for the end user. It also inverts the control and let the user choose how to backup something, without needing any work from the maintainers of the Nextcloud or other modules.

Stream Backup Contract

If one backs up files, one should be able to backup databases too. I’ll go quickly over this one and only highlight differences with the files backup contract.

I called this contract database backup contract in my project but stream backup contract is more adapted I think.

One could use the file backup contract to backup a database. Usually though, you can’t just backup the underlying files of the database, you need a dump. Creating that dump is expensive and takes disk space, so it’s usually better to rely on some streaming functionality.

The contract for this looks like so:

user = mkOption {
  description = ''
    Unix user doing the backups.

    This should be an admin user having access to all databases.
  '';
  type = str;
  example = "postgres";
};

backupName = mkOption {
  description = "Name of the backup in the repository.";
  type = str;
  example = "postgresql.sql";
};

backupCmd = mkOption {
  description = "Command that produces the database dump on stdout.";
  type = str;
  example = literalExpression ''
    ''${pkgs.postgresql}/bin/pg_dumpall | ''${pkgs.gzip}/bin/gzip --rsyncable
  '';
};

restoreCmd = mkOption {
  description = "Command that reads the database dump on stdin and restores the database.";
  type = str;
  example = literalExpression ''
    ''${pkgs.gzip}/bin/gunzip | ''${pkgs.postgresql}/bin/psql postgres
  '';
};

The Postgres module would be a requester here, providing a databasebackup option defined like so:

databasebackup = lib.mkOption {
  description = ''
    Backup configuration.
  '';

  default = {};
  type = lib.types.submodule {
    options = contracts.databasebackup.mkRequester {
      user = "postgres";

      backupName = "postgres.sql";

      backupCmd = ''
        ${pkgs.postgresql}/bin/pg_dumpall | ${pkgs.gzip}/bin/gzip --rsyncable
      '';

      restoreCmd = ''
        ${pkgs.gzip}/bin/gunzip | ${pkgs.postgresql}/bin/psql postgres
      '';
    };
  };
};

The Restic module - the provider module - would have an attrsOf option taking in this request. The code for that can be found here.

Finally, on the end user side, using this looks like so:

shb.restic.databases."postgres" = {
  request = config.shb.postgresql.databasebackup.request;
  settings = // ...
};

Notice a pattern?

Secrets Contract

Until now, we didn’t do much with the result set by the provider module. Let’s define a contract where the result is used.

We’ll define a contract for providing secrets.

Contract

On the requester side, the contract allows to define the following options:

options = {
  mode = mkOption {
    description = "Mode of the secret file.";
    type = str;
  };

  owner = mkOption {
    description = "Linux user owning the secret file.";
    type = str;
  };

  group = mkOption {
    description = "Linux group owning the secret file.";
    type = str;
  };

  restartUnits = mkOption {
    description = "Systemd units to restart after the secret is updated.";
    type = listOf str;
  };
};

On the provider side, the contract just defines the resulting path where the secret will be located:

options = {
  path = mkOption {
    type = lib.types.path;
    description = ''
      Path to the file containing the secret generated out of band.

      This path will exist after deploying to a target host,
      it is not available through the nix store.
    '';
  };
};

Requester Side

Nextcloud needs a password for the admin user. It defines the adminPass option like so:

adminPass = lib.mkOption {
  description = "Nextcloud admin password.";
  type = lib.types.submodule {
    options = contracts.secret.mkRequester {
      mode = "0400";
      owner = "nextcloud";
      restartUnits = [ "phpfpm-nextcloud.service" ];
    };
  };
};

Then, in the config section, it uses the secret like so:

services.nextcloud.config.adminpassFile = cfg.adminPass.result.path;

Provider Side

A first possible provider is Sops.

options.shb.sops = {
  secret = mkOption {
    description = "Secret following the [secret contract](./contracts-secret.html).";
    default = {};
    type = attrsOf (submodule ({ name, options, ... }: {
      options = contracts.secret.mkProvider {
        settings = mkOption {
          description = ''
            Settings specific to the Sops provider.

            This is a passthrough option to set [sops-nix options](https://github.com/Mic92/sops-nix/blob/24d89184adf76d7ccc99e659dc5f3838efb5ee32/modules/sops/default.nix).

            Note though that the `mode`, `owner`, `group`, and `restartUnits`
            are managed by the [shb.sops.secret.<name>.request](#blocks-sops-options-shb.sops.secret._name_.request) option.
          '';

          type = attrsOf anything;
          default = {};
        };

        resultCfg = {
          path = "/run/secrets/${name}";
          pathText = "/run/secrets/<name>";
        };
      };
    }));
  };
};

The settings field is really passthrough here as it’s an attrsOf anything. TBH I was lazy and just let the upstream sops module defined the options.

The resultCfg sets the path where the secret will be located.

The config part is pretty simple:

config = {
  sops.secrets = let
    mkSecret = n: secretCfg: secretCfg.request // secretCfg.settings;
  in mapAttrs mkSecret cfg.secret;
};

End User Side

The plumbing now must be done in both ways. First, the requester side to define how to generate the secret:

shb.sops.secret."nextcloud/adminpass".request =
  config.shb.nextcloud.adminPass.request;

Then, the result side must be given back to the requester module:

shb.nextcloud.adminPass.result =
  config.shb.sops.secret."nextcloud/adminpass".result;

This double-sided plumbing is a bit annoying but the following doesn’t work:

shb.nextcloud.adminPass =
  config.shb.sops.secret."nextcloud/adminpass";

Hardcoded Secret

In all the NixOS tests involving secrets, I saw the usage of pkgs.writeText to generate a file and the path to that file was given to the option needing it. This has one major flaw, this doesn’t ensure that the file is generated with correct permissions. In fact, dealing with that is currently left as an exercise to the end user.

Thanks to contracts though, testing the permissions become easy. We can create a new provider that wraps pkgs.writeText and sets the permissions accordingly. The full module is here:

{ config, lib, pkgs, ... }:
let
  cfg = config.shb.hardcodedsecret;

  contracts = pkgs.callPackage ../contracts {};

  inherit (lib) mapAttrs' mkOption nameValuePair;
  inherit (lib.types) attrsOf nullOr str submodule;
  inherit (pkgs) writeText;
in
{
  options.shb.hardcodedsecret = mkOption {
    default = {};
    description = ''
      Hardcoded secrets. These should only be used in tests.
    '';
    example = lib.literalExpression ''
    {
      mySecret = {
        request = {
          user = "me";
          mode = "0400";
          restartUnits = [ "myservice.service" ];
        };
        settings.content = "My Secret";
      };
    }
    '';
    type = attrsOf (submodule ({ name, ... }: {
      options = contracts.secret.mkProvider {
        settings = mkOption {
          description = ''
            Settings specific to the hardcoded secret module.

            Give either `content` or `source`.
          '';

          type = submodule {
            options = {
              content = mkOption {
                type = nullOr str;
                description = ''
                  Content of the secret as a string.

                  This will be stored in the nix store and should only be used for testing or maybe in dev.
                '';
                default = null;
              };

              source = mkOption {
                type = nullOr str;
                description = ''
                  Source of the content of the secret as a path in the nix store.
                '';
                default = null;
              };
            };
          };
        };

        resultCfg = {
          path = "/run/hardcodedsecrets/hardcodedsecret_${name}";
        };
      };
    }));
  };

  config = {
    system.activationScripts = mapAttrs' (n: cfg':
      let
        source = if cfg'.settings.source != null
                 then cfg'.settings.source
                 else writeText "hardcodedsecret_${n}_content" cfg'.settings.content;
      in
        nameValuePair "hardcodedsecret_${n}" ''
          mkdir -p "$(dirname "${cfg'.result.path}")"
          touch "${cfg'.result.path}"
          chmod ${cfg'.request.mode} "${cfg'.result.path}"
          chown ${cfg'.request.owner}:${cfg'.request.group} "${cfg'.result.path}"
          cp ${source} "${cfg'.result.path}"
        ''
    ) cfg;
  };
}

Now, in a NixOS test, whenever a secret must be given, we can use this provider to test that the permissions are accurate:

shb.vaultwarden = {
  databasePassword.result = config.shb.hardcodedsecret.passphrase.result;
};
shb.hardcodedsecret.passphrase = {
  request = config.shb.vaultwarden.databasePassword.request;
  settings.content = "PassPhrase";
};

Generic Tests

Speaking of tests, we didn’t yet talk about how to enforce a provider is acting as the contract expects it. This is done with generic NixOS tests. They are generic because they are a function defining the test script and expecting to be given the provider module to test.

For file backup contract, the generic test is located here. The interesting bits are that the setting options for the request part of the contract is hardcoded. This ensures all contracts work the same and we can’t easily cheat.

The test script is pretty classic: it creates some files, backs them up, deletes them, restores them and then asserts they’re identical as the original files. One important aspect is that to start the backup and restore the files, we use the systemd service and restore script provided by the provider.

We must then instantiate this generic test for every provider implementing the contract. It is done for Restic here. Btw, I don’t like that I hardcode the username to be root and then something else. I’d love some randomly generated name here, but we don’t have that AFAIK.

There are also generic tests for the secrets contract and instantiation for the hardcoded secret provider. Same for the stream backup contract and the Postgres instantiation.

Tour of the Code

The contract library, containing the machinery, is in /modules/contracts/default.nix. In the same folder are locate all contracts. The generic tests, as well as documentation for each contract are located in a subfolder with the name of the contract.

Tests for the contracts are located in test/contracts.

Drawbacks

Discovering which provider support which contract can be hard. There should be a generated index in the documentation.

Not sure about other drawbacks, but I’m curious what the community thinks.

Alternatives

I’m not sure there are alternatives. Tweaks in the implementation or interface, sure, but I never saw an alternative after the 2 years I’ve been working on this.

Prior Art

Same here, I’ve never seen anyone talk about this or projects tackling this in the NixOS ecosystem.

In the software world in general, it definitely exists already. The design uses structural typing which I know is used in Python with the name duck typing.

In Kubernetes, I heard there is a reverse proxy operator that is essentially a contract but I’m not familiar with the ecosytem.

Not sure it’s fair to categorize this as prior art, but I gave two talks about this. One at NixCon Pasadena 2024 and the other at NixCon Berlin 2024.

Unresolved questions

There are things I’d like to do better. If anything, the plumbing the end user needs to do is a bit ugly. You can see my full config here and judge for yourself.

Also, I’m sure there are still unknown unknowns which will be discovered as I (we?) define new contracts and providers.

Future work

There is already a path for upstreaming the backup contract. I think this one is the best to start with because very few modules, if any, provide such a feature. Adding this contract thus adds a lot of value to the NixOS ecosystem and doesn’t require adding backwards incompatible changes, lowering the friction to adoption.

In parallel, adding more contracts is crucial. Those I already see useful, in no particular order, are:

Setting up a database
Reverse Proxy
LDAP
SSO
Mountpoint? (Think ZFS dataset or actualy mountpoint - it’s a directory provided by something)

Parting Notes

I want to insist on what this RFC’s essence is. What I want to get across is that we need more decoupling in NixOS. The structural typing is less important, it’s what I chose to implement this and I think it’s a pretty good fit with the NixOS module system. What’s even less important is the shape of the contracts themselves. I can see multiple overlapping, subset or superset contracts existing in nixpkgs without any issues. For example, there could be a file backup contract without pre/post script hooks and one without. Both can live in nixpkgs and evolve separately or together.

Thanks for reading! I’m eager to get feedback on this.

Cheers,
ibizaman

nbp · January 6, 2025, 11:54am

Thanks a lot for looking at this problem and trying to draft a potential solution. This is a problem I failed to address in the past due to not having a good way to map / bind the results to the proper provider.

Having a contract ensures that we have a single common interpretation which can be served by various provider and used by various requester.

However, I foresee some short-comings, such that contracts are only providing a subset of all implementations, which is asserted by the function argument of mkRequest (used with the argument of mkRequester). This is problematic as this would only offer a limited set of options and limit the usage of new features of providers, or limit the usage of contracts due to required features by some requester. I think the function argument assertion induced by the function would be a better fit when one binds a request to a provider. Doing it at the binding between a provider and a requester would help at making contract extensible as more requirements show up on the requester side. On a similar note, the module system should already provide all the support for having default values, documentation, extensibility and asserting when extra options definition or non defined options are used. The need for function arguments to enforce contracts should not be necessary.

Another aspect which is along the same lines is the fact that mkRequester is shown as defined in the options set. It would be nicer to have a submodule with no modules, and extend this submodule within the definition, by extending it with the result of mkRequester. This would help keep definitions closer to the other definitions of the modules and avoid mixing these with the interface / option declarations of the module.

malikwirin · January 6, 2025, 2:39pm

Can I imagine one advantage for the user to more easily replace specific usecase implementations with other ones a user might prefer?

Like replacing openssl with libressl in more places more easily for example

paperdigits · January 6, 2025, 6:05pm

Yes, exactly. Or replacing nginx with caddy or haproxy.

ibizaman · January 6, 2025, 6:35pm

I think I get it but please let me rephrase to make sure I understand your point.

First, your arguing against having the mkRequester and mkProvider functions because structural typing does not require those to exist. One should write requesters and providers by hand by explicitly stating all options they want or they provide. The downside of having those functions is that they are not composable on top of being redundant with what the module system provides.
Is that a correct summary?

I agree with you, these are not required. They arose naturally from a desire for uniformity and from the tediousness of writing all the options by hand, especially in the context of changing interfaces. They are not required but I think they should still be included, as part of the official library, if anything to avoid anyone maintaining more than a couple modules the labor of writing them themselves and having in the end multiple not quite compatible implementations. That being said, they should be made composable indeed so that it is easy to create a union of two or more contracts.

As a concrete example, we didn’t create functions for configuration file generation including secrets that must be provided out of band. The result is maintainers of multiple modules have reinvented the wheel ion slightly incompatible ways.

Your second point is about why does those functions return a list of options instead of a full fledged submodule, right?

This is because the default values given by the requester or the provider must be allowed to depend on other option’s values. And especially in the context of having a contract under an option of type attrsOf (submodule ({ name, … }: …)) where you want the default values to depend on the name argument of the function. Returning the list of options from the mk* functions was the best middle ground between convenience and practicality I could find.

Ma27 · January 6, 2025, 6:39pm

Yes, exactly. Or replacing nginx with caddy or haproxy.

I think this is a different problem: for caddy vs nginx you have to replace services, for openssl vs. libressl you have to override packages to replace openssl. That’s what we have overlays for essentially.

I’ll read through the suggestion soonish, probably after vacation.
Looking forward to your suggestions as maintainer of the motivating example, Nextcloud

numinit · January 6, 2025, 6:54pm

Great explanation, I needed this framing a bit to unpack what I was reading about

Edit: To the above comment: the sixos talk was another good take on the module system’s merge being more limiting than the package system’s override feature. Which is why overriding, say, OpenSSL, is a package specific operation but not a module specific one. In sixos they become the same thing.

ibizaman · January 6, 2025, 7:16pm

It depends at what level you’re talking. At the package/derivation level, we have already the means necessary to replace a package with overlays.

If you use openssl to produce something, like certs, then yes exactly, that’s the goal!
Well, to be fair, this RFC is more about providing a generic and more hands off way of doing this. For certs for example, usually modules have an option of type path which expect a cert to exist at that path. So one could argue this RFC is not necessary here because you’re free to generate this cert how you want already. But this leaves out part of the picture, like you also need to give it the correct Unix user. For now, the only way of knowing that is through the documentation. This RFC is about writing this in the code.

It’s exactly as @paperdigits said!

EDIT: and as @Ma27 said too, looks like we replied at the same time And thank you so much for maintaining the Nextcloud module! I learned so much from reading through the code. I also want to make clear I’m absolutely not picking on the Nextcloud module, to the contrary. I don’t think I wrote it that way but just in case, I wanted to make it clear.

ibizaman · January 8, 2025, 9:44am

As a concrete example of the use of this RFC and specifically the backup contract.

This breaking change changes the location of a directory. The end user can only know about this by reading this thread. With a backup contract, this would be expressed in code without the user even needing to know about it.

Atemu · January 8, 2025, 11:55am

I’ve only had a glance at what you wrote because it’s mostly stuff that seems obviously right to me.

I’d like to see the more common software engineering terminology used here though because this is effectively interfaces/typeclasses/traits. You define the desired features in the interface, all implementations must provide implementations for those features and this is verified automatically.

I have not looked too closely at the detailed implementation yet because I think there’s still a major issue to resolve at a more abstract level: How to handle differences between mostly similar implementations?

I think this issue is also what @nbp alluded to.

While most of the functionality can (and IMHO should) be abstracted between “equivalent” implementations, the reason there even are multiple implementations of ostensibly the same thing is that they implement some details differently and, while mostly the same, still have different sets of features.

It’s these implementation-specific features I’m concerned with because they’re sometimes indispensable. It may very well be the case that my service depends on some specific feature that only a subset of all implementers of the generic interface offer and that must be a legal case in this abstraction system.

It must therefore be possible to have optional parts of the interface. These would ideally be abstracted to some degree as to not be entirely implementation-specific. Services would then need to explicitly request the features and there should be at least one implementation. Using the service with an implementation that does not implement the required feature should result in an eval error that ideally lists all alternative implementations which do implement the requested feature.
Given that interfaces are treated as types in most languages I know of, it probably makes sense to implement them as module system types too.

Since a mechanism for optional features would be required anyways, you could perhaps even build the entire interface definition out of optional features because they all need to be verified anyways and this could allow some niche implementations for some use-cases that don’t implement all the features that the abstract interface for its kind of service usually assumes.

I also see this RFC to be a precursor for non-systemd service manager support which is in turn required for NixOS on non-Linux kernels.
It’s much the same issue: systemd and other service managers implement the same core feature set (define services, how to run them and dependencies between services) but have significant differences in how that’s implemented and what additional options you can declare. As I’ve written before in RFC 163, I think it wouldn’t be acceptable to dumb down our service definition interface to the lowest common denominator, especially when the alternative options are, frankly, obviously technologically inferior.

With a service manager interface with optional features, it’d be possible to incrementally introduce alternative service managers without causing unreasonable maintenance overhead for module maintainers or any downsides for systemd users.

aanderse · January 8, 2025, 12:13pm

@Atemu i wonder if you have taken this idea further than it was specifically intended to go by mentioning service manager abstractions…

i think the idea @ibizaman mentioned here is extremely useful as a loose pattern for higher level patterns that emerge when the abstraction remains simple

let’s take the concept of a reverse proxy as an example: to cover the full capabilities of a web server and abstract differences between nginx, caddy, and apache i think we might fail (or have a very challenging road ahead of us)… but if instead we simply offer a small feature subset of a simplified reverse proxy i think we have a challenge that can effectively be solved and offer a ton of value to nixos - no we do not cover every case, but let’s count our victories where we can because an abstraction that covers 80% of cases (for example) is still pretty good

but maybe i don’t represent the views of @ibizaman at all here… this could be simply my opinions… regardless, please consider them though

Atemu · January 8, 2025, 2:46pm

I really do believe that it’s exactly the same problem and that solving this problem properly for e.g. nginx, backups or databases would pretty much solve the service manager problem too.
With the envisioned architecture, there’d just simply be an interface for service managers and any service manager module could implement it.
Optional interface features would make it possible to realistically introduce a service manager besides systemd incrementally without affecting systemd users or creating maintenance nightmares for people who don’t care for alternative service managers.

To limit scope creep (and very likely nasty people showing up), I’d be absolutely fine if this wasn’t explicitly mentioned in the RFC text but “other similar cases” were to be considered

As I said though, this RFC would effectively solve the core of that problem.

Right but I feel like that falls exactly into the trap that @nbp mentioned: You get the lowest common denominator.

If you at any point wanted to make use of some functionality that only some of the implementations support in this architecture, you’d necessarily have to drop the other implementations entirely which is going to make a bunch of people mad.
The more likely thing to happen is that useful functionaltiy of some implementations will go unused and, at that point, why even have multiple implementations? As mentioned, different feature sets are the reason we even have different implementations of basically the same thing and that’s a good thing IMHO.

waffle8946 · January 8, 2025, 2:55pm

I agree. To rephrase and make this more explicit: there is little benefit to supporting multiple impls if we can’t even use the specifics of that impl. Why would I switch to caddy if I can’t use caddy’s specialties? (I don’t know caddy, it’s just an example.) And if our impl is too brittle to support including caddy’s specialties, then we would be back to where we are today. Hence having optional vs. required abstractions would be necessary to put us in a better spot.

If this works well, then we could consider applying the same idea, in a future RFC, to alternative service managers, but I don’t think we’ve even demonstrated that this will work well enough yet.

ibizaman · January 8, 2025, 4:31pm

@Atemu I agree with you indeed about this RFC being a good candidate to decouple from systemd. This RFC is about finding a good pattern for decoupling interfaces from implementations after all.

Also agreed on the vocabulary. I’m not set on the names I chose. I’m familiar with Haskell, Go and Python and this structural typing really matches indeed interfaces I know from Go. From anecdotal evidence, I talked with a handful of people before posting the RFC and none were familiar with Go interfaces. That’s why I went with this less precise vocabulary in the first place. All that to say I’m open to suggestions.

About optional features, I fully understand and agree with your point. I didn’t think much about how to handle this though. I took inspiration from go interfaces and tried to keep the interfaces small. The reason was that to make interfaces with more functionalities, you could create an union of those interfaces. Or you could create a superset. I think the discussion will resolve in what’s a good pattern to handle set of options and how to make unions out of those.

I’m curious about what you mean by slightly different implementations. If we use structural typing as proposed, there must be a rule saying that every module implementing a given set of options must behave in the exact same way. This rule makes me think of parametricity in Haskell where a parametric function can’t inspect the actual type given to the function and change its implementation based on that. It’s a property I’d like to keep unless someone convinces me otherwise.

So I suppose by slightly different you mean 2 implementations having mostly the same set of options and only a couple different ones? Their sets being disjoint. TBH I have an easier time thinking about these kind of things with concrete examples but if we manage to keep the number of options in interfaces to only a few, we could maybe manage to have a common interface and then different ones for each implementation?
I’ll have to play with this more.

Intuitively, I’d say a making a big interface and making some parts of it optional is less ideal than having small ones from which you can construct a bigger one. I can’t express why I think that though.

@aanderse I agree to not scope creep this RFC Im repeating myself but again I really want to keep this RFC about finding a pattern and the actual contracts/interfaces that come with it be good, useful, complete and diverse enough but it shouldn’t implement all possible contracts. My hope is we can find a relatively small set of contracts and adding them all together will give us most use cases we’ll encounter later, which will result in a good pattern.

@waffle8946 Indeed, if you end up with only one implementation for a contract, you loose some of the benefit of having a contract in the first place. But you still keep other benefits: decoupling of functionality, able to test with a stub, possible code reuse and also allowing the end user to implement the contract with something that links the NixOS world with the outside world.

As far as this RFC is concerned though, I would leave those kind of one-off contract out.

ibizaman · January 8, 2025, 4:40pm

Another good use case covered by this RFC Breaking changes announcement for unstable - #71 by Scrumplex

We would be able to tell programmatically what user a secret should be created for, eliminating the kind of bugs the comment is warning about.

I begin to like this breaking changes topic. It’s helping me in the end

ibizaman · January 8, 2025, 8:44pm

Linking to Modular services by roberth · Pull Request #372170 · NixOS/nixpkgs · GitHub which has common goal, just so the link goes both ways.

sliedes · January 9, 2025, 12:20am

I love this proposal and the direction it’s taking! The issues you’ve identified—duplication, tight coupling, and lack of modularity—really resonate with me as some of the key challenges in the ecosystem, as a relative NixOS newbie.

I’ve been toying with a similar-but-different idea, which is certainly more controversial and not nearly as fleshed out. I don’t want to derail this thread with too many details, but I’d like to mention a few thoughts that might complement or extend the discussion:

Package Overrides as a Use Case: I think your focus on services is spot on, but package overrides are another area where I think this could shine. For example, overriding Python packages or Rust packages in nixpkgs seems to be very different from overriding other packages. I think there are probably good reasons for all this, but at the same time I feel there could be at least some kind of a common interface, with a good specification and documented behavior, that could be lurking there. Have you considered this as a potential use case? Is it even realistic? Could it be served by the same mechanisms?
Typed Interfaces for Contracts: One frustration I often encounter when trying to override packages is that adding unused fields to an attrset typically results in… absolutely no change at all. It’s easy to end up making modifications that are ignored. As a beginner, this happens to me a lot when trying to override some aspects of packages, and I’d be shocked to learn it doesn’t happen to more experienced people too. I wonder if contracts could specify the exact set of fields expected in certain cases? This might help guide users towards a well-defined interface and avoid silent failures.
Ecosystem vs. Language Problem: While I think this is at its core more of an ecosystem issue than a language one, I think the lack of typing in Nix makes it harder to naturally guide users toward thinking in terms of interfaces. A more structured approach like your contracts proposal could help address this.
Testability and Specs: Having a clear spec for what is expected of interface implementers is indeed crucial. This may just mean tests, especially ones that can validate adherence to a contract in isolation; they seem like an essential part of this. I’m really glad you’re highlighting this aspect with the NixOS VM tests. I think a lot of the current problems come from all local problems being solved with rather ad hoc solutions that are all unique snowflakes.

waffle8946 · January 9, 2025, 1:16am

This has little to do with NixOS, though.

Same as above. Because extra attrs are just envvars. But more builders should enforce using env or passthru as needed.

This is an unrelated topic and doesn’t need an RFC in any case, just send a good PR.

sliedes · January 9, 2025, 2:16am

Ok, maybe I misunderstand something, but this seems like a very strange claim, unless you make some big distinction between NixOS and nixpkgs. Clearly the 40 different mechanisms for overriding functionality in nixpkgs have a lot to do with at least nixpkgs? And the idea of standardized interfaces seems to apply quite directly. What am I missing?

waffle8946 · January 9, 2025, 2:44am

It’s not a claim, it’s just a fact. .override, .overrideAttrs, .overrideScope etc have nothing to do with the module system. Frankly nixpkgs and nixos have nothing to do with each other, other than being hosted in the same repo, and the fact that NixOS has a hard dependency on nixpkgs.

The only project I know of that tries to unify the interface between nixpkgs overrides and NixOS modules is @amjoseph 's sixos which create an “infusion” interface for both. You can google if you’re curious. Those ideas do need more attention, but that’s far beyond the scope of what’s discussed here.