Pre-RFC: Decouple services using structured typing

ibizaman · January 9, 2025, 8:30pm

Indeed those are interesting topic but as mentioned by @waffle8946 these are out of scope for the RFC.

About 1. Package Overrides as a Use Case did you know some modules allowed you to override the package? Like this one.

sliedes · January 10, 2025, 12:39am

Is that because you’re not looking to design a generic interface/implementation mechanism? It just felt like another use case to inform the design if that was the goal. But perhaps you were thinking about something much more specific? To me, your interface idea looks in principle quite generic and not limited to services, and I’d be delighted if the same mechanism could be applied to different domains. But, sure, it also makes sense to focus on a specific use case if that’s what you are looking to solve.

And in any case I’m happy about this proposal, because I think more thought out software engineering methodology is what Nix needs.

waffle8946 · January 10, 2025, 2:14am

Yes, but we haven’t demonstrated this idea for services yet. Of course work can be done in nixpkgs in parallel, but that’s a whole other discussion to be had there. Adding that to this RFC means more to bikeshed about, at least from past observation.

ibizaman · January 10, 2025, 5:25am

If we can solve more issues even with this RFC, I would be delighted. But I don’t see at all how it could be applied to the problems you described. If you have any idea I would be happy to hear it!

Indeed the goal is to decouple services - modules in nixpkgs - using structural typing.

sliedes · January 10, 2025, 6:12pm

Ah, now I see what you mean! I wasn’t suggesting adding those use cases to this RFC (I agree that wouldn’t be a good idea)—just that keeping additional use cases in mind might help refine the abstractions further. By thinking about corner cases from other domains, the resulting interfaces could become even neater and more versatile. To be clear, I’m not looking to hijack this RFC or propose competing ideas.

Fair enough! I think I’ll have to flesh out applying this to the cases I mentioned a bit more.

Perhaps I can clarify where I’m coming from.

The motivation for my thoughts comes from experiences with Nix the language, nixpkgs, and NixOS. Package overriding felt like an example that I encounter often, but it may have been more misleading than helpful. (I hadn’t even thought about NixOS modules, which perhaps underscores that I’m not communicating very clearly. ) In general, a lot of what feels incoherent to me seems to stem from the lack of tools to encourage structural typing or coherent interfaces. Without these, the path of least resistance often leads to ad hoc solutions.

So my thinking isn’t primarily about “making package overrides easier” but about encouraging some sound software engineering principles and providing tools to support them, primarily structural typing, contracts and the subsequent ability to test components in isolation (I’ve actually framed it more as Haskell style type classes, but I think there’s no big difference). I believe approaches like these will demonstrate their value when applied, and I hope people will over time apply them to other parts of the ecosystem, making everything more maintainable and cohesive.

This is why I’m excited by your RFC from a general perspective; I hope it will have a ripple effect beyond service decoupling.

ibizaman · January 15, 2025, 9:57am

I’m excited because all the comments so far are really positive. Thank you all for that!

If you see some issues, speak up! If not, I’ll be writing the real RFC soon with an accompanying draft PR.

Ma27 · January 21, 2025, 12:01pm

So, first of all I do think that having a way to decouple services themselves from the stuff they need around the is useful. One reason is that you don’t have all of it one the same machine (larger setups don’t have applications and their DB servers on the same machine for instance).

In fact, I asked myself a similar question regarding when to provide nginx config for a module (Guidelines / Recommendations for when to configure nginx in a service module? · Issue #277723 · NixOS/nixpkgs · GitHub), though I didn’t really like my conclusion which was essentially that it only makes sense if it’s “actually necessary”, i.e. if the config is more than just a fcgi_pass/proxy_pass (as it’s in the case of Nextcloud and most PHP applications in general).

Before I give feedback, I have a few questions regarding my understanding of your proposal:

in the “File Backup Contract”: which module is the “requester side” here? The nextcloud module? I would’ve expected so (given that this request/provider thingy was suggested as solution to supporting >1 reverse proxy for instance). OTOH it seems as if the information is mostly used for the restic module and the Nextcloud module mainly gives information about what to back up.
So my question mainly boils down to “why is Nextcloud requesting a backup here?”.

In the “Secrets contract” you add a “Contract” section: this is what the requester-side requests, correct? And is this also what the Nextcloud module (assuming it’s the requester) can use? I think this would answer my first question.

Now, where does the “request” on the sops module come from?

Then, to a few technical questions:

In the repo you linked you define the options on the requester-side in a function. Is it somehow possible to “inject” additional options into the requester from somewhere else? E.g. it’s possible to inject additional options into the option tree (including submodules) from other modules, hence the questions.
Do you have any ideas on how to make sure that requesters and their providers are discoverable, e.g. through the manual? E.g. we generate documentation for all options, but I don’t see yet how & where these functions fit in.
Given that both requester and provider have their own little (sub)-modules already: how is the merging done? E.g. is the restartUnits = [ "phpfpm-nextcloud.service" ]; in the mkRequester-call merged with additional declarations of the same option? (Assuming same priority, i.e. I don’t use mkForce).
Given the current implementations, how well does error reporting work? I’m a little afraid that this is a potential source for even more obscure errors.

I hope these questions make sense even though I probably missed some information.

Short of technical questions, there’s another thing I’d like to bring up: this is a powerful tool that brings a lot more flexibility. Now, nixpkgs is a project with a lot of things being done ad-hoc where the line between doing the right thing and pissing someone off is sometimes pretty thin.

From an RFC I’d wish some base rules on how maintainers (and users) should collaborate here. Obviously, this should just be a default ruleset, if provider/requester maintainers of a subsystem can agree on different rules, that’s fine. In the end I don’t want to constrain people, but make sure that the life’s of maintainers from affected modules don’t get significantly harder:

How do we expose maintainer information to e.g. the option search? I think it’s neither useful if I support Nextcloud+httpd (given I hardly know anything about the latter) nor do I want to spend time on playing first-level support, i.e. forwarding bugreports for providers to the responsible people myself.
How long is it OK to wait on version updates for provider maintainers to fix their code? Another Nextcloud example: we have a bunch of conditionals in place that check for the Nextcloud version in nginx because of certain differences in how the configuration must look like. Generally, I think there should be a timeout when it’s OK to (temporarily) mark a provider as broken (btw, how would we mark providers as broken? ). This is especially relevant for security releases that require changes in modules.

I think I forgot about something on that end, but I can’t remember now.

I wrote these as questions on purpose because I don’t have good answers myself yet, but I’d like to share my thoughts and get other people to share their thoughts on that.

nyanbinary · January 26, 2025, 1:26pm

Any updates to this?

ibizaman · January 27, 2025, 12:09pm

I didn’t have time yet to right a proper response to @Ma27. I wanted to do that first and then write the RFC and the draft PR. That’s about it

ibizaman · January 27, 2025, 10:19pm

In the “File Backup Contract”: which module is the “requester side” here? The nextcloud module? I would’ve expected so (given that this request/provider thingy was suggested as solution to supporting >1 reverse proxy for instance).

Indeed, Nextcloud is what I called the requester. You correctly noticed that I abused the vocabulary I chose arbitrarily and it doesn’t really fit here.

OTOH it seems as if the information is mostly used for the restic module and the Nextcloud module mainly gives information about what to back up.

Right. Here, Nextcloud doesn’t need any info from Restic. I suppose we could see this as a special case of a more general contract where the requester needs info from the provider and similarly in the opposite way.

So my question mainly boils down to “why is Nextcloud requesting a backup here?”

TBH I can’t find a word/concept that express correctly that Nextcloud gives information about what files to backup as well as fits well with other backups.

In the “Secrets contract” you add a “Contract” section: this is what the requester-side requests, correct? And is this also what the Nextcloud module (assuming it’s the requester) can use? I think this would answer my first question.

Correct. The requester (Nextcloud) here tells the provider (Sops) the various properties the secret should have (mode, owner, group, etc.). Here, it’s a bidirectional contract because the provider (Sops) will also tell the requester (Nextcloud) some info (the path where the secret will be located).

Now, where does the “request” on the sops module come from?

I’m not sure about what you’re asking here, so let me elaborate with the services.nextcloud.config.adminpassFile option. Let’s assume it uses the secret contract under the shb.nextcloud.adminPass option and the Sops file contains the corresponding secret at nextcloud/adminpass. The user would create the Sops secret like so, letting Sops know about Nextcloud’s request:

shb.sops.secret."nextcloud/adminpass".request = config.shb.nextcloud.adminPass.request;

And then the user would let Nextcloud know about Sops’ result - the path of the secret.

adminPass.result = config.shb.sops.secret."nextcloud/adminpass".result;

Does that help clarify what you were asking about?

One other aspect that’s not obvious here is that the user complements the request when they define the sops.secret. That option is an attrsOf, so the user must give a name there which corresponds to where the secret lives in the Sops file.

So you have:

The requester sets some options on the provider. Those options are only the options declared in the contract.
The user sets remaining options on the provider. Those options are those not set in the contract but needed by the provider. A good example is with the backup contract where one can define the backup schedule.

In the repo you linked you define the options on the requester-side in a function. Is it somehow possible to “inject” additional options into the requester from somewhere else? E.g. it’s possible to inject additional options into the option tree (including submodules) from other modules, hence the questions.

Not readily with the functions. Like you mention later, the requesters and providers use their own sub-modules. Until now, I didn’t see the use for adding options not part of the original contract to this sub-module. If the requester needs two contracts, I think it makes sense to have two of those sub-modules.

It’s extremely probable I’m wrong on this point. And I’d love if someone can show me a counter-example to work with. I can imagine we can merge the result of mkRequester calls.

Do you have any ideas on how to make sure that requesters and their providers are discoverable, e.g. through the manual? E.g. we generate documentation for all options, but I don’t see yet how & where these functions fit in.

I didn’t dive much into this yet. It’s annoying IMO to have all options repeated in the documentation for each requester/provider contract. Taking the shb.nextcloud.adminPass secret contract example, we should just see in the doc nextcloud.adminPass and type = contract.secretContract; with a link to the contract. Something like that. Currently, we see all options as you can see in the doc of my project. We can probably make this happen by using a new argument to the submodule function?

We will also want to build an index of all requester services and provider services using a particular contract. One way to make this happen is to have this new submodule argument in the style of contract = nullOr str; which specifies which contract the submodule belongs to. I don’t like this solution because it tramples on structural typing. It also makes it necessary to think about namespacing. If one contract is called backup, does that mean it’s the true and only backup contract? How would we name others? As I’m writing this, I realize we need to name a contract and I’m doing that already. So maybe this is okay.

I agree it’s an important to get this right from the start.

Given the current implementations, how well does error reporting work? I’m a little afraid that this is a potential source for even more obscure errors.

I don’t think it’s worse, but it clearly doesn’t help that the contract name is not appearing anywhere in the error message.

Here also, we should get this right from the start.

nixpkgs is a project with a lot of things being done ad-hoc where the line between doing the right thing and pissing someone off is sometimes pretty thin.

Haha that’s definitely true My style is to lead by example. It’s foolish to want to make everything switch to this style in one PR and was never my intention. We should start small, very small and build from there.

Ideally, the first PR would introduce an “easy” contract that provides value. IMO the best one currently for this is the backup contract. It adds value because there’s not much about backup in nixpkgs services and little ad-hoc pre-existing work on this. A bad contract to start would be the reverse proxy one. There should be about 3 to 5 services, each with a different maintainer, that agree to implement this backup contract. The goal here is to start a trend because more and more maintainers realize it’s a good style. I’d of course love if other initiatives to create PRs for other contracts or services led by others.

Btw, I’m not talking about the draft PR accompanying the RFC. That PR should implement a few diverse contracts and cover a lot of edge cases, to be sure we get this right. That PR will probably be closed after we agree on it and from that would stem other PRs, the one I talked about above included.

In the end I don’t want to constrain people, but make sure that the life’s of maintainers from affected modules don’t get significantly harder

Very good point, we should also get this right from the start.

How do we expose maintainer information to e.g. the option search? I think it’s neither useful if I support Nextcloud+httpd (given I hardly know anything about the latter) nor do I want to spend time on playing first-level support, i.e. forwarding bugreports for providers to the responsible people myself.

About triage, I think that’s a broader issue than just related to this RFC. One first step is we could surface the meta.maintainers field on a submodule in the related options. For example, I don’t see the maintainers defined here anywhere in the options documentation. I searched in nixpkgs issues but couldn’t find anything related to surfacing that field.

Honestly, I’m not sure how to make it obvious to the user who to contact in case of an issue here. I see 3 groups of maintainers:

One for each requester service. I don’t see how only part of the Nextcloud module could be maintained, so indeed maintainers of a service will need to maintain the requester parts.
One for each provider service. Same comment.
One for each contract.

Of course, they could overlap. I would even expect it to be common for all or most of services relying on a contract to be part of the maintainer group of that contract. After all, those using it have high stakes in the contract being useful for them.

I think though that contracts will in time lower the maintenance burden in general thanks to reusability. The key here will be relying heavily on the NixOS generic tests for each contract. Ideally, the differences between the httpd provider and nginx provider for a given contract will be ironed out and you, the maintainer of Nextcloud which uses the reverse proxy contract as a requester, shouldn’t see the difference. And this would be enforced thanks to an extensive test suite that each provider will need to pass.

The beauty here is that those tests are generic. I mean that if we discover an issue with httpd and we add a test case for it, that test case will be automatically applied to all other providers at the same, maybe discovering some other bugs or at least avoiding some future regression. This is to me very appealing.

Everyone using a contract will benefit from shared knowledge.

I know this is a bit idealistic and implementations will always differ but that’s already the case in software in general. This won’t be solved now but the reward for writing a test case for those generic tests will be substantially higher than currently. I hope that will be appealing to others.

In time, I think this will allow you, the Nextcloud maintainer, to have less work maintaining the integration will all supported reverse proxies.

Speaking of tests, I also really would like if we embraced using more web automation frameworks like selenium or playwright. I know it’s tedious to implement and maintain as I’m doing that right now in my project. But the cost is maybe worth the effort if one test covers multiple providers. I can imagine it’s usefulness for a LDAP or SSO contract for example.

One could even imagine creating a matrix out of the current Nextcloud test suite, running it for each reverse proxy provider.

Again, I’m maybe a bit idealistic here, but I know from experience I much prefer dealing with a failing test than with an angry customer - hum - I mean with an issue created by a user as understanding the underlying issue there is always much more time consuming.

How long is it OK to wait on version updates for provider maintainers to fix their code?

Eh, that’s a tough one. I’m not sure we should be imposing hard constraints here but I’m admittedly not a maintainer of a big nixpkgs service relied upon by a lot of people. I was hoping you would have an idea on how to answer this

Another Nextcloud example: we have a bunch of conditionals in place that check for the Nextcloud version in nginx because of certain differences in how the configuration must look like.

The versions difference is really interesting. At first glance I would say this falls on the shoulders of the maintainers of the Nginx provider, freeing your shoulders from dealing with that. Which is one of the goal of contracts in the first place.

Generally, I think there should be a timeout when it’s OK to (temporarily) mark a provider as broken (btw, how would we mark providers as broken? ). This is especially relevant for security releases that require changes in modules.

I think it will be impossible to update a contract as well as all the providers and requesters. We should probably treat this similarly to how updates to databases are handled:

Create new optional option.
1. Add generic test for it, enabled only if option is set.
2. Deprecate some old option, if relevant.
Migrate providers and requesters in parallel.
Remove deprecated option.

In other words, any migration will be a multi step process.

ttamttam1 · March 20, 2025, 3:47pm

I’ll admit I dont fully understand the rfc on the implementation level, so I may be off the mark here:

I think this is best explained with microvm. Looking at all those hypervisors, it’s safe to reason this could be abstracted into a “hypervisor” contract. However, if you have a machine that requires a hypervisor with virtiofs support, only certain “virtiofs capable” “hypervisor” providers can accommodate that. “Virtiofs capable hypervisor” would be the subset you want, but we can’t manually create a subset for every possible combination of hypervisor features.

Normally I’d use unions as well, but how do we describe a particular contract that can only exist when another is present? I.e. a union between “hypervisor” and “virtiofs capable” is great, but “virtiofs capable” can only exist if “hypervisor” already exists. I.e. it sounds to me like we need a flexible system to validate that a requester is requesting a valid contract, and that a producer is producing a valid contract, which to me sounds like we need contracts to come with rules to determine how they can be combined, excluded, or selected.

The first sulotion I’d reach for is feature flags, but maybe the existence of certain optional keys could be flags themselves? If no additional options are needed than they can just be keys to empty sets.

Alternatively contracts could come with a list of validation functions, and a special set of contract merging function (union, intersection, negation, and derived functions such as conditional and disjoint union) could be created which do the job on their name, but always merge the validation function list, which in turn can generate warnings or errors.

We also need to consider disjoint unions. For example Attic can use the local filesystem xor an s3 bucket, but not both at the same time.

nyanbinary · March 21, 2025, 1:25am

I think this should be opened as a proper RFC in GitHub (maybe as a draft state?) to get more traction/eyes on it.

ibizaman · March 24, 2025, 7:01pm

Agreed! I’m working on writing the RFC and the accompanying draft PR but it’s not there yet.

On the bright side, I’m happy and proud to announce that this RFC as well as my project it’s based upon are now sponsored by the NGI Zero Core fund: NLnet; SelfHostBlocks. I don’t know the details but NixOS Foundation is a sponsor and provides help for grantees, I found that pretty cool. I applied in July 2024 for the October batch, it got a bit delayed on their side but now it’s done. This won’t magically grant me more free time but will help me prioritize finishing this for sure.

ibizaman · May 28, 2025, 7:15am

Great news everyone. @fricklerhandwerk and I as well as @kiara and @lassulus hacked on this during the Zürich 25.05 ZHF hackathon and we made great progress. I invite you to read this section of the report Zürich 25.05 ZHF hackathon report. We created a repo with what we will propose to upstream. GitHub - fricklerhandwerk/module-interfaces

Before upstreaming though, I’ll be now working on migrating SelfHostBlocks (my project from which this originates from) to this new pattern and see if there are any tweaks needed to be made. For example, how does documentation look like and how well does this integrate with the generic NixOS tests for contracts? I already made some tests during the hackathon and things looks good.

The TL; DR: we have worked out a way for the module type system to make the following possible and type check:

{ config, ... }:
{
  services.myservice.password.provider = sops.secrets."myservice-password";
}

That’s right, it’s a one way connection from the end user’s perspective but there is still wiring in both directions happening behind the scenes.

As for such a contract, defining it is done this way:

{ lib, ... }:
let
  inherit (lib) mkOption types;
in
{
  config.interfaces.secrets = {
    description = "generate a secret that is passed out of band to the nix store";
    input = input: {
      options.owner = mkOption {
        type = types.str;
      };
      options.group = mkOption {
        type = types.str;
        default = "root";
      };
      options.mode = mkOption {
        type = types.str;
        default = "0400";
      };
    };
    output = output: {
      options.path = mkOption {
          type = types.str;
      };
    };
  };
}

Finally, using this contract as a consumer and provider is done this way:

{ lib, ... }:
let
  inherit (lib) mkOption;
in
{
  options = {
    services.myservice.password = mkOption {
      type = config.interfaces.password.consumer;
    };

    sops.secrets = mkOption {
      type = config.interfaces.password.provider;
    };
  };
}

I’m leaving out a few details here, like how the consumer and provider actually access the input and outputs, but that’s all in the repo. I’m quite biased here, but I find this very slick. I’m really happy of this progress.

ttamttam1 · June 16, 2025, 11:39pm

This looks like exactly the type of thing I need for a wirenix refactor I’m chewing on. What might stop me from using this pattern now in advance of the RFC (aside from a missing license in the repo )? It can’t be worse than the current untyped method of handling this I’m using.

fricklerhandwerk · June 20, 2025, 10:59am

@ttamttam1 In my opinion the pattern is sound, for what amounts to pure functions I think it’s rather usable already. As noted in the report you can’t have the equivalent of side effects yet, i.e. computing a value and also manipulating config. Not sure when I will find time to figure that out, but I need this and will almost certainly eventually sit down and just do it. Contributions appreciated of course!

Also thanks for the reminder, added a license.