Idea for solving flake versioning

Hi guys

Today I thought a bit about flake versioning and depending on flakes with a specific version, or version range. I’m not very happy that the current situation seems to be to just use FlakeHub.

I think Nix could solve this quite easily by itself, even without breaking flakes how they currently work.

I don’t have a very deep understanding how Nix handles Flakes and I surely haven’t thought of every edge case. So instead of proposing a finished solution, I just want to dump my head here to see if I can inspire someone with it.

Here’s my idea:

Versioned Flakes

In addition to normal flakes, there could now be a new versioned flake type. Versioned flakes would have everything a normal flake has + a new file at the top of the repository called flake-versions.json. The content of such a file could be like this:

{
   "1.0.0":"6d5fea164f44d58d55453242bed17a867e76aa8a",
   "1.0.1":"6f7d3bb008e714168777291d49d4f42faab0e37b",
   "2.1.7":"641e0cf442a9284cedf507b42ee08012b400b652",
   // and so on
}

This is basically a list of self-references, that tells nix at which commit to find which version of the flake.

Referencing versioned flakes

Now let’s assume we have a versioned flake called “superlib” and flakes A,B,C and D, that depend on each other and also on superlib

A:
  superlib
  B:
    superlib
  C:
     superlib
     D:
       superlib

By default, we would get 4 versions of superlib. The only way to get around this is to set “follows” everywhere. But that’s a bad idea, because we don’t even know with what versions of a dependency a flake can deal with and what versions wont work.

As an alternative we could allow setting the version of a dependency like this:

{
  inputs.superlib = {
    url = "github:VanCoding/superlib";
    version = "1.0.1"; # could also be ">1" or other range expressions, like NPM has
  };
}

Whenever the version is set like this, Nix would know that the flake has to be a versioned flake. If the referenced flake doesn’t have a flake-versions.json, Nix would complain.

If the version is specified, Nix would completely change how it resolves flakes. For all flakes with the same url, it would download the only one (the latest) version, just like if you had set follows everywhere. Then it would look at the flake-versions.json of that version, and look at what versions have beet requested by the whole dependency tree. There it could try to figure out the most efficient set of versions to satisfy all dependents and also add these to the flake.lock file with their commits from flake-versions.json.

When resolving the dependency tree, Nix would then give each dependent the version it previously decided on.

So to wrap up:

If 6 flakes in the dependency tree depend on github:VanCoding/superlib, Nix would look at the latest commit there, read the flake-versions.json file and then pick the matching commits from there. It would do this instead of downloading 6 different versions of github:VanCoding/superlib.

But why?

As I said before, i don’t like the proprietary, centralized approach that FlakeHub currently has. It should be possible that a flake can somehow tell the world where to find which version by itself, without any middlemen.

Additionally, this would make the use of flake versioning completely optional. For some flakes, versioning doesn’t really make sense. For others this would be very beneficial. In the way I described, these two approaches can easily coexist and be mixed.

Again, this is just my brain dump. I’d enjoy discussing this with you. My hope is that someone of the nix developer team gets inspired by this and we could see something like this implemented in the future.

3 Likes

This is inherently impossible, since changing the hash there changes the commit contents and therefore the commit hash so you have to change the hash there which changes the commit contents and therefore the commit hash so you have to…

But yes, some way to signal a semver version for a flake is sorely needed. This could be done in a backwards compatible way simply by extending the inputs or outputs schema. No need to tie this to commit hashes in-source, just put a version number somewhere and define what it means and how it should be used.

There is practically no development on the flakes concept at the moment though.

1 Like

A while ago I made a project that could do something like this without being too intrusive, and sidestepped having to go through the Nix review/RFC process, but I kinda got distracted by other things. It may be interesting as a sketch for a solution.

However, I would warn that at the time, it seemed hard to get this feature merged into official Nix stuff, because having semantic versioning was the value proposition of flakehub, which had several employees that also held a lot of power in Nix. Nowadays, the powerdynamic may be more favorable, but I’d advise that it’s probably easier to get some semantic versioning into one of the forks or making an RFC.

I’d also just say I think it would be a way better design if we could just have the semantic version directly at the same place where we declare our dependencies in the flake.nix inputs.


Also just, adding this to nix would probably be quite easy, can just steal most of the logic from rime, rewrite it to C++, and make it part of how flake input urls are resolved.

2 Likes

Ah yes, I’m aware of this “issue”, I should have mentioned how I think this will work. Basically, referring to a commit is only possible in another commit, but that shouldn’t be a problem. But sure, this means that even if only one version of a flake is used in a dependency tree, the flake has to be downloaded twice: once for reading the flake-versions.json and then once again at the commit read from the flake-version.json, which as you said can never be the same commit.

But this only happens when generating/updating the flake.lock file. Afterwards, the “true” commits we want to download are in the flake.lock file, and nix can download it directly, without looking at the flake-versions.json file.

But yes, some way to signal a semver version for a flake is sorely needed. This could be done in a backwards compatible way simply by extending the inputs or outputs schema. No need to tie this to commit hashes in-source, just put a version number somewhere and define what it means and how it should be used.

The issue with this is, that we then only have the current version of the flake, but not what other versions are available, and at which commit to find them. That’s what FlakeHub knows, for example. Or also what NPM gives us. A list of available versions based on which we can decide which versions we want to resolve to. If we only have the current version, that’s not possible. Also, there would be multiple commits that have the same version in the flake.nix file, and we cannot know which is the “correct” one.

I’d also just say I think it would be a way better design if we could just have the semantic version directly at the same place where we declare our dependencies in the flake.nix inputs.

Same here. We really need a list of all versions and where to find them, and not just the current version.

Nowadays, the powerdynamic may be more favorable, but I’d advise that it’s probably easier to get some semantic versioning into one of the forks or making an RFC.

That would be very sad. If changes like this really would be blocked, because it is against the interests of the FlakeHub owners, that would be a death sentence to the “official” Nix IMO.

1 Like

Unfortunately this is the core of the issue, if you’re out of the loop I’d suggest reading some of the announcement posts from detsys over the last 3 years or so, and the amount of backlash they’ve had to every single one of them for at least the last year, precisely because the non-detsys parts of the community are fearing this outcome.

It has already kind of happened since both large parts of the community and detsys are now maintaining forks, rather than contributing upstream, and both are seeing adoption. The detsys (flakehub owners) fork in particular has recently started introducing heavily incompatible changes, and have set a stability guarantee for flakes in the current state (whatever that means given that promise has already been broken?), so it’s almost inevitable that the flake standard will diverge.

The problems are very much politics and funding, not technical. But I do hope you continue to push this effort, even if I don’t particularly like your design, I think it’ll be hard to get nix into a stable long-term position without some good community-driven design efforts on flakes, and we simply lack the people who take charge on this.

9 Likes

I’ve read some of the DetSys announcements and have also seen the backlash. But without taking a closer look, it made the impression that they’ve always tried to upstream their work.

Also, if I recall correctly, the reason why FlakeHub exists was that when the topic about handling versioned dependencies came up, it was suggested to do solve it outside of Nix first, and that it’s probably not even a problem that Nix should solve. I’d have to read that discussion again, but I assume it was DetSys that wanted to bring this into Nix but then faced backlash.

So, if that’s the case, maybe they would still be behind bringing this into Nix directly.

A while ago I made a project that could do something like this without being too intrusive, and sidestepped having to go through the Nix review/RFC process, but I kinda got distracted by other things. It may be interesting as a sketch for a solution.

I’ve now taken a look into this. But if I understand it correctly, the mentioned project only dealt with “locking” to a specific version. It can’t really solve the problem of having more flake-versions than necessary. Let me make an example:

If dependency A requires superlib >=2.0.0 and < 4.0.0 and dependency B requires superlib >=3.0.0, we could resolve this to just use superlib 3.x, instead of using 3.x for A and 4.x for B.

I think that problem also exists for FlakeHub currently. It cannot really be solved from outside nix, because Nix just pickst the latest version the provided by the link.

Also, semnix that was mentioned basically is what I suggest versioned flakes should provide on their own. I can’t think of a reason why this information should be separated from the flake itself.

But if upstream decided they wanted to design things differently, we would get two mutually incompatible flake systems, because detsys are already relying on the way they do it?

1 Like

No, we’re committed to moving forward with upstream and reconciling those differences. Just like I’ve written before about stabilizing flakes as-is to make room for a v2. An existing thing can never cause everything to freeze forever – writing software is all about adapting to changes. Anyway, it’d be great to have Nix have native version solving of some sort in flake inputs!

5 Likes

Thanks @grahamc for confirming this! :slight_smile:

It’d be interesting what you think about my idea.

I still think that there needs to be a place where Nix can find out what versions of a flake exist, and where to find those. But I realized we probably don’t have to enforce this to be a git repo. This could also just be a URL to JSON file.

This way, FlakeHub could easily provide an Endpoint that returns said JSON. But it would still be possible to just provide it with the flake itself. It would be completely up to the Flake author how to provide it and then tell its users how to reference it as dependency in other flakes.

Also, the flake-versions.json could be extended to also support URLs instead of commit hashes to point to versions. For example:

{
   "1.0.0":"https://flakehub.com/flakes/superlib/1.0.0.tar.gz",
   "1.0.1":"https://flakehub.com/flakes/superlib/1.0.1.tar.gz",
   "2.1.7":"https://flakehub.com/flakes/superlib/2.1.7.tar.gz",
   // and so on
}

and for referencing commits it could look like this:

{
   "1.0.0":"github:VanCoding/superlib/6d5fea164f44d58d55453242bed17a867e76aa8a",
   "1.0.1":"github:VanCoding/superlib/6d5fea164f44d58d55453242bed17a867e76aa8a",

   // and pointing to tags/branches would also be possible
   "2.1.7":"github:VanCoding/superlib/2-1-7",
}

I guess I have a few concerns about flake versioning:

  1. On a philosophical level, I’m not convinced version resolution is a good idea. Having witnessed the terror of node_modules, I’m not exactly eager to adopt a system so easy it enables reckless usage of third party dependencies. Then again, nix is a very different beast so maybe my concerns are unfounded.
  2. I’m not sure how I feel about further hardcoding reliance on git. It prevents trying new SCM solutions, it’s effectively a DDOS for smaller git hosts, and it imposes unnecessary limitations on the versioning system that require workarounds (such as requiring two requests, which would amplify the DDOS problem). I understand flakes are currently architected around git, and I think maybe they shouldn’t be.

If we’re going to commit to versioning, maybe we could use a vector of semver versions instead? It’s really helpful when you have multiple API surfaces:

versions = {
    moduleOptions: "1.0.0";
    extensionApi: "1.3.2";
    packageAbi: "2.0.0";
};

Semver prime exists for this purpose but its more of a compatible hack designed for existing versioning systems than a clean solution for new versioning systems.

2 Likes

I feel you @ttamttam1 :smiley: I’m doing most of my stuff in TypeScript and know NPM dependency hell very vell.

Still, Nix is a very different type of thing and it’s probably not fair to assume we’ll have the same problems in nix. Most of the problems NPM has are because of the lack of a good JS standard library. In nix, nixpkgs is sufficient for most of the stuff.

If we’re going to commit to versioning, maybe we could use a vector of semver versions instead? It’s really helpful when you have multiple API surfaces:

Cool that you bring this up! I originally also thought about this, but then decided to not talk about it at first for simplicity. But flakes can export a lot of different things, and it might make sense to be able to version them separately. The solution I tought about was having multiple “targets” in the flake-versions.json file like this:

{
  "package":{
     "1.0.0":"6d5fea164f44d58d55453242bed17a867e76aa8a",
     "1.0.1":"6f7d3bb008e714168777291d49d4f42faab0e37b",
     "2.1.7":"641e0cf442a9284cedf507b42ee08012b400b652"
  },
  "nixosModule":{
     "1.0.0":"6d5fea164f44d58d55453242bed17a867e76aa8a",
     "1.1.0":"6f7d3bb008e714168777291d49d4f42faab0e37b"
  }
}

and then when depending on this flake you could specify the target with a special syntax, something like this:

{
  inputs.superlib = {
    url = "github:VanCoding/superlib";
    version = "package@1.0.1";
  };
}

It’d would probably make sense to allow specifying the versions for multiple targets. But the problem would then be that they could be in conflict with each other.

Compared to Semver prime, it would probably be easier to read when specifying the target of interest directly, instead of using a “global version”.

Realistically it’s common to use both the package and module if they go together. Also, there can be multiple packages and modules etc, which would explode this auxiliary file too.

Probably, yes. But just because it’s possible to version everything in the flake separately, it doesn’t mean that it should be. It’d be completely up to the author to do it when it makes sense. I think usually there would only be one version “target” in a flake.

I think the scheme described in What Is a dependency? and Minimal Version Selection Revisited might work for nix.

You can have all of:

  • Fully decentralized system without a central registry
  • Semver unification (in a resolved dependency graph, there’s only a single sever compatible version of every flake, and everything uses that)
  • Everything is strongly pined via hashes
  • No separate lock file needed (!)

What you can’t have is Rust-style maxver dependency resolution, only Go-style minver is possible (but it’s easy enough to simulate maxver with extra tooling).

Here’s the idea:

  • Each flake declares a name, and a semver version:
{
  name = "A";
  version = "1.2.3";
}
  • If flake A depends on flake B, flake A specifies B’s name, version, url, and checksum

    {
      inputs.B = {
        name = "B";
        version = "2.1.2";
        checksum = "xxxxx";
        url = "https://raw.githubusercontent.com/example/flake-B/blob/08f3bdd56fb81e61df84195f8761a838972db775/flake.nix"
      }
    }
    

    Note that the URL points at a specific version of flake

    The checksum is load-bearing, it’s the identity of the flake. The flake doesn’t have to be fetched from url, if there’s anything with the same checksum exists in any cache (local or remote), that thing is fetched from cache.

  • If flake B depends on flake C, B specifies C’s hash in its inputs. But that means that A doesn’t have to specify C’s hash, because it is pinned, indirectly, by B’s hash which A does specify.

  • Given the root flake, nix can fetch the whole tree of dependencies, and validate its integrity — any URL is reacheable from the root flake, and every url is protected by a checksum

  • After fetching all dependencies, it could be the case that there’s several different versions of a flake in in the set.

  • And this is where sever-deduplication comes in. When A specifies inputs.B.version, it is treated as a version requirement. A might get anything that’s sever compatible. So if there’s another flake somewhere, which also claims name = "B", but also version = "2.2.0" A will get that instead of 2.1.2 it directly requested.

  • In other words, input play a double role:

    • on the one hand, it specifies a version requirement: I want flake with name B with any version that is server-compatible with 2.1.2
    • On the other hand, it specifies one specific version that satisfies that requirement.
  • During dependency resolution, the “universe” of versions is everything reachable from the root flake

  • And then semver unification is executed against that universe.

  • In other-words, hash-tree structure of dependency specification gives a sort of distributed registry, and distributed lock file. Each flake contributes only its direct dependencies, but all flakes together pin everything.

  • What you can’t do in this scheme for free is to get the latest versions of things. If there’s B = "2.5.0" somewhere, but it isn’t mentioned anywhere in the department graph, than this version won’t be considered. Hence, minver.

  • But it’s easy enough to imagine an outside-of-the-core tool which does upgrades heuristically. If we depend on B=2.20, and its URL is a git repo, we can fetch its HEAD and update our dependency specification with newer version, URL, and checksum. If the URL is something with version number in in, version number is replaced with latest and redirect is followed, etc.

2 Likes

Sure!

My gut reaction is the JSON document of versions → tarballs is a good idea. Especially if that comes with hashes / signatures / other verification mechanisms. Exposing this sort of data via API seems like an obviously great thing for FlakeHub to do. A+

In terms of technical data structure, probably want to have more structure, like:

{
  "meta":  { ... },
  "versions": { 
    "x.y.z": {
      "url": "...",
      "yanked": "...",
      "latest_stable": true/false,
    }
  }
}

The part I don’t like is having it colocated with the flake itself. It makes a given project’s release process super messy, makes it harder to deal with renames, yanking releases, etc. Also, updating metadata about given releases and whatnot.

I appreciate, also, that you’re not loving a centralized package repository. However, I feel this is where a lot of value comes from for a few reasons:

  • Nix has a bit of a discovery problem, and a central record is useful for that.
  • Dependencies need to be available over the long tail, and projects like to disappear. Publishing archives to an independent location with policies about availability is important.
  • Releases should be immutable, and a central repository can make and maintain those guarantees through centrally applied policies and procedures, which is harder to achieve via individually-maintained version documents.
  • Directly fetching Nix expressions from GitHub can actually be surprisingly unstable, and I think there is very real value in pushing tarballs to a webserver. Sometimes their archive generation changes and all the hashes are invalidated, or git features like autocrlf get in the way.

Anyway, these are just some rough thoughts for you. I also appreciate not loving that FlakeHub is proprietary, which I completely understand, and won’t try to persuade you it is better this way for some reason.

Over all: It’d be great to take the semver support up a notch and not just be hyper-local per-input resolution.

2 Likes

I completely agree that versioning is best handled outside the git repo. To that end, I think keeping the standard structure of flakerefs is preferable to relying solely on git commits. I envision something like this:

{
inputs = {...};

metadata = {
  # Single semver for simplicity, but we could agree on any versioning schema.
  # `metadata.version` is specifically the version of the flake at this location.
  # i.e. the current commit.
  # i.e. the version for the inputs+outputs of this file.
  version = "1.0.1"; 
  # Alternatively, `releases` could be an attrset with keys being versions.
  # Using an array loses built in uniqueness but gains flexibility in what
  # versioning schema we use. 
  releases = [
    {
      # version must be unique
      version = "1.0.0";
      ref = {
        url = "github:example-user/example-flake/26ab0db90d72e28ad0ba1e22ee510510";
      };
    }
    {
      version = "0.0.2";
      ref = {
        type = "github";
        owner = "example-user";
        repo = "example-flake";
        rev = "26ab0db90d72e28ad0ba1e22ee510510"; 
    };
    }
    {
      version = "0.0.1";
      ref = {
        url = "github:example-user/example-flake/26ab0db90d72e28ad0ba1e22ee510510";
      };
    }
  ];
  # tags can be named anything
  tags = {
    release = "0.0.2";
    next = "1.0.1";
  };
};
outputs = {...}: {...};
}

A flake registry (or even a user with a static site and some CI), might also specify a version file, which itself is just another flake, but this time it doesn’t need to specify its own version, since everything is external. A flake registry might use an automation to convert from type = "github"; to type = "tarball";:

{
# no inputs necessary

metadata = {
  releases = [
    {
      version = "1.0.1";
      ref = {
        type = "tarball";
        # this registry uses git nar hashes for their urls, but maybe another uses
        # git commits, uuids, or even sequential ids.
        url = "flakes.example.com/flakes/N2RlMTU1NWRmMGMyNzAwMzI5ZTgxNWI5M2IzMmM1NzE.tgz"; 
        narHash = "sha256-N2RlMTU1NWRmMGMyNzAwMzI5ZTgxNWI5M2IzMmM1NzE=";
      };
    }
    {
      version = "1.0.0";
      ref = {
        type = "tarball";
        url = "flakes.example.com/flakes/MTEyMWNmY2NkNTkxM2YwYTYzZmVjNDBhNmZmZDQ0ZWE.tgz";
        narHash = "MTEyMWNmY2NkNTkxM2YwYTYzZmVjNDBhNmZmZDQ0ZWE=";
      };
    }
    {
      version = "0.0.2";
      ref = {
        type = "tarball";
        url = "flakes.example.com/flakes/NTNjMjM0ZTVlODQ3MmI2YWM1MWMxYWUxY2FiM2ZlMDY.tgz";
        narHash = "NTNjMjM0ZTVlODQ3MmI2YWM1MWMxYWUxY2FiM2ZlMDY=";
      };
    }
    {
      version = "0.0.1";
      ref = {
        type = "tarball";
        url = "flakes.example.com/flakes/NDM1NWE0NmIxOWQzNDhkYzJmNTdjMDQ2ZjhlZjYzZDQ.tgz";
        narHash = "NDM1NWE0NmIxOWQzNDhkYzJmNTdjMDQ2ZjhlZjYzZDQ=";
      };
    }
  ];
  tags = {
    release = "0.0.2";
    next = "1.0.1";
  };
};

# no outputs necessary
}

When bringing in a dependency, you can then specify some additional parameters to be managed on nix flake update:

{
inputs = {
  # completely backwards compatible
  nixpkgs.url = "github:nixos/nixpkgs/nixos-unstable";

  # can now specify a version if there's no conflicting pins such as `rev`
  example-flake = {
    url = "github:example-user/example-flake";
    version = ">1.0.0"; # standard semver for simplicity of the example
  };
};
outputs = {...}: {...};
}

nix flake update would see that there’s version metadata, use that to figure out what version to use, and potentially grab from github again if metadata.version does not exist or is too recent. It would then lock the version it pulls.

Alternatively, maybe the user uses their flake registry:

{
inputs = {
  nixpkgs.url = "github:nixos/nixpkgs/nixos-unstable";
  example-flake = {
    # this is the registry's "version flake", as outlined two codeblocks up.
    url = "tarball+https://flakes.example.com/example-user/example-flake.tgz";
    version = ">1.0.0";
  };
};
outputs = {...}: {...};
}

This would pull down the version flake mentioned before, which the client would then use to download one of the tarballs it links to. This would serve almost as a client side alternative to lockable tarballs, where instead of the registry resolving versions and returning a locked tarball, the client asks for a list of versions and locks them itself.

@matklad Interesting stuff, thanks!

I like the simplicity of the minver solution. But not being able to update to a higher version of a sub-dependency without depending on it directly looks like a big drawback to me. This way you’re dependant on the the flake authors to update their dependencies very frequently.

Also, I think having a lockfile would still be very benefitial, even with this system. A lockfile would be the shortcut, that let’s you skip downloading all the individual versions to figure out what versions you really need. The lockfile would contain only the relevant, resolved versions with their hashes. It’s no longer necessary to look at the versions specified in the flake.nix and it’s dependencies. The flakes can be downloaded right away. That’s a big time and bandwith saving for all people that want to use the flake at the top-level.

@grahamc

In terms of technical data structure, probably want to have more structure, like:

Yup, definitely! My examples were intentionally reduced to the minimum to make it easier to read. I also think it should allow for more information in practice.

The part I don’t like is having it colocated with the flake itself. It makes a given project’s release process super messy, makes it harder to deal with renames, yanking releases, etc. Also, updating metadata about given releases and whatnot.

I agree that it’s probably not the best idea to do it this way. But I’d still like if it were possible.
What I think would be nice is to accept the same formats as we’re accepting for normal dependencies. So it could be a git repo, a tarball or everything a normal flake can be. Then it’s up to the author how to distribute the versions list.

@ttamttam1

I completely agree that versioning is best handled outside the git repo. To that end, I think keeping the standard structure of flakerefs is preferable to relying solely on git commits. I envision something like this:

using flake refs for this would be the natural choice, I agree :slightly_smiling_face:
But I dislike having the “releases” list inside the flake.nix and in the nix format, because I can’t see a use case where this information is generated dynamically. If it’s in a separate file, it can also be in a different tarball or git repo. But I definitely also look at the “flake-versions”-file to be just another flake that points the effective flake versions.

You’re right that a json file would be better. Changing it to a json file also wouldn’t really change anything I just said. All the same structures and workflows could apply, with the key change being that flakerefs would allow non-git usage in central repositories, traditional artifact hosting services (which don’t have lockable tarball compat), or by people who just don’t like git for some reason.

1 Like