Brainstorming for RFC: pname and version

Yes. We should follow this as faithfully as we can.
For now, I think that stripping non-numerical identifiers at the beginning should suffice.

snapshot looks good to me.

Explain.

I do not think YYYY-MM-DD needs a huge explanation.


A thing that we need to think:

Certain softwares use various branches for its releases. Think on something like

  • A LTS branch
  • A bleeding edge branch

How should we encode this?

Thinking on it, we should mature the idea of multiple branches.

Could you add the phrase under “Terminology”?

Note: This sounds strange, but does so, since it tries not to be git specific.

Boost my motivation to want to untangle it. :wink: But I agree with git agnosticity.

@abathur Hm? 10.000 feet flight height: What you’re ± saying is version should be an attribute set, correct? — Can we easily overload it’s toString in nix?

Yeeehaaa!

nix-repl> builtins.toString { a = "b"; __toString = x: "a"; }
"a"
nix-repl> y = { a = "b"; __toString = i: i.a; }

nix-repl> "pname-${y}"
"pname-b"

We can.

Maybe it helps to frame the nixpkgs-specific_version as an implementation detail nixpgks needs for compatibility with an implementation detail of nix-env.

I’m proposing:

  1. Packagers record (among other details) a package’s official version (or none), and whatever *_date is required. These can be determined without understanding the nix ecosystem.
  2. One or more functions derive the nixpkgs_specific_version from those values using the kinds of precedence rules that are getting hammered out in this thread.

That’s in contrast to counting on the packager to decide whether nixpkgs-specific_version should = official_upstream_version or (release|tag|commit|fetch)_date.

In addition to making the packager read/consider unnecessary nixpkgs implementation details, it makes any version fields containing a date ambiguous (to reviewers, readers later, code in nixpkgs, external automation, etc.):

  • If it is formatted as nixpkgs expects, it could be:

    • an official upstream version number that just happens to be a well-formatted date
    • a correctly-formatted (release|tag|commit|fetch)_date
  • If it isn’t formatted as nixpkgs expects, it could be:

    • an official upstream version number (in which case it is incorrect to reformat it)
    • a misformatted (release|tag|commit|fetch)_date (in which case it should be reformatted).

This is roughly my point, though. A date is the simplest example I could think of, and I still see advantages to decomposing it:

  1. If the date components are recorded semantically, you don’t need a string parser and automation to modify hundreds of packages if we need to change the format.
  2. While there may be quantitative ~resource reasons not to use attrsets, I don’t see any qualitative advantage to making people (used to different local formats) manually strftime dates to write the string and strptime to read them. It seems like unnecessary friction.

Not sure what the answer is, but the bashup.events package I mentioned previously (https://github.com/nixos/nixpkgs/tree/master/pkgs/development/libraries/bashup-events) is a particularly weird example of this, so it might be worth using as a test case.

  • It isn’t really versioned, and it has two separate concurrently-maintained ~variants.
  • They have more-or-less identical APIs (but distinct implementations and performance characteristics.
    • the one on the master branch is written ~for bash 3.2 compat
    • the one on the bash44 branch uses bash 4.4 features
  • Changes are released simultaneously if they impact both branches.

My previous post may have clarified this, but: not necessarily, though that is one way to think about it.

(I’m using tediously long names to be explicit here):

Instead of a human-entered nixpkgs-specific_version attr that might contain official_upstream_version or release_date–I’m suggesting:

  • separate attrs for the upstream version, release date, and anything else necessary to decide how to generate nixpkgs-specific_version (and likewise up at the pname and name levels)
  • build those rules into reusable functions
  • (presumably) use those rules from derivation-generating functions such as mkDerivation to generate the final attr (whether it is still named version or not

I assume the organization should be driven by how many values that is (when including edge-cases) and how closely related they are.

Some of the VCS fields, for example, overlap significantly with what goes into some source fetchers. It seems ideal to single-source them, but I won’t pretend to know what should move/draw from where.

@abathur I think to get a better grasp of the idea you have in mind, I (and maybe others) would need an example attribute set without too many verbal hedging around: It’s clear that this is a discussion on the green field.

{
  version = {
    release-date = "";
    ...
  } // { 
    # I guess the implementation of this is rather irrelevant for now,
    # but if you'd have an idea, it would be great to know that, too :-)
    __toString = i: i.a; 
  };
}

It sounds definitely interesting. It is clear to me that you propose lifting responsibility from the packager to “get the string right” and rather suggest to expose a as-thumb-as-possible “form sheet” (aka attribute set). You also argue that this makes any sort of programmatic processing easier.

Hence, it’s a win-win. And I believe with the __toSting overload, the implementation is relatively cheap.

Looks interesting!

Another thing to be taken into account is how it should be mapped to Repology.

I belive labeled snapshots are still released or announced, not snapshotd (don’t fear copy-paste, fear search-replace)

Linux versioning — maybe mention other cases? Python, GHC, 
 (whenever the core for the additions typically has >2 versions, I guess)

  1. In the end of the day, version has to be represented as a string.
  2. Uniformity of this string representation is more important than uniformity of generations.
  3. Agreeing on target string format has no performance implications, unlike this attributeset thing that will cost both CPU cycles and RAM bytes.
  4. For external tools, finding version *= * and parsing the string is often easier than parsing Nix


So I recommend declaring that stuff about attribute sets out of scope and first agree on the target string format.

(For the record, I would find splitting YYYY-MM-DD date string into three Nix attributes and absolutely horrible editing UX, Nix expressions are not spreadsheets where such a split is of course a must)

4 Likes

Oh, that clarifies many things. Indeed many things discussed here can be regarded as meta attributes.

OK then, back to strings!

1 Like

Oh, that clarifies many things. Indeed many things discussed here can be regarded as meta attributes.

OK then, back to strings!

I recognise might end up losing the arguments about the splitting to attribute set, and what should go there; but I hope we nobody objects that we need to agree on the string representation whatever we do, and preferably not need to change the main part of it too many times — so I am for deciding this before (but maybe not instead of) the more bikeshedding-prone topics.

1 Like

It may not be clear exactly what I do mean, but hopefully it is clear that I’m not just here to litigate paint colors (and I agree working out the string form is still important).

I’m just trying to encourage thinking ahead about how the string form can be implemented in a way that minimizes:

  • what people need to understand/decide to get it right
  • the effort needed to iterate on the form if this isn’t one true form to rule them all forever

I’ll pull my nose back out, in any case.

1 Like

I’m just trying to encourage thinking ahead about how the string form can be implemented in a way that minimizes:

  • what people need to understand/decide to get it right
  • the effort needed to iterate on the form if this isn’t one true form to rule them all forever

Surely these are valid concerns.


 but there also:

  • effort necessary to update by hand
  • effort necessary to have an «optimistic» updater script
  • overhead for during nixos-rebuild (probably negligible either way but needs checking)
  • overhead during other Nix operations


 and the bikeshedding is in negotiating the list.

For example, yet another option could be defining the list of fields and providing the definitive script that takes JSON and produces the best inferrable version string, maybe with some suggestions about the pieces of data it wanted to check but did not receive.

I’ll pull my nose back out, in any case.

Maybe we should indeed already start discussing the preferences and the approaches in a parallel thread. I don’t think either of the two questions should be a blocker for the other one.

1 Like

In general though, I don’t think it’s a bad line of thought that a package manager has a dedicated data-structure for versions. Even if we drop that idea (for now), we should be aware of that bigger picture.

Real quick searching from python’s semver package for example yields the following datastructure:

>>> ver = semver.VersionInfo.parse('1.2.3-pre.2+build.4')
>>> ver.major
1
>>> ver.minor
2
>>> ver.patch
3
>>> ver.prerelease
'pre.2'
>>> ver.build
'build.4'

Exemplifying for all intents and purposes, only.

1 Like

Back to the tracks: version should be a string, composed in the style +key=val except by its first ‘attribute’.

If the whole over-enginneering about version-as-attr become useful someday in the future, we can resurrect meta.version.

For now I think the real point of contention is about upstream teams that provide, in a single source tree, many possible branches. Suppose a software that releases different snapshots for different environments, with both sources being derived from different branches on a single tree.

How we should encode such thing?

As a more or less concrete example: a piece of hardware whose driver is generated from a certain source tree, but every operating system it supports is dedicated to a different branch.

(Indeed, Open Sound System does it! There is a source code for FreeBSD, other for Linux and another for Solaris, but they are extracted from a same Mercurial source tree: Open Sound System download | SourceForge.net)

My first take is to make it a part of the pname.

Came across git-describe and made me think of this pre-RFC. This bypasses issues regarding a date. A convention can be that repos without any tags at all have an implied 0.0.0 tag on the initial commit.

$ git describe --tags
v0.4-2-g42a48e8

Where it is (the g is for “git”)

${TAG}-${numberOfRevisionsSinceTag}-g${shortHash}

and just

${TAG}

when directly referring to a tag.

Came across git-describe and made me think of this pre-RFC. This bypasses issues regarding a date. A convention can be that repos without any tags at all have an implied 0.0.0 tag on the initial commit.

This convention is effectively what is proposed anyway. Dates are useful information for references that do not get bumped often, so fully following git describe is not a good match. (Also, in different VCSes the notion of «number of revisions between» is tracked in different enough way to be even more confusing than it already is for a single VCS)

Another thing worthy of some note:

pname does not need to be exactly identical to the original project’s name. Sometimes the pname can be used to convey some useful information.

A typical case is bitcoin:

Here, pname can be bitcoin or bitcoind (bitcoin-daemon?) depending whether it is compiled with GUI support.

Another example is the various versions of a same software we maintain, such as GCC. It would be useful to encode such things in pname instead of version.

https://github.com/NixOS/nixpkgs/blob/08b00c20e0f05f32348048e0ef8deaf0366456b7/pkgs/applications/blockchains/bitcoin.nix#L34

Here, pname can be bitcoin or bitcoind (bitcoin-daemon?) depending whether it is compiled with GUI support.

We could indeed encourage the latter option, so that upstream pname is always included as a visible component, with optional underscore prefix (for leading-digir pnames) and optional dash-something suffix.

I was thinking on something like that for those questions of “master branch” &c. It remembers me the approach employed by AUR: a tagged version like gcc-10.9.8 and non-tagged versions like gcc-git-<commit hash string>.

We can borrow the concept above to some degree. We can use pname to encode these configurations such as older versions, non-tagged releases, and the like, while version can be regarded as a “version” of these things.

E.g. if we are using the latest master branch of GCC, we could render it as pname = gcc-master; version = 2021-06-06;.
We can also try things like pname = gcc7; version = 7.5; or anything that makes sense for a pname.

E.g. if we are using the latest master branch of GCC, we could render it as pname = gcc-master; version = 2021-06-06;.
We can also try things like pname = gcc7; version = 7.5; or anything that makes sense for a pname.

This I think is questionable; previous pname modifications are in the case when neither is fresher than the other; here version comparisons somewhat make sense.