Brainstorming for RFC: pname and version

blaggacao · May 31, 2021, 3:21am

I think the problem is with the word “latest”, but also “tagged” is probably better than “stable”.

Instead of “latest” , we could say “latest tagged release — if a project has more than one release series: of the relevant release series” and then clarify in the examplifications with the postgres example.

7c6f434c · May 31, 2021, 6:42am

I now support this wording.

(I was not in doubt that the wording in the text was intended to achieve this result, but I think this rewording is much clearer)

AndersonTorres · May 31, 2021, 11:52am

Indeed I can think in an extreme case: a very simple developer environment with git as the only tool.

It is just a codebase maintained by many developers, and from time to time they make a new release.

But the “make a release” process is merely “Create a new entry in the changelog file, adding 1 to the version number and (optionally) registering the hash identifier and the date of this commit in YYYY-MM-DD format”.

There is no tarball being generated, only the git tree being updated and a changelog to be read by package maintainers. In this scenario, the versions can be fetched by reading the changelog, not by waiting the (non-existent) automation process generate a cute tarball.

fricklerhandwerk · May 31, 2021, 2:01pm

Great stuff. Thanks for driving this.

However, it does not map very well with builtins.parseDrvName function

What do we do about that? It’s not evident from your proposal.

Usually, Nixpkgs maintains “unstable” releases of many softwares, sometimes
along with stable ones.

You put “unstable” in quotes, yet continue using it in the proposed naming convention without clarifying its meaning. Stability is a property of the software, i.e. that it does not crash randomly, or its API, i.e. that it does not change arbitrarily subject to some rules. Having a tagged, named, or otherwise well-defined version identifier has nothing to do with that. Overloading the meaning of “unstable” is bound to produce confusion (at least initially), and I’d prefer to keep cognitive overhead low. What speaks against only adding +YYYY-MM-DD to the latest release version?

Also, what happened to +nixpkgs=YYYY-MM-DD to signify custom patches?

As an alternative I would like to add a suffix signifying the version control revision identifier, such as +git=<hash>.

AndersonTorres · May 31, 2021, 4:50pm

The new format (hopefully) solves it automatically, because it forces version to start with a digit. But OK, it needs to be clearer.

Yes, I have noticed it while brainstorming.
Better names are needed.

The idea is to use a key=value semantics on the version attribute, as suggested by @zimbatm above.
It makes the version attribute more amenable to parsing and therefore to automation.

However we need a better name here. Maybe untagged?

Unneeded.

Patching the code in order to make it run seamlessly in Nixpkgs is an expected piece of our development. There is no need to encode it in the version.

The purpose of version attribute is the upgrade. Indeed it is explicited in nix-env manual.
Custom patches made exclusively for Nixpkgs are not an “upgrade” to the code in this sense, no more than patchShebangs.

Indeed, conceptually patchShebangs, substituteInPlace et al. are all patches, as well as many other things we do in softwares like CMake and Meson. Patches are not limited to the diff & patch tools after all.
However, no one would suggest to encode all these custom patches in version attribute.

Therefore, there is no need to encode these custom Nixpkgs patches in version.

Git hashes are opaque strings. They convey no useful information for human beings. Dates are way more useful.

Also, as said above, version is used to upgrade. Git hashes don’t follow the ordenation rules stated by nix-env, whereas YYYY-MM-DD dates do.

fricklerhandwerk · May 31, 2021, 7:34pm

Ah right. I got confused, because I followed the structure of the document, and there was no transition from „the parser separates on digit“ to „this is how we stay compatible while making it more meaningful“, but went straight into the definition.

Agreed on keeping the key=value thing consistent. Don’t like untagged because it alludes to semantics that are specific to git. What about unreleased, to continue in the same vein as we had so far? Although in general I think it would be better to have a positive, i.e. unprefixed term, such as rolling (as in „rolling release“, although that’s already a bit obfuscated) or snapshot (as @davidak suggested).

Alright, thanks for the clarification.

I thought the argument about the debianesque prefix for +nixpkgs= was originally about picking a specific unreleased version for packaging purposes, not just patching for packagability. I’d accept the argument that making it a statement of „editor‘s discretion“ and therefore prefix it as such, and this is what I read from @rycee‘s comment - correct me if I’m wrong. Other than that it would not convey much meaning without further explanation.

Sure. I intended to clarify this as an alternative deliberately not considered, maybe I didn’t put it clearly.

Would be good to add exactly that to the reasoning in the document.

abathur · May 31, 2021, 11:37pm

When I commented before, I felt like trying to compose multiple pieces of human-derived information into a single string is putting the cart before the horse. (I’m less focused on the information in the string or how it gets laid out than on whether improving the packaging process here can make it easier to iterate towards ecosystem goals.)

I don’t think cart/horse is a constructive observation, so I tried to tease out some alternate frames/approaches. But as the discussion has played out, I’ve started to wonder if I muddled or under-sold my point.

I want to take another swing, but please ignore me if my point was clear the first time.

Previously, I focused on a process:

I probably should have given specific examples:

There should be 1 attr for the official upstream version identifier. There should be no opinion or deviation (even if that identifier is non-numeric). If there is no such identifier, the value should be none.
No vague attr names like date or version. The name should clarify where the date comes from and what the version belongs to.
Packagers shouldn’t need to make any decision/judgement/opinion call that requires systemic knowledge/perspective–this is a job for composing functions.
- They shouldn’t have to decide if something is “unstable” or not.
- They shouldn’t be deciding if something like the release/commit date is the version for Nix’s purposes. (Even if the upstream version IS the date–this isn’t the packager’s call.)
Every value or detail that is included in or impacts the composition of pname/name/version strings belongs in its own attr.
Needing a comment to explain the value’s format–like a date format–is a smell it should be decomposed into an attrset.
With as few exceptions as possible, pname/name/version strings should be programmatically composed from these values.

AndersonTorres · May 31, 2021, 11:59pm

Update; it is now way more confused!

I am trying to introduce the idea of multiple branches, as well as some useful terminology.

AndersonTorres · June 1, 2021, 12:06am

Yes. We should follow this as faithfully as we can.
For now, I think that stripping non-numerical identifiers at the beginning should suffice.

snapshot looks good to me.

Explain.

I do not think YYYY-MM-DD needs a huge explanation.

A thing that we need to think:

Certain softwares use various branches for its releases. Think on something like

A LTS branch
A bleeding edge branch

How should we encode this?

Thinking on it, we should mature the idea of multiple branches.

blaggacao · June 1, 2021, 1:50am

Could you add the phrase under “Terminology”?

Note: This sounds strange, but does so, since it tries not to be git specific.

Boost my motivation to want to untangle it. But I agree with git agnosticity.

blaggacao · June 1, 2021, 1:54am

@abathur Hm? 10.000 feet flight height: What you’re ± saying is version should be an attribute set, correct? — Can we easily overload it’s toString in nix?

Yeeehaaa!

nix-repl> builtins.toString { a = "b"; __toString = x: "a"; }
"a"
nix-repl> y = { a = "b"; __toString = i: i.a; }

nix-repl> "pname-${y}"
"pname-b"

We can.

abathur · June 1, 2021, 2:18am

Maybe it helps to frame the nixpkgs-specific_version as an implementation detail nixpgks needs for compatibility with an implementation detail of nix-env.

I’m proposing:

Packagers record (among other details) a package’s official version (or none), and whatever *_date is required. These can be determined without understanding the nix ecosystem.
One or more functions derive the nixpkgs_specific_version from those values using the kinds of precedence rules that are getting hammered out in this thread.

That’s in contrast to counting on the packager to decide whether nixpkgs-specific_version should = official_upstream_version or (release|tag|commit|fetch)_date.

In addition to making the packager read/consider unnecessary nixpkgs implementation details, it makes any version fields containing a date ambiguous (to reviewers, readers later, code in nixpkgs, external automation, etc.):

If it is formatted as nixpkgs expects, it could be:
- an official upstream version number that just happens to be a well-formatted date
- a correctly-formatted (release|tag|commit|fetch)_date
If it isn’t formatted as nixpkgs expects, it could be:
- an official upstream version number (in which case it is incorrect to reformat it)
- a misformatted (release|tag|commit|fetch)_date (in which case it should be reformatted).

This is roughly my point, though. A date is the simplest example I could think of, and I still see advantages to decomposing it:

If the date components are recorded semantically, you don’t need a string parser and automation to modify hundreds of packages if we need to change the format.
While there may be quantitative ~resource reasons not to use attrsets, I don’t see any qualitative advantage to making people (used to different local formats) manually strftime dates to write the string and strptime to read them. It seems like unnecessary friction.

Not sure what the answer is, but the bashup.events package I mentioned previously (https://github.com/nixos/nixpkgs/tree/master/pkgs/development/libraries/bashup-events) is a particularly weird example of this, so it might be worth using as a test case.

It isn’t really versioned, and it has two separate concurrently-maintained ~variants.
They have more-or-less identical APIs (but distinct implementations and performance characteristics.
- the one on the master branch is written ~for bash 3.2 compat
- the one on the bash44 branch uses bash 4.4 features
Changes are released simultaneously if they impact both branches.

abathur · June 1, 2021, 3:31am

My previous post may have clarified this, but: not necessarily, though that is one way to think about it.

(I’m using tediously long names to be explicit here):

Instead of a human-entered nixpkgs-specific_version attr that might contain official_upstream_version or release_date–I’m suggesting:

separate attrs for the upstream version, release date, and anything else necessary to decide how to generate nixpkgs-specific_version (and likewise up at the pname and name levels)
build those rules into reusable functions
(presumably) use those rules from derivation-generating functions such as mkDerivation to generate the final attr (whether it is still named version or not

I assume the organization should be driven by how many values that is (when including edge-cases) and how closely related they are.

Some of the VCS fields, for example, overlap significantly with what goes into some source fetchers. It seems ideal to single-source them, but I won’t pretend to know what should move/draw from where.

blaggacao · June 1, 2021, 3:40am

@abathur I think to get a better grasp of the idea you have in mind, I (and maybe others) would need an example attribute set without too many verbal hedging around: It’s clear that this is a discussion on the green field.

{
  version = {
    release-date = "";
    ...
  } // { 
    # I guess the implementation of this is rather irrelevant for now,
    # but if you'd have an idea, it would be great to know that, too :-)
    __toString = i: i.a; 
  };
}

It sounds definitely interesting. It is clear to me that you propose lifting responsibility from the packager to “get the string right” and rather suggest to expose a as-thumb-as-possible “form sheet” (aka attribute set). You also argue that this makes any sort of programmatic processing easier.

Hence, it’s a win-win. And I believe with the __toSting overload, the implementation is relatively cheap.

AndersonTorres · June 1, 2021, 3:56am

Looks interesting!

Another thing to be taken into account is how it should be mapped to Repology.

7c6f434c · June 1, 2021, 8:51am

I belive labeled snapshots are still released or announced, not snapshotd (don’t fear copy-paste, fear search-replace)

Linux versioning — maybe mention other cases? Python, GHC, … (whenever the core for the additions typically has >2 versions, I guess)

In the end of the day, version has to be represented as a string.
Uniformity of this string representation is more important than uniformity of generations.
Agreeing on target string format has no performance implications, unlike this attributeset thing that will cost both CPU cycles and RAM bytes.
For external tools, finding version *= * and parsing the string is often easier than parsing Nix…

So I recommend declaring that stuff about attribute sets out of scope and first agree on the target string format.

(For the record, I would find splitting YYYY-MM-DD date string into three Nix attributes and absolutely horrible editing UX, Nix expressions are not spreadsheets where such a split is of course a must)

AndersonTorres · June 1, 2021, 1:12pm

Oh, that clarifies many things. Indeed many things discussed here can be regarded as meta attributes.

OK then, back to strings!

7c6f434c · June 1, 2021, 1:31pm

7c6f434c:

(For the record, I would find splitting YYYY-MM-DD date string into three Nix attributes and absolutely horrible editing UX, Nix expressions are not spreadsheets where such a split is of course a must)

Oh, that clarifies many things. Indeed many things discussed here can be regarded as meta attributes.

OK then, back to strings!

I recognise might end up losing the arguments about the splitting to attribute set, and what should go there; but I hope we nobody objects that we need to agree on the string representation whatever we do, and preferably not need to change the main part of it too many times — so I am for deciding this before (but maybe not instead of) the more bikeshedding-prone topics.

abathur · June 1, 2021, 2:22pm

It may not be clear exactly what I do mean, but hopefully it is clear that I’m not just here to litigate paint colors (and I agree working out the string form is still important).

I’m just trying to encourage thinking ahead about how the string form can be implemented in a way that minimizes:

what people need to understand/decide to get it right
the effort needed to iterate on the form if this isn’t one true form to rule them all forever

I’ll pull my nose back out, in any case.

7c6f434c · June 1, 2021, 3:25pm

I’m just trying to encourage thinking ahead about how the string form can be implemented in a way that minimizes:

what people need to understand/decide to get it right

the effort needed to iterate on the form if this isn’t one true form to rule them all forever

Surely these are valid concerns.

… but there also:

effort necessary to update by hand
effort necessary to have an «optimistic» updater script
overhead for during nixos-rebuild (probably negligible either way but needs checking)
overhead during other Nix operations

… and the bikeshedding is in negotiating the list.

For example, yet another option could be defining the list of fields and providing the definitive script that takes JSON and produces the best inferrable version string, maybe with some suggestions about the pieces of data it wanted to check but did not receive.

I’ll pull my nose back out, in any case.

Maybe we should indeed already start discussing the preferences and the approaches in a parallel thread. I don’t think either of the two questions should be a blocker for the other one.