Why was ATerms chosen for the format of store derivations instead of ASN.1?

toraritte · May 1, 2023, 12:11am

Based on my limited knowledge, the purpose of Abstract Syntax Notation One (ASN.1) seems similar (and/or overlaps with) to Annotated Terms (ATerms). ASN.1 has been around longer, is more widespread (e.g., X-509 certificates are defined using it), and have more mature tooling, and ATerms seems to be abandoned (or at least couldn’t find any active projects other than Nix that use it).

Found an issue suggesting to change the ATerms representation of store derivations, but there is no mention as to why ATerms is inadequate. Sure, it’s use in Nix is not documented (apart from Dolstra’s PhD thesis), but other than that, it has been working out for decades and I’m not aware of technical issues with it (such as hindering the addition of features, making processing slow / more expensive, etc.).

This question was borne out of pure historical curiosity (and because I found out that both ATerms and ASN.1 can be useful with ontologies).

edit: Of course, I just found one possible reason, right after submitting this post, but it does not mention ASN.1:

The .drv files are written in an obscure academic language called “aterm”. It was selected by Eelco because it makes some guarantees in terms of reproducible output like ordering the keys alphabetically.

Nowadays most references to it have disappeared in the nix code. The upstream library has been merged into the nix code base and upstream has disappeared.

– comment by Zimbatm in the “Alternative language” thread

Reading

ATerms:

[BJKO00] M.G.J. van den Brand, H.A. de Jong, P. Klint, and P. Olivier. Efficient Annotated Terms . 259–291. Software, Practice & Experience . 30. 2000.
Chapter 1. The ATerm Programming Guide
aterm-manual.pdf
(Probably more related stuff in this directory.)

ASN.1:

7c6f434c · May 1, 2023, 6:05am

Speaking specifically of ASN.1, does it even have repeated substructure sharing? Because ATerms are used in-memory for making sure all those glibc references are to the same copy.

(Also, I am not sure how «more mature» plays out given that you need quite a lot of maturity to do anything with ASN.1 correctly, given what we can see in the track record of various software…)

Given the choice of ATerms… Nix was first written as a part of PhD thesis work, this was in association with Stratego/XT project, and there was quite a lot of tooling already around and integrated. So ATerms for substructure sharing, SDF for parsing, etc.

SDF turned out to be too slow for the growing Nixpkgs, so it has been replaced. It is not clear that replacing ATerm will give any gain, so it stays. And replacement should cover in-memory representation with substructure sharing…

(I think there are a few forks of ATerm, and the one that disappeared was the Nix-tuned one)

zimbatm · May 4, 2023, 4:13pm

Another aspect is that both ATerm and Nix were invented in Netherland universities. It’s common to have professors talk together and build some sort of mindshare and then guide their students to which technology to use. Of course, this is just a hypothesis

Regarding replacing ATerm, the main motivation would be to use a representation that has broader language support. It would make it easier to re-implement Nix and build tooling on top. In practice, ATerm is simple enough to parse and there is no obvious alternative that would outweigh to cost of the switch.

JSON has been floated a few times just because it’s so ubiquitous, but it would have to be a subset with additional key ordering, reproducible spacing and UTF-8 encoding guarantees.