Nix-nomad: auto-generated NixOS modules to create HashiCorp Nomad job definitions

Hello! I’ve been putting this together in my spare time and think it might be ready for public scrutiny.

We’ve been using Nix at my day job for general package building and development environments for a year or two now. It’s creeping into all aspects of our tooling. However, our Nomad job files are all written in HCL, which, after immersing myself in the world of Nix… I find lacking.

I’ve been daydreaming about having a NixOS module system to write these in, so I built one in my spare time. I wrote a small Go program that generates these modules from the Nomad source code, so I’m hoping this will be somewhat trivial to maintain.

The idea is that the syntax should be more similar to HCL job definitions, than the JSON API. Since the two are not directly compatible, this should make translating an HCL file over to Nix easier.

I haven’t introduced this at work, yet – but I hope to get it to a point where that would not be a mistake. This is also the first time I’ve put together, or open sourced, a library in Nix, so I’ve certainly made mistakes. I’d appreciate any feedback on how I can improve this or make it more suitable for your use case.

Let me know what you think!

7 Likes

This is pretty cool, I’ve been experimenting with Nomad in my homelab for a few weeks now and really like it.

I have been wondering if there is an elegant way to actually run a nix closure with raw_exec, but it seemed liked it would be a hassle to ship the closure off to some artifact server for Nomad to download. This would at least give us a job that references the nix store, but I guess all the client nodes would need to have that closure pushed to them.

I have some ideas I haven’t explored yet that I want to try here. Ultimately, Nomad is designed around pulling the artifacts it needs with artifact stanza. I think that having the NixOS module system in place now may enable some magic behind the scenes using string contexts, binary cache URLs and the artifact stanza.

Another idea revolves around a custom task driver for Nix.

The dream is something like

{ pkgs }:

{
  jobs.hello = {
    type = "batch";
    datacenters = ["dc1"];

    groups.webs = {
      count = 1;

      tasks.frontend = {
        driver = "nix_exec";

        config = {
          command = "${pkgs.hello}/bin/hello";
        };
      };
    };
  };
}
1 Like

I’m not sure to understand this part. It looks like you are using the Nomad JSON API specification [JSON Job Specification - HTTP API | Nomad by HashiCorp], which differs from the HCL specification (Job Specification | Nomad by HashiCorp). From my basic experience, the documentation of the JSON specification is not up-to-date (even wrong sometimes), not well written and then should not be used by end user.

However, it is possible to submit HCL jobs using the JSON dataformat. An HCL2 file can either be HCL or JSON (hcl2/spec.md at fb75b3253c80b3bc7ca99c4bfa2ad6743841b1af · hashicorp/hcl2 · GitHub). This means it is possible to submit JSON blobs to Nomad using the HCL specification which has a much more better documentation.

I agree, it’s pretty confusing because there are (at least) 3 ways to submit jobs to Nomad:

  • HCL
  • JSON thanks to the HCL2 support
  • JSON API

(Note I’m currently experimenting submitting Nomad jobs with CUE and i want to allow users to rely on the Nomad HCL documentation to write their jobs.)

apologies if I am about to explain something you already know or understand better than I do. I’m even confused to hear you can submit HCL-like-JSON, that is news to me.

when you run nomad run myjob.hcl (or I would guess nomad run myjob.hcl.json), what’s happening is that the Nomad client is parsing that file and converting into a struct that is JSON serialized into a PUT request against the /v1/jobs endpoint. the shape of the payload is defined here in the Nomad source code.

you’ll notice that there are Golang struct tags on that struct that correspond to the HCL you are writing. (I believe) this is the metadata that the HCL library uses to parse your file and convert into an API payload when you use nomad run.

(Note I’m currently experimenting submitting Nomad jobs with CUE and i want to allow users to rely on the Nomad HCL documentation to write their jobs.)

the generator I wrote also uses these HCL metadata struct tags, but it’s generating the Nix modules that I’m sharing here. the Nix you write would thus directly reflect the associated HCL syntax (with one minor tweak; pluralization of lists-of-attrsets), since it is literally generated from the same code that parses that HCL. you can follow along directly with the public HCL docs, as you can see in the example on the repo where it is using group and not TaskGroups.

so yes, I am using the JSON API specification - but so are you, just with another abstraction layer in between. I’m hoping with the approach I am taking you get two things:

  1. an always (or easily kept) up to date and accurate schema
  2. an output that you can push directly to the API, or via the CLI with the -json flag that just merged recently

is this the best approach? I honestly don’t know – candidly, like I said, I’m just now finding out you can submit HCL-as-JSON (I swore that was impossible, and swear I even tried doing it!).

ultimately, if you’re defining your jobs in Nix, you don’t need anything HCL2 gives you. that’s why you’re using Nix. you just need to define the job metadata. so I’m not sure what outputting an HCL-as-JSON file would get you.

I dug in a little bit and found this PR with a lot of background detail. The most relevant bit from the PR body:

#1 is an even more interesting accident of history: the jobspec2
package automatically detects if the input to Parse is JSON and switches
to a JSON parser. This behavior is undocumented, the format is
unspecified, and there is no official HashiCorp tooling to produce this
JSON from HCL. The plot thickens when you discover popular third party
tools like hcl2json.com and https://github.com/tmccombs/hcl2json seem to
produce JSON that nomad run accepts!

Since we have no telemetry around whether or not anyone passes HCL JSON
to nomad run , and people don’t file bugs around features that Just
Work, I’m choosing to leave that code path in place and acknowledged
but not suggested
in documentation.

It looks like attribute names are not the same between the HCL and JSON API. For instance, to specify a service port in HCL, the attribute is job.group.service.port as documented here while the JSON API attribute is job.taskgroups[].services[].portLabel (portLabel instead of port). So, to use your module, the user has to use the JSON API documentation and cannot use the main documentation. Moreover, all examples on Internet use the HCL specification. So, by using the JSON API, it’s not possible to easily convert HCL examples to your NixOS modules, because some attribute names could differ.

This is why i’m currently writing JSON in a HCL2 file: to preserve attribute names in order to be able to rely on the main documentation which is IMHO, much more better than the JSON API documentation.

I think this would be also possible to generate the HCL structure and attribute names by reading the HCL attribute from Go structure tag (see here for instance).

That’s a good point! This would be harder with the HCL syntax because we would need to use a Nomad Go library to first convert the HCL to JSON.

BTW, i really don’t understand why this is so complicated :confused:

And one last thing a bit out of the scope. CUE has a subcommand to generate struct from Go code or protobuf. This is something that would be nice to have with Nix modules: the generalization of your project :wink:

It looks like attribute names are not the same between the HCL and JSON API.

yes, this is a truth between the two APIs I already described, but it is not true about how you interact with this library. where are you seeing that?

to specify a service port in HCL, the attribute is job.group.service.port as documented here while the JSON API attribute is job.taskgroups[].services[].portLabel ( portLabel instead of port ).

yes, this is true if you are writing the JSON API syntax by hand.

So, to use your module, the user has to use the JSON API documentation and cannot use the main documentation.

this is simply not true. the user of this library does not need to know the JSON API. they speak HCL and it outputs JSON. the path is job.<name>.group.<name>.services.*.port, exactly what you expect, and outlined here: https://tristanpemble.github.io/nix-nomad/index.html#opt-job.name.group.name.networks._.port

I think this would be also possible to generate the HCL structure and attribute names by reading the HCL attribute from Go structure tag (see here for instance).

this is exactly what I am doing.

BTW, i really don’t understand why this is so complicated :confused:

HCL is a syntactic sugar on top of a JSON structure. in Nomad, the JSON structure is directly mapped to a Go structure. there are a few things that create the discrepancies between HCL and JSON syntax:

  1. the Go field names are different than the HCL field names
  2. in HCL, blocks can map to a struct, a list-of-structs, or a struct-of-structs, depending on if it is a labeled block, and how the parsing code is configured for that field.

Seems there is a nix exec driver here:

2 Likes

Oh, ok. I didn’t understand you are exposing HCL attribute names and converting them to JSON attribute names: this is really nice.

Thanks for your explanations!

1 Like

thanks – someone pointed this one out to me on reddit. I’m planning to dig in more, but on the surface there’s two problems I have with this driver:

  1. I think it is flake only – we don’t use flakes, yet.
  2. since it is using flakes, it must reference git refs. we do continuous delivery via a monorepo. far as I can tell, ultimately this means that we would have to redeploy every job on every commit, regardless of if there were changes. that adds unnecessary load to production servers doing a full rolling restart of every single service on every commit. if we could use store paths for job commands instead of git refs, that would be better – the job would only rebuild and redeploy when the package/store path changes.

still, this is super damn cool and in a simpler setup, were we using flakes, I would definitely use it.

Hosted by Flying Circus.