NixOS images on Azure

Thanks a lot already for the answers.

Maybe I also should add that I’m working with on VMs that already use the new generation of the Azure hypervisor:

there has been plenty of changes. One to note is that finally we have UEFI support.

I have a working “conversion” script, but the snapshots / images are not bootable :frowning:

@jonringer did you make any experience with hypervisor 2 already?

1 Like

No, I have not. Sorry

Asking for some guidence: @colemickens’ script is great (and, to my knowledge, the only thing that works for Azure) and how could one contribute? (Extra documentation starting from scratch, and small fixes, such as jq missing from shell.nix.) Should I open a pull request to NixOS/nixpkgs or colemickens/nixpkgs?

open a PR on the main nixpkgs, and just add myself and @colemickens as a reviewer

1 Like

Booting UEFI images in Azure right now means losing features, any gaining none, so I’ve not been terribly concerned about it.

I’m not likely to invest much additional effort in NixOS images on Azure until (1) there is a way that we can publish publicly-accessible images without having to create storage accounts (someone else can do the publish-via-storage-account work, I won’t), and (2) there is a documented way to boot an image in Azure without running Microsoft’s walinuxagent. These are things that I and others have been asking about for years. It’s enough work dealing with Azure without having to guess at things that they can’t document. Of course I am happy to review anything, though.

@toraritte Feel free to open an issue to discuss the “I am basically locked out after a rebuild”. It gets me when I stand up a new image sometimes, but I was reluctant to change the default image settings because they’re inherited by all running images as well (and it represents a change in security configuration). In reality, I doubt anyone is using nixpkgs’s Azure image infra if they’re actually using NixOS in Azure. Maybe we should just go ahead and set security.sudoNeedsPassword = false for the azure image.

2 Likes

Pretty new to Azure, and not sure what the drawbacks are yet (a quick google search wasn’t helpful either plus not sure what to look for).

I presume (1) refers to the dance with creating disks, provide/revoke access to them, etc.; was wondering about this part myself. Can’t comment on (2) because I haven’t gotten deep enough as I just learned the name walinuxagent itself.

After using and tinkering with azure-new I kind of figured it out that it is more of a template than a tool, but a very handy one for newcomers like me. The most important part of it is the VM creation itself that would’ve taken me a long time to figure out (and not I wouldn’t even dare touch it at the moment), and the shell scripts are going to be modified by anyone based on their taste anyway. Thanks for figuring out all the steps and quirks, and documenting them.

This is not really an issue, because I am exploring NixOS/NixOps to provision production servers, and so I wouldn’t want to be gallivanting in there; if somethings broken, the idea is to re-deploy from scratch. Even with the extra steps (i.e., deploy with azure-new, manage configs with NixOps, in theory) this should be way easier than doing everything by hand right now.

Thanks also for suggesting security.sudoNeedsPassword = false, I learned something again.

So, how are people using NixOS with Azure? I’m completely in the black here (plus stuck with Azure). Up until your message, I was under the impression that the only reason there’s no backend for NixOps was because the API and/or Python libraries have changed, but now it seems that there are fundamental issues with Azure that I have yet to find out about.

nixops had support for azure up until 1.6.1, there’s still some pinned azure packages to enable that package. However, then you don’t get any of the recent nixops improvements.

1 Like

Thanks, I may have read that somewhere, but 1.6.1 was indeed a long time ago.

Do you think this approach would be viable at the moment?

I mean, not sure exactly what the minimum configuration is for a NixOS machine to become a deployment target other than

(Found also this issue but the gentleperson there was using NixOps from a Ubuntu machine that may have had to do with the troubles.)

Sorry for the laggy relies, I’m trying to integrate Discourse into my routine…

Regarding Gen2 VMs: Why Choose Gen2? aka Timeline for Gen2 Disk Encryption? · Issue #52340 · MicrosoftDocs/azure-docs · GitHub. By their own admission, Gen2 VMs support “modern booting via UEFI” (that has zero net benefit afaict) and lack support for Azure Disk Encryption. Normally I like to aim for where the ball is going to be, but I’m not going to support launching features like this, so I don’t intend to look at Gen2/UEFI support until ADE is supported.

No. Unfortunately all of that crap in azure-new now is the bare minimum to get a Managed Disk uploaded for a single subscription. There is no way to publish a Managed Disk publicly and I don’t want to re-write legacy code to deal with publishing a public disk blob in an Azure Storage Account (and the code to create/ensure the account and container, and then re-write more code to import that disk to the end-users subscription for it to actually be usable.) Frankly, I’m tired of talking and typing about it and working around it. I wish they would just pay someone to actually use their APIs and give them internal feedback about this kind of stuff.

After using and tinkering with azure-new I kind of figured it out that it is more of a template than a tool, but a very handy one for newcomers like me. The most important part of it is the VM creation itself that would’ve taken me a long time to figure out (and not I wouldn’t even dare touch it at the moment), and the shell scripts are going to be modified by anyone based on their taste anyway. Thanks for figuring out all the steps and quirks, and documenting them.

Let me know if you think any quirks are worth incorporating. I’m happy to iterate on azure-new, review PRs, whatever. I’m glad to hear they’re of use.

For now, I am using azure-new scripts along with image definitions (that have sudo changes, my own services, etc) in my own nixcfg repo to build images and deploy instances. I update the VMs either by making it a devbox and cloning my nixcfg+nixpkgs, or I have my own “remote deploy” script that builds a machine closure, copies and activates it (similar to what nixops would do).

Not sure about others.

My limited understanding of NixOps Azure is:

  • it’s been dropped from NixOps for a while
  • NixOps has seen much change/improvement since then
  • it was based on legacy disk types,
  • it’s based on fairly old python libs,

I suspect significant portions would need revamping or rewriting. I’ve heard rumors of someone having interest in reviving a nixops-azure, but nothing concrete.

but now it seems that there are fundamental issues with Azure that I have yet to find out about.

:zipper_mouth_face: . I’ll admit, I’m being somewhat stubborn. But it is considerably easier to upload images to GCP/AWS and considerably easier to share them publicly afterward. Less API calls, less concepts to learn, less waiting on slow APIs, less spending an hour figuring out how to name storage accounts, on and on.

But no, there’s nothing that fundamentally prevents a good NixOps + Azure experience if someone were to write the nixops backend. (Ping me if you are interested…)

I think in theory this could/should work? I am actually looking to adopt NixOps to manage my machines (which would include my managed Azure machine) and in theory I already manage them all the same way, so it ought to work.

1 Like

Trust me when I say that everyone is aware of the pain when they onboard onto a team, but customer usability is usually secondary to feature development.

Not to mention that teams usually own a very small slice of the big public-cloud pie, so they develop on top of the work of other established teams (e.g. blob storage), or they roll their own (usually overlapping) solutions for a given user scenario (e.g. images).

That’s my impression as well.

1 Like

Thank you so much for the comprehensive answers! … and I’m sorry for the late reply as well.

I am meaning to do a deep-dive in your nixcfg, and thanks for describing your workflow. Now I have concrete ideas where to start, and this is indeed almost identical to “deploy with azure-new, manage with NixOps”; especially nixup.sh looking very similar in functionality to what NixOps does. (I was meaning to ask you about this so thanks for pre-emptive answer.)

Two weeks ago, I spent almost an entire day following all the threads, and this effort seems to be actively dead. (Pardon the oxymoron.)

I guess I’m quite “lucky” then that I don’t have experience with custom VMs on GCP/AWS - if we ever going to switch providers, it will just get easier.

Would love to, but there’s only a slim chance of ever having such abundance of free time. (But then I already put it on the list, so thanks again.)

Just an update on this: made a fork of the azure-new script that we are using internally but @colemickens made another (and more modern) one at GitHub - colemickens/flake-azure-demo at dev (not sure about its status as I am still at the level of struggling to understand flakes).

There is also a #nixos-azure IRC channel with logs.)

1 Like

My MSDN subscription has finally reached some sort of hidden expiration date (far later than it probably should have). This subscription had been providing me with $150/month in Azure credits, which was enough to motivate me to keep some long-running services in Azure, and thus author nixos-azure (check the dev branch, the readme is stale, but the nix code is all there), a smaller, better Rust Azure boot agent, and write scripts for reliable image publishing.

But, now that that free subscription is gone, I have no reason to use or support Azure. (If this changes again, I’ll reply again here.)

2 Likes

@colemickens your efforts have definitely improved the situation. I was a little saddened by the ease of using pre-built images (or lack thereof) when I was working at Microsoft.

hello everybody,

just giving this a push… did anybody investigate NixOS on Azure Gen 2 VMs?

1 Like

Given the lack of replies, my guess is no, but I’m planning to (i.e., have to) get back to it in a month or two the latest so thanks for the bump!

1 Like

we actually made it work in the meantime.
I hope we find the time to open source the missing pieces. Ping me if I forget about it :slight_smile:

1 Like

Here’s a minimal working example:

Gonna upstream this into nixpkgs, at some point when I have a bit of time over.

7 Likes

Tangentially related: Started re-reading Azure’s Virtual Machines Documentation, and this (well-buried) part is a deal breaker as I wanted to deploy it for a job:

The Azure platform SLA applies to virtual machines running the Linux OS only when one of the endorsed distributions is used.

(SLA for Virtual Machines)

I want to run a nixos machine on a azure vm. What’s the current status? Is plommonsorbets minimal example the way to go or has this been upstreamed into nixpkgs?