Using Nix for IoT? (in bandwidth/resource constrained context)

svet · December 28, 2022, 9:48pm

I’ve been playing around with Nix for a short while and - despite the complexity - I really like the overall approach. I can see how it’s a great solution in a variety of contexts. What I’m trying to work out is whether it can be a good solution in an IoT context. I’ll say a few words about my requirements/priorities, and observations so far. Would be great to get input from the community on whether I’m barking up the right tree.

Context and requirements:

We have a few hundred devices running in the field, used for data acquisition. The hardware is based on Raspberry Pi or similar, and they all run some flavor/version of Debian as a base OS.
We also have our own applications running on the devices, which handle the above-mentioned data acquisition, and other relevant tasks.
We occasionally need to push new application updates. Though we don’t generally upgrade the underlying OS after deployment.
The main requirements are that:
1. Unattended updates should be performed atomically. More generally: things should never end up in a broken state.
2. Updating the application should (within reason) be independent of the underlying OS. This generally means bundling dependencies.

The above requirements basically rule out using apt/dpkg. Currently we are deploying the applications as Snaps, though after ~5 years of using this approach, I’m not totally satisfied with the result - and am wondering if there’s a better way for future deployments - so have been looking into the viability of using Nix for such a use case.

Before I continue, I should mention another couple of priorities:

Many of the devices are in remote locations, with patchy network access. So the download size of updates should be kept to a minimum. [despite many promises, this is actually one weak point of Snaps; both due to design decisions as well as bugs that persist]
While storage is not extremely tight, it is also not abundant. Most devices have 4-8 GB flash memory.
Maybe it goes without saying, but building anything from source on the Raspberry Pis is a no-no; all applications should be available as pre-built binaries.

I was attracted to Nix since it caters to the main requirements: atomic updates, and pinned dependencies. To be clear, I’m not thinking of going full NixOS (yet), but using Nix for application software deployment and updates.

After a bit of playing around, I have some concerns around download and storage size. I’d like to get a sense of whether these concerns are well-founded. I.e. whether Nix can be made to work well in a constrained environment - or it’s just not the right tool for this kind of job.

Specifically, I got things set up using @tonyfinn’s excellent Nix from First Principles: Flake Edition guide. To simplify/minimize things, I also removed all flake registries, except for a pinned github:NixOS/nixpkgs/22.11. Then installing just a couple of basic packages (nix-tree, mosquitto, python310) causes /nix/store to grow to ~1GB in size, which seemed somewhat alarming. Some observations:

The two largest directories namely, /nix/store/nkhjmzkf9hky9h34yrfy0cgyd9pbh03v-source (293MB), and /nix/store/wwk2ad9jvg8r1a8lyg0x8kmmg53n97sq-nixpkgs (146MB) appear to both be downloads of github:NixOS/nixpkgs - the former pinned at the 22.11 tag (but somehow double the size), and the latter a more recent commit
Installing mosquitto (normally a tiny piece of software) pulls in 200MB+ of dependencies. A lot of that is systemd (fair enough - I wouldn’t expect Nix to use what’s already there), but another large part is stuff like Perl, which is clearly unrelated (it’s pulled in through a dubious dependency of libwebsockets on openssl-dev).
I have two versions of glibc: 2.34-210 and 2.35-163. The former is used by nix (v 2.12) and the latter by the other packages in the 22.11 release. (similar for other libraries like sqlite)

Some of the above points strike me as…suboptimal? Maybe I’m missing something basic, which would reduce the level of redundancy of what’s getting downloaded and stored? By the way, I’ve enabled/run store optimization, wiped the profile history, and run garbage collection.

Either way, minimizing the storage required for an initial install is not the top priority - we can ship devices with whatever firmware we want (as long as it fits in flash). It’s more important to minimize download size during subsequent upgrades. And this is something I’m less clear about…

Presumably if we pin everything to a given NixOS/nixpkgs release (e.g. 22.11), and we build our applications as a set of flakes that link to the exact library/package versions in that release, we’re good to go. I.e., whenever our applications get updated, they are the only thing that gets downloaded, and no external dependencies?
How does this work in the context of security updates? I guess if any nixpkgs flakes are truly pinned to a tag/commit, then there will be no updates? Conversely, allowing for updates will require a more general re-install of most things (since basic building blocks like glibc are likely to have changed)?

To be honest I’m not even 100% sure what exactly I should be evaluating here. I guess I’m just concerned that an unforeseen situation could introduce the need for a large-scale download/rebuild/reinstall.

But more generally, could Nix serve a resource-constrained IoT use case well? Or would this be a world of pain?

tejing · December 29, 2022, 1:24am

You may have 2 copies of nixpkgs around because you’re using both the flake registry and channels? You can get around this by managing NIX_PATH in another way, and not using channels at all.

To be more precise, you want to pin everything to a given nixpkgs commit, but yes, if you do that, the majority if the closure should be shared between new and old versions of your target software.

Yes. If you don’t pin down to a commit, there’s likely to be a fair amount of downloading and disk space usage associated with the updates, because nix redistributes every reverse dependency of a changed package as well.

Using NixOS itself might actually help you here, since you won’t have to have both nix store copies and OS copies of all the base OS stuff.

Overall, though, Nix generally assumes you have quite a bit of disk space and network bandwidth. It trades those resources to get the reproducibility, atomic changes, and rollbacks.

Nix’s most basic design decision is to eliminate well-known paths as much as possible. One of the effects of this is that you don’t have to corral every package to use the same versions of dependencies; in other words, no diamond problem. This also means you take up more space with multiple copies of those dependencies, though. nixpkgs does try to keep things mostly sharing, but it doesn’t try as hard as other distros.

You can often decrease closure size significantly by overriding packages to not build against optional dependencies you don’t plan to use, but this is a fair amount of work.

One of the notable space-saving measures you could try is just not doing builds on the target machine at all, so you don’t even need a copy of nixpkgs around on the target device, nor any nix eval caching or similar. At least with NixOS, it’s easy enough to do this. There are several deployment tools that have been created to remotely manage nixos systems like this.

svet · December 29, 2022, 3:56am

Thanks @tejing for a super helpful response! Really helps me get a sense of the situation.

Yes, I wondered about that. In order to attempt a flake-only setup I had removed the unstable channel that was set by the initial install (with nix-channel --remove) - but it looks like there was an outstanding dependency due to the channel symlinks that remained in my nix-profile directory. Deleting those led to GC removing the second copy of nixpkgs from the store .

Speaking of nixpkgs taking up space…you mention

Sounds very relevant! Could you share more/point me in the right direction? I was indeed slightly put off by the idea of storing several hundred MB of stuff that will probably not be necessary after the initial install, so would be great to avoid that.

[EDIT: I guess it’s actually literally fine to remove the nixpkgs from the store after I’ve installed whatever I need from there. Not sure why the download wasn’t getting GC’ed before]

Anyway…

Right. I’m currently pinned to the 22.11 tag, which (assuming no pathological repo behavior) should remain the same commit in perpetuity. Though is it fair to say that most regular users would pin to a branch (e.g. nixos-22.11, or indeed nixos-unstable)? So this would pull in updates, with the corresponding rebuilds, etc.

Very true - though probably not something that we can seriously consider at this stage, given the hardware enablement work that would be required.

Finally…

Yes, my mosquitto example was maybe a bit obtuse - though also interesting to observe. In reality we would likely do a custom build of dependencies like that, in a way that’s tailored to the actual functionality we need. I assume most packages/flakes don’t offer a more direct way to turn features on or off (e.g. via arguments)?

tejing · December 29, 2022, 4:12am

That would be the normal pattern, yes. I didn’t know the tags existed, actually.

Many packages allow features to be enabled or disabled, and essentially all of them allow dependencies to be redirected, through the pkgs.foo.override mechanism. If you look at the argument list at the top of the callPackaged source in nixpkgs, you’ll see what options can be altered by that mechanism.

peterhoeg · December 29, 2022, 7:29am

If you look at the argument list at the top of the callPackaged source in nixpkgs, you’ll see what options can be altered by that mechanism.

You can also use overrideAttrs which allows access to anything including buildInputs, configureFlags and so on, so there really is no limit to how you can modify existing packages.

domenkozar · December 29, 2022, 10:37am

Hey Svet,

That’s really exciting because I’ve built Cachix Deploy for this use case.

You can run the agent managing a standalone profile so that it’s completely independent of the OS.

The deployment command features a rollbackScript that can activate things and check if the deployment went well, and then exit if something went wrong.

The agent will pull down only the runtime dependencies, that I can help you trim it down.

You’d be deploying from some secure environment and uploading all the closures to the binary cache, so that IoT devices only stream that data to the disk and activate.

I’m happy to chat and help out if you book a slot at Calendly - Domen Kožar

Sandro · December 29, 2022, 3:48pm

That won’t be fixed by nix.

You would need to manually optimize this.

Not to alarming for me if I am honest.

 ➜ du -sh /nix/store
190G    /nix/store

A fixed for that got merged yesterday.

glibc probably got a bump on unstable already.

you need to run nix-channel --update afterwards.

Thats the initial tag which already has missing many things. Don’t use it.

svet · December 29, 2022, 5:17pm

Thanks everyone, this is really helpful!

On balance I can see that there are likely to be some pain points when it comes to adopting Nix in a constrained/IoT context. Most of these are perhaps not fundamental issues - I can see how with some elbow grease the system can be tuned quite well - but it will remain tricky. For example, even if we are careful about dependency management, there will be the inevitable security updates to libraries like openssl that we’ll want to pull in; which will cascade into rebuilds/re-downloads of various other bits. In a more classical distribution the latter would usually not be necessary. And our applications are generally not so sensitive to exact dependency pinning, to make the trade-off worthwhile.

@domenkozar - thanks also for the great work with Cachix! It was already very much on my radar as a potential key building block here, though I hadn’t come across Cachix Deploy. I’d indeed love to chat a bit more about this, even if it doesn’t turn out to be the right solution here. I’ll grab some time after the holidays.

Sandro · December 30, 2022, 1:48am

system.replaceRuntimeDependencies could come in handy

nixinator · December 30, 2022, 6:51pm

If you have clumps of devices backed together onto a Lan/Wifi, , then you can use a bridge head server, to download the updates once, and the distribute that cache to the clients… This can save a lot of bandwidth.

If these devices are all remote, then this is a significant problem, even for more more traditional methods. If you got a very limited and metered 3g/4g connection, that can be a pain too. However getting a better deal from mobile providers isn’t impossible.

you may find this talk relevant, on how to managed a remote fleet of nixos systems,

however, you may not.

wamserma · January 2, 2023, 1:08pm

In the end, the answer depends heavily on your application and its closure, when built with nix. If your application is a binary that only links to to a few libraries, then you should give Nix a try. If your application is a python script with a serious number of dependencies and updating your applications includes updating the python deps, the closure size may be too big for compact updates.

Just a few random closures to compare:

nix build --no-link nixos#dfrs nixos#cowsay nixos#gpodder
nix path-info -rsSh nixos#dfrs | sort -nk3
nix path-info -rsSh nixos#cowsay | sort -nk3
nix path-info -rsSh nixos#gpodder | sort -nk3

That said, the size of your Snaps should be a good indication of what to expect.
Note that with Nix even fixing a typo in a source code comment may lead to a change in nix paths and thus increase the size of the output (at least until content-addressed store paths are in production). If you rather plug together binary outputs, Flatpak (based on libostree for content-adressed storage), might also be an option.

turion · January 2, 2023, 2:18pm

Note that even if you need to manually rebuild parts of your system, you don’t need to do so on your remote machines. You can build on your dev machines and install by using nixos-rebuild with the option --target-host. I do this for my private server. Of course Cachix Deploy (or nixops, krops, …) can be even better suited, but I just wanted to point out that already with NixOS you can have this.

svet · January 16, 2023, 6:22pm

Thanks all for the input here - in particular @domenkozar who I got to have a chat with earlier - and apologies for the slightly late response. To cover a few of the points that were raised, and recap…

@nixinator - the devices are mostly all on independent 3G/4G connections unfortunately, so no real economies to be had from hosting closer to the edge. But definitely a good idea if the topology was different!

@wamserma - the main application is currently a mix of Python and Rust code - with the Python bits being gradually migrated to Rust. It will be a while before the Python component is fully eliminated, but once we get there I agree that the prospect of a mostly self-contained binary will be attractive. And I can see that this also solves some of the dependency management challenges that exist otherwise. @domenkozar mentioned the planned implementation of chunked upgrades via Cachix (and maybe more fundamentally in Nix?), which would make the situation even better.

For now I’m going to keep looking around for solutions. As of now, I don’t think that the benefits of shifting over to Nix for our use case will be sufficient to outweigh some of the likely pain points. But I can see there is a lot of active development on top of strong fundamentals, and I’m excited to see where the ecosystem goes!

wamserma · January 16, 2023, 7:27pm

Chunking just landed in Attic if you’re inclined to experiment with it. But I suggest you first complete your move away from Python, before you migrate to Nix. That said, you can already start using Nix to build the Rust parts and still ship as Snaps.

peterhoeg · January 17, 2023, 3:33am

That said, you can already start using Nix to build the Rust parts and still ship as Snaps.

I think this is a very valid point (although not only limited to the Rust parts). You can still gain significant benefit from nix to ensure consistent environments (dev and prod) without going all-in on NixOS.