Management protocol for NixOS desktop?

ygi · April 5, 2024, 11:30am

Hello,

I’m CTO of a MDM startup (bravas.io) looking to get more understanding of NixOS and its roadmap.

A MDM is a Mobile Device Manager, a solution here to help organization to manage a fleet of endpoint at scale.

By manage here you can understand:

enforce encryption and collect recovery key
distribute certificates from a PKI
install & configure apps

Currently we support iOS and macOS, and are working on the Windows support.

And of course, the next step is Linux support (in about a year). Which will be hard because so far, no management protocol seems to exist for Linux accros distributions.

This is how I’ve learned about NixOS and its vision.

So I’m curious about the future and a way for us to support NixOS as our first supported Linux.

Is a management protocol something already in mind for NixOS? If not, is it something that we could propose?

norpol · April 5, 2024, 12:41pm

Is a management protocol something already in mind for NixOS?

What aspects would a management protocol cover in your eyes? NixOS is already a fully declarative system abstracting almost every single program configuration (including browser defaults and add-ons).

I think you would potentially hook into NixOS itself via nix-files/modules/overlays, give the user some way to allow configuration themselves and then remotely run some sort of nixos-rebuild switch to switch to a new configuration. The Nix configs can be sourced through https, git or nix files on the local filesystem. An MDM would likely just automate the part of distributing and applying nixos-rebuild.

Linux doesn’t has a “management protocol”, but that’s partially due to the fact that Linux configuration and software management has been much more coherent than on Windows/macos to some extend. In Linux world I would see “configuration management” the equivalent of a “management protocol”. Saltstack/puppet/ansible/purpleidea mgmt are all tools that allow managing of Linux environments remotely, many of the configuration modules are/can be implemented in an distribution agnostic way. I think the benefit with NixOS is that you get to have much tighter control and reproducibility/rollback mechanisms. With traditional config management tools, you always apply on top of an existing “state”, you often end up implementing transition states where you have to revert previous states, with NixOS such occurances are a minimum.

ygi · April 5, 2024, 1:29pm

Management protocol has to be differentiated here from management payload and commands.

The payloads are the actually settings you want to enforce on the endpoint. Unlike windows and its register, or macOS with UserDefaults, Linux does not have a standard configuration database. It’s flat file, eventually in /etc, not necessarily in the same file format, and not with a single access process.

This is where the works in NixOS is really useful, it start to create a common process of configuration for all dependencies.

The commands are what you currently have via the CLI (apply the config change, etc)

The protocol however, is how you integrate an infinite number of endpoint with a centralized system.

The management protocol need to:

allow endpoint to report this situation change (disk usage, available update, etc)
distribute the commands and payload at scale (all managed devices enrolled for a user member of the group developers will get the configuration payload to install git)
allow enrollment in self service and during initial installation

If we look at how it works on other platforms (iOS, macOS, windows, and somehow Android, ChromeOS is different) this protocol is quite simple, allow some basic messaging and event handling, and let the whole logic up to the MDM implementation.

The server side is the MDM, and the client side is an MDM agent built into the system by the “vendor” of the OS for iOS, macOS, and windows. For ChromeOS, the MDM agent is built by the MDM vendor and use Android native API.

This is where the value could be in NixOS, having the nix toolset supporting a management server in addition to the local commands.

Somehow it’s like puppet/cheff/etc, except that when you are managing endpoint, it’s always behind firewall and NAT, so it needs to be outgoing communication from the endpoint to the management server. You cannot SSH into it.

norpol · April 5, 2024, 1:51pm

Thank you for elaborating further. I think the things you are listing are plain R&D efforts you have to come up with as a proprietary commercial provider. I agree with SSH not being the appropriate way. Puppet/Salt/mgmt are per design a pull based strategy, so already prepared for being behind Firewalls/NATs. There are a variety of tools in various extends that might already help you with the task.

If not, is it something that we could propose?

I think it’s certainly something you can propose, the community is certainly always excited about new ideas and influences. I believe currently the main focus for the community are exploring ways for NixOS deployment tools, which may or may not be covering what you might need or want.

I believe MDM by it’s nature has some different needs than just deploying tools, but perhaps some already are building some fundamental aspects one wants to build on. In the end it’s really a question of how you want to in-vision MDM altogether for NixOS users. I suppose you have like fully-managed, half-managed or partially-managed MDM systems. (like prohibiting any local system changes, managing just some fundamentals (provisioning/enforcing collecting data on some aspects), allowing some (like using flake nix for dev environments). Each of those are different audiences (Linux for regular web browser/media editing users, Linux for admins, Linux for developers) and implementing an MDM for Linux will highly depend on which user-group you will be able to target. I think in enterprise perspective, Linux is primarily used by developers and administrators, where they usually demand some sort of self-authority.

Note: I’m just a NixOS community newcomer myself, so take all my comments with a grain of salt :), I’m mostly interested in MDM topics since I spent some thoughts on that from professional jobs perspective. Linux world remains mostly unmanaged.

RaitoBezarius · April 5, 2024, 2:28pm

The management protocol of NixOS is an API to programmatically edit the configuration files and trigger rebuild switch, alternatively: to deploy the closure and switch into it.

Whether you rsync .nix and do a nixos-rebuild switch or you have the actual effective version of GitHub - RaitoBezarius/nixit: Code transformation tooling for Nix is up to you.

Note that unprivileged end users can always “install” applications as long as they don’t require root privileges (e.g. SUID bit), so I don’t really know if you want to prevent this in your model or not, if so, you will need to modify the Nix daemon itself.

With respect to a mgmt network protocol, I think we appreciate standards and not proprietary tooling, so maybe you could look into NETCONF and see if there’s any acceptable reasonable NETCONF standard that a NixOS system could itself adopt. But maybe you need a bespoke tooling.

Personally, I do things using osquery.

ygi · April 5, 2024, 2:56pm

I do believe in open standards too. If we start to contribute on the management framework (and before that to the needed RFC) for NixOS, it clearly will be for FOSS mindset for the client side and protocol. And each MDM vendor is free to implement a server side the way it wants (or an open source MDM like micromdm could start to support NixOS too).

An open standard already exist for that, OMA-DM, but beside Microsoft I don’t know anyone using it. So maybe not a great idea to stick to it as it is an old protocol.

NETCONF is push also, it could be a evolution of it to be pull based, but would also need to add reporting capabilities (to react to a local change and report to the server).

Most management frameworks that currently existing are from 2000-2010. The learning in between are pushing people like Apple and Microsoft to review their protocol to adopt to the modern needs around it nowadays.

The fact that NixOS already have local API to interact with the configuration and to trigger the rebuild switch is interesting. It make it closer to the way Android behave.

Android unlike other platform does not have an mdmagent able to talk with a server, but a MDM framework that a custom made agent need to integrate with. Protocol with the server is up to the MDM vendor.

We could eventually imagine something equivalent for NixOS, even if I think an FOSS client would be more useful for the community.

RaitoBezarius · April 5, 2024, 2:57pm

I feel like you know what to do , feel free to ping the community if you have things to show and you would like our opinions!

bridgefordjack · April 29, 2024, 9:33am

@ygi we’re considering NixOS as the OS for developers in our company but would require a functional MDM solution before committing, please keep us up to date with your Linux support! We would be open to beta-testing to help get NixOS support off the ground.

ygi · April 29, 2024, 4:56pm

It wont be for that year (except if someone help on the finance side).

But what we could start here is a proper RFC, to define an open protocol between NixOS and an MDM server.

tasiaiso · June 9, 2024, 5:31pm

I’d be interested in helping out design an MDM standard for NixOS, feel free to contact me if you’re building something.

tasiaiso · June 28, 2024, 4:30am

Update: I just started working on one. Email me if you want to contribute!

ygi · June 28, 2024, 6:09am

Hi

Sorry for the lack of time recently, we are working on the Windows support for our MDM, and so learning how bad a MDM protocol can be compare to our initial support of Apple’s MDM.

Of course I’m interested to work on it, I think starting with an RFC defining protocol goals, mindset, and integration with “regular” NixOS should be the initial step.

What do you think?

tasiaiso · June 28, 2024, 8:26am

My initial idea is to rely on Tailscale or Headscale as much as possible to abstract away node (machine) discovery and management, communication through different networks and CG-NAT and potentially key exchange (though I think it’d be safer to do it through the NixOS config). Kind of like cLAN uses ZeroTier (although I’ve never used it so I may be wrong about that).

Then each node has an HTTP server with an API that the server can talk to. From here the server can ask the machine’s status, tell it to update, fetch the machine’s logs etc.

Some areas to explore:

Is relying on Tailscale or Headscale so much something we would want from this protocol ? Some people/organizations probably cannot/don’t want to adopt Tailscale. If not, how should we go around discovery and communication ?
The server can build and copy the derivations directly to the target to avoid having each machine pull the same packages from cache.nixos.org. It’d drastically reduce the amount of network I/O and overall CPU time needed to update a lot of machines.

I do want to make an open protocol and a free software implementation, although Bravas could absolutely implement it on their own too, or use the free software implementation and sponsor it.

If this is going somewhere, we can create a Matrix room to branch off of this thread.

ygi · June 28, 2024, 8:52am

My recommendation would be to not jump on the technology first. Especially not copying something like head/tail scale which had different needs.

In the order we should define:

what a MDM is for in NixOS
what is the expected use in organization (description of business use case)
what is already here
what are the functional needs of the MDM protocol (configuring? commanding? notifying? reporting?
which needs can be technically answered in the same ways

Etc

My goals (and Bravas goals) is to produce RFC then implementation allowing NixOS to have an open management protocol where the OS have it’s own FOSS mdmclient with standardized exchange protocols based on easy to use and implement standard (such as JSON Rest and websocket for example).

Anything that is on the client send must be FOSS, and up to the different MDM software to implement their server side the way they wants.

tasiaiso · June 28, 2024, 9:27am

As you said earlier, Nix already provides everything we need to manage NixOS hosts. What we really need right now is the glue that holds everything together and allows sysadmins to manage a large amount of machines efficiently.

An organization (or an individual with a lot of computers) would need to manage tens or hundreds of machines.

Each host would have it’s own hardware config (to account for partition UUIDs and maybe different hardware), but they probably would have at least 10 different configurations (regular workstation, workstation with different needs (say a video editor’s workstation or someone that needs CUDA for some reason, accounting’s workstations, laptops, servers, router, firewall, kiosks, etc). These would be shared across hosts.

They would need to:

Keep the machines up-to-date and free from known vulns;
Remotely shut down or reboot machines;
Know the status of each machine;
Remotely control individual hosts (SSH/Rustdesk, etc);
Aggregate logs;
Ensure compliance with security standards;
Manage secrets, back keys up, etc (could be dealt with by other software).

The Nix and NixOS CLIs which already does a lot of stuff.

Configuration through the NixOS config;
Commanding: SSH, HTTP API, etc;
Notifying/Reporting: (I assume you mean to transmit data from the client to the server instead of the other way around): same as commanding but in reverse ?

Not sure what you mean by that.

Feel free to add onto what I’ve just said.

ygi · June 28, 2024, 9:40am

Which is one of the most complex thing. That’s why design steps are hugely critical.

tasiaiso:

An organization (or an individual with a lot of computers) would need to manage tens or hundreds of machines.

Each host would have it’s own hardware config (to account for partition UUIDs and maybe different hardware), but they probably would have at least 10 different configurations (regular workstation, workstation with different needs (say a video editor’s workstation or someone that needs CUDA for some reason, accounting’s workstations, laptops, servers, router, firewall, kiosks, etc). These would be shared across hosts.

They would need to:

Keep the machines up-to-date and free from known vulns;

Remotely shut down or reboot machines;

Know the status of each machine;

Remotely control individual hosts (SSH/Rustdesk, etc);

Aggregate logs;

Manage secrets, back keys up, etc (could be dealt with by other software).

That kind of things goes in a documentation / RFC: why we do things and what do we take in accounts

For example: remote control of individual hosts, but everyone behind NAT and firewall, which situation are we supporting? And is it really the role of the MDM to act as a tunnel broker? Not a single MDM protocol on the market handle that feature for example.

Will it also do the inventory collection? Looking for all installed apps and version? Looking for compliance status for example? And will it be able to react and notify on a change or will it be polling only?

Maybe it does, but without writing expectation and mind mapping of existing parts that answer some of the needs, it’s easy to mess things up

Configuration: we can configure the endpoint posture but also the user setup such as e-mail account, if so are we able to support more than one user on an endpoint? And if yes, with or without an endpoint bound to a directory service?

Commands: SSH will have hard time to works due to mobility, which mean some other protocol to pass some reduced set of commands like shutdown, reboot, device wipe, lock in lost mode, etc. Which commands can we support and what is the difference on the protocol side between a command and a configuration?

Notifying: how a MDM server will notify and endpoint of a policy change to ensure “sort of real time” spread of a local passcode complexity change for example? Are we limited to client poll based mechanism? Or can we use some kind of websocket for push notification?

Reporting: are we able to react to a local change such as a status change (like version after update, or completed full disk encryption) to inform as soon as possible the server of the new situation without waiting for a polling based mechanism?

Those are the kinds of points of details needed to be in mind for everyone, being sure that the global vision is shared, to create the least demanding most effective code base for a FOSS mdmclient.

fricklerhandwerk · June 28, 2024, 9:46am

You should definitely follow Clan then, they’re working on answering a lot of these questions in working code. But in terms of Nix upstream tooling and documentation, there’s just a lot of clean-up and stabilisation work to be done, to make the existing workflows easy to find and more straighforward to use, and to converge on solutions to well-known problems such as the ones you list.

ygi · June 28, 2024, 9:51am

Do they have a protocol documentation somewhere? I see a lot of explanation of what they do but not how they do it. And their interaction system seems all integrated. So not sure if it would be easy to extract as an independent client usable by any management server.

pytte · June 28, 2024, 10:37am

Apple MDM can handle a laptop/phone serial directly from the box. And set it up the way the company wants.
I haven’t looked into Android but I’m guessing same…

When do you onboard nixos into the mdm ?

ygi · June 28, 2024, 10:45am

All OS vendors have a OOB experience solution where the serial number of the platform is indeed referenced in some shared discovery server such as Apple School or Business Manager, Windows Autopilot or Android ZTE.

But all of them also have a self service enrollment system where it’s basically an authenticated enrollment profile run on an already installed device.

And all of them started by this self service solution. The automated deployment came later.

I recommend doing the same. One problem at the time: first enrolling and managing, then discovering enrollment OOB