I can’t help but think that there will be issues on some level here anyway, given that you’re subverting the basic principles of web security. https is all but universally required, and hence relied on; it’s possible these certificates end up being used in ways you don’t expect beyond simple transport encryption.
Finding good documentation on X.509 has been very difficult, but it seems that there is no good way to get the performance benefits of HTTP/2 and HTTP/3 over a secure internal connection without making some serious compromises. You introduce a totally unnecessary maintenance burden by paying for a domain name and using LetsEncrypt, or you must install certificates on your devices and take responsibility for protecting a secret (I suspect that this isn’t an accident, since the designers of X.509 had every incentive to further contribute to the centralization of the web). At least by using nameConstraints in this way I am avoiding opening up additional security holes by allowing someone who takes my secret key to impersonate google.com on any of my devices: judging by my research online, that is more than most people do.
If what I am describing does work, I think that it is the cleanest solution to the problem that avoids taking responsibility for any secrets. That being said, I’d really appreciate it if anyone finds any vulnerabilities or issues!
It’s your threat model, but security is one of those things where going against the flow is generally a bad idea IME. I’d really recommend not just throwing caution to the wind; especially because sops & co. really are not that much extra overhead.
I’m quite cautious and I have given serious consideration to all of the different options here, but I don’t think that SOPS would quite solve my problem. I have devices in my network which I do not entirely trust (i.e. VPS servers with public-facing services and IoT devices), and if one of these were compromised then waltmck-cakey.pem would be readable. If I were to actually treat it as a secret, I would need to store it as state on some trusted device I control and manually sign/deploy leaf certs to new devices as I bring them online. This would be more maintenance burden than is justified by the benefits of using TLS.
That said, surely you could just do that? Do browsers reject certificates where the CA doesn’t limit itself to a TLD these days? AIUI that is an extension and it should be perfectly possible to just claim random names and have that be signed by your CA, as long as you trust that cert (and you don’t explicitly limit your CA like you do now) browsers should have no issue with this.
I could create and trust a CA that doesn’t limit itself to a TLD. However, that is a massive security hole: anyone who steals it could spoof google.com to any of my devices. Even if I restricted it to my VPN’s IP range, it would still be a bad idea: then anyone who compromises one of my devices and the root cert could spoof google.com to point to the stolen device and MITM from there.
drop the permitted;DNS:.vpn in your CA, replace it with specific permitted claims for all your hostnames, and add subjectAltName = DNS:${hostname} to your certs (aside: are spaces allowed in there?). Maybe claim the CN for that hostname and drop the .vpn altogether.
This is the method that I originally considered. The reason I gave up on it is because it means that every time I add a new device to my network, I would have to manually generate a new root cert and re-sign all of my devices’ leaf certs. This would actually still be acceptable if it could be done programmatically (i.e. I write a derivation for a root cert given a list of hostnames), but that would require the root certificate generation be reproducible (so that the same root cert is trusted by all of my devices). I explored that pathway a little bit, but it ended up being too cursed even for me.
As an aside, if you can accept a shared TLD of some sort, you could set up mDNS and use .local. This would save you the maintenance effort of hard-coded hostnames, and seems more appropriate for this kind of mesh VPN than raw hostnames IMO, I don’t think saving 6 characters is worth all the subverting networking standards you’re doing.
Yeah, mDNS is great, and theat is what I did before switching to NixOS. But if I already know the IP ↔ hostname mapping ahead of time, why introduce an additional point of failure?