Am I setting myself up for failure?

Quagmire · October 12, 2022, 10:20pm

Hi all, my apologies that this will be a little wordy due to the backstory. At my company I’m pretty much solely in charge of about 120 machines, 99% of which run Linux, including the employee workstations. Historically we’ve been a heavy CentOS (now Alma) shop, with just a couple of other Linux distributions in use on a handful of machines. I run my own data-center internally with offsite infrastructure being my fail-over solution. We have triple-redundant Internet connectivity, triple-redundant HVAC and triple-redundant power and it’s been extremely reliable for me over the last 20 years with no data loss, no security incidents and nearly zero downtime.

All that said, I’m contemplating moving the servers offsite so that some outside provider can deal with any hardware issues and upgrades, leaving me to deal solely with network design, OS maintenance and service config and support. In the process of moving, I thought that re-deploying everything under NixOS would help me to keep it all manageable.

I’ve been using NixOS for about a year now on a number of personal machines, as well as some employee laptops and I’ve been able to accomplish what I set out to do in all those scenarios, so I’ve got some confidence going with it. That said, I noticed that while I was attempting to duplicate one of our more complicated servers offsite with NixOS, that a few of the packages were unmaintained.

I’m a stickler for security, and historically I have religiously updated any services and packages we utilize in a very rapid fashion. I’d like to bring that same mentality forward with me into NixOS, but as a single Systems Administrator, I’m wondering if it’s feasible for me track all the packages I’ll be using in NixOS and then rapidly and manually update them when security or bug fixes are announced upstream. One example package I ran into yesterday was Apache Solr, which appears to be unmaintained, so maybe that’s a good example.

TLDR: I don’t have any experience with what it would take to personally maintain my own set of package updates for NixOS and I’m wondering if it would be possible for me, as a single sys-admin, to guarantee a high level of system security when using NixOS to natively provide services to external parties over the Internet. Would I instead be stuck with utilizing containers for any external facing service, in order to make rapid updates manageable?

PS: I’m also nervous that besides Vulnix there doesn’t appear to be much support for using automated tools to double-check the security posture of NixOS installations, except for maybe generic port scanning and the like.

emmanuelrosa · October 13, 2022, 2:46am

What I can tell you is this. Nixpkgs has update scripts for a variety of packages. Those scripts can be executed locally. Therefore it’s possible to check packages for upstream updates. If a package you need doesn’t have a script, then you can write one. It’s also possible to build NixOS from a clone of the Nixpkgs repo, which allows you to build NixOS with whatever packages you want. Alternatively you can have a repo of overrides which you can use to add packages and even replace packages In Nixpkgs.

peterhoeg · October 13, 2022, 3:12am

It’s also possible to build NixOS from a clone of the Nixpkgs repo

Probably this as it allows you to manage what goes where when.

But, the technical solution looks to me to be the least of your worries.

I’m not trying to tell you how to do your job, but a lot of critical infrastructure is tied up on one person. This is extremely high risk and probably what you want to address first.

aanderse · October 13, 2022, 11:48am

I was in a similar position once. I decided to take the plunge and install NixOS at work. By the end of my run at that company I had replaced between between 50-70 Debian/RHEL servers with about 40 NixOS machines. There were some stressful points but overall what a great experience and it saved me countless hours, allowing me to really expand my scope.

Years ago I needed to provide solr at work. I saw the package was not maintained. I will the opportunity to learn both solr and more about NixOS modules and I rewrote it. It worked well. Eventually I didn’t have the need to provide solr anymore so I stopped maintaining it. You have 2 choices for software that is obscure in our community: learn it really well and maintain it yourself, or, keep that server running on another distro or use docker, etc… If there are only a few pieces of software you run that aren’t receiving much support in NixOS land then another distro/docker is pragmatic.

If you want to post what sort of software/workglows you manage I’m sure many people can comment on how well supported that is in NixOS.

Quagmire · October 13, 2022, 2:01pm

Thanks so much for the feedback so far, it’s greatly appreciated.

@peterhoeg Yes, the current situation definitely isn’t ideal, which is one of the reasons I am contemplating this change. Our facility is oddly located in a small farming community, sort of in the middle of nowhere, so there just isn’t much technical expertise available there for us to bring in any additional help. I’m hopeful that an infrastructure move may expand our hiring opportunities out of the immediate area.

@aanderse I don’t think we’re doing anything too outside of the box, which is probably helpful. If I were going to ramble off some primary items…

Java Development
Postgres
Wildfly
Jenkins
MySQL
Apache/PHP
HAProxy
Firefox/Thunderbird/LibreOffice
Squid
Bind
Dovecot/Apache Solr
Asterisk Phone Server

aanderse · October 13, 2022, 2:50pm

Many of the items you listed will be fine to run on NixOS

I’ll warn you about downtime, though. By managing configuration with NixOS you are going to cause service restarts every time you tweak anything. You mentioned downtime specifically so I want to make sure you understand how this works with NixOS.

If this turns out to be important to you, like it was to me when I was in a situation similar to yours, there are some strategies you can use to avoid service restarts every time you tweak anything. Let me know if it is important and you would like to discuss further.

Solene · October 13, 2022, 4:15pm

the workstations may not be able to run binaries downloaded on the Internet, this may be very annoying for the users, especially if they are developers.

koshcher · September 21, 2024, 2:19pm

I would really be interested in 0 downtime deployments of my services and some automations around it.
I can definitely do it through manually changing configuration 2 times and rebuilding 2 times.
For example I have some code like this for service:

{ config, pkgs, ... }:

let
  bluePort = 6100;
  greenPort = bluePort + 1;

  feedhub = port: {
    enable = true;
    description = "Feedhub systemd instance";

    wantedBy = [ "multi-user.target" ];

    environment = {
      ASPNETCORE_URLS = "http://localhost:${toString port}";
    };

    # env is currently in .env file right beside project files
    serviceConfig = {
      Type = "simple";
      Restart = "always";
      StateDirectory = "feedhub";
      WorkingDirectory = "/var/lib/feedhub/bin";
      ExecStart = "${pkgs.dotnet-aspnetcore_8}/bin/dotnet ./Web.dll";
      RestartSec = 0;
    };
  };
in {
  # 2 instances
  systemd.services.feedhub-blue = feedhub bluePort;
  # can't 
  # systemd.services.feedhub-green = feedhub greenPort;

  services.caddy.virtualHosts."feedhub.cookingweb.dev".extraConfig = ''
    reverse_proxy :${toString bluePort}
  '';
    
}