Some questions about NixOps

jhaberku · November 9, 2020, 2:10pm

Hi everyone.

I recently starting playing with NixOps with EC2 as a target for hosting some personal servers (to start, a Git remote).

I don’t have a lot of experience administering this kind of infrastructure, so I’m learning as I go.

I encountered a few issues with NixOps. Rather than filing bugs immediately, I wanted to make sure I was on the right track first.

Missing manual contents

I stumbled on to NixOps User's Guide. I started with the VirtualBox example and was able to follow the machine-specific instructions for interacting with EC2.

However, I realized the version of NixOps I had installed was 1.7, so by changing that part of the URL I was able to get an updated manual.

On the other hand, when I search Google for “nixops”, the second result takes me to NixOps User's Guide.

That version of the manual seems to be missing a lot of instructive content. Is it intentional that it’s missing?

Warnings about the cache

(I’ll paste my EC2 configuration file in the next section.)

When I run nixops deploy, at the stage of

gitServer> copying closure...

the process seems to hang for at least 10 minutes and I get 3 to 4 warnings of this form:

gitServer> warning: unable to download 'https://cache.nixos.org/nix-cache-info': Timeout was reached (28); retrying in 307 ms

I’ve seen this consistently over a few days. I noticed some maybe-related issues in the bug-tracker, but (unless I’m mistaken) nothing exactly related.

Is this expected? If not, what can I do to investigate further?

Confusion about attaching EBS volums and NVMe devices

This is the machine configuration file:

let
  region = "ca-central-1";
  zone = "ca-central-1a";
  accessKeyId = "jhaberku";
in

{
  resources.ec2KeyPairs.jhaberku = { inherit region accessKeyId; };

  gitServer =
    { resources, ... }:
    {
      deployment.targetEnv = "ec2";
      
      deployment.ec2 = {
        inherit accessKeyId region zone;
        instanceType = "t3.micro";
        ebsInitialRootDiskSize = 3;
        elasticIPv4 = "8.8.8.8"; # Not actually.
        keyPair = resources.ec2KeyPairs.jhaberku;
        securityGroups = ["ssh"];
      };

      fileSystems."/var/lib/gitolite" = {
        fsType = "ext4";
        ec2.disk = "vol-foobarbaz"; # Not actually.
        device = "/dev/nvme1n1";
      };
    };
}

I’m a little unclear on the device attribute. My understanding is that this is an assignment that I choose, but one that I have to choose carefully.

When I selected something like /dev/xvdj, I got strange errors (I can reproduce them if it would be helpful).

Then, I noticed the error listed devices of the form /dev/nvme1n1 and I found some documentation for newly-added NVMe device support in NixOps. The documentation contradicted my observation, because it said NVMe devices were not applicable for t3.micro instances, but perhaps that’s changed on AWS’s side now.

I chose what I believe is the next-available “slot” and the deployment proceeded. The device is mounted on the expected point in the file-system, and everything seems to be working.

On the other hand, I consistently get a strange warning when I deploy:

$ nixops deploy
gitServer> warning: device ‘/dev/nvme1n1’ was manually detached!
gitServer> attaching volume ‘vol-foobarbaz’ as ‘/dev/nvme1n1’... 
building all machine configurations...
gitServer> copying closure...
jhaberku.net> closures copied successfully
gitServer> updating GRUB 2 menu...
gitServer> activating the configuration...
gitServer> setting up /etc...
gitServer> reloading user units for root...
gitServer> setting up tmpfiles
gitServer> activation finished successfully
jhaberku.net> deployment finished successfully

and nixops check reports issues:

$ nixops check
Machines state:
+-----------+--------+-----+-----------+----------+----------------+------------------------------------+------------------------------------------------------------+
| Name      | Exists | Up  | Reachable | Disks OK | Load avg.      | Units                              | Notes                                                      |
+-----------+--------+-----+-----------+----------+----------------+------------------------------------+------------------------------------------------------------+
| gitServer | Yes    | Yes | Yes       | No       | 0.05 0.03 0.01 |   sys-kernel-config.mount [failed] | volume ‘vol-foobarbaz’ not attached to ‘/dev/xvda’ |
|           |        |     |           |          |                | ● tmp.mount [failed]               |                                                            |
+-----------+--------+-----+-----------+----------+----------------+------------------------------------+------------------------------------------------------------+
Non machines resources state:
+----------+--------+
| Name     | Exists |
+----------+--------+
| jhaberku | Yes    |
+----------+--------+

What am I doing wrong (if anything)?

jhaberku · November 11, 2020, 9:01pm

To answer the second of my questions, the reason that the NixOS cache was not resolving was because I had disallowed outgoing HTTP and HTTPS connections from the instance.

I added a security group on the AWS console, added it to deployment.ec2.securityGroups, and now deploys are much faster and with no warnings.

Still curious about the other two issues, though!