Explaining modern server monitoring stacks for self-hosting with NixOS

Solene · September 11, 2022, 5:19pm

Solene · September 11, 2022, 5:21pm

I don’t know how to refresh the open graph information in discourse to fix the typo

Lun · September 11, 2022, 5:24pm

Add some random query string to the link?

Solene · September 11, 2022, 5:33pm

It worked! that was a good idea

TLATER · September 13, 2022, 10:54am

TIL about collectd. It’s probably time to revisit metrics Thanks!

Solene · September 13, 2022, 11:06am

there are also Telegraf as an agent, very minimal.

And netdata, but it can works standalone providing SO MANY METRICS without any configuration pain if you don’t want to centralize metrics.

NickCao · September 13, 2022, 3:55pm

I’m using plain telegraf with prometheus without a dashboard. Years of operation learned me a lesson: I don’t have a spare display for staring at the metrics, a handful of alerts for the most critical events are more than enough.

Solene · September 13, 2022, 5:13pm

Metrics are often useful to understand a problem. It doesn’t replace alerting.

Solene · September 15, 2022, 6:54am

Wow, netdata is a lot more powerful than when I tested it ~8 years ago. It now has persistent data storage on disk, instead of keeping everything in memory

And it recently got machine learning based anomaly detection. I’m curious to see if it gives good results. My issue with netdata is that you can’t access the logs if the system is down

  services.netdata.enable = true;
  services.netdata.config = {
      global = {
          "page cache size" = 32; # max 32MB o memory
          "update every" = 30; # 30s interval
      };
      ml = {
          "enabled" = "yes"; # machine learning
      };
  };

valyala · September 17, 2022, 5:26am

Thanks for sharing various options for server monitoring in a clear way! The second option - VictoriaMetrics + vmagent - can be simplified further by removing vmagent from the configuration and using a sole VictoriaMetrics for metrics scraping - see these docs. This should reduce memory usage by another 13MB. Also try playing with -memory.allowedBytes command-line option at VictoriaMetrics if you need reducing memory usage even more.Here is the -help description for this option:

-memory.allowedPercent float
     Allowed percent of system memory VictoriaMetrics caches may occupy.
     See also -memory.allowedBytes. Too low a value may increase cache miss rate usually resulting in higher CPU and disk IO usage.
     Too high a value may evict too much data from OS page cache which will result in higher disk IO usage
     (default 60)

Solene · September 17, 2022, 7:43am

Wow cool! I’m going to try, thank you