I don’t know how to refresh the open graph information in discourse to fix the typo
Add some random query string to the link?
It worked! that was a good idea
TIL about collectd. It’s probably time to revisit metrics Thanks!
there are also Telegraf as an agent, very minimal.
And netdata, but it can works standalone providing SO MANY METRICS without any configuration pain if you don’t want to centralize metrics.
I’m using plain telegraf with prometheus without a dashboard. Years of operation learned me a lesson: I don’t have a spare display for staring at the metrics, a handful of alerts for the most critical events are more than enough.
Metrics are often useful to understand a problem. It doesn’t replace alerting.
Wow, netdata is a lot more powerful than when I tested it ~8 years ago. It now has persistent data storage on disk, instead of keeping everything in memory
And it recently got machine learning based anomaly detection. I’m curious to see if it gives good results. My issue with netdata is that you can’t access the logs if the system is down
services.netdata.enable = true;
services.netdata.config = {
global = {
"page cache size" = 32; # max 32MB o memory
"update every" = 30; # 30s interval
};
ml = {
"enabled" = "yes"; # machine learning
};
};
Thanks for sharing various options for server monitoring in a clear way! The second option - VictoriaMetrics + vmagent - can be simplified further by removing vmagent from the configuration and using a sole VictoriaMetrics for metrics scraping - see these docs. This should reduce memory usage by another 13MB. Also try playing with -memory.allowedBytes
command-line option at VictoriaMetrics if you need reducing memory usage even more.Here is the -help
description for this option:
-memory.allowedPercent float
Allowed percent of system memory VictoriaMetrics caches may occupy.
See also -memory.allowedBytes. Too low a value may increase cache miss rate usually resulting in higher CPU and disk IO usage.
Too high a value may evict too much data from OS page cache which will result in higher disk IO usage
(default 60)
Wow cool! I’m going to try, thank you