RAM-limiting firefox for pathological tabbers

So, I’m a pathological tab user, and my firefox RAM usage is crippling my use of the rest of my machine -
so I figured I would limit the ram usage using a cgroup.

No matter what I try I can’t actually manage to get it to work.
Does anyone have any ideas, or any systemd maintainer friends? :stuck_out_tongue: Memory accounting regression for user units · Issue #9502 · systemd/systemd · GitHub

It’s been a few weeks since I poked at this.

The gist is to run firefox with systemd-run --user and set a cgroup memory limit on it, but this seems to be broken for --user?

I do have a working setup of cgroupsv2 and systemd-run. I don’t remember
how I arrived at it, it’s a collection of various options and hacks
which is not straight-forward from any single wiki or guide. I really
ought to document this on a blog-post but for now this mail will do.

Systemd has a complicated relationship with cgroups. Basically systemd
as a project added semantic meaning to cgroups hierarchies and some
people where not happy with that. At the time, it was known on the
kernel ML that cgroups as separated controller (blkio, cpu, memory) on
separated hierarchies led to problems and weird hacks. The idea was to
do a new version of cgroups that only had a single hierarchy and a
single process that would modify such hierarchy. That job naturally
would be for systemd on this new version.

So to make the story short, to use systemd-run to limit group of
processes, it is better to only use cgroups v2 interface. On nixos
this is straight forward as putting

boot = {
    kernelParams = [ "cgroup_no_v1=all"
                     "systemd.unified_cgroup_hierarchy=yes"];
}

on configuration.nix. What this does is pretty clear from the options
names, so no further explanation will be given.

On boot-up, you can check your new /sys/fs/cgroup. Mine does look like
this:

.
├── init.scope
├── system.slice
│   ├── accounts-daemon.service
│   ├── atd.service
│   ├── (..)
│   ├── upower.service
│   └── wpa_supplicant.service
└── user.slice
    ├── user-1000.slice
    │   ├── session-2.scope
    │   └── user@1000.service
    │       ├── geoclue-agent.service
    │       └── (..)
    │       └── redshift.service
    └── user-78.slice
        └── user@78.service
            └── init.scope

On cgroupsv2, the hierarchy is different that on cgroupsv1. You can
check the full docs on the Linux kernel repo (you should, it’s
interesting), but the gist of it is

- Processes are either at the root or the leaves of the hierarchy.
  No process can belong to a node with children except on the root
  node.

- Hierarchy on controllers. The available controllers are at the
  file /sys/fs/cgroup/cgroup.controller , but we can delegate
  to lower level limits on some controller writing for example:
      # echo +memory \
          >>/sys/fs/cgroup/system.slice/cgroup.subtree_control
  But if we do, we have to add that controller to the
  subtree_control file all the way to the root. There are some
  pretty good diagrams on the web on how this is set up, I can't do
  a good job with pure text :-) .

So where does this leave us? we want to run systemd-run to limit some
memory intensive program. In fact I have one right in mind

module Main where

import Data.List (foldl', foldl, foldr)

main :: IO ()
main = putStrLn . show $ foldr (+) 0 [1..10000000000]

Any haskeller worth its salt will tell you this leaks (on my crappy
netbook, more so). So I compile this program, call it foldr. Let’s limit
the amount of memory it will use, so as per the previous point I need to
be sure the cgroup I will create will be able to limit memory, that is,
I need to check that the subtree_control of that cgroup has the memory
controller activated. For that first I create the cgroup with a
shell on it like this:

systemd-run --user --scope -p MemoryHigh=1800M -p MemoryMax=2G \
-p MemorySwapMax=40M zsh

This command will print to which unit (and which cgroup) it does belong.
That info we could also have gotten it from /proc/$$/cgroup . We search
for that cgroup on the hierarchy at /sys/fs/cgroup/ and read the
subtree_control file. Mine was at

cd user.slice/user-1000.slice/user@1000.service/run-r9e5d9b1bcabe4e1dac0f55a4ef1414a5.scope

Here I read the file subtree_control and it’s empty. For it to take
effect it should have ‘memory’ written on it. Luckily ‘memory’ it’s
present on the file cgroup.controller , which means I can do

# echo +memory >>cgroup.subtree_control

and enable the limits for this group. If ‘memory’ wouldn’t have been
available at cgroups.controllers, I would had to ‘cd …’ a level and try
to do the same until it succeeded and then iterate downwards.

Now we can check everything is in order. The files memory.high,
memory.max and memory.swap.max should be set to the values on you passes
to systemd-run. The ‘foldr’ program will be a child of the current shell
which is in this cgroup, when run it will also belong to this cgroup and
should be limited.

$ ~/foldr

On another shell I run

$ ps -o %mem,rss,comm -p $(pgrep foldr)
%MEM   RSS COMMAND
50.6 1836320 foldr

And that remains stable not increasing from the limit I set. Depending
on the memory.high vs memory.max value you set, the OOM will be
triggered. I killed the process with a kill command on another shell. I
recover user control of the shell and execute exit to kill the cgroup.

It’s a involved setup, but it fun knowing you can remain interactive
even when using bad programs (hello Matlab on my shitty netbook). You
need some setup but it’s nothing once done and you can always use your
shell history to remember how it’s done.

Good luck

3 Likes

One of my hunches was that the hybrid mode might be doing something weird! I only now was about to try disabling it to see what happens. I’d have some difficulty collecting all the random junk I’ve pulled together here because I haven’t been taking notes properly.

https://github.com/NixOS/nixpkgs/issues/73800

Anyway, I’m pleasantly surprised someone here knew something!
…I’m still reading the post \o/ …

Thank you so much! I’ll try this out.

Do you happen to be aware of systemd-cgls and systemd-cgtop? (Though these seem to have some idiosyncrasies in their flags…) systemctl status ... will also give some information about limits I think.

Have you tried setting MemoryAccounting=true instead of manually enabling the memory controller?

It’s kind of frustrating how close I feel I was to the solution, but couldn’t make the final jumps. :stuck_out_tongue:
A quick test in a build-vm suggests this does indeed work! systemctl status ... shows a memory limit being applied now, and I don’t even have to jump through any hoops enabling the memory controller. It just works.
Did you have any problems?

I don’t suppose you know anything about debugging systemd/systemd-run? There seems to be a lot of dbus involved. It still bothers me that this didn’t work in hybrid mode, but I probably wont look into it further.

Edit: it seems like echo 3 > /proc/sys/vm/drop_caches dumped about 6GB of RAM, so I guess it was the ZFS caches.
systemd / cgroups (?) seems to have a strange way of calculating memory usage though.
I have 16GB of RAM.
I have 3 gigs reported available by free -h, 12GB is in use by by the top level cgroup accorting to cgtop, 6.3 is used by my user (5.3 is used by firefox). No idea where the other 4.7GB of RAM went. There is nothing else major running on my system…

Control Group                                                                                   Tasks   %CPU   Memory  Input/s Output/s
/                                                                                                 766   67.8    12.7G        -        -
user.slice                                                                                        545   69.4     6.9G        -        -
user.slice/user-1000.slice                                                                        541   69.3     6.8G        -        -
user.slice/user-1000.slice/user@1000.service                                                      339   63.3     5.2G        -        -
user.slice/user-1000.slice/user@1000.service/run-rd474d6cf8a1c42b297739b80971ad9ae.scope          331   63.3     5.2G        -        -
user.slice/user-1000.slice/session-2.scope                                                        202    6.0     1.6G        -        -