Im new to nix and I got here through frustration with other distributions creating a server environment. So I might not be entirely right by my judgement of the situation but it seems the cockpit nix package doesn’t work properly.
It starts up however some of the subprocesses/helper programs that cockpit uses dont seem to be pathed properly so things like installing extra cockpit modules(the derivations of said modules do work) and things like privilege escalation dont work.
When I try:
find /nix/store -name '*cockpit*'
All the necessary processes seem to be there but cockpit still looks in the old locations for some things. An example like cockpit-askpass is running
I have tried tying the cockpit paths overriding systemd but Im not sure if the paths or if its setting in general. Since this doesnt work. Im also unsure how to print out variables I have used to check if its working. Using lib.mkForce instead of lib.mkDefault provides the same bad result as in the screenshot so hierarchy isnt the issue I think.
I could also map them to /usr/libexec/cockpit-* but defeats the point of nixos. So thats not a solution I tried yet.
What are you trying to do when you get that message? Cockpit is pretty complex, and the services have all kinds of limitations in their envs for security purposes, so it’s entirely possible that you’re just doing something that the cockpit nix module isn’t intended to do (and not configured to give particularly nice error messages for).
IMHO cockpit generally doesn’t mesh well with NixOS, a lot of what it does is very imperative and just won’t work properly around these parts - but then, it’s been like half a decade since I last properly looked at it.
This stuff is pointless, by the way:
lib.mkDefault will make this be overridden by settings without a priority override, so this will just be ignored because the upstream module sets PATH without priorities AFAICT. Using mkForce sounds like it would have… interesting results, but probably not what you intend. If you want to add stuff, just write packages to the path attribute (without any priority overrides).
That said, the module already has coreutils set in the path, and stuff in libexec is explicitly never intended to be in $PATH, so it’d be very odd for the upstream build scripts not to do that correctly (and the module maintainers somehow not to notice).
I think it’s far more likely that what you’re doing runs up against systemd hardening and is just prohibited by the NixOS module than that the package is broken.
I dont believe Im doing anything strange although maybe I should try and run a cleaner environment to test. What I did with cockpit is how most ppl have configured it. For them it seems to work and but not in my situation which led me to think it was the package. Im also developing the config in a VM though so maybe its related to that. Although I couldnt think of a reason why. Thats my best guess.
I tried modifying the polkit permissions as well but I might have done something wrong with that. I was thinking cockpit might have been working with its own users/permissions/rules but I didnt pursue this line of thinking.
That might explain why it didnt have any effect regardless of hierarchy. I just did lib.mkDefault bc NixOS would complain otherwise that it was already set.
Any suggestions then?
I would like to have a dashboard and I looked for alternatives for the reason you stated but im using rootless podman for my containers which drastically limits my options. I the first you see if you look up podman server dashboard/manager is the home-dashboard project but that seemed like a lot of work since you had configure a lot of the panels/modules/etc yourself.
I believe you, I’d just like to know what specifically triggers that message (simple login? viewing logs?..) so I can guess which specific service to look at and think about the implications on cgroup permissions to give you a better debug experience
There’s also icinga and nagios for more integrated solutions. Basically, any of the alternatives that focus less on doing administrative tasks (since those should only ever be done with nixos-rebuild, which nothing reasonably supports) and more on monitoring.
That said, clearly someone’s maintaining cockpit, and we should figure out why it’s not working for you.
If you heavily rely on containers cockpit probably also works better for you, though I’d also softly suggest thinking about using NixOS modules instead of containers, if this is an option for your infrastructure.
##journalctl
`journalctl -eu cockpit` this is just from today after a restart:
dec 15 18:09:29 nixos systemd[1]: Starting Cockpit Web Service...
dec 15 18:09:29 nixos cockpit-certificate-ensure[11073]: /nix/store/m2l2bnrdfj14aymr82ia1jn45rxvyg6f-cockpit-328/libexec/.cockpit-certificate-helper->
dec 15 18:09:29 nixos cockpit-certificate-ensure[11074]: .....+...+..+...+.+.....+......+....+++++++++++++++++++++++++++++++++++++++*.+...+.+........>
dec 15 18:09:29 nixos cockpit-certificate-ensure[11074]: ..+........+....+...+........+....+...+.....+.........+....+......+...+...+........+....+..+>
dec 15 18:09:29 nixos cockpit-certificate-ensure[11074]: -----
dec 15 18:09:29 nixos systemd[1]: Started Cockpit Web Service.
dec 15 18:10:59 nixos systemd[1]: cockpit.service: Deactivated successfully.
dec 15 18:10:59 nixos systemd[1]: cockpit.service: Consumed 176ms CPU time, 3.8M memory peak, 1M read from disk, 8K written to disk.
Nothing really strange here sscg isnt found but it uses openssh as fallback as shown by the ssh-pattern image.
##cockpit-bridge
I can see errors from cockpit by trying to run cockpit manually using cockpit-bridge in the terminal.
Traceback (most recent call last):
File "/nix/store/m2l2bnrdfj14aymr82ia1jn45rxvyg6f-cockpit-328/lib/python3.12/site-packages/cockpit/protocol.py", line 130, in consume_one_frame
length = int(data[:newline])
^^^^^^^^^^^^^^^^^^^
ValueError: invalid literal for int() with base 10: b''
Traceback (most recent call last):
File "/nix/store/m2l2bnrdfj14aymr82ia1jn45rxvyg6f-cockpit-328/bin/.cockpit-bridge-wrapped", line 8, in <module>
sys.exit(main())
^^^^^^
File "/nix/store/m2l2bnrdfj14aymr82ia1jn45rxvyg6f-cockpit-328/lib/python3.12/site-packages/cockpit/bridge.py", line 315, in main
run_async(run(args), debug=args.debug)
File "/nix/store/m2l2bnrdfj14aymr82ia1jn45rxvyg6f-cockpit-328/lib/python3.12/site-packages/cockpit/_vendor/systemd_ctypes/event.py", line 135, in run_async
asyncio.run(main, debug=debug)
File "/nix/store/zv1kaq7f1q20x62kbjv6pfjygw5jmwl6-python3-3.12.7/lib/python3.12/asyncio/runners.py", line 194, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/nix/store/zv1kaq7f1q20x62kbjv6pfjygw5jmwl6-python3-3.12.7/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nix/store/zv1kaq7f1q20x62kbjv6pfjygw5jmwl6-python3-3.12.7/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/nix/store/m2l2bnrdfj14aymr82ia1jn45rxvyg6f-cockpit-328/lib/python3.12/site-packages/cockpit/bridge.py", line 166, in run
await router.communicate()
File "/nix/store/m2l2bnrdfj14aymr82ia1jn45rxvyg6f-cockpit-328/lib/python3.12/site-packages/cockpit/router.py", line 258, in communicate
await self._communication_done
File "/nix/store/m2l2bnrdfj14aymr82ia1jn45rxvyg6f-cockpit-328/lib/python3.12/site-packages/cockpit/protocol.py", line 192, in data_received
result = self.consume_one_frame(self.buffer)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nix/store/m2l2bnrdfj14aymr82ia1jn45rxvyg6f-cockpit-328/lib/python3.12/site-packages/cockpit/protocol.py", line 132, in consume_one_frame
raise CockpitProtocolError("frame size is not an integer") from exc
cockpit.protocol.CockpitProtocolError: frame size is not an integer
It might be nothing if I look at it from glance but it might be related to the systemd configuration not find a specific path since before these errors show up there is a warning also systemd is mentioned near the top of the trace:
cockpit.packages-WARNING: Could not detect libexecdir
This is related to executing programs by other programs which explains why no plugins and the escalation privilege porgram(cockpit-askpass) dont seem to work. libexec also aligns with the error in the screenshot since this is where Rhel places these programs to be executed I think. This is what infer from cockpit-tls
You probably know about libexec already I am just going through my thought process.
I dont have time to go even deeper so I hope this helps. Sorry I cant do more debugging. I also am not sure where they are putting the logs bc I dont see them in /var/*
All this happens on startup.
Ill try and get more info when I have time.
Ill check them out later this week.
whats the exact difference then between a module and a container bc from my understanding right now a module is a separated out section of the configuration. In my mind that would be the same as running it on the os-level. I do have my container configs tied to systemd though, each separated into their own modules.
Yep, that’s what I mean. NixOS modules generally set up services with cgroups anyway, so you don’t gain much from using containers, but you lose all the integration the module maintainer has done (and a lot of nix’ guarantees because most of your services are basically data now).
That said, this is a moot point if you have to use containers for other reasons.
That’s indeed interesting, I don’t see any explicit setup for that in the systemd services, and that binary is explicitly put in $PATH, so it should probably be expected to work.
Maybe it was introduced by the latest bump. I ran the NixOS tests and they passed. I am basically only trusting the test. Builds and passthru.tests run successfully? Merge!
I don’t have the time to cherry-pick cockpit from master into my systems.
EDIT 1: to monkey patch more stuff into the path the systemd.service of the services has the path option. That populates PATH properly.
When I have time Ill try and run it off of the nix-unstable version. Im currently trying this on the 24.11 release. Ill also try the 24.11 release in a vm as well to see if its something weird in my environment.
For time Im already working on moving what I have working in my config from my vm to the server.
All I was going to use it for was for monitoring and the shell, so I can try it on a relatively clean bare metal environment as well.
BTW when doing that bug report please show the steps to reproduce. I am guiding the bumps solely by the result of the NixOS test and I am using stable in all my machines.
That way we can reproduce the bug condition in the test and the next bumps will not have this issue.
Looks like a great bug report to me, I think they hadn’t seen it yet, and were afraid that they might have to ask you for the steps since you neglected to tell me what they were several times in a row. It’s why I was suspecting an XY problem
Guess explicitly asking for “reproduction steps” in the bug report template makes it clearer what is meant, though, sorry about the miscommunication!