Hello,
I’m stuck on a problem.
How to use the smartctl exporter? When I include it in “node” it crashes the service.
If I add it in its own exporter, it has no access to the disks
Are there any special rights to be granted?
thank you for your help!
hexa
May 12, 2022, 1:51pm
2
services.prometheus.exporters.smartctl = {
enable = true;
};
services.udev.extraRules = ''
SUBSYSTEM=="nvme", KERNEL=="nvme[0-9]*", GROUP="disk"
'';
The udev rule is to allow access to the raw nvme device to the disks group, should you have any, which is what the exporter automatically discovers.
committed 12:37PM - 27 Jan 22 UTC
When no devices are given the exporter tries to autodiscover available
disks. Th… e previous DevicePolicy was however preventing the exporter
from accessing any device at all, since only explicitly mentioned ones
were allowed.
This commit adds an allow rule for several device classes that I could
find on my machines, that gets set when no devices are explicitly
configured.
There is an existing problem with nvme devices, that expose a character
device at `/dev/nvme0`, and a (namespaced) block device at
`/dev/nvme0n1`. The character device does not come with permissions that
we could give to the exporter without further impacting the hardening.
crw------- 1 root root 247, 0 27. Jan 03:10 /dev/nvme0
brw-rw---- 1 root disk 259, 0 27. Jan 03:10 /dev/nvme0n1
The autodiscovery only finds the character device, which the exporter
unfortunately does not have access to.
However a simple udev rule can be used to resolve this:
services.udev.extraRules = ''
SUBSYSTEM=="nvme", KERNEL=="nvme[0-9]*", GROUP="disk"
'';
Unfortunately I'm not fully aware of the security implications this
change carries and we should question upstream (systemd) why they did
not include such a rule.
The disk group has no members on any of my machines.
❯ getent group disk
disk:x:6:
3 Likes
interesting, unfortunately I don’t have the access authorization.
mai 15 20:53:38 cobblepot systemd[1]: Started prometheus-smartctl-exporter.service.
mai 15 20:53:38 cobblepot smartctl_exporter[3862540]: [Warning] S.M.A.R.T. output reading error: exit status 2
mai 15 20:53:38 cobblepot smartctl_exporter[3862540]: [Warning] The device error log contains records of errors.
mai 15 20:53:38 cobblepot smartctl_exporter[3862540]: [Error] Smartctl open device: /dev/nvme0n1 failed: Operation not permitted
mai 15 20:53:38 cobblepot smartctl_exporter[3862540]: [Error] smartctl returned bad data for device /dev/nvme0n1
mai 15 20:53:38 cobblepot smartctl_exporter[3862540]: [Info] Starting on 0.0.0.0:9633/metrics
Do I have to create special rights?
My disks sda,sdb etc… and my disk nvme are in right root:disk
I tried to change the rights of the disks or the smartctl exporter unfortunately nothing helps
“Operation not permitted”
hexa
May 18, 2022, 10:25am
5
Does /dev/nvme0n1
belong to the disks group? If so the exporter should be able to access it via SupplementaryGroups = [ "disk" ];
Should look like so:
brw-rw---- 1 root disk 259, 0 22. Mär 17:00 /dev/nvme0n1
Thank you for your help !
Unfortunately, nothing helps, I tried to use it: user:root group:disks.
The service has already supplementarygroups = [" disk "];
My disks are in:
brw-rw---- 1 root disk 259, 0 18 mai 15:40 /dev/nvme0n1
Maybe a problem to access the smartd program?
deni
October 27, 2022, 1:37pm
7
I don’t know if it’s the same issue but this fixed it for me:
NixOS:master
← MalteT:fix/smartctl-exporter-override
opened 01:44PM - 06 Jun 22 UTC
###### Description of changes
Tweak mkOverride priority on `DeviceAllow=` for s… martctl prometheus exporter service.
Fixes #176524.
Tested in my own config, both with `services.prometheus.exporters.smartctl.devices` set and without it.
###### Things done
- Built on platform(s)
- [x] x86_64-linux
- [ ] aarch64-linux
- [ ] x86_64-darwin
- [ ] aarch64-darwin
- [ ] For non-Linux: Is `sandbox = true` set in `nix.conf`? (See [Nix manual](https://nixos.org/manual/nix/stable/command-ref/conf-file.html))
- [ ] Tested, as applicable:
- [NixOS test(s)](https://nixos.org/manual/nixos/unstable/index.html#sec-nixos-tests) (look inside [nixos/tests](https://github.com/NixOS/nixpkgs/blob/master/nixos/tests))
- and/or [package tests](https://nixos.org/manual/nixpkgs/unstable/#sec-package-tests)
- or, for functions and "core" functionality, tests in [lib/tests](https://github.com/NixOS/nixpkgs/blob/master/lib/tests) or [pkgs/test](https://github.com/NixOS/nixpkgs/blob/master/pkgs/test)
- made sure NixOS tests are [linked](https://nixos.org/manual/nixpkgs/unstable/#ssec-nixos-tests-linking) to the relevant packages
- [ ] Tested compilation of all packages that depend on this change using `nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD"`. Note: all changes have to be committed, also see [nixpkgs-review usage](https://github.com/Mic92/nixpkgs-review#usage)
- [ ] Tested basic functionality of all binary files (usually in `./result/bin/`)
- [22.11 Release Notes (or backporting 22.05 Release notes)](https://github.com/NixOS/nixpkgs/blob/master/CONTRIBUTING.md#generating-2211-release-notes)
- [ ] (Package updates) Added a release notes entry if the change is major or breaking
- [ ] (Module updates) Added a release notes entry if the change is significant
- [ ] (Module addition) Added a release notes entry if adding a new NixOS module
- [ ] (Release notes changes) Ran `nixos/doc/manual/md-to-db.sh` to update generated release notes
- [x] Fits [CONTRIBUTING.md](https://github.com/NixOS/nixpkgs/blob/master/CONTRIBUTING.md).
<!--
To help with the large amounts of pull requests, we would appreciate your
reviews of other pull requests, especially simple package updates. Just leave a
comment describing what you have tested in the relevant package/service.
Reviewing helps to reduce the average time-to-merge for everyone.
Thanks a lot if you do!
List of open PRs: https://github.com/NixOS/nixpkgs/pulls
Reviewing guidelines: https://nixos.org/manual/nixpkgs/unstable/#chap-reviewing-contributions
-->
There is also another workaround in the linked issue here:
opened 08:50AM - 06 Jun 22 UTC
0.kind: bug
6.topic: nixos
### Describe the bug
`prometheus-smartctl-exporter` cannot access my devices. H… ere's an example log:
```
Jun 05 21:02:22 faunus-ater smartctl_exporter[195740]: [Error] Smartctl open device: /dev/sde failed: Operation not permitted
Jun 05 21:02:22 faunus-ater smartctl_exporter[195740]: [Error] smartctl returned bad data for device /dev/sde
Jun 05 21:02:22 faunus-ater smartctl_exporter[195740]: [Warning] S.M.A.R.T. output reading error: exit status 2
Jun 05 21:02:22 faunus-ater smartctl_exporter[195740]: [Warning] The device error log contains records of errors.
```
I'm almost certain, that the issue is an empty `DeviceAllow=`, as the service runs fine without it, but I'd like a second opinion before sending a PR.
### Steps To Reproduce
Steps to reproduce the behavior:
1. Configure `services.prometheus` and set `services.prometheus.exporters.smartctl.enable = true`
2. Watch `journalctl -u prometheus-smartctl-exporter`
3. [Optional] Override DeviceAllow list:
```nix
systemd.services."prometheus-smartctl-exporter".serviceConfig.DeviceAllow = lib.mkOverride 10 [
"block-blkext rw"
"block-sd rw"
"char-nvme rw"
]
```
The service will work now.
### Expected behavior
`smartctl` should run without errors.
### Additional context
Adding `services.prometheus.exporters.smartctl.devices = [ ... ]` has the same issue, overriding `DeviceAllow` fixes it aswell.
Possible offending code:
https://github.com/NixOS/nixpkgs/blob/d9794b04bffb468b886c553557489977ae5f4c65/nixos/modules/services/monitoring/prometheus/exporters.nix#L193
Not sure why the override is not working, though:
https://github.com/NixOS/nixpkgs/blob/c8c9a5b0218fccadadd5595a8d4960b3153665cc/nixos/modules/services/monitoring/prometheus/exporters/smartctl.nix#L53-L61
### Notify maintainers
<!--
Please @ people who are in the `meta.maintainers` list of the offending package or module.
If in doubt, check `git blame` for whoever last touched something.
-->
@mweinelt
### Metadata
Please run `nix-shell -p nix-info --run "nix-info -m"` and paste the result.
```console
[user@system:~]$ nix-shell -p nix-info --run "nix-info -m"
- system: `"x86_64-linux"`
- host os: `Linux 5.17.2, NixOS, 22.05 (Quokka), 22.05.20220413.ff9efb0`
- multi-user?: `yes`
- sandbox: `yes`
- version: `nix-env (Nix) 2.8.0pre20220411_f7276bc`
- nixpkgs: `/nix/var/nix/profiles/per-user/root/channels/nixos`
```
systemd.services."prometheus-smartctl-exporter".serviceConfig.DeviceAllow = lib.mkOverride 10 [
"block-blkext rw"
"block-sd rw"
"char-nvme rw"
]
hexa
October 27, 2022, 1:51pm
8
Merged the fix just now. Apparently I forgot to follow up on the issue, sorry.
There is still a problem.
It seems to be impossible to read smart attributes of any disks attached to KVM VMs from the VM and the host should take care of that.
At the same time libvirtd seems to be reassigning the ownership of the main block devices attached to the guests to root:root
❯ ls -la /dev/sd?
brw-rw---- 1 root disk 8, 0 Nov 5 20:33 /dev/sda
brw-rw---- 1 root root 8, 16 Nov 5 20:33 /dev/sdb
brw-rw---- 1 root root 8, 32 Nov 5 20:33 /dev/sdc
brw-rw---- 1 root root 8, 48 Nov 5 20:33 /dev/sdd
brw-rw---- 1 root root 8, 64 Nov 5 00:03 /dev/sde
brw-rw---- 1 root root 8, 80 Nov 5 00:03 /dev/sdf
brw-rw---- 1 root root 8, 96 Nov 5 00:03 /dev/sdg
brw-rw---- 1 root disk 8, 112 Nov 5 00:03 /dev/sdh
brw-rw---- 1 root disk 8, 128 Nov 5 00:03 /dev/sdi
brw-rw---- 1 root disk 8, 144 Nov 5 00:03 /dev/sdj
brw-rw---- 1 root root 8, 160 Nov 5 20:31 /dev/sdk
brw-rw---- 1 root disk 8, 176 Nov 5 00:03 /dev/sdl
brw-rw---- 1 root root 8, 192 Nov 5 00:03 /dev/sdm
brw-rw---- 1 root disk 8, 208 Nov 5 00:03 /dev/sdn
Disks with root:disk permissions are not attached to the VMs, others are.
hexa:
services.udev.extraRules
I think it would be great to add a note somewhere about it, it took me a few hours to find this post for the workaround. As security implications aren’t clear, maybe just a little note/suggested workaround on the “enable” flag?
hexa
December 8, 2022, 3:15pm
11
Agreed. It’s an important bit to get the autodiscovery of devices going, which is pretty useful. I think we should just add the udev rule into the module for now.
A friendly reminder that I do not think that the udev rule has been added to the module yet. I had to manually add the udev rule in order to not get permission denied.
hexa
July 8, 2023, 12:27pm
13
The discussion was moved upstream to systemd, as the udev rules we use are managed by them and are rolled out uniformly on many Linux distributions.
opened 03:09PM - 10 Jan 23 UTC
needs-discussion 🤔
udev
### systemd version the issue has been seen with
252.1 with NixOS patches
… ### Used distribution
NixOS, following master branch, at c4e1db0e2571a5f134471e6aa7b7fce129c7d822
### Linux kernel version used
5.15.80
### CPU architectures issue was seen on
x86_64
### Component
udev rule files
### Expected behaviour you didn't see
All devices corresponding to an NVMe disk have the same owner, group, and POSIX filesystem permissions by default.
### Unexpected behaviour you saw
```
$ ls -l /dev/nvme0*
crw------- 1 root root 249, 0 Jan 10 00:35 /dev/nvme0
brw-rw---- 1 root disk 259, 0 Jan 10 00:35 /dev/nvme0n1
brw-rw---- 1 root disk 259, 1 Jan 10 00:35 /dev/nvme0n1p1
brw-rw---- 1 root disk 259, 2 Jan 10 00:35 /dev/nvme0n1p2
```
Note that the top-level device has a different group and different mode. (Aside: note that it's a char device; all block access to NVMe devices happens via [namespaces](https://www.flashmemorysummit.com/English/Collaterals/Proceedings/2013/20130812_PreConfD_Marks.pdf). This is what causes the discrepancy: the char device doesn't match the rule from `50-udev-default.rules` that assigns group `disk` to all block devices.)
### Steps to reproduce the problem
1. Have an NVMe device in a system that uses systemd-udevd with its default configuration.
2. Look at `/dev/nvme*`
### Additional program output to the terminal or log subsystem illustrating the issue
_No response_
The udev rule that makes disk char devices owned by the disk
group is something that you can decide to use for yourself. For nixpkgs we could ideally separate that access into a dedicated group that regulates access to the raw io interface.
NixOS:master
← Frostman:prometheus-smartctl-exporter-fix-nvme
opened 06:57PM - 08 Dec 22 UTC
As discussed a few times, it seems to be the only way to make smartctl work for … NVMe devices and so enabling it when smartctl exporter is enabled.
https://github.com/NixOS/nixpkgs/commit/12c26aca1fd55ab99f831bedc865a626eee39f80
It's independent of #205123 and compatible with both versions
Not the most elegant way to handle it, would be happy to learn a better way
P.S. I've tested it multiple time in a VM and it seems to be working correctly
###### Description of changes
<!--
For package updates please link to a changelog or describe changes, this helps your fellow maintainers discover breaking updates.
For new packages please briefly describe the package or provide a link to its homepage.
-->
###### Things done
- Built on platform(s)
- [x] x86_64-linux
- [ ] aarch64-linux
- [ ] x86_64-darwin
- [ ] aarch64-darwin
- [ ] For non-Linux: Is `sandbox = true` set in `nix.conf`? (See [Nix manual](https://nixos.org/manual/nix/stable/command-ref/conf-file.html))
- [x] Tested, as applicable:
- [NixOS test(s)](https://nixos.org/manual/nixos/unstable/index.html#sec-nixos-tests) (look inside [nixos/tests](https://github.com/NixOS/nixpkgs/blob/master/nixos/tests))
- and/or [package tests](https://nixos.org/manual/nixpkgs/unstable/#sec-package-tests)
- or, for functions and "core" functionality, tests in [lib/tests](https://github.com/NixOS/nixpkgs/blob/master/lib/tests) or [pkgs/test](https://github.com/NixOS/nixpkgs/blob/master/pkgs/test)
- made sure NixOS tests are [linked](https://nixos.org/manual/nixpkgs/unstable/#ssec-nixos-tests-linking) to the relevant packages
- [x] Tested compilation of all packages that depend on this change using `nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD"`. Note: all changes have to be committed, also see [nixpkgs-review usage](https://github.com/Mic92/nixpkgs-review#usage)
- [x] Tested basic functionality of all binary files (usually in `./result/bin/`)
- [23.05 Release Notes (or backporting 22.11 Release notes)](https://github.com/NixOS/nixpkgs/blob/master/CONTRIBUTING.md#generating-2305-release-notes)
- [ ] (Package updates) Added a release notes entry if the change is major or breaking
- [ ] (Module updates) Added a release notes entry if the change is significant
- [ ] (Module addition) Added a release notes entry if adding a new NixOS module
- [ ] (Release notes changes) Ran `nixos/doc/manual/md-to-db.sh` to update generated release notes
- [x] Fits [CONTRIBUTING.md](https://github.com/NixOS/nixpkgs/blob/master/CONTRIBUTING.md).
<!--
To help with the large amounts of pull requests, we would appreciate your
reviews of other pull requests, especially simple package updates. Just leave a
comment describing what you have tested in the relevant package/service.
Reviewing helps to reduce the average time-to-merge for everyone.
Thanks a lot if you do!
List of open PRs: https://github.com/NixOS/nixpkgs/pulls
Reviewing guidelines: https://nixos.org/manual/nixpkgs/unstable/#chap-reviewing-contributions
-->