-
Notifications
You must be signed in to change notification settings - Fork 6.3k
fix(docker): add netdata user to nvidia device group on non-Debian systems #21358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes GPU monitoring on non-Debian systems by ensuring the netdata user has access to NVIDIA device files. On distributions like openSUSE, NVIDIA devices are restricted to a specific group (e.g., video) instead of being world-readable as on Debian/Ubuntu.
Key Changes:
- Added
add_netdata_to_nvidia_group()function to dynamically add netdata user to the NVIDIA device group - Function checks
/dev/nvidiactlownership and adds netdata to that group if it's not root-owned
Comments suppressed due to low confidence (1)
packaging/docker/run.sh:29
- Corrected spelling of 'apend' to 'append' in the usermod flag. This is an existing bug in the Proxmox function that should be fixed. The new nvidia function correctly uses '--append' on line 55.
if ! usermod --apend --groups "${group_guid}" "${DOCKER_USR}"; then
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 issue found across 1 file
Prompt for AI agents (all 1 issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="packaging/docker/run.sh">
<violation number="1" location="packaging/docker/run.sh:55">
`usermod` is called with the numeric GID instead of the group name, so adding the netdata user to the NVIDIA device group fails and GPU monitoring remains broken.</violation>
</file>
Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No issues found across 1 file
…stems (netdata#21358) (cherry picked from commit 5920582)
Summary
Fixes #21353
On Debian/Ubuntu, NVIDIA device files are world-readable (crw-rw-rw-), so netdata can access them without special permissions:
However, on some distributions like openSUSE Leap 15.6, these devices are owned by the video group with restricted permissions (crw-rw----):
This causes GPU monitoring to fail because the netdata user cannot access the device.
This PR adds the
netdatauser to the group that owns/dev/nvidiactl(if it's not root), following the same pattern used for Proxmox configuration files access.Test Plan
Additional Information
For users: How does this change affect me?
Summary by cubic
Ensure Netdata can access NVIDIA device files on non-Debian systems by adding the netdata user to the device’s group at container startup. Fixes broken GPU monitoring on distros where /dev/nvidiactl is group-restricted (e.g., openSUSE).
Written for commit 7a7d986. Summary will update automatically on new commits.