Skip to content

Conversation

@ilyam8
Copy link
Member

@ilyam8 ilyam8 commented Nov 27, 2025

Summary

Fixes #21353

On Debian/Ubuntu, NVIDIA device files are world-readable (crw-rw-rw-), so netdata can access them without special permissions:

crw-rw-rw- 1 root root 195, 255 Nov 24 19:22 /dev/nvidiactl

However, on some distributions like openSUSE Leap 15.6, these devices are owned by the video group with restricted permissions (crw-rw----):

crw-rw---- 1 root video 195, 255 25. Nov 19:53 /dev/nvidiactl

This causes GPU monitoring to fail because the netdata user cannot access the device.

This PR adds the netdata user to the group that owns /dev/nvidiactl (if it's not root), following the same pattern used for Proxmox configuration files access.

Test Plan
Additional Information
For users: How does this change affect me?

Summary by cubic

Ensure Netdata can access NVIDIA device files on non-Debian systems by adding the netdata user to the device’s group at container startup. Fixes broken GPU monitoring on distros where /dev/nvidiactl is group-restricted (e.g., openSUSE).

  • Bug Fixes
    • Detect /dev/nvidiactl group GID and add netdata to it; skip if root-owned.
    • Create a local group with the device’s GID if it doesn’t exist.
    • Run only when entrypoint is root, the container user is not root, and /dev/nvidiactl is present; applies the same guard to Proxmox group handling.

Written for commit 7a7d986. Summary will update automatically on new commits.

@ilyam8 ilyam8 requested a review from a team as a code owner November 27, 2025 11:57
@ilyam8 ilyam8 requested review from Copilot and removed request for a team November 27, 2025 11:57
@github-actions github-actions bot added the area/packaging Packaging and operating systems support label Nov 27, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes GPU monitoring on non-Debian systems by ensuring the netdata user has access to NVIDIA device files. On distributions like openSUSE, NVIDIA devices are restricted to a specific group (e.g., video) instead of being world-readable as on Debian/Ubuntu.

Key Changes:

  • Added add_netdata_to_nvidia_group() function to dynamically add netdata user to the NVIDIA device group
  • Function checks /dev/nvidiactl ownership and adds netdata to that group if it's not root-owned
Comments suppressed due to low confidence (1)

packaging/docker/run.sh:29

  • Corrected spelling of 'apend' to 'append' in the usermod flag. This is an existing bug in the Proxmox function that should be fixed. The new nvidia function correctly uses '--append' on line 55.
    if ! usermod --apend --groups "${group_guid}" "${DOCKER_USR}"; then

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@ilyam8 ilyam8 marked this pull request as draft November 27, 2025 12:09
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 1 file

Prompt for AI agents (all 1 issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="packaging/docker/run.sh">

<violation number="1" location="packaging/docker/run.sh:55">
`usermod` is called with the numeric GID instead of the group name, so adding the netdata user to the NVIDIA device group fails and GPU monitoring remains broken.</violation>
</file>

Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR

@ilyam8 ilyam8 marked this pull request as ready for review November 27, 2025 12:45
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 1 file

@ilyam8 ilyam8 enabled auto-merge (squash) November 27, 2025 12:53
@ilyam8 ilyam8 merged commit 5920582 into netdata:master Nov 27, 2025
117 checks passed
@ilyam8 ilyam8 deleted the docker-nvidia-group branch November 27, 2025 13:10
@stelfrag stelfrag mentioned this pull request Dec 1, 2025
stelfrag pushed a commit to stelfrag/netdata that referenced this pull request Dec 1, 2025
Ferroin pushed a commit that referenced this pull request Dec 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/packaging Packaging and operating systems support

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Podman: nvidia metrics broken since 2.8.1

2 participants