-
Notifications
You must be signed in to change notification settings - Fork 6.3k
Labels
area/packagingPackaging and operating systems supportPackaging and operating systems supportbugcollectors/go.d
Description
Bug description
Recently I updated from version 2.8.0 to 2.8.1 and it looks like nvidia metrics are not working anymore.
I'm running the latest container with podman
nvidia_smi inside the container is working
root@marvin:/usr/libexec/netdata/plugins.d# nvidia-smi
Wed Nov 26 11:21:42 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.105.08 Driver Version: 580.105.08 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3050 On | 00000000:01:00.0 Off | N/A |
| 30% 44C P2 19W / 70W | 535MiB / 6144MiB | 1% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 29046 C /usr/lib/ffmpeg/7.0/bin/ffmpeg 260MiB |
| 0 N/A N/A 29059 C /usr/lib/ffmpeg/7.0/bin/ffmpeg 260MiB |
+-----------------------------------------------------------------------------------------+
If I run ./go.d.plugin -d -m nvidia_smi I got the following logs
root@marvin:/usr/libexec/netdata/plugins.d# ./go.d.plugin -d -m nvidia_smi
DBG godplugin/main.go:63 plugin: name=go.d, version=v2.8.1 user_config_dir=/etc/netdata stock_config_dir=/usr/lib/netdata/conf.d plugins_dir=/usr/libexec/netdata/plugins.d netdata_bin_dir=/usr/sbin component=agent
DBG godplugin/main.go:65 current user: name=root, uid=0 component=agent
INF godplugin/main.go:69 env HTTP_PROXY '', HTTPS_PROXY '' component=agent
INF godplugin/main.go:71 directories → config: [/etc/netdata /usr/lib/netdata/conf.d] | collectors: [/etc/netdata/go.d /usr/lib/netdata/conf.d/go.d] | sd: [/etc/netdata/go.d/sd /usr/lib/netdata/conf.d/go.d/sd] | varlib: component=agent
INF agent/agent.go:213 instance is started component=agent
INF agent/setup.go:23 loading config file component=agent
DBG agent/setup.go:31 looking for 'go.d.conf' in [/etc/netdata /usr/lib/netdata/conf.d] component=agent
INF agent/setup.go:38 found '/etc/netdata/go.d.conf component=agent
INF agent/setup.go:45 config successfully loaded component=agent
INF agent/agent.go:217 using config: enabled 'true', default_run 'false', max_procs '0' component=agent
INF agent/setup.go:50 loading modules component=agent
INF agent/setup.go:73 enabled/registered modules: 1/125 component=agent
INF agent/setup.go:79 building discovery config component=agent
DBG agent/setup.go:109 looking for 'nvidia_smi.conf' in [/etc/netdata/go.d /usr/lib/netdata/conf.d/go.d] component=agent
DBG agent/setup.go:125 found '/usr/lib/netdata/conf.d/go.d/nvidia_smi.conf component=agent
INF agent/setup.go:130 dummy/read/watch paths: 0/1/0 component=agent
INF discovery/manager.go:116 registered discoverers: [file discovery: [file reader] service discovery] component="discovery manager"
DBG agent/setup.go:153 looking for 'vnodes/' in [/etc/netdata /usr/lib/netdata/conf.d] component=agent
DBG vnodes/vnodes.go:99 '/usr/lib/netdata/conf.d/vnodes' is not a regular file, skipping it component=vnodes
INF agent/setup.go:164 found '/usr/lib/netdata/conf.d/vnodes' (0 vhosts) component=agent
INF discovery/manager.go:61 instance is started component="discovery manager"
INF functions/manager.go:49 instance is started component="functions manager"
INF jobmgr/manager.go:97 instance is started component="job manager"
DBG functions/ext.go:62 registering function 'config' with prefix 'go.d:collector:' component="functions manager"
DBG functions/ext.go:62 registering function 'config' with prefix 'go.d:vnode' component="functions manager"
CONFIG go.d:vnode create accepted template /collectors/go.d/Vnodes internal 'internal' 'add schema userconfig test' 0x0000 0x0000
CONFIG go.d:collector:nvidia_smi create accepted template /collectors/go.d/Jobs internal 'internal' 'add schema enable disable test userconfig' 0x0000 0x0000
INF sd/sd.go:66 instance is started component="service discovery"
INF file/discovery.go:69 instance is started component=discovery discoverer=file
INF file/read.go:48 instance is started component=discovery discoverer=file
INF file/read.go:49 instance is stopped component=discovery discoverer=file
DBG jobmgr/manager.go:144 received configs: 1/+1/-0 ('/usr/lib/netdata/conf.d/go.d/nvidia_smi.conf') component="job manager"
CONFIG go.d:collector:nvidia_smi:nvidia_smi create accepted job /collectors/go.d/Jobs stock 'discoverer=file_reader,file=/usr/lib/netdata/conf.d/go.d/nvidia_smi.conf' 'schema get enable disable update restart test userconfig' 0x0000 0x0000
DBG jobmgr/manager.go:311 creating nvidia_smi[nvidia_smi] job, config: map[__provider__:file reader __source__:discoverer=file_reader,file=/usr/lib/netdata/conf.d/go.d/nvidia_smi.conf __source_type__:stock autodetection_retry:0 module:nvidia_smi name:nvidia_smi priority:70000 update_every:10] component="job manager"
DBG nvidia_smi/exec.go:98 executing '/usr/sbin/nd-run /usr/bin/nvidia-smi -q -x -l 5' collector=nvidia_smi job=nvidia_smi
INF pipeline/pipeline.go:144 instance is started component="service discovery" pipeline=docker
INF dockersd/docker.go:100 instance is started component="service discovery" discoverer=docker
ERR dockersd/docker.go:117 Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running? component="service discovery" discoverer=docker
INF dockersd/docker.go:101 instance is stopped component="service discovery" discoverer=docker
INF pipeline/accumulator.go:92 discoverer 'sd:docker' exited before ctx done component="service discovery" pipeline=docker
INF pipeline/accumulator.go:61 all discoverers exited before ctx done component="service discovery" pipeline=docker
DBG pipeline/pipeline.go:165 received 0 target groups component="service discovery" pipeline=docker
INF pipeline/pipeline.go:163 instance is stopped component="service discovery" pipeline=docker
ERR module/job.go:244 init failed: process exited before the first sample was collected collector=nvidia_smi job=nvidia_smi
HOST ''
HOST ''
CONFIG go.d:collector:nvidia_smi:nvidia_smi delete
INF pipeline/pipeline.go:144 instance is started component="service discovery" pipeline="network listeners"
INF sd/sd.go:116 pipeline config is disabled 'snmp' (/usr/lib/netdata/conf.d/go.d/sd/snmp.conf) component="service discovery"
INF netlistensd/netlisteners.go:103 instance is started component="service discovery" discoverer=net_listeners
DBG netlistensd/netlisteners.go:104 used config: interval: 2m0s, timeout: 5s, cache expiration time: 10m0s component="service discovery" discoverer=net_listeners
DBG ndexec/ndexec.go:72 executing: /usr/sbin/nd-run /usr/libexec/netdata/plugins.d/local-listeners no-udp6 no-local no-inbound no-outbound no-namespaces
^CINF agent/agent.go:170 received interrupt signal (2). Terminating... component=agent
INF netlistensd/netlisteners.go:105 instance is stopped component="service discovery" discoverer=net_listeners
INF file/discovery.go:70 instance is stopped component=discovery discoverer=file
DBG functions/ext.go:78 unregistering function 'config' with prefix 'go.d:collector:' component="functions manager"
INF functions/manager.go:50 instance is stopped component="functions manager"
DBG functions/ext.go:78 unregistering function 'config' with prefix 'go.d:vnode' component="functions manager"
INF jobmgr/manager.go:98 instance is stopped component="job manager"
INF pipeline/accumulator.go:53 all discoverers exited component="service discovery" pipeline="network listeners"
INF pipeline/pipeline.go:161 instance is stopped component="service discovery" pipeline="network listeners"
INF sd/sd.go:67 instance is stopped component="service discovery"
INF discovery/manager.go:62 instance is stopped component="discovery manager"
INF agent/agent.go:214 instance is stopped component=agent
Expected behavior
nvidia related metrics should be visible
Steps to reproduce
- run latest netdata container
- install nvidia_smi inside the container
- run netdata and check logs
...
Installation method
docker
System info
Linux marvin 6.4.0-150600.23.78-default #1 SMP PREEMPT_DYNAMIC Thu Nov 6 21:50:11 UTC 2025 (80d92ac) x86_64 x86_64 x86_64 GNU/Linux
/etc/os-release:NAME="openSUSE Leap"
/etc/os-release:VERSION="15.6"
/etc/os-release:ID="opensuse-leap"
/etc/os-release:ID_LIKE="suse opensuse"
/etc/os-release:VERSION_ID="15.6"
/etc/os-release:PRETTY_NAME="openSUSE Leap 15.6"
/etc/os-release:ANSI_COLOR="0;32"
/etc/os-release:CPE_NAME="cpe:/o:opensuse:leap:15.6"
/etc/os-release:LOGO="distributor-logo-Leap"Netdata build info
root@marvin:/usr/libexec/netdata/plugins.d# netdata -W buildinfo
time=2025-11-26T11:24:55.315+01:00 comm=netdata source=daemon level=notice errno="2, No such file or directory" tid=4212 msg="CONFIG: cannot load user config '/etc/netdata/stream.conf'. Will try stock config."
Packaging:
Netdata Version ____________________________________________ : v2.8.1
Installation Type __________________________________________ : oci
Package Architecture _______________________________________ : x86_64
Package Distro _____________________________________________ : unknown
Configure Options __________________________________________ : cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_C_STANDARD=11 -DCMAKE_CXX_STANDARD=14 -DBUILD_SHARED_LIBS=OFF -DCMAKE_C_FLAGS='-O2 -funroll-loops -pipe -fexceptions -fstack-protector-strong -D_FORTIFY_SOURCE=3 -fstack-clash-protection -fcf-protection=full -ffunction-sections -fdata-sections -Wno-builtin-macro-redefined -fno-omit-frame-pointer -funwind-tables -fasynchronous-unwind-tables' -DCMAKE_CXX_FLAGS=' -O2 -funroll-loops -pipe -fexceptions -fstack-protector-strong -D_FORTIFY_SOURCE=3 -fstack-clash-protection -fcf-protection=full -ffunction-sections -fdata-sections -Wno-builtin-macro-redefined -fno-omit-frame-pointer -funwind-tables -fasynchronous-unwind-tables' -DCMAKE_COMPILE_DEFINITIONS='_GNU_SOURCE' -DCMAKE_EXE_LINKER_FLAGS='-Wl,--gc-sections -fexceptions -fstack-protector-strong -D_FORTIFY_SOURCE=3 -fstack-clash-protection -fcf-protection=full -ffunction-sections -fdata-sections -Wno-builtin-macro-redefined -rdynamic' -DCMAKE_SHARED_LINKER_FLAGS='-Wl,--gc-sections'
Default Directories:
User Configurations ________________________________________ : /etc/netdata
Stock Configurations _______________________________________ : /usr/lib/netdata/conf.d
Ephemeral Databases (metrics data, metadata) _______________ : /var/cache/netdata
Permanent Databases ________________________________________ : /var/lib/netdata
Plugins ____________________________________________________ : /usr/libexec/netdata/plugins.d
Static Web Files ___________________________________________ : /usr/share/netdata/web
Log Files __________________________________________________ : /var/log/netdata
Lock Files _________________________________________________ : /var/lib/netdata/lock
Home _______________________________________________________ : /var/lib/netdata
Operating System:
Kernel _____________________________________________________ : Linux
Kernel Version _____________________________________________ : 6.4.0-150600.23.78-default
Operating System ___________________________________________ : openSUSE Leap
Operating System ID ________________________________________ : opensuse-leap
Operating System ID Like ___________________________________ : suse opensuse
Operating System Version ___________________________________ : 15.6
Operating System Version ID ________________________________ : 13
Detection __________________________________________________ : /host/etc/os-release
Hardware:
CPU Cores __________________________________________________ : 12
CPU Frequency ______________________________________________ : 1800000000
RAM Bytes __________________________________________________ : 33419157504
Disk Capacity ______________________________________________ : 9001789784064
CPU Architecture ___________________________________________ : x86_64
Virtualization Technology __________________________________ : none
Virtualization Detection ___________________________________ : systemd-detect-virt
Container:
Container __________________________________________________ : podman
Container Detection ________________________________________ : systemd-detect-virt
Container Orchestrator _____________________________________ : none
Container Operating System _________________________________ : Debian GNU/Linux
Container Operating System ID ______________________________ : debian
Container Operating System ID Like _________________________ : unknown
Container Operating System Version _________________________ : 13 (trixie)
Container Operating System Version ID ______________________ : 13
Container Operating System Detection _______________________ : /etc/os-release
Features:
Built For __________________________________________________ : Linux
Netdata Cloud ______________________________________________ : YES
Health (trigger alerts and send notifications) _____________ : YES
Streaming (stream metrics to parent Netdata servers) _______ : YES
Back-filling (of higher database tiers) ____________________ : YES
Replication (fill the gaps of parent Netdata servers) ______ : YES
Streaming and Replication Compression ______________________ : YES (zstd lz4 gzip brotli)
Contexts (index all active and archived metrics) ___________ : YES
Tiering (multiple dbs with different metrics resolution) ___ : YES (5)
Machine Learning ___________________________________________ : YES
Memory Allocator ___________________________________________ : system
Database Engines:
dbengine (compression) _____________________________________ : YES (zstd lz4)
alloc ______________________________________________________ : YES
ram ________________________________________________________ : YES
none _______________________________________________________ : YES
Connectivity Capabilities:
ACLK (Agent-Cloud Link: MQTT over WebSockets over TLS) _____ : YES
static (Netdata internal web server) _______________________ : YES
WebRTC (experimental) ______________________________________ : NO
Native HTTPS (TLS Support) _________________________________ : YES
TLS Host Verification ______________________________________ : YES
Libraries:
LZ4 (extremely fast lossless compression algorithm) ________ : YES
ZSTD (fast, lossless compression algorithm) ________________ : YES
zlib (lossless data-compression library) ___________________ : YES
Brotli (generic-purpose lossless compression algorithm) ____ : YES
protobuf (platform-neutral data serialization protocol) ____ : YES (system)
OpenSSL (cryptography) _____________________________________ : YES
libdatachannel (stand-alone WebRTC data channels) __________ : NO
JSON-C (lightweight JSON manipulation) _____________________ : YES
libcap (Linux capabilities system operations) ______________ : YES
libcrypto (cryptographic functions) ________________________ : YES
libyaml (library for parsing and emitting YAML) ____________ : YES
libmnl (library for working with netfilter) ________________ : YES
stacktraces (library for getting stack traces) _____________ : libbacktrace (mmap, threads, data)
Plugins:
apps (monitor processes) ___________________________________ : YES
cgroups (monitor containers and VMs) _______________________ : YES
cgroup-network (associate interfaces to CGROUPS) ___________ : YES
proc (monitor Linux systems) _______________________________ : YES
tc (monitor Linux network QoS) _____________________________ : YES
diskspace (monitor Linux mount points) _____________________ : YES
freebsd (monitor FreeBSD systems) __________________________ : NO
macos (monitor MacOS systems) ______________________________ : NO
windows (monitor Windows systems) __________________________ : NO
statsd (collect custom application metrics) ________________ : YES
timex (check system clock synchronization) _________________ : YES
idlejitter (check system latency and jitter) _______________ : YES
bash (support shell data collection jobs - charts.d) _______ : YES
debugfs (kernel debugging metrics) _________________________ : YES
cups (monitor printers and print jobs) _____________________ : NO
ebpf (monitor system calls) ________________________________ : NO
freeipmi (monitor enterprise server H/W) ___________________ : YES
network-viewer (monitor TCP/UDP IPv4/6 sockets) ____________ : YES
systemd-journal (monitor journal logs) _____________________ : YES
windows-events (monitor Windows events) ____________________ : NO
nfacct (gather netfilter accounting) _______________________ : NO
perf (collect kernel performance events) ___________________ : YES
slabinfo (monitor kernel object caching) ___________________ : YES
Xen ________________________________________________________ : NO
Xen VBD Error Tracking _____________________________________ : NO
Exporters:
AWS Kinesis ________________________________________________ : NO
GCP PubSub _________________________________________________ : NO
MongoDB ____________________________________________________ : YES
Prometheus (OpenMetrics) Exporter __________________________ : YES
Prometheus Remote Write ____________________________________ : YES
Graphite ___________________________________________________ : YES
Graphite HTTP / HTTPS ______________________________________ : YES
JSON _______________________________________________________ : YES
JSON HTTP / HTTPS __________________________________________ : YES
OpenTSDB ___________________________________________________ : YES
OpenTSDB HTTP / HTTPS ______________________________________ : YES
All Metrics API ____________________________________________ : YES
Shell (use metrics in shell scripts) _______________________ : YES
Debug/Developer Features:
Trace All Netdata Allocations (with charts) ________________ : NO
Developer Mode (more runtime checks, slower) _______________ : NO
Runtime Information:
Profile ____________________________________________________ : standalone
Stream Parent (accept data from Children) __________________ : NO
Stream Child (send data to a Parent) _______________________ : NO
Total System Memory ________________________________________ : 33419157504
Available System Memory ____________________________________ : 19062743040Additional info
No response
Metadata
Metadata
Assignees
Labels
area/packagingPackaging and operating systems supportPackaging and operating systems supportbugcollectors/go.d