-
Notifications
You must be signed in to change notification settings - Fork 5.8k
Closed
Labels
area/prometheusbugunexpected problem or unintended behaviorunexpected problem or unintended behavior
Description
Relevant telegraf.conf:
[global_tags]
[agent]
interval = "10s"
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = "0s"
flush_interval = "10s"
flush_jitter = "0s"
precision = ""
hostname = ""
omit_hostname = false
[[outputs.influxdb]]
[[inputs.cpu]]
percpu = true
totalcpu = true
collect_cpu_time = false
report_active = false
[[inputs.disk]]
ignore_fs = ["tmpfs", "devtmpfs", "devfs", "iso9660", "overlay", "aufs", "squashfs"]
[[inputs.diskio]]
[[inputs.kernel]]
[[inputs.mem]]
[[inputs.processes]]
[[inputs.swap]]
[[inputs.system]]
[[inputs.intel_powerstat]]
cpu_metrics = ["cpu_frequency", "cpu_temperature"]
[[inputs.net]]
[[inputs.nvidia_smi]]System info:
Telegraf 1.18.3 (git: HEAD 6a94f65)
Fedora 34
Kernel 5.12.8
Steps to reproduce:
- Enable
[[input.intel_powerstat]]with `cpu_metrics = ["cpu_temperature"] - Restart telegraf
- Watch
journalctl -u telegraf -f
Expected behavior:
Hate to say it, but silent failure would be better. See additional info section.
Actual behavior:
Everytime metrics are collected, intel_powerstats reports an error and fails to read cpu_temperature
[inputs.intel_powerstat] error fetching rapl data for socket 0, err: error opening socket energy_uj file on path /sys/devices/virtual/powercap/intel-rapl/intel-rapl:0/energy_uj, err: open /sys/devices/virtual/powercap/intel-rapl/intel-rapl:0/energy_uj: permission denied
Additional info:
File is read-only by root
$ ls -l /sys/devices/virtual/powercap/intel-rapl/intel-rapl:0/energy_uj
-r--------. 1 root root 4096 Jun 1 21:03 /sys/devices/virtual/powercap/intel-rapl/intel-rapl:0/energy_uj
Similar issue for prometheus here
Related to this kernel change and Intel CVE-2020-8695
For security reasons, energy_uj is now readable only by root, and is likely to remain so. Telegraf cannot read this file and it would be nice if it failed without putting an error message in the logs every ten seconds.
In the long term, as more kernels update, a workaround to reading energy_uj will be needed.
Metadata
Metadata
Assignees
Labels
area/prometheusbugunexpected problem or unintended behaviorunexpected problem or unintended behavior