This repository provides a JSON file for a detailed NVIDIA uptime and performance monitoring dashboard in OpenObserve. Importing this dashboard gives you a real-time overview of NVIDIA GPU health, node performance, and service reliability to ensure optimized LLM operations.
The JSON file includes panels that track key metrics such as:
- Overview of GPU metrics
- Core performancer metrics
- Clock and Temperature utilization details
- Memory & Power consumption details
