NVIDIA Nsight Systems
NVIDIA Nsight™ Systems is a system-wide performance analysis tool designed to visualize an application’s algorithms, identify the largest opportunities to optimize, and tune to scale efficiently across any quantity or size of CPUs and GPUs, from large servers to our smallest systems-on-a-chip (SoCs).
Nsight Systems 2026.3.1 is available now.

Nsight Systems visualizes system workload metrics on a timeline and provides tools that help developers detect, understand, and solve performance issues.
Profile the System
The full picture of app optimization requires drilling deeply into hardware interactions to ensure maximum parallelism is achieved. Nsight Systems visualizes unbiased, system-wide activity data on a unified timeline, allowing application developers to investigate correlations, dependencies, activity, bottlenecks, and resource allocation to ensure hardware components are working harmoniously.
Analyze Performance
Nsight Systems offers low-overheard performance analysis that visualizes otherwise hidden layers of events and metrics used for pursuing optimizations, including CPU parallelization and core utilization, GPU streaming-multiprocessor (SM) optimization, system workload and CUDA® libraries trace, network communications, OS interactions, and more.
Scale Across Platforms
Nsight Systems is the universal tool for developing applications on NVIDIA platforms, whether on-premises or in the cloud. Scale across a wide range of NVIDIA platforms, from NVIDIA DGX™ to NVIDIA RTX™ workstations, including NVIDIA DRIVE® for automotive and NVIDIA Jetson™ for edge AI and robotics. Nsight Systems provide valuable insights for optimizing AI, high-performance computing (HPC), pro-visualization and gaming applications.
Explore Key Features
Trace CPU and GPU Workloads
Nsight Systems latches on to target applications with low-overhead to expose CPU and GPU activity in timeline, correlating events to remedy performance blockers. For compute tasks, it supports investigating CUDA, cuBLAS, cuDNN, and NVIDIA TensorRT™. For graphics, it profiles Vulkan, OpenGL, DirectX 11, DirectX 12, DXR, and NVIDIA OptiX™ APIs.
CPU activity (top) in parallel to GPU graphics and compute activity (bottom).
The GPU Metrics section of the Nsight Systems timeline.Track GPU Activity
To further explore the GPU, toggling on GPU Metrics Sampling will plot low-level input/output (IO) activity such as PCIe throughput, NVIDIA NVLink®, and dynamic random-access memory (DRAM) activity. GPU Metrics Sampling also exposes SM utilization, Tensor Core activity, instruction throughput, and warp occupancy. Every workload and their CPU origin can be readily tracked to support performance tuning.
Accelerate Multi-Node Performance
Nsight Systems supports multi-node profiling to resolve performance limiters on the scale of data centers and clusters. Multi-node analysis automatically diagnoses performance limiters across many nodes simultaneously. Additionally, network metrics alongside Python backtrace sampling paint a complete picture across GPUs, CPUs, DPUs, and internode communication.
Optimize Python for AI and Deep Learning
Nsight Systems helps you write Python applications that maximize GPU utilization. Backtraces and automatic call stack sampling allows you to fine-tune performance for deep learning applications.
Furthermore, integration with Jupyter Lab allows you to profile Python and other supported languages directly in Jupyter, including detailed analysis with the full Nsight Systems GUI.
Bring your own data to Nsight Systems via Plugins
Nsight Systems supports standalone executables and injected shared libraries as plugins. Alternatively, a plugin may simply specify a set of environment variables that are passed to the profiled application overriding any inherited ones. Standalone plugins can be profiled along with the main application or without one in a system-wide profiling. In-process plugins are loaded into a profiled process and initialized. The NVTX events emitted by a plugin are displayed in the same timeline as the main application events. Additionally, any stdout and stderr streams are captured