Nsight Compute

NVIDIA Nsight Systems

NVIDIA Nsight™ Systems is a system-wide performance analysis tool designed to visualize an application’s algorithms, identify the largest opportunities to optimize, and tune to scale efficiently across any quantity or size of CPUs and GPUs, from large servers to our smallest systems-on-a-chip (SoCs).

Get Started

Nsight Systems 2026.3.1 is available now.

Nsight Systems can make high-performance games with beautiful graphics

Nsight Systems visualizes system workload metrics on a timeline and provides tools that help developers detect, understand, and solve performance issues. 

Profile the System

The full picture of app optimization requires drilling deeply into hardware interactions to ensure maximum parallelism is achieved. Nsight Systems visualizes unbiased, system-wide activity data on a unified timeline, allowing application developers to investigate correlations, dependencies, activity, bottlenecks, and resource allocation to ensure hardware components are working harmoniously. 

Analyze Performance

Nsight Systems offers low-overheard performance analysis that visualizes otherwise hidden layers of events and metrics used for pursuing optimizations, including CPU parallelization and core utilization, GPU streaming-multiprocessor (SM) optimization, system workload and CUDA® libraries trace, network communications, OS interactions, and more.

Scale Across Platforms

Nsight Systems is the universal tool for developing applications on NVIDIA platforms, whether on-premises or in the cloud. Scale across a wide range of NVIDIA platforms, from NVIDIA DGX™ to NVIDIA RTX™ workstations, including NVIDIA DRIVE® for automotive and NVIDIA Jetson™ for edge AI and robotics. Nsight Systems provide valuable insights for optimizing AI, high-performance computing (HPC), pro-visualization and gaming applications. 


Explore Key Features 

Trace CPU and GPU Workloads

Nsight Systems latches on to target applications with low-overhead to expose CPU and GPU activity in timeline, correlating events to remedy performance blockers. For compute tasks, it supports investigating CUDA, cuBLAS, cuDNN, and NVIDIA TensorRT™. For graphics, it profiles Vulkan, OpenGL, DirectX 11, DirectX 12, DXR, and NVIDIA OptiX™ APIs.

Nsight Systems can make high-performance games with beautiful graphics
CPU activity (top) in parallel to GPU graphics and compute activity (bottom).
Nsight Systems tracks GPU activity
The GPU Metrics section of the Nsight Systems timeline.

Track GPU Activity

To further explore the GPU, toggling on GPU Metrics Sampling will plot low-level input/output (IO) activity such as PCIe throughput, NVIDIA NVLink®, and dynamic random-access memory (DRAM) activity. GPU Metrics Sampling also exposes SM utilization, Tensor Core activity, instruction throughput, and warp occupancy. Every workload and their CPU origin can be readily tracked to support performance tuning. 

Accelerate Multi-Node Performance

Nsight Systems supports multi-node profiling to resolve performance limiters on the scale of data centers and clusters. Multi-node analysis automatically diagnoses performance limiters across many nodes simultaneously. Additionally, network metrics alongside Python backtrace sampling paint a complete picture across GPUs, CPUs, DPUs, and internode communication.

Scale AI Applications to the Data Center and Cloud with NVIDIA Nsight Systems GTC Demo Video
Feature Spotlight on Python support in Nsight Developer Tools

Optimize Python for AI and Deep Learning

Nsight Systems helps you write Python applications that maximize GPU utilization. Backtraces and automatic call stack sampling allows you to fine-tune performance for deep learning applications. 

Furthermore, integration with Jupyter Lab allows you to profile Python and other supported languages directly in Jupyter, including detailed analysis with the full Nsight Systems GUI. 

Get the NVIDIA Nsight Tools JupyterLab Extension

Bring your own data to Nsight Systems via Plugins

Nsight Systems supports standalone executables and injected shared libraries as plugins. Alternatively, a plugin may simply specify a set of environment variables that are passed to the profiled application overriding any inherited ones. Standalone plugins can be profiled along with the main application or without one in a system-wide profiling. In-process plugins are loaded into a profiled process and initialized. The NVTX events emitted by a plugin are displayed in the same timeline as the main application events. Additionally, any stdout and stderr streams are captured