I am using a RTX 3090 GPU and trying to examine Tensor Flops of a kernel that uses HMMA.16816.F32 instructions.
Some sources in this forum recommend using sm__ops_* metrics, but my ncu (Version 2025.2.1.0) does not show such metrics when i do —query-metrics. Moreover my roofline analysis plot that prints Flops/sec also seems off.
The metrics you are seeking are in newer versions of Nsight Compute. The Nsight Compute team will have to provide more information on minimal version supporting the metrics. I did not find a call out for these metrics in 1. Release Notes — NsightCompute 13.0 documentation.
Nsight Compute 2025.3.0>ncu --query-metrics --chips ga102 | grep sm__ops
sm__ops_path_tensor_src_bf16_dst_fp32 Counter # of math ops executed in Tensor path with source BF16 and
sm__ops_path_tensor_src_bf16_dst_fp32_sparsity_off Counter # of math ops executed in Tensor path with source BF16 and
sm__ops_path_tensor_src_bf16_dst_fp32_sparsity_on Counter # of math ops executed in Tensor path with source BF16 and
sm__ops_path_tensor_src_fp16_dst_fp16 Counter # of math ops executed in Tensor path with source FP16 and
sm__ops_path_tensor_src_fp16_dst_fp16_sparsity_off Counter # of math ops executed in Tensor path with source FP16 and
sm__ops_path_tensor_src_fp16_dst_fp16_sparsity_on Counter # of math ops executed in Tensor path with source FP16 and
sm__ops_path_tensor_src_fp16_dst_fp32 Counter # of math ops executed in Tensor path with source FP16 and
sm__ops_path_tensor_src_fp16_dst_fp32_sparsity_off Counter # of math ops executed in Tensor path with source FP16 and
sm__ops_path_tensor_src_fp16_dst_fp32_sparsity_on Counter # of math ops executed in Tensor path with source FP16 and
sm__ops_path_tensor_src_fp64 Counter # of math ops executed in Tensor path with source FP64
sm__ops_path_tensor_src_int1 Counter # of math ops executed in Tensor path with source INT1
sm__ops_path_tensor_src_int4 Counter # of math ops executed in Tensor path with source INT4
sm__ops_path_tensor_src_int4_sparsity_off Counter # of math ops executed in Tensor path with source INT4 with sparsity
sm__ops_path_tensor_src_int4_sparsity_on Counter # of math ops executed in Tensor path with source INT4 with sparsity
sm__ops_path_tensor_src_int8 Counter # of math ops executed in Tensor path with source INT8
sm__ops_path_tensor_src_int8_sparsity_off Counter # of math ops executed in Tensor path with source INT8 with sparsity
sm__ops_path_tensor_src_int8_sparsity_on Counter # of math ops executed in Tensor path with source INT8 with sparsity
sm__ops_path_tensor_src_tf32_dst_fp32 Counter # of math ops executed in Tensor path with source TF32 and
sm__ops_path_tensor_src_tf32_dst_fp32_sparsity_off Counter # of math ops executed in Tensor path with source TF32 and
sm__ops_path_tensor_src_tf32_dst_fp32_sparsity_on Counter # of math ops executed in Tensor path with source TF32 and