|
Verifying claimed TOPS performance on Jetson Thor – CUTLASS kernel for SM110 does not run, SM80 gives very low performance (~6.9 TFLOP/s)
|
|
11
|
164
|
December 12, 2025
|
|
NvRmMemInitNvmap failed / NVMAP permission denied when launching nvcr.io/nvidia/vllm:25.11-py3 container on Jetson Orin NX + JetPack 6.2 (L4T 36.4.3)
|
|
1
|
23
|
December 11, 2025
|
|
Conditions on NVJet kernels on Jetson Thor
|
|
13
|
126
|
December 11, 2025
|
|
Running a repo using jax on Jetson Orin AGX 64GB GPU
|
|
1
|
34
|
December 5, 2025
|
|
Example code of Outer Vector Scaling for FP8 data types
|
|
0
|
14
|
December 1, 2025
|
|
Pointers align requirement for api:cublasGemmBatchedEx
|
|
1
|
18
|
November 26, 2025
|
|
Accessing kernel call stack
|
|
10
|
46
|
December 9, 2025
|
|
Nsys profile not showing any GPU data
|
|
0
|
36
|
November 22, 2025
|
|
cuSPARSELt: Strict Output Layout Constraints for Optimal Performance in Sparse-Dense GEMM
|
|
2
|
60
|
November 21, 2025
|
|
CMake Linking Issues
|
|
1
|
63
|
November 20, 2025
|
|
cuBLAS failing on Jetpack 6.2 + dGPU
|
|
5
|
62
|
November 19, 2025
|
|
Unable to work with TensorFlow in Docker on DGX Spark
|
|
5
|
208
|
November 14, 2025
|
|
Possible duplicate entries in `cuda_kern_exec_trace` report
|
|
12
|
106
|
November 21, 2025
|
|
Static CUDA Build with Opencv
|
|
5
|
58
|
November 6, 2025
|
|
Performance Benchmarking on Jetson Thor
|
|
7
|
291
|
December 2, 2025
|
|
Converting Grounding DINO TAO on x86 causes issues
|
|
5
|
73
|
November 12, 2025
|
|
Julia CUDA on DGX Spark
|
|
2
|
87
|
October 27, 2025
|
|
Switch from "sm90_xmma_gemm.._cublas"/ "void cutlass::Kernel<cutlass_80_tensorop_.." kernels with CUDA-12.1 to "nvjet_tst..." kernels with CUDA-12.8
|
|
0
|
77
|
October 26, 2025
|
|
Unlocking Tensor Core Performance with Floating Point Emulation in cuBLAS
|
|
1
|
29
|
October 24, 2025
|
|
Windows 11, RTX 3070 error installing Python cudnn packages for tensorflow
|
|
0
|
49
|
October 19, 2025
|
|
Fusion of two GEMM operators using CUDA
|
|
5
|
136
|
October 17, 2025
|
|
Batched cublas matrix inverses with matrices stored in a derived type
|
|
3
|
69
|
October 9, 2025
|
|
Can`t properly install and run isaac sim with isaac lab even to clean VM with ubuntu 24.04
|
|
2
|
171
|
October 28, 2025
|
|
Failed to install and run together isaaclab and isaacsim
|
|
2
|
68
|
September 30, 2025
|
|
Exception Error cublasSgetrsBatched while cublasSgetrfBatched has no issues (cuda12.8)
|
|
0
|
43
|
September 24, 2025
|
|
[reformatBuilder.cpp::writeGlob::165] Error Code 2: Internal Error (Assertion inputScalesLen >= quantizations.inputs[0].scale.count() failed. )
|
|
0
|
40
|
September 18, 2025
|
|
Show me how to headlessly install CUDA toolkit on Windows without installing the driver
|
|
0
|
67
|
September 15, 2025
|
|
Why is cuBLAS cublasDgemm slower than my naive GEMM kernel?
|
|
1
|
59
|
September 15, 2025
|
|
What is GPU doing during the period "smsp cycles idle" = smsp__cycles_elapsed.max - smsp__cycles_active.max?
|
|
2
|
49
|
September 2, 2025
|
|
cublasSgemm crash with multi-thread,multi-context on t4,cublas12.4.2
|
|
0
|
36
|
September 2, 2025
|