|
Llama.cpp can't work properly with docker. Multi-modal functionality fails with a CUDA internal error
|
|
7
|
142
|
June 5, 2026
|
|
GPU Utilization Bottleneck: Single-Process with 8 Streams vs. Multi-Process with 1 Stream each (DeepStream 7.1 / PyServiceMaker)
|
|
0
|
16
|
June 4, 2026
|
|
Z-Image Turbo NVFP4
|
|
1
|
379
|
May 31, 2026
|
|
pyTorch Installation on Jetson Orin Nano
|
|
1
|
56
|
May 29, 2026
|
|
Request for sm_121-tuned kernels in cuDNN/cuBLAS — DGX Spark training throughput gap
|
|
4
|
222
|
May 23, 2026
|
|
One way to setup Jetson Orin Nano for OpenCV and Yolo with Cuda May 2026
|
|
1
|
100
|
May 22, 2026
|
|
Pinned memory uploads not being asynchronous on RTX 5060 Ti
|
|
7
|
101
|
June 4, 2026
|
|
cuBLAS severe underperformance on cublasSgemm for RTX 3060 Laptop GPU
|
|
1
|
51
|
May 14, 2026
|
|
Nsight system showing more memory than reality
|
|
8
|
77
|
May 13, 2026
|
|
RTX 4070 (AD104) GSP firmware crash (Xid 120 @ pc:0x1a92c96) under sustained CUDA workload — Windows BSOD + Linux GPU reset
|
|
0
|
83
|
May 11, 2026
|
|
Issues generating 64T64R testMAC vectors via cuMAC (thread-block limit & 32-bit integer overflow)
|
|
1
|
72
|
May 11, 2026
|
|
RTX Pro 6000 Backwell Card Crash
|
|
5
|
387
|
May 8, 2026
|
|
cuFFT (libcufft) crashes on H100 in Confidential Computing (CC) mode
|
|
1
|
45
|
April 29, 2026
|
|
cusolverDnXsyevd status 6 + XID 31 MMU fault at n=50000, FP64 real, CUDA 13.2
|
|
2
|
56
|
April 28, 2026
|
|
Parallel cuBLAS distributions - which one is the canonical one?
|
|
0
|
37
|
April 26, 2026
|
|
Machine readable specifications for compute libraries
|
|
0
|
19
|
April 22, 2026
|
|
cublasDx batched gather gemm
|
|
3
|
49
|
April 20, 2026
|
|
cublasSgemmGroupedBatched requires host-side synchronization after preceding TRSM on A5000 (device-side ordering insufficient)
|
|
1
|
30
|
April 19, 2026
|
|
Gemma 4 VLM VRAM/Host Memory Leak — Full Investigation Report
|
|
1
|
448
|
April 10, 2026
|
|
cuBLAS batched FP32 SGEMM dispatcher picks suboptimal kernel on RTX 5090 (sm_120)
|
|
0
|
45
|
April 10, 2026
|
|
CUDA 13.2 DGX Spark impact
|
|
8
|
1460
|
March 29, 2026
|
|
GB10 (SM12.1) vLLM FP8 inference — any progress on native SM12.1 kernels?
|
|
4
|
758
|
March 27, 2026
|
|
PyTorch CUDA Incompatibility on NVIDIA Thor (L4T 38.4, CUDA 13)
|
|
3
|
124
|
March 23, 2026
|
|
Jetson AGX Thor: official PyTorch 25.08 container works for Conv2d and ResNet18, but pip-installed PyTorch 2.12.0.dev+cu128 fails with "no kernel im
|
|
2
|
153
|
March 21, 2026
|
|
Verify ai performance by cutlass_profiler,but it was too slow,why?
|
|
2
|
46
|
March 4, 2026
|
|
Custom FP4 CUDA Kernel - 129 TFLOPS on DGX Spark with Pre-Quantized Weight Cache
|
|
4
|
683
|
February 25, 2026
|
|
OpenACC use_device / OpenMP use_device_ptr / use_device_addr in combination with cuBLAS
|
|
5
|
66
|
February 19, 2026
|
|
Anyone know if CUDA 12.6.2 is coming to JetPack?
|
|
1
|
52
|
February 16, 2026
|
|
Pytorch matmul vs cudaTensorCoreGemm on Jetson Orin NX
|
|
2
|
60
|
February 12, 2026
|
|
Which tool can accurately obtain kernel performance, ncu or nsys?
|
|
2
|
91
|
March 30, 2026
|