|
Accelerated vs non-accelerated HSB latency
|
|
0
|
19
|
June 4, 2026
|
|
DGX Spark Performance Degradation - GPU Power Draw Issue
|
|
67
|
3808
|
June 4, 2026
|
|
Qwen3.5-122B-A10B on single Spark: up to 51 tok/s (v2.1 — patches + quick-start + benchmark)
|
|
412
|
18948
|
June 2, 2026
|
|
Just another ASUS GX10 NCCL all_gather_perf thread... mpirun... please read if you have an ASUS model multinode setup
|
|
4
|
551
|
May 23, 2026
|
|
Slow wi-fi on orin nano devkit (RTL8822CE 802.11ac)
|
|
11
|
124
|
June 3, 2026
|
|
Lossless 7.67× LoRA / 8.35× Full FT speedup for Qwen3.5 on DGX Spark (GB10, sm_121a)
|
|
3
|
393
|
May 20, 2026
|
|
T5000 - Sample to use full performance
|
|
4
|
112
|
May 11, 2026
|
|
Dual DGX Spark: NCCL capped at 2.80 GB/s + ib_write_bw crashes at 128KB syndrom 0x88 — matches thread 366266 with additional RoCE degradation
|
|
2
|
179
|
April 20, 2026
|
|
GPU needs to be "warmed up" to achieve maximum performance
|
|
22
|
758
|
May 5, 2026
|
|
NeuralForge GPU Native Knowledge Intelligence Platform Built on DGX Spark GB10
|
|
4
|
324
|
April 17, 2026
|
|
NCCL bandwidth capped at 3 GB/s, GPU PCIe topology reports Gen1 x1 on DGX Spark FE
|
|
6
|
354
|
April 14, 2026
|
|
Qwen3.5 Flash Attention performance inconsistencies
|
|
1
|
422
|
April 5, 2026
|
|
Latest Update (20Mar 2026) on Nvidia Spark FE caps GPU performance
|
|
9
|
640
|
April 3, 2026
|
|
VK_EXT_descriptor_heap: Uniform buffer loads use global memory loads instead of constant loads
|
|
0
|
160
|
March 14, 2026
|
|
How to tell core utilization when running headless?
|
|
5
|
152
|
March 28, 2026
|
|
Degraded performance on L4T 32.X vs L4T 35.X
|
|
2
|
38
|
March 2, 2026
|
|
OpenGL texture view performance
|
|
0
|
79
|
February 2, 2026
|
|
High Latency and GPU Contention when running DeepStream (Python) + VSS on DGX Platform
|
|
4
|
237
|
January 23, 2026
|
|
A new GPU-accelerated prime sieve using constant-cost structural elimination to overcome memory bandwidth limits at massive scales
|
|
5
|
208
|
January 21, 2026
|
|
Knema – Frame Continuity Engine
|
|
0
|
26
|
January 16, 2026
|
|
Nvidia Powerd Dynamic boost not increasing the power limit
|
|
2
|
201
|
January 13, 2026
|
|
Help on llama.cpp command line arguments and compilation settings (performance testing included)
|
|
7
|
2914
|
January 9, 2026
|
|
`rte_flow_async_create` takes a huge 1 million cycles on ConnectX-6 Dx DX
|
|
2
|
93
|
January 7, 2026
|
|
How to correlate range profiling metrics with a certain kernel?
|
|
3
|
105
|
January 30, 2026
|
|
cuDNN Bug Report: Conv3d Performance Regression with bfloat16/float16 on H100
|
|
2
|
255
|
December 31, 2025
|
|
What is APP ( Adjusted Peak Performance) of Jetson AGX Orin 64GB?
|
|
2
|
66
|
December 18, 2025
|
|
DGX Spark Image and Video Generation performance?
|
|
2
|
1299
|
December 15, 2025
|
|
Unable to run benchmark on Jetson Orin NX 16GB
|
|
7
|
144
|
December 4, 2025
|
|
Unexpected Performance Behavior with CUDA Software Prefetcher, Warm-Up Kernel and GEMV
|
|
10
|
191
|
December 3, 2025
|
|
Optimizing PTX mma ops on volta to surpass wmma
|
|
2
|
104
|
November 30, 2025
|