Source Code of Cutlass GemmKernel from Basic Gemm

lstephan · April 16, 2025, 7:24pm

Is the implementation of Cutlass’s basic gemm benchmark open sourced? It seems the code leads to a kernel called GemmKernel, but where can I find the actual implementation of the matrix multiply? My understanding is that this kernel’s launch configurations are computed based on problem size. Is it feasible to change its launch configurations, if so where (i.e. when calling GemmKernel)?

QuaternionsRock · April 16, 2025, 8:26pm

cutlass::gemm::device::Gemm::GemmKernel is a type alias that refers to cutlass::gemm::kernel::DefaultGemm. The definition of the kernel, including partial specializations for some architectures, can be found in the latter link.

Topic		Replies	Views
Where does cutlass' detailed GEMM kernel? GPU-Accelerated Libraries cutlass	4	1091	June 16, 2022
Where is cute's gemm code? CUDA Programming and Performance	20	2625	October 13, 2024
Improving GEMM Kernel Auto-Tuning Efficiency on NVIDIA GPUs with Heuristics and CUTLASS 4.2 Technical Blog	2	53	September 15, 2025
Understanding cutlass GEMM hierarchy GPU-Accelerated Libraries cutlass	1	3677	October 14, 2021
Using gcgemm from CuBLAS CUDA Programming and Performance	1	742	March 23, 2020
Question of using cublassgemm() for matrix mulitiplication CUDA Programming and Performance	3	1009	January 28, 2015
CUTLASS: Fast Linear Algebra in CUDA C++ Technical Blog	13	2094	September 9, 2024
Just Released: CUTLASS 3.8 Technical Blog	1	372	February 4, 2025
cublasSgemm - is there a way to choose algorithm GPU-Accelerated Libraries cublas	6	1884	August 15, 2022
Why is cuBLAS cublasDgemm slower than my naive GEMM kernel? GPU-Accelerated Libraries cuda , kernel , cublas , cutlass	1	59	September 15, 2025

Source Code of Cutlass GemmKernel from Basic Gemm

Related topics