Skip to content

# cuDNN Frontend v1.21.0 Release Notes#213

Merged
Anerudhan merged 1 commit into
mainfrom
1.21.0-rc
Mar 25, 2026
Merged

# cuDNN Frontend v1.21.0 Release Notes#213
Anerudhan merged 1 commit into
mainfrom
1.21.0-rc

Conversation

@Anerudhan
Copy link
Copy Markdown
Collaborator

@Anerudhan Anerudhan commented Mar 25, 2026

cuDNN Frontend v1.21.0 is the recommended version for cuDNN 9.20.0 and later releases.

General Improvements 🚀

  • Dropped dependency on the CUDA driver API for the frontend library, enabling builds without direct CUDA driver linkage.

Open-Source Kernels

Added new kernels for the GEMM fusions.

Grouped GEMM + GLU: Unified grouped GEMM GLU API supporting dense and discrete MoE weight layouts with optional bias.
Grouped GEMM + dGLU: Unified grouped GEMM dGLU backward API supporting dense and discrete MoE weight layouts with optional bias.
Discrete Grouped GEMM + SwiGLU: Per-expert-pointer SwiGLU grouped GEMM for MoE workloads without weight packing.
Discrete Grouped GEMM + dSwiGLU: Per-expert-pointer dSwiGLU backward grouped GEMM for MoE workloads without weight packing. Uses dSwiGLU/dGeGLU backward epilogue.
Grouped GEMM + dSwiglu: dSwiglu activation fused with Grouped GEMM
Grouped GEMM + Quant: Grouped GEMM with output quantization for MoE FC2/dFC1 workloads.

cuDNN Frontend v1.21.0 is the recommended version for [cuDNN 9.20.0](https://docs.nvidia.com/deeplearning/cudnn/backend/latest/release-notes.html#cudnn-9-20-0) and later releases.

## General Improvements 🚀

- Dropped dependency on the CUDA driver API for the frontend library, enabling builds without direct CUDA driver linkage.

## Open-Source Kernels

Added new kernels for the GEMM fusions.

**[Grouped GEMM + GLU](https://github.com/NVIDIA/cudnn-frontend/tree/main/python/cudnn/discrete_grouped_gemm/grouped_gemm_glu):** Unified grouped GEMM GLU API supporting dense and discrete MoE weight layouts with optional bias.
**[Grouped GEMM + dGLU](https://github.com/NVIDIA/cudnn-frontend/tree/main/python/cudnn/discrete_grouped_gemm/grouped_gemm_dglu):** Unified grouped GEMM dGLU backward API supporting dense and discrete MoE weight layouts with optional bias.
**[Discrete Grouped GEMM + SwiGLU](https://github.com/NVIDIA/cudnn-frontend/tree/main/python/cudnn/discrete_grouped_gemm/discrete_grouped_gemm_swiglu):** Per-expert-pointer SwiGLU grouped GEMM for MoE workloads without weight packing.
**[Discrete Grouped GEMM + dSwiGLU](https://github.com/NVIDIA/cudnn-frontend/tree/main/python/cudnn/discrete_grouped_gemm/discrete_grouped_gemm_dswiglu):** Per-expert-pointer dSwiGLU backward grouped GEMM for MoE workloads without weight packing. Uses dSwiGLU/dGeGLU backward epilogue.
**[Grouped GEMM + dSwiglu](https://github.com/NVIDIA/cudnn-frontend/tree/main/python/cudnn/gemm_dswiglu):** dSwiglu activation fused with Grouped GEMM
**[Grouped GEMM + Quant](https://github.com/NVIDIA/cudnn-frontend/tree/main/python/cudnn/grouped_gemm/grouped_gemm_quant):** Grouped GEMM with output quantization for MoE FC2/dFC1 workloads
@Anerudhan Anerudhan merged commit 7b9b711 into main Mar 25, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant