Skip to content

CUTLASS does not build with extended MMA shape kernels in PyTorch #133695

@Skylion007

Description

@Skylion007

🐛 Describe the bug

As far as I can tell, we never use the CMakeLists.txt in the third_party/cutlass which means we never enable the special MMA shape kernels in CUTLASS which increase performance on H100.

We should make sure to compile all these shape kernels as it's a onetime cost and can help any CUTLASS matmuls used by other ops like SPDA and Flash Attention

Versions

master as of 8/16/2024

cc @malfet @seemethere

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: buildBuild system issuestriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions