Skip to content

[MatMul] Prolog build out, adding automatic swizzle generator for a few tile sizes#1900

Merged
zasdfgbnm merged 98 commits intorebase-matmul_swizzle_genfrom
matmul_swizzle_gen
Mar 20, 2023
Merged

[MatMul] Prolog build out, adding automatic swizzle generator for a few tile sizes#1900
zasdfgbnm merged 98 commits intorebase-matmul_swizzle_genfrom
matmul_swizzle_gen

Conversation

@shmsong
Copy link
Copy Markdown

@shmsong shmsong commented Aug 10, 2022

This PR tries to build out matmul CTA tiling support, prioritizing Ampere ones at the moment.

Also in this PR is an inline bank conflict check utility that reports bank conflict on shared memory accesses.

Remaining TODO:

  • add asymmetric swizzle operators to support more irregular tile sizes.

shmsong and others added 30 commits July 11, 2022 22:15
@zasdfgbnm zasdfgbnm changed the base branch from devel to rebase-matmul_swizzle_gen March 20, 2023 17:56
@zasdfgbnm zasdfgbnm merged commit c8c8cd7 into rebase-matmul_swizzle_gen Mar 20, 2023
@zasdfgbnm zasdfgbnm deleted the matmul_swizzle_gen branch March 20, 2023 17:57
zasdfgbnm added a commit to NVIDIA/Fuser that referenced this pull request Mar 21, 2023
When working on csarofeen/pytorch#1900, I find
that sometimes expr simplifier will assign the dtype of a value wrongly.
This is because expr simplifier is written loosely assuming the dtypes
of all `Val`s are the same. However, because we are putting pointer
types into the expression, we need to be more careful, otherwise we will
get wrong kernel code like:
```
__half* ptr1 = threadIdx.x * 128;
__half* ptr2 = ptr1 + 256 + T1.data;
```
This PR changes all passes in the expr simplifier to let it infer dtype
from its inputs.

- [x] TODO: update all the failing `assertCUDAKernel` tests😵‍💫😵‍💫😵‍💫😵‍💫
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants