Skip to content

Conversation

@guocuimi
Copy link
Collaborator

@guocuimi guocuimi commented Jul 10, 2025

TODOs:

  • warp-specialization kernel with 2 warpgroups: one for mma and one for copy async
  • load kv cache with tma: 2 warpgroups: 1 warpgroup for mma and 1 warpgroup for tma+copy async
  • pingpong-scheduling to overlap softmax and mma: 2 warpgroups for mma and 1 warpgroup for tma+copy async
  • TMA Store + STMTX for epilogue
  • Overlap epilogue with next attn

@guocuimi guocuimi changed the title feat: added fmha using tma, ws, pingpong for sm120 [WIP] feat: added fmha using tma, ws, pingpong for sm120 Jul 10, 2025
@guocuimi guocuimi changed the title [WIP] feat: added fmha using tma, ws, pingpong for sm120 [WIP] feat: added sm120 fmha using tma, warp-specialization and pingpong-scheduling Jul 10, 2025
@guocuimi guocuimi changed the title [WIP] feat: added sm120 fmha using tma, warp-specialization and pingpong-scheduling feat: [1/n] added sm120 fmha using collective copy async Jul 17, 2025
@guocuimi guocuimi changed the title feat: [1/n] added sm120 fmha using collective copy async feat: [1/n] added sm120 fmha using collective async copy Jul 17, 2025
@guocuimi guocuimi merged commit 53d87c1 into main Jul 17, 2025
3 checks passed
@guocuimi guocuimi deleted the attn_sm120 branch July 17, 2025 02:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants