Skip to content

Conversation

@guocuimi
Copy link
Collaborator

@guocuimi guocuimi commented Jul 8, 2024

  • support fp8 kv cache (sm89 + sm90)
  • paged kv cache
  • various length
  • various page size
  • support fp8 gemm (sm89 + sm90)
  • support sm90 (TMA + WGMMA)
  • support sm75

@guocuimi guocuimi changed the title kernel: new kernel for attention kernel: upgrade cutlass to 3.5.0 + cuda 12.4 for sm89 fp8 support Jul 8, 2024
@guocuimi guocuimi merged commit bc72645 into main Jul 8, 2024
@guocuimi guocuimi deleted the attention branch July 8, 2024 05:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants