Skip to content

Conversation

@gongshaotian
Copy link
Collaborator

@gongshaotian gongshaotian commented Sep 1, 2025

Summary

CUDAGraph Support Speculate Decode. Currently, only N-Gram and MTP speculative decoding algorithms are supported.

  • N-Gram: The maximum capture size supported is 256
  • MTP: The maximum capture size supported is 512

@paddle-bot
Copy link

paddle-bot bot commented Sep 1, 2025

Thanks for your contribution!

littledgg and others added 2 commits September 16, 2025 21:28
Enable Target Model Padding And Draft Model in cudagraph
@gongshaotian gongshaotian self-assigned this Sep 24, 2025
__shared__ float md_smem[bdy * 2];
for (int qid = blockIdx.x; qid < token_num; qid += gridDim.x) {
const uint32_t bid = batch_id_per_token[qid];
if(bid == -1){
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

注意下编码规范

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

注意下编码规范

这里能把 bid 从 uint32_t 切换成 int 吗?取值范围变小了有无风险?

const int num_chunks_this_seq = div_up(seq_len_kv, chunk_size);
if (num_chunks_this_seq <= 1) {
continue;
}else if (!ENABLE_PREFILL){
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上

Copy link
Collaborator

@yuanlehome yuanlehome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@Deleter-D Deleter-D left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Jiang-Jia-Jun Jiang-Jia-Jun merged commit aa27b03 into PaddlePaddle:develop Oct 9, 2025
23 of 30 checks passed
@gongshaotian gongshaotian deleted the mtp branch November 3, 2025 06:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants