Skip to content

Conversation

@gongshaotian
Copy link
Collaborator

@gongshaotian gongshaotian commented Jul 23, 2025

Summary

GetBlockShapeAndSplitKVBlock is an operator in pre-processing that calculates the input of certain types of AttentionBackend. In previous implementations, these inputs were not directly managed by the model runner。The root cause is the unreasonable boundary division between ForwardMeta and AttentionMetaData, which resulted in two issues:

  1. It is easy to ignore these additional model inputs when implementing other functions, leading to bugs
  2. This operator does not use buffers properly, resulting in frequent Memcpy required to implement CudaGraph functionality, which leads to low performance

Current PR only addresses the second issue. The scope of modification includes:

  1. GetBlockShapeAndSplitKVBlock kernel
  2. Attention Backend Zoo: AppendAttention, MLA Attention, Flash Attention3, BlockMultiHead Attention
  3. ModelRunner Zoo: GPU, GCU, iluvatar ModelRunner And MTP

Contrast

0555b476f726c4b8c4ef29365d93f158

When the Batch Size of Erine-21B Model is 64, in some cases a single step can be reduced by 2ms. The larger the Batch Size, the higher the acceleration ratio.

d00d76a1f1a5c21f82534d0ad5a1fe69

@paddle-bot
Copy link

paddle-bot bot commented Jul 23, 2025

Thanks for your contribution!

@gongshaotian gongshaotian reopened this Jul 28, 2025
@gongshaotian gongshaotian force-pushed the refactor_get_block_op branch 2 times, most recently from d589fb7 to 7226338 Compare July 28, 2025 16:29
@gongshaotian gongshaotian marked this pull request as ready for review July 28, 2025 16:35
Comment on lines +86 to +87
encoder_block_shape_q: int = -1,
decoder_block_shape_q: int = -1,
Copy link
Collaborator Author

@gongshaotian gongshaotian Jul 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这俩参数不是所有 backend 共有的,是不是放 init metadata 里比较好?

@gongshaotian gongshaotian changed the title [Executor] Reset decoder_block_shape_q buffer [Executor] Refactor GetBlockShapeAndSplitKVBlock Kernel Jul 29, 2025
@gongshaotian gongshaotian changed the title [Executor] Refactor GetBlockShapeAndSplitKVBlock Kernel [WIP] Refactor GetBlockShapeAndSplitKVBlock Kernel Jul 29, 2025
@gongshaotian gongshaotian changed the title [WIP] Refactor GetBlockShapeAndSplitKVBlock Kernel [Executor] Refactor GetBlockShapeAndSplitKVBlock Kernel Jul 30, 2025
Comment on lines +194 to +196
encoder_block_shape_q = 64
decoder_block_shape_q = 16
Copy link
Collaborator Author

@gongshaotian gongshaotian Jul 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这两个参数不方便传进来,有什么好的方法吗?@freeliuzc

})
.Outputs({
paddle::Optional("encoder_batch_ids"),
paddle::Optional("encoder_tile_ids_per_batch"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的Optional如果对性能不大的话,就去掉返回空tensor呢

Copy link
Collaborator Author

@gongshaotian gongshaotian Jul 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的Optional如果对性能不大的话,就去掉返回空tensor呢

encoder的这三个tensor 在纯解码阶段返回的是 shape 为 0 的 tensor,Mix 或纯Prefill 阶段返回非空Tensor是符合预期的。是想也固定下Return的shape吗?

@gongshaotian gongshaotian force-pushed the refactor_get_block_op branch from f076e56 to 4ffa462 Compare July 30, 2025 14:54
@gongshaotian gongshaotian merged commit d850660 into PaddlePaddle:develop Jul 30, 2025
9 of 12 checks passed
# MLA
metadata.max_enc_len_this_time = metadata.set_max_lengths[1]
metadata.max_dec_len_this_time = metadata.set_max_lengths[2]
forward_meta.max_enc_len_this_time = metadata.set_max_lengths[1]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deepseek 组网那会使用 forward_meta.max_dec_len_this_time 来判断prefill和decode. 这里直接删除后,组网那里会有问题

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants