Skip to content

Conversation

@RichardWooSJTU
Copy link
Collaborator

In the attention class, the self.rank parameter is utilized to initialize the metadata for key-value (KV) signals. However, the original implementation inadvertently caused inconsistencies in the KV signal handling. This pull request resolves the issue by incorporating EP configuration details to ensure proper initialization and alignment of KV signals.

@paddle-bot
Copy link

paddle-bot bot commented Jul 8, 2025

Thanks for your contribution!

Comment on lines 109 to +116
if fd_config.parallel_config.expert_parallel_rank is None:
fd_config.parallel_config.expert_parallel_rank = 0
device_id = self.rank + fd_config.parallel_config.tensor_parallel_degree * \
fd_config.parallel_config.expert_parallel_rank
if self.device_id is None:
self.device_id = device_id
self.device_id = self.rank
else:
self.device_id = self.device_id.split(",")[device_id]
device_ids = self.device_id.split(",")
rank_index = self.rank % len(device_ids)
self.device_id = self.device_id[rank_index]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这一块代码以及94行代码要不封装成一个utils函数?不然每一个attention_backend都要复写一遍

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的 下个PR封装一下

@RichardWooSJTU RichardWooSJTU merged commit e8bbe72 into PaddlePaddle:develop Jul 8, 2025
3 of 5 checks passed
RichardWooSJTU added a commit that referenced this pull request Jul 8, 2025
EmmonsCurse pushed a commit that referenced this pull request Jul 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants