Skip to content

Conversation

@RichardWooSJTU
Copy link
Collaborator

This PR introduces two key improvements to the ep.py DeepEP engine implementation:

  1. Unified Buffer Initialization in Mixed Mode
  • Modified the DeepEP engine organization to initialize a single deepep.Buffer instance for both prefill and decode phases in mixed mode operation.
  • The change affects all workflows utilizing mixed-mode execution.
  1. Explicit Buffer Cleaning for MoE Workspaces
  • Added mandatory calls to clean_low_latency_buffer() at the conclusion of each Mixture-of-Experts (MoE) processing stage.
  • This ensures proper workspace cleanup between prefill/decode transitions, preventing potential memory corruption from stale tensor references.
  • The modification is critical for maintaining stability in long-running inference sessions with variable sequence lengths.

@paddle-bot
Copy link

paddle-bot bot commented Aug 4, 2025

Thanks for your contribution!

@Jiang-Jia-Jun Jiang-Jia-Jun merged commit f5c64a0 into PaddlePaddle:develop Aug 5, 2025
16 of 21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants