Skip to content

Conversation

@Wanglongzhi2001
Copy link
Collaborator

@Wanglongzhi2001 Wanglongzhi2001 commented Aug 1, 2025

Fix cudagraph when use ep.

  • Dataset: 200 prompts

  • model: ERNIE-4_5-300B-A47B-FP8-Paddle

  • Version of paddle: 3.0.1

  • max_model_len:32768

  • content of pr
    When max_model_len is large such as 32k, the full_length of _dummy_prefill_inputs function will be 4k,and the received token num of EP will be very large such as 19w, then the buffer size of DeepEP maybe not enough and then cause nan problem. So I reduce the full length when use EP.

  • comparison result

image

@paddle-bot
Copy link

paddle-bot bot commented Aug 1, 2025

Thanks for your contribution!

gongshaotian
gongshaotian previously approved these changes Aug 1, 2025
Copy link
Collaborator

@gongshaotian gongshaotian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@RichardWooSJTU RichardWooSJTU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@RichardWooSJTU RichardWooSJTU merged commit 01d7586 into PaddlePaddle:develop Aug 4, 2025
16 of 21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants