Skip to content

Conversation

@liyonghua0910
Copy link
Collaborator

@liyonghua0910 liyonghua0910 commented Aug 26, 2025

问题描述

Qwen-2-7b-Instruct 模型部署,请求设置 top_p=0,连续发送 2 次相同请求,输出结果存在差异。

产生原因

Diff 主要来源于 apply_penalty_multi_scores 步骤,两次请求的输入仅 sampling_metadata.pre_token_ids 存在差异。

image

其中,

  • 第一条请求的 pre_token_ids=[198, 39814, -1, -1, -1, -1, -1, -1, -1, -1, ...
  • 第二条请求的 pre_token_ids=[198, 39814, 11, 1588, 525, 2326, 5837, 323, 862, 92999, 1447, 16, 13, ...
    可以看到,pre_token_ids 在第二条请求推理时没有重置为 -1。

查看 custom_ops/gpu_ops/token_penalty_multi_scores.cu 代码,并没有用 cur_len 去 mask 掉后面的无效值,而是依赖 pre_ids[cur_len: ] 被预先置为负数(如 -1),才能保证计算正确性。

image

而 V1 Scheduler 也没有在请求 prefill 时重置 pre_token_ids 为 -1 的逻辑,该逻辑在 V0 是有的。

解决方法

在 insert_tasks_v1 方法中添加初始化 pre_token_ids 的逻辑。

@paddle-bot
Copy link

paddle-bot bot commented Aug 26, 2025

Thanks for your contribution!

@paddle-bot paddle-bot bot added the contributor External developers label Aug 26, 2025
@codecov-commenter
Copy link

codecov-commenter commented Aug 27, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@e645db3). Learn more about missing BASE report.

Additional details and impacted files
@@            Coverage Diff            @@
##             develop   #3634   +/-   ##
=========================================
  Coverage           ?   0.00%           
=========================================
  Files              ?       3           
  Lines              ?       3           
  Branches           ?       0           
=========================================
  Hits               ?       0           
  Misses             ?       3           
  Partials           ?       0           
Flag Coverage Δ
diff 0.00% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Jiang-Jia-Jun Jiang-Jia-Jun merged commit b2afdf4 into PaddlePaddle:develop Aug 27, 2025
13 of 18 checks passed
liyonghua0910 added a commit to liyonghua0910/FastDeploy that referenced this pull request Aug 27, 2025
* [fix] qwen output inconsistency when top_p=0

* [fix] remove decode pre_id code
Jiang-Jia-Jun pushed a commit that referenced this pull request Aug 28, 2025
* [fix] qwen output inconsistency when top_p=0

* [fix] remove decode pre_id code
handsomecoderyang pushed a commit to handsomecoderyang/FastDeploy that referenced this pull request Aug 28, 2025
* [fix] qwen output inconsistency when top_p=0

* [fix] remove decode pre_id code
@liyonghua0910 liyonghua0910 deleted the develop_fix-qwen-topp branch September 17, 2025 02:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants