Skip to content

Conversation

@ddchenhao66
Copy link
Collaborator

@ddchenhao66 ddchenhao66 commented Oct 22, 2025

xpu支持限制思考长度,使用kernel提升性能

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


ddchenhao66 seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@paddle-bot
Copy link

paddle-bot bot commented Oct 22, 2025

Thanks for your contribution!


}

PD_BUILD_OP(limit_thinking_content_length_v1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里不需要改成 PD_BUILD_STATIC_OP 吗,下同

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

max_think_lens = share_inputs["max_think_lens"]
step_idx = share_inputs["step_idx"]
limit_think_status = share_inputs["limit_think_status"]
print(f"ch66 limit_strategy:{limit_strategy}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete print

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

else:
# Disable thinking
self.share_inputs["max_think_lens"][idx : idx + 1, :] = -1
self.share_inputs["limit_think_status"][idx : idx + 1, :] = 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.share_inputs["limit_think_status"][idx : idx + 1, :] = 0 可以合并一下?

if request.get("enable_thinking", False) and request.get("reasoning_max_tokens", None) is not None:
# Enable thinking
self.share_inputs["max_think_lens"][idx : idx + 1, :] = request.get("reasoning_max_tokens")
self.share_inputs["limit_think_status"][idx : idx + 1, :] = 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.share_inputs["limit_think_status"][idx : idx + 1, :] = 0 可以合并一下?

// 强制将当前token替换为结束思考的token
next_token_lm = line_break_id;
limit_think_status_lm = 2;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里再加一个else{}? 加一些debug信息?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不需要,else的场景不需要操作

hong19860320
hong19860320 previously approved these changes Oct 22, 2025

WRAPPER_DUMP(ctx);
if (ctx->dev().type() == api::kCPU) {
assert(false);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

后面有空补上 CPU wrapper 的实现吧

WRAPPER_DUMP_PARAM2(ctx,line_break_id,bs);
WRAPPER_DUMP(ctx);
if (ctx->dev().type() == api::kCPU) {
assert(false);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上

Copy link
Contributor

@cqulilujia cqulilujia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

顺手绑一下pybind.cc吧

yuanlehome
yuanlehome previously approved these changes Oct 22, 2025
Copy link
Collaborator

@yuanlehome yuanlehome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

DDDivano
DDDivano previously approved these changes Oct 22, 2025
XiaoguangHu01
XiaoguangHu01 previously approved these changes Oct 22, 2025
Copy link

@XiaoguangHu01 XiaoguangHu01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

qingqing01
qingqing01 previously approved these changes Oct 22, 2025
Copy link
Collaborator

@hong19860320 hong19860320 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@EmmonsCurse EmmonsCurse merged commit 5443b2c into PaddlePaddle:develop Oct 23, 2025
28 of 39 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.