Skip to content

Conversation

@lizexu123
Copy link
Collaborator

@lizexu123 lizexu123 commented Jul 16, 2025

功能描述
我们参考了flashinfer的实现(感谢),实现了min_p_from_prob,支持min_p以张量的形式传入,既支持gpu kernel的形式,也支持paddle散op的形式
使用方式
服务方式请求:

response = client.chat.completions.create(
    model="default",
    messages=[
        {"role": "user", "content": "北京天安门在哪里?"},
    ],
    temperature=0.1,
    metadata={"min_p":0.1},
    stream=False,
)

print(response.choices[0].message.content)
print("\n")

离线方式:

from fastdeploy.engine.sampling_params import SamplingParams
from fastdeploy.entrypoints.llm import LLM

model_name_or_path = "Qwen/Qwen3-0.6B"

sampling_params = SamplingParams(temperature=1.0,min_p=0.1)
llm = LLM(model=model_name_or_path, tensor_parallel_size=1,reasoning_parser="qwen3")
prompt = "北京天安门在哪里?"
messages = [{"role": "user", "content": prompt}]
output = llm.chat([messages],
                   sampling_params)

print(output)

@paddle-bot
Copy link

paddle-bot bot commented Jul 16, 2025

Thanks for your contribution!

sampled_token_ids=next_tokens,
logprobs_tensors=logprobs_tensors,
)
self.step+=1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete it

"""
min_p_sampling
"""
if paddle.count_nonzero(min_p_arr)==0:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pre-commit all files

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

咱们的pre-commit失效了吗

Copy link
Collaborator

@qingqing01 qingqing01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

单测需要和小算子组合比对正确性

# limitations under the License.


import matplotlib.pyplot as plt
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

单测不比依赖matplotlib

@lizexu123 lizexu123 force-pushed the min_p_1 branch 2 times, most recently from 644ac9d to 13d4cdd Compare July 18, 2025 09:56
@yuanlehome yuanlehome merged commit 67990e0 into PaddlePaddle:develop Jul 21, 2025
4 of 5 checks passed
Deleter-D pushed a commit to Deleter-D/FastDeploy that referenced this pull request Jul 22, 2025
* Fastdeploy support min_p

* add test_min_p

* fix

* min_p_sampling

* update

* delete vl_gpu_model_runner.py

* fix

* Align usage of min_p with vLLM

* fix

* modified unit test

* fix test_min_sampling

* pre-commit all files

* fix

* fix

* fix

* fix xpu_model_runner.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants