Skip to content

Conversation

@liyonghua0910
Copy link
Collaborator

@liyonghua0910 liyonghua0910 commented Oct 21, 2025

Motivation

LogitsProcessor (以下简称 LP)在处理链中位于「模型输出 logits → 采样策略」之间,负责对即将生成 token 的 logits 进行任意变换或约束,然后再交给采样器(top-k / top-p / temperature 等)挑选最终 token。其输入是未处理的 logits 值,以及可选的一些推理状态(例如已生成 tokens 的统计信息),输出是处理后的 logits 值。

LP 是有状态的批处理器,集成为采样器的子模块。一个标准的 LP 需要提供 update_stateapply 接口。

  • update_state - 在 execute_model 的开头执行,根据当前推理的 batch 信息更新 LP 内部维护的状态,可以保存一些统计信息、索引、张量等;
  • apply- 在 model.forward 之后、sampling 之前执行,根据当前 LP 状态对 logits 值应用修改。

以上的标准化接口可以支持用户自定义任意 LP,用户需要了解 FD 内部的推理状态是如何维护的,在 update_state 时编写 LP 状态的更新代码,然后在 apply 中编写对 logits 值的修改代码。

Modifications

  • config.py/args_utils.py - 添加 --logits-processors 服务启动参数,并新增配置,通过 fd_config.structured_outputs_config.logits_processors 引用
  • sampling_params.py/protocol.py - 添加 logits_processors_args 请求参数
  • sample/meta_data.py - 将 logits_processors 以 sampling_metadata 属性的形式传入 sampler
  • fastdeploy/model_executor/logits_processor/
    • __init__.py - 提供 LP 的实例化函数,以及 logits_processor 模块的命名空间
    • base.py - 提供 LP 的抽象类接口,自定义 LP 需继承自该类
    • builtin.py - 内置的 LP 实现,目前仅提供 LogitBiasLogitsProcessor (为指定 token_id 施加 logits 偏置)

Usage or Command

服务部署时,在启动命令中加入 --logits-processors 参数,指定当前服务可支持的 Logits Processors. 指定的 LP 会在引擎启动时实例化。传入的每个 LP 字符串必须是合法的 FQCN (Fully Qualified Class Name),即 module.path:ClassName 格式;如果使用内置的 LP,则不需要指定模块路径,只需传入 ClassName

python -m fastdeploy.entrypoints.openai.api_server \
    --model $your_model_path \
    --port 8580 \
    --metrics-port 8581 \
    --engine-worker-queue-port 8582 \
    --cache-queue-port 8583 \
    --logits-processors LogitBiasLogitsProcessor dotted.path.to.your.module:YourCustomLogitsProcessor

注:--logits-processors 参数决定服务可支持的 logits processors 有哪些,这里被指定的顺序就是各种处理器的执行顺序。

发送请求时,在请求体中加入 logits_processors_args 参数,可用于控制当前请求是否应用特定的 LP,传参需与 LP 类定义时约定的可接收参数一致:

curl -X POST "http://0.0.0.0:8580/v1/chat/completions" -H "Content-Type: application/json" -d '{
  "messages": [
    {"role": "user", "content": "鲁迅是谁"}
  ],
  "logits_processors_args": {
    "logit_bias": {"0": 100.0},
    "enable_your_custom_logits_processors": true
  }
}'

Accuracy Tests

以内置 LogitBiasLogitsProcessor 为例,首先发送普通请求,再发送带偏置的请求(给 token_id=0 即 ! 字符的 logits 值很大的偏置),最后再发送普通请求验证回答正确性:

curl -X POST "http://0.0.0.0:$1/v1/chat/completions" -H "Content-Type: application/json" -d '{
  "messages": [
    {"role": "user", "content": "鲁迅是谁"}
  ],
  "stream": false,
  "top_p": 0.0
}'

curl -X POST "http://0.0.0.0:$1/v1/chat/completions" -H "Content-Type: application/json" -d '{
  "messages": [
    {"role": "user", "content": "鲁迅是谁"}
  ],
  "stream": false,
  "top_p": 0.0,
  "logits_processors_args": {
    "logit_bias": {"0": 1000.0}
  }
}'

curl -X POST "http://0.0.0.0:$1/v1/chat/completions" -H "Content-Type: application/json" -d '{
  "messages": [
    {"role": "user", "content": "鲁迅是谁"}
  ],
  "top_p": 0.0,
  "stream": false
}'

输出:

{"id":"chatcmpl-0dfc6083-c0ce-4f94-beaf-9a23c72e865b","object":"chat.completion","created":1761280673,"model":"/root/paddlejob/workspace/env_run/liyonghua/models/Qwen/Qwen2-7B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"鲁迅,原名周樟寿,后改名为周树人,字豫山,后改豫才,“鲁迅”是他1918年发表《狂人日记》时所用的笔名,也是他影响最为广泛的笔名,浙江绍兴人。著名文学家、思想家、民主战士,五四新文化运动的重要参与者,中国现代文学的奠基人。\n\n鲁迅的作品包括杂文、短篇小说、评论、散文、翻译作品。对于“鲁迅”这个名字,人们熟知的是他的文学创作,尤其是那些深刻揭示社会现实和人性弱点的作品,如《呐喊》、《彷徨》、《故事新编》等。他的文字犀利、深刻,对封建制度、旧道德、国民性等问题进行了无情的批判,同时也对进步青年寄予了深切的期望。鲁迅不仅是中国现代文学的巨匠,也是中国文化界的杰出人物,其思想和作品对后世产生了深远的影响。","multimodal_content":null,"reasoning_content":null,"tool_calls":null,"prompt_token_ids":null,"completion_token_ids":null,"prompt_tokens":null,"completion_tokens":null},"logprobs":null,"draft_logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":21,"total_tokens":221,"completion_tokens":200,"prompt_tokens_details":{"cached_tokens":0}}}

{"id":"chatcmpl-91f6066f-c27b-436d-a129-011ec2e7a539","object":"chat.completion","created":1761280674,"model":"/root/paddlejob/workspace/env_run/liyonghua/models/Qwen/Qwen2-7B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!","multimodal_content":null,"reasoning_content":null,"tool_calls":null,"prompt_token_ids":null,"completion_token_ids":null,"prompt_tokens":null,"completion_tokens":null},"logprobs":null,"draft_logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":21,"total_tokens":2048,"completion_tokens":2027,"prompt_tokens_details":{"cached_tokens":0}}}

{"id":"chatcmpl-99e83644-02f2-48db-91db-b60f60ac048e","object":"chat.completion","created":1761280692,"model":"/root/paddlejob/workspace/env_run/liyonghua/models/Qwen/Qwen2-7B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"鲁迅,原名周樟寿,后改名为周树人,字豫山,后改豫才,“鲁迅”是他1918年发表《狂人日记》时所用的笔名,也是他影响最为广泛的笔名,浙江绍兴人。著名文学家、思想家、民主战士,五四新文化运动的重要参与者,中国现代文学的奠基人。\n\n鲁迅的作品包括杂文、短篇小说、评论、散文、翻译作品。对于“鲁迅”这个名字,人们熟知的是他的文学创作,尤其是那些深刻揭示社会现实和人性弱点的作品,如《呐喊》、《彷徨》、《故事新编》等。他的文字犀利、深刻,对封建制度、旧道德、国民性等问题进行了无情的批判,同时也对进步青年寄予了深切的期望。鲁迅不仅是中国现代文学的巨匠,也是中国文化界的杰出人物,其思想和作品对后世产生了深远的影响。","multimodal_content":null,"reasoning_content":null,"tool_calls":null,"prompt_token_ids":null,"completion_token_ids":null,"prompt_tokens":null,"completion_tokens":null},"logprobs":null,"draft_logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":21,"total_tokens":221,"completion_tokens":200,"prompt_tokens_details":{"cached_tokens":0}}}

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link

paddle-bot bot commented Oct 21, 2025

Thanks for your contribution!

@liyonghua0910 liyonghua0910 marked this pull request as ready for review October 24, 2025 03:23
@Jiang-Jia-Jun Jiang-Jia-Jun requested a review from Copilot October 24, 2025 08:29
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a logits processor framework that allows users to apply custom transformations to model logits before sampling. The implementation includes a base class for processors, a built-in LogitBiasLogitsProcessor for token bias adjustment, and integration throughout the inference pipeline.

Key changes:

  • Added abstract LogitsProcessor base class and LogitBiasLogitsProcessor implementation
  • Integrated logits processors into the sampling pipeline with state management via update_state and apply methods
  • Added --logits-processors CLI parameter and logits_processors_args request parameter for runtime configuration

Reviewed Changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
fastdeploy/model_executor/logits_processor/base.py Defines abstract LogitsProcessor interface
fastdeploy/model_executor/logits_processor/builtin.py Implements LogitBiasLogitsProcessor for token bias adjustment
fastdeploy/model_executor/logits_processor/init.py Provides factory functions for loading and instantiating processors
fastdeploy/config.py Adds logits_processors field to StructuredOutputsConfig
fastdeploy/engine/args_utils.py Adds --logits-processors CLI argument
fastdeploy/engine/engine.py Passes logits processor configuration to workers
fastdeploy/engine/sampling_params.py Adds logits_processors_args field with validation
fastdeploy/entrypoints/openai/protocol.py Adds logits_processors_args to API request models
fastdeploy/worker/worker_process.py Adds --logits-processors argument to worker parser
fastdeploy/worker/gpu_model_runner.py Initializes processors and updates state before model forward
fastdeploy/model_executor/layers/sample/meta_data.py Adds logits_processors field to SamplingMetadata
fastdeploy/model_executor/layers/sample/sampler.py Applies logits processors in sampling pipeline and renames SamplerProcessor to GuidedDecoding
fastdeploy/engine/sched/resource_manager_v1.py Removes metrics updates (unrelated change)
tests/model_executor/test_logits_processor.py Adds comprehensive unit tests for LogitBiasLogitsProcessor

@Jiang-Jia-Jun
Copy link
Collaborator

对应使用文档参考下vLLM/SGlang中的说明,在FD中也补充下

@Jiang-Jia-Jun Jiang-Jia-Jun merged commit a012e36 into PaddlePaddle:develop Oct 28, 2025
24 of 28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants