-
Notifications
You must be signed in to change notification settings - Fork 682
[Feature] support logits processors #4515
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] support logits processors #4515
Conversation
…BiasLogitsProcessor
|
Thanks for your contribution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a logits processor framework that allows users to apply custom transformations to model logits before sampling. The implementation includes a base class for processors, a built-in LogitBiasLogitsProcessor for token bias adjustment, and integration throughout the inference pipeline.
Key changes:
- Added abstract LogitsProcessor base class and LogitBiasLogitsProcessor implementation
- Integrated logits processors into the sampling pipeline with state management via
update_stateandapplymethods - Added
--logits-processorsCLI parameter andlogits_processors_argsrequest parameter for runtime configuration
Reviewed Changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| fastdeploy/model_executor/logits_processor/base.py | Defines abstract LogitsProcessor interface |
| fastdeploy/model_executor/logits_processor/builtin.py | Implements LogitBiasLogitsProcessor for token bias adjustment |
| fastdeploy/model_executor/logits_processor/init.py | Provides factory functions for loading and instantiating processors |
| fastdeploy/config.py | Adds logits_processors field to StructuredOutputsConfig |
| fastdeploy/engine/args_utils.py | Adds --logits-processors CLI argument |
| fastdeploy/engine/engine.py | Passes logits processor configuration to workers |
| fastdeploy/engine/sampling_params.py | Adds logits_processors_args field with validation |
| fastdeploy/entrypoints/openai/protocol.py | Adds logits_processors_args to API request models |
| fastdeploy/worker/worker_process.py | Adds --logits-processors argument to worker parser |
| fastdeploy/worker/gpu_model_runner.py | Initializes processors and updates state before model forward |
| fastdeploy/model_executor/layers/sample/meta_data.py | Adds logits_processors field to SamplingMetadata |
| fastdeploy/model_executor/layers/sample/sampler.py | Applies logits processors in sampling pipeline and renames SamplerProcessor to GuidedDecoding |
| fastdeploy/engine/sched/resource_manager_v1.py | Removes metrics updates (unrelated change) |
| tests/model_executor/test_logits_processor.py | Adds comprehensive unit tests for LogitBiasLogitsProcessor |
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
|
对应使用文档参考下vLLM/SGlang中的说明,在FD中也补充下 |
…puts into LP, do not copy share_inputs and logits
…& add docs and tests
Motivation
LogitsProcessor (以下简称 LP)在处理链中位于「模型输出 logits → 采样策略」之间,负责对即将生成 token 的 logits 进行任意变换或约束,然后再交给采样器(top-k / top-p / temperature 等)挑选最终 token。其输入是未处理的 logits 值,以及可选的一些推理状态(例如已生成 tokens 的统计信息),输出是处理后的 logits 值。
LP 是有状态的批处理器,集成为采样器的子模块。一个标准的 LP 需要提供
update_state和apply接口。update_state- 在 execute_model 的开头执行,根据当前推理的 batch 信息更新 LP 内部维护的状态,可以保存一些统计信息、索引、张量等;apply- 在 model.forward 之后、sampling 之前执行,根据当前 LP 状态对 logits 值应用修改。以上的标准化接口可以支持用户自定义任意 LP,用户需要了解 FD 内部的推理状态是如何维护的,在
update_state时编写 LP 状态的更新代码,然后在apply中编写对 logits 值的修改代码。Modifications
config.py/args_utils.py- 添加--logits-processors服务启动参数,并新增配置,通过fd_config.structured_outputs_config.logits_processors引用sampling_params.py/protocol.py- 添加logits_processors_args请求参数sample/meta_data.py- 将 logits_processors 以 sampling_metadata 属性的形式传入 samplerfastdeploy/model_executor/logits_processor/__init__.py- 提供 LP 的实例化函数,以及 logits_processor 模块的命名空间base.py- 提供 LP 的抽象类接口,自定义 LP 需继承自该类builtin.py- 内置的 LP 实现,目前仅提供 LogitBiasLogitsProcessor (为指定 token_id 施加 logits 偏置)Usage or Command
服务部署时,在启动命令中加入
--logits-processors参数,指定当前服务可支持的 Logits Processors. 指定的 LP 会在引擎启动时实例化。传入的每个 LP 字符串必须是合法的 FQCN (Fully Qualified Class Name),即module.path:ClassName格式;如果使用内置的 LP,则不需要指定模块路径,只需传入ClassName。注:--logits-processors 参数决定服务可支持的 logits processors 有哪些,这里被指定的顺序就是各种处理器的执行顺序。
发送请求时,在请求体中加入
logits_processors_args参数,可用于控制当前请求是否应用特定的 LP,传参需与 LP 类定义时约定的可接收参数一致:Accuracy Tests
以内置 LogitBiasLogitsProcessor 为例,首先发送普通请求,再发送带偏置的请求(给 token_id=0 即
!字符的 logits 值很大的偏置),最后再发送普通请求验证回答正确性:输出:
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.