-
Notifications
You must be signed in to change notification settings - Fork 684
[Feature] mm and thinking model support structred output #2749
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] mm and thinking model support structred output #2749
Conversation
|
Thanks for your contribution! |
d07f737 to
72de4a3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds structured output support via guided decoding (reasoning parsers) for multi-modal and thinking models, including offline inference capabilities.
- Introduce a new
--reasoning_parserCLI argument and propagate it through configuration to model runners. - Extend the sampling and guided decoding pipeline: updated
Sampler, guided backend interfaces, and skip-index logic. - Enhance
SamplingParamswithGuidedDecodingParamsand document offline inference usage for structured outputs.
Reviewed Changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| fastdeploy/worker/worker_process.py | Add --reasoning_parser CLI arg and integrate it into FDConfig. |
| fastdeploy/worker/vl_gpu_model_runner.py | Initialize guided backend and reasoning parser; update guided decoding flow in the GPU model runner. |
| fastdeploy/model_executor/layers/sample/sampler.py | Enhance Sampler to support reasoning parsing and skip indices when masking tokens. |
| fastdeploy/engine/sampling_params.py | Introduce GuidedDecodingParams in SamplingParams for offline structured inference. |
| docs/features/structured_outputs.md | Add offline inference examples for structured output using GuidedDecodingParams. |
Comments suppressed due to low confidence (3)
fastdeploy/worker/vl_gpu_model_runner.py:145
- The code checks for
guided_json,guided_regex,guided_grammar, andstructural_tagbut does not handleguided_choicefromGuidedDecodingParams. Add support forguided_choiceto ensure all constraint types are honored.
elif request.guided_grammar is not None:
fastdeploy/engine/engine.py:1049
- The code references
self.cfg.reasoning_parser, butreasoning_parseris not defined on the engine config object. It should likely referenceself.cfg.model_config.reasoning_parser.
f" --reasoning_parser {self.cfg.reasoning_parser}")
fastdeploy/worker/vl_gpu_model_runner.py:152
- Using
request.get(...)may not work ifrequestis not a dict-like object. Consider usinggetattr(request, 'enable_thinking', True)to access the attribute safely.
enable_thinking=request.get("enable_thinking", True),
aac8503 to
04c2f3c
Compare
2ef373a to
69fc3a2
Compare
69fc3a2 to
6bd3676
Compare
0429910 to
3e9bba5
Compare
aec275d to
278d3bd
Compare
8f9fb63 to
9ba1d41
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #2749 +/- ##
==========================================
Coverage ? 39.02%
==========================================
Files ? 11
Lines ? 123
Branches ? 19
==========================================
Hits ? 48
Misses ? 69
Partials ? 6
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Uh oh!
There was an error while loading. Please reload this page.