Skip to content

Conversation

@sunlei1024
Copy link
Collaborator

@sunlei1024 sunlei1024 commented Oct 21, 2025

Motivation

优化 Engine 与 Worker 之间的任务队列通信,通过在任务调度中(put_tasks & get_tasks)将多模态输入(图像)在 NumPy 数组和 PaddlePaddle Tensor 之间转换,降低通信成本,提高多模态数据处理性能。

Modifications

  • 新增 to_tensor 静态方法:当 FD_ENABLE_MAX_PREFILL 开启时,将任务中的 NumPy 图像数组转换为 PaddlePaddle Tensor。
  • 新增 to_numpy 静态方法:当 FD_ENABLE_MAX_PREFILL 开启时,将任务中的 PaddlePaddle Tensor 转回 NumPy 数组,以便在结果处理前使用。
  • 两个方法均增加异常捕获与日志记录,保证稳定性。

Usage or Command

  • 当环境变量 FD_ENABLE_MAX_PREFILL 开启时,这些方法会自动应用。
  • 无需用户手动调用,任务在 Engine 与 Worker 之间传输时自动进行转换。
# put_tasks,将图像转换为 Tensor
EngineWorkerQueue.to_tensor(tasks)

# get_tasks,将 Tensor 转回 NumPy
EngineWorkerQueue.to_numpy(tasks)

Accuracy Tests(准确性测试)

  • 仅涉及数据格式转换,不影响模型输出。
  • 已验证任务中的图像可以正确转换为 Tensor,并能无损转换回 NumPy 数组。

@paddle-bot
Copy link

paddle-bot bot commented Oct 21, 2025

Thanks for your contribution!

@paddle-bot paddle-bot bot added the contributor External developers label Oct 21, 2025
EmmonsCurse
EmmonsCurse previously approved these changes Oct 22, 2025
Copy link
Collaborator

@EmmonsCurse EmmonsCurse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@CLAassistant
Copy link

CLAassistant commented Oct 23, 2025

CLA assistant check
All committers have signed the CLA.

ming1753
ming1753 previously approved these changes Oct 24, 2025
Copy link
Collaborator

@ming1753 ming1753 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sunlei1024 sunlei1024 force-pushed the perf/engine_work_queue branch from 8ee1bab to c30b772 Compare October 24, 2025 12:34
Copy link
Collaborator

@ming1753 ming1753 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Jiang-Jia-Jun Jiang-Jia-Jun merged commit dc1a9c7 into PaddlePaddle:develop Oct 25, 2025
24 of 27 checks passed
kevincheng2 pushed a commit to kevincheng2/FastDeploy that referenced this pull request Oct 27, 2025
…Paddle#4531)

* perf: Optimize task queue communication from engine to worker

* perf: get_tasks to numpy

* perf: get_tasks remove to_numpy

* fix: request & replace ENV

* remove test_e2w_perf.py

* fix code style

---------

Co-authored-by: Jiang-Jia-Jun <[email protected]>
Jiang-Jia-Jun added a commit that referenced this pull request Oct 27, 2025
* support mm prefix caching

* update code

* fix mm_hashes

* support encoder cache

* add encoder cache

* update code

* update encoder cache

* fix features bug

* fix worker bug

* support processor cache, need to optimize yet

* refactor multimodal data cache

* update code

* update code

* update v1 scheduler

* update code

* update code

* update codestyle

* support turn off processor cache and encoder cache

* update pre-commit

* fix code

* solve review

* update code

* update code

* update test case

* set processor cache in GiB

* update test case

* support mm prefix caching for qwen model

* fix code style check

* update pre-commit

* fix unit test

* fix unit test

* add ci test case

* fix rescheduled bug

* change text_after_process to prompt_tokens

* fix unit test

* fix chat template

* change model path

* [EP] fix adapter bugs (#4572)

* Update expert_service.py

* Update common_engine.py

* Update expert_service.py

* fix v1 hang bug (#4573)

* fix import image_ops error on some platforms (#4559)

* [CLI]Update parameters in bench latecy cli tool and fix collect-env cli tool (#4558)

* add collect-env

* del files

* [Graph Optimization] Add dy_runnable and introduce cudagraph_switch_threshold for cudagraph mode switching (#4578)

* add new branch for sot

* reorder

* fix batch bug

* [XPU]Moe uses a new operator (#4585)

* [XPU]Moe uses a new operator

* [XPU]Moe uses a new operator

* update response

* [Feature] Support Paddle-OCR (#4396)

* init

* update code

* fix code style & disable thinking

* adapt for common_engine.update_mm_requests_chunk_size

* use 3d rope

* use flash_attn_unpadded

* opt siglip

* update to be compatible with the latest codebase

* fix typo

* optim OCR performance

* fix bug

* fix bug

* fix bug

* fix bug

* normlize name

* modify xpu rope

* revert logger

* fix bug

* fix bug

* fix bug

* support default_v1

* optim performance

* fix bug

---------

Co-authored-by: root <[email protected]>
Co-authored-by: zhangyue66 <[email protected]>

* [DataProcessor] add reasoning_tokens into usage info (#4520)

* add reasoning_tokens into usage info initial commit

* add unit tests

* modify unit test

* modify and add unit tests

* fix unit test

* move steam usage to processor

* modify processor

* modify test_logprobs

* modify test_logprobs.py

* modify stream reasoning tokens accumulation

* fix unit test

* perf: Optimize task queue communication from engine to worker (#4531)

* perf: Optimize task queue communication from engine to worker

* perf: get_tasks to numpy

* perf: get_tasks remove to_numpy

* fix: request & replace ENV

* remove test_e2w_perf.py

* fix code style

---------

Co-authored-by: Jiang-Jia-Jun <[email protected]>

* Clean up ports after processing results (#4587)

* [CI] Add /re-run command in PR comments to restart failed CI workflows (#4593)

* [Others] api server exits when worker process is dead (#3271)

* [fix] fix terminal hangs when worker process is dead

* [chore] change sleep time of monitor

* [chore] remove redundant comments

* update docs

---------

Co-authored-by: ApplEOFDiscord <[email protected]>
Co-authored-by: ApplEOFDiscord <[email protected]>
Co-authored-by: ltd0924 <[email protected]>
Co-authored-by: yinwei <[email protected]>
Co-authored-by: JYChen <[email protected]>
Co-authored-by: qwes5s5 <[email protected]>
Co-authored-by: Ryan <[email protected]>
Co-authored-by: yyssys <[email protected]>
Co-authored-by: ming1753 <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: zhangyue66 <[email protected]>
Co-authored-by: kxz2002 <[email protected]>
Co-authored-by: SunLei <[email protected]>
Co-authored-by: Jiang-Jia-Jun <[email protected]>
Co-authored-by: Zhang Yulong <[email protected]>
Co-authored-by: YuBaoku <[email protected]>
Co-authored-by: 李泳桦 <[email protected]>
@sunlei1024 sunlei1024 deleted the perf/engine_work_queue branch December 3, 2025 05:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants