perf: Optimize task queue communication from engine to worker #4531

sunlei1024 · 2025-10-21T15:21:34Z

Motivation

优化 Engine 与 Worker 之间的任务队列通信，通过在任务调度中（put_tasks & get_tasks）将多模态输入（图像）在 NumPy 数组和 PaddlePaddle Tensor 之间转换，降低通信成本，提高多模态数据处理性能。

Modifications

新增 to_tensor 静态方法：当 FD_ENABLE_MAX_PREFILL 开启时，将任务中的 NumPy 图像数组转换为 PaddlePaddle Tensor。
新增 to_numpy 静态方法：当 FD_ENABLE_MAX_PREFILL 开启时，将任务中的 PaddlePaddle Tensor 转回 NumPy 数组，以便在结果处理前使用。
两个方法均增加异常捕获与日志记录，保证稳定性。

Usage or Command

当环境变量 FD_ENABLE_MAX_PREFILL 开启时，这些方法会自动应用。
无需用户手动调用，任务在 Engine 与 Worker 之间传输时自动进行转换。

# put_tasks，将图像转换为 Tensor
EngineWorkerQueue.to_tensor(tasks)

# get_tasks，将 Tensor 转回 NumPy
EngineWorkerQueue.to_numpy(tasks)

Accuracy Tests（准确性测试）

仅涉及数据格式转换，不影响模型输出。
已验证任务中的图像可以正确转换为 Tensor，并能无损转换回 NumPy 数组。

paddle-bot · 2025-10-21T15:21:44Z

Thanks for your contribution!

EmmonsCurse

LGTM

CLAassistant · 2025-10-23T09:02:54Z

All committers have signed the CLA.

ming1753

LGTM

ming1753

LGTM

…Paddle#4531) * perf: Optimize task queue communication from engine to worker * perf: get_tasks to numpy * perf: get_tasks remove to_numpy * fix: request & replace ENV * remove test_e2w_perf.py * fix code style --------- Co-authored-by: Jiang-Jia-Jun <[email protected]>

* support mm prefix caching * update code * fix mm_hashes * support encoder cache * add encoder cache * update code * update encoder cache * fix features bug * fix worker bug * support processor cache, need to optimize yet * refactor multimodal data cache * update code * update code * update v1 scheduler * update code * update code * update codestyle * support turn off processor cache and encoder cache * update pre-commit * fix code * solve review * update code * update code * update test case * set processor cache in GiB * update test case * support mm prefix caching for qwen model * fix code style check * update pre-commit * fix unit test * fix unit test * add ci test case * fix rescheduled bug * change text_after_process to prompt_tokens * fix unit test * fix chat template * change model path * [EP] fix adapter bugs (#4572) * Update expert_service.py * Update common_engine.py * Update expert_service.py * fix v1 hang bug (#4573) * fix import image_ops error on some platforms (#4559) * [CLI]Update parameters in bench latecy cli tool and fix collect-env cli tool (#4558) * add collect-env * del files * [Graph Optimization] Add dy_runnable and introduce cudagraph_switch_threshold for cudagraph mode switching (#4578) * add new branch for sot * reorder * fix batch bug * [XPU]Moe uses a new operator (#4585) * [XPU]Moe uses a new operator * [XPU]Moe uses a new operator * update response * [Feature] Support Paddle-OCR (#4396) * init * update code * fix code style & disable thinking * adapt for common_engine.update_mm_requests_chunk_size * use 3d rope * use flash_attn_unpadded * opt siglip * update to be compatible with the latest codebase * fix typo * optim OCR performance * fix bug * fix bug * fix bug * fix bug * normlize name * modify xpu rope * revert logger * fix bug * fix bug * fix bug * support default_v1 * optim performance * fix bug --------- Co-authored-by: root <[email protected]> Co-authored-by: zhangyue66 <[email protected]> * [DataProcessor] add reasoning_tokens into usage info (#4520) * add reasoning_tokens into usage info initial commit * add unit tests * modify unit test * modify and add unit tests * fix unit test * move steam usage to processor * modify processor * modify test_logprobs * modify test_logprobs.py * modify stream reasoning tokens accumulation * fix unit test * perf: Optimize task queue communication from engine to worker (#4531) * perf: Optimize task queue communication from engine to worker * perf: get_tasks to numpy * perf: get_tasks remove to_numpy * fix: request & replace ENV * remove test_e2w_perf.py * fix code style --------- Co-authored-by: Jiang-Jia-Jun <[email protected]> * Clean up ports after processing results (#4587) * [CI] Add /re-run command in PR comments to restart failed CI workflows (#4593) * [Others] api server exits when worker process is dead (#3271) * [fix] fix terminal hangs when worker process is dead * [chore] change sleep time of monitor * [chore] remove redundant comments * update docs --------- Co-authored-by: ApplEOFDiscord <[email protected]> Co-authored-by: ApplEOFDiscord <[email protected]> Co-authored-by: ltd0924 <[email protected]> Co-authored-by: yinwei <[email protected]> Co-authored-by: JYChen <[email protected]> Co-authored-by: qwes5s5 <[email protected]> Co-authored-by: Ryan <[email protected]> Co-authored-by: yyssys <[email protected]> Co-authored-by: ming1753 <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: zhangyue66 <[email protected]> Co-authored-by: kxz2002 <[email protected]> Co-authored-by: SunLei <[email protected]> Co-authored-by: Jiang-Jia-Jun <[email protected]> Co-authored-by: Zhang Yulong <[email protected]> Co-authored-by: YuBaoku <[email protected]> Co-authored-by: 李泳桦 <[email protected]>

perf: Optimize task queue communication from engine to worker

2b299be

paddle-bot bot added the contributor External developers label Oct 21, 2025

perf: get_tasks to numpy

76e88b2

EmmonsCurse previously approved these changes Oct 22, 2025

View reviewed changes

sunlei1024 dismissed EmmonsCurse’s stale review via cbb8fa0 October 23, 2025 09:02

ming1753 previously approved these changes Oct 24, 2025

View reviewed changes

sunlei1024 dismissed ming1753’s stale review via 8ee1bab October 24, 2025 12:28

sunlei1024 added 4 commits October 24, 2025 12:32

perf: get_tasks remove to_numpy

a900779

fix: request & replace ENV

a68b993

remove test_e2w_perf.py

90eb67c

fix code style

c30b772

sunlei1024 force-pushed the perf/engine_work_queue branch from 8ee1bab to c30b772 Compare October 24, 2025 12:34

ming1753 approved these changes Oct 24, 2025

View reviewed changes

Merge branch 'develop' into perf/engine_work_queue

3594513

Jiang-Jia-Jun merged commit dc1a9c7 into PaddlePaddle:develop Oct 25, 2025
24 of 27 checks passed

sunlei1024 deleted the perf/engine_work_queue branch December 3, 2025 05:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: Optimize task queue communication from engine to worker #4531

perf: Optimize task queue communication from engine to worker #4531

Uh oh!

sunlei1024 commented Oct 21, 2025 •

edited

Loading

Uh oh!

paddle-bot bot commented Oct 21, 2025

Uh oh!

EmmonsCurse left a comment

Uh oh!

CLAassistant commented Oct 23, 2025 •

edited

Loading

Uh oh!

ming1753 left a comment

Uh oh!

ming1753 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

perf: Optimize task queue communication from engine to worker #4531

perf: Optimize task queue communication from engine to worker #4531

Uh oh!

Conversation

sunlei1024 commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests（准确性测试）

Uh oh!

paddle-bot bot commented Oct 21, 2025

Uh oh!

EmmonsCurse left a comment

Choose a reason for hiding this comment

Uh oh!

CLAassistant commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ming1753 left a comment

Choose a reason for hiding this comment

Uh oh!

ming1753 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

sunlei1024 commented Oct 21, 2025 •

edited

Loading

CLAassistant commented Oct 23, 2025 •

edited

Loading