[Feature] mm support prefix cache #4134

kevincheng2 · 2025-09-16T10:12:57Z

support mm prefix cache
support mm encoder cache
support mm processor cache
prefix_cache_manager.log renamed as cache_manager.log
adjust v1 scheduling logic to schedule more requests
add test case for mm prefix cache and encoder cache

paddle-bot · 2025-09-16T10:13:06Z

Thanks for your contribution!

Copilot

Pull Request Overview

This PR introduces comprehensive multimodal (MM) support for prefix caching, encoder caching, and processor caching, along with adjustments to V1 scheduling logic.

Multimodal prefix cache support with hash-based block identification for image/video content
Encoder cache management to store processed multimodal features and reduce redundant computations
Processor cache system for storing preprocessed multimodal data with ZMQ-based IPC communication

Reviewed Changes

Copilot reviewed 26 out of 26 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
tests/v1/cache_manager/test_prefix_cache.py	Test cases for multimodal prefix caching functionality
tests/v1/cache_manager/test_encoder_cache.py	Test cases for encoder cache management
fastdeploy/worker/worker_process.py	Added max_encoder_cache argument for worker configuration
fastdeploy/worker/gpu_model_runner.py	Encoder cache implementation and multimodal input processing
fastdeploy/scheduler/local_scheduler.py	V1 scheduler logic adjustments for better request handling
fastdeploy/scheduler/global_scheduler.py	V1 scheduler logic adjustments for global scheduling
fastdeploy/multimodal/hasher.py	Multimodal content hashing utility for cache identification
fastdeploy/input/preprocess.py	Added processor cache enablement parameter
fastdeploy/input/ernie4_5_vl_processor/process.py	Processor cache integration with ZMQ communication
fastdeploy/input/ernie4_5_vl_processor/ernie4_5_vl_processor.py	Processor cache enablement in ERNIE processor
fastdeploy/entrypoints/openai/protocol.py	Added mm_hashes field to API protocol
fastdeploy/entrypoints/openai/api_server.py	Added max_processor_cache configuration
fastdeploy/entrypoints/engine_client.py	Processor cache configuration in engine client
fastdeploy/entrypoints/chat_utils.py	Updated chat message parsing for UUID-based multimodal content
fastdeploy/engine/sched/resource_manager_v1.py	Resource manager integration with multimodal caches
fastdeploy/engine/request.py	Added ImagePosition dataclass for multimodal positioning
fastdeploy/engine/engine.py	Added encoder cache configuration to worker startup
fastdeploy/engine/common_engine.py	Simplified available blocks calculation
fastdeploy/engine/args_utils.py	Added encoder and processor cache configuration arguments
fastdeploy/config.py	Configuration updates for encoder and processor cache settings
fastdeploy/cache_manager/prefix_cache_manager.py	Multimodal prefix cache implementation with hash-based block matching
fastdeploy/cache_manager/multimodal_cache_manager.py	Base classes for multimodal cache management
fastdeploy/cache_manager/cache_metrics.py	Updated log file name from prefix_cache_manager.log to cache_manager.log
fastdeploy/cache_manager/cache_data.py	Updated log file name from prefix_cache_manager.log to cache_manager.log
docs/zh/usage/log.md	Updated log file name documentation
docs/usage/log.md	Updated log file name documentation

tests/v1/cache_manager/test_prefix_cache.py

fastdeploy/multimodal/hasher.py

fastdeploy/cache_manager/multimodal_cache_manager.py

fastdeploy/input/ernie4_5_vl_processor/process.py

fastdeploy/engine/args_utils.py

Jiang-Jia-Jun · 2025-10-14T12:58:30Z

fastdeploy/engine/args_utils.py

+    max_encoder_cache: int = -1
+    """
+    Maximum number of tokens in the encoder cache.
+    """


这里的单位为啥会改成使用token而不是具体大小呢

encoder的具体大小等于image_token * hidden_size * dtype，完全和token数正相关，这里表示方式是跟vllm一样的

Jiang-Jia-Jun · 2025-10-14T13:00:26Z

fastdeploy/engine/sched/resource_manager_v1.py

                break
        return can_schedule

+    def _update_mm_hashes(self, request):


对于图像的hash当前在测试时，耗时大概是多少呢，看后面是否有要优化

fastdeploy/worker/gpu_model_runner.py

* Update expert_service.py * Update common_engine.py * Update expert_service.py

…li tool (PaddlePaddle#4558) * add collect-env * del files

…hreshold for cudagraph mode switching (PaddlePaddle#4578) * add new branch for sot * reorder * fix batch bug

* [XPU]Moe uses a new operator * [XPU]Moe uses a new operator * update response

* init * update code * fix code style & disable thinking * adapt for common_engine.update_mm_requests_chunk_size * use 3d rope * use flash_attn_unpadded * opt siglip * update to be compatible with the latest codebase * fix typo * optim OCR performance * fix bug * fix bug * fix bug * fix bug * normlize name * modify xpu rope * revert logger * fix bug * fix bug * fix bug * support default_v1 * optim performance * fix bug --------- Co-authored-by: root <[email protected]> Co-authored-by: zhangyue66 <[email protected]>

* add reasoning_tokens into usage info initial commit * add unit tests * modify unit test * modify and add unit tests * fix unit test * move steam usage to processor * modify processor * modify test_logprobs * modify test_logprobs.py * modify stream reasoning tokens accumulation * fix unit test

…Paddle#4531) * perf: Optimize task queue communication from engine to worker * perf: get_tasks to numpy * perf: get_tasks remove to_numpy * fix: request & replace ENV * remove test_e2w_perf.py * fix code style --------- Co-authored-by: Jiang-Jia-Jun <[email protected]>

PaddlePaddle#4593)

) * [fix] fix terminal hangs when worker process is dead * [chore] change sleep time of monitor * [chore] remove redundant comments

kevincheng2 added 2 commits September 16, 2025 11:18

support mm prefix caching

c043855

update code

8e10642

kevincheng2 and others added 2 commits September 16, 2025 20:05

fix mm_hashes

7f8b2a9

support encoder cache

82d5839

kevincheng2 added ERNIE-45-VL V1 V1 scheduler labels Sep 23, 2025

kevincheng2 and others added 19 commits September 23, 2025 16:17

add encoder cache

5cb74b6

update code

e6ed41d

update encoder cache

dc93e65

fix features bug

ff8fe54

fix worker bug

023732e

support processor cache, need to optimize yet

3890687

refactor multimodal data cache

fd7b767

update code

e0745b4

update code

674caff

update v1 scheduler

d77d05a

update code

bcd4a28

update code

080e1b4

Merge remote-tracking branch 'origin/develop' into mm_prefix_cache

ca6179f

update codestyle

e17ec61

support turn off processor cache and encoder cache

a947376

Merge branch 'develop' into mm_prefix_cache

03c7cb2

update pre-commit

88fdb81

Merge branch 'develop' into mm_prefix_cache

85c4fd6

fix code

9f21b7a

Jiang-Jia-Jun requested a review from Copilot October 14, 2025 12:36

Copilot AI reviewed Oct 14, 2025

View reviewed changes

solve review

3e08928

Jiang-Jia-Jun requested changes Oct 14, 2025

View reviewed changes

ApplEOFDiscord and others added 28 commits October 20, 2025 17:39

Merge branch 'develop' into mm_prefix_cache

a0353a2

fix code style check

04530cf

update pre-commit

d0458ec

fix unit test

e7d90a8

Merge remote-tracking branch 'origin/develop' into mm_prefix_cache

67b35a4

Merge remote-tracking branch 'origin/develop' into mm_prefix_cache

0fdb32c

fix unit test

193bc94

add ci test case

1f9120f

fix rescheduled bug

74beeb8

change text_after_process to prompt_tokens

0f4f129

fix unit test

900cc05

fix chat template

e09c59a

change model path

b88a48b

Merge branch 'develop' into mm_prefix_cache

d539616

[EP] fix adapter bugs (PaddlePaddle#4572)

ad5ad64

* Update expert_service.py * Update common_engine.py * Update expert_service.py

fix v1 hang bug (PaddlePaddle#4573)

df3d702

fix import image_ops error on some platforms (PaddlePaddle#4559)

3b1b8ca

[CLI]Update parameters in bench latecy cli tool and fix collect-env c…

616556b

…li tool (PaddlePaddle#4558) * add collect-env * del files

[Graph Optimization] Add dy_runnable and introduce cudagraph_switch_t…

473f6dc

…hreshold for cudagraph mode switching (PaddlePaddle#4578) * add new branch for sot * reorder * fix batch bug

[XPU]Moe uses a new operator (PaddlePaddle#4585)

fbe32fa

* [XPU]Moe uses a new operator * [XPU]Moe uses a new operator * update response

Clean up ports after processing results (PaddlePaddle#4587)

9ca7f17

[CI] Add /re-run command in PR comments to restart failed CI workflows (

62b85fd

PaddlePaddle#4593)

[Others] api server exits when worker process is dead (PaddlePaddle#3271

b2159db

) * [fix] fix terminal hangs when worker process is dead * [chore] change sleep time of monitor * [chore] remove redundant comments

Merge remote-tracking branch 'origin/develop' into mm_prefix_cache

fb54e6c

update docs

e553a5b

Jiang-Jia-Jun approved these changes Oct 27, 2025

View reviewed changes

Jiang-Jia-Jun merged commit 8aab4e3 into PaddlePaddle:develop Oct 27, 2025
22 of 27 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] mm support prefix cache #4134

[Feature] mm support prefix cache #4134

Uh oh!

kevincheng2 commented Sep 16, 2025 •

edited

Loading

Uh oh!

paddle-bot bot commented Sep 16, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Jiang-Jia-Jun Oct 14, 2025

Uh oh!

kevincheng2 Oct 14, 2025 •

edited

Loading

Uh oh!

Jiang-Jia-Jun Oct 14, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

15 participants

[Feature] mm support prefix cache #4134

[Feature] mm support prefix cache #4134

Uh oh!

Conversation

kevincheng2 commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paddle-bot bot commented Sep 16, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Jiang-Jia-Jun Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

kevincheng2 Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jiang-Jia-Jun Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

15 participants

kevincheng2 commented Sep 16, 2025 •

edited

Loading

kevincheng2 Oct 14, 2025 •

edited

Loading