Skip to content

Conversation

@kevincheng2
Copy link
Collaborator

@kevincheng2 kevincheng2 commented Sep 16, 2025

  • support mm prefix cache
  • support mm encoder cache
  • support mm processor cache
  • prefix_cache_manager.log renamed as cache_manager.log
  • adjust v1 scheduling logic to schedule more requests
  • add test case for mm prefix cache and encoder cache

@paddle-bot
Copy link

paddle-bot bot commented Sep 16, 2025

Thanks for your contribution!

@kevincheng2 kevincheng2 added ERNIE-45-VL V1 V1 scheduler labels Sep 23, 2025
@Jiang-Jia-Jun Jiang-Jia-Jun requested a review from Copilot October 14, 2025 12:36
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces comprehensive multimodal (MM) support for prefix caching, encoder caching, and processor caching, along with adjustments to V1 scheduling logic.

  • Multimodal prefix cache support with hash-based block identification for image/video content
  • Encoder cache management to store processed multimodal features and reduce redundant computations
  • Processor cache system for storing preprocessed multimodal data with ZMQ-based IPC communication

Reviewed Changes

Copilot reviewed 26 out of 26 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
tests/v1/cache_manager/test_prefix_cache.py Test cases for multimodal prefix caching functionality
tests/v1/cache_manager/test_encoder_cache.py Test cases for encoder cache management
fastdeploy/worker/worker_process.py Added max_encoder_cache argument for worker configuration
fastdeploy/worker/gpu_model_runner.py Encoder cache implementation and multimodal input processing
fastdeploy/scheduler/local_scheduler.py V1 scheduler logic adjustments for better request handling
fastdeploy/scheduler/global_scheduler.py V1 scheduler logic adjustments for global scheduling
fastdeploy/multimodal/hasher.py Multimodal content hashing utility for cache identification
fastdeploy/input/preprocess.py Added processor cache enablement parameter
fastdeploy/input/ernie4_5_vl_processor/process.py Processor cache integration with ZMQ communication
fastdeploy/input/ernie4_5_vl_processor/ernie4_5_vl_processor.py Processor cache enablement in ERNIE processor
fastdeploy/entrypoints/openai/protocol.py Added mm_hashes field to API protocol
fastdeploy/entrypoints/openai/api_server.py Added max_processor_cache configuration
fastdeploy/entrypoints/engine_client.py Processor cache configuration in engine client
fastdeploy/entrypoints/chat_utils.py Updated chat message parsing for UUID-based multimodal content
fastdeploy/engine/sched/resource_manager_v1.py Resource manager integration with multimodal caches
fastdeploy/engine/request.py Added ImagePosition dataclass for multimodal positioning
fastdeploy/engine/engine.py Added encoder cache configuration to worker startup
fastdeploy/engine/common_engine.py Simplified available blocks calculation
fastdeploy/engine/args_utils.py Added encoder and processor cache configuration arguments
fastdeploy/config.py Configuration updates for encoder and processor cache settings
fastdeploy/cache_manager/prefix_cache_manager.py Multimodal prefix cache implementation with hash-based block matching
fastdeploy/cache_manager/multimodal_cache_manager.py Base classes for multimodal cache management
fastdeploy/cache_manager/cache_metrics.py Updated log file name from prefix_cache_manager.log to cache_manager.log
fastdeploy/cache_manager/cache_data.py Updated log file name from prefix_cache_manager.log to cache_manager.log
docs/zh/usage/log.md Updated log file name documentation
docs/usage/log.md Updated log file name documentation

max_encoder_cache: int = -1
"""
Maximum number of tokens in the encoder cache.
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的单位为啥会改成使用token而不是具体大小呢

Copy link
Collaborator Author

@kevincheng2 kevincheng2 Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

encoder的具体大小等于image_token * hidden_size * dtype,完全和token数正相关,这里表示方式是跟vllm一样的

break
return can_schedule

def _update_mm_hashes(self, request):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

对于图像的hash当前在测试时,耗时大概是多少呢,看后面是否有要优化

ApplEOFDiscord and others added 28 commits October 20, 2025 17:39
* Update expert_service.py

* Update common_engine.py

* Update expert_service.py
…hreshold for cudagraph mode switching (PaddlePaddle#4578)

* add new branch for sot

* reorder

* fix batch bug
* [XPU]Moe uses a new operator

* [XPU]Moe uses a new operator

* update response
* init

* update code

* fix code style & disable thinking

* adapt for common_engine.update_mm_requests_chunk_size

* use 3d rope

* use flash_attn_unpadded

* opt siglip

* update to be compatible with the latest codebase

* fix typo

* optim OCR performance

* fix bug

* fix bug

* fix bug

* fix bug

* normlize name

* modify xpu rope

* revert logger

* fix bug

* fix bug

* fix bug

* support default_v1

* optim performance

* fix bug

---------

Co-authored-by: root <[email protected]>
Co-authored-by: zhangyue66 <[email protected]>
* add reasoning_tokens into usage info initial commit

* add unit tests

* modify unit test

* modify and add unit tests

* fix unit test

* move steam usage to processor

* modify processor

* modify test_logprobs

* modify test_logprobs.py

* modify stream reasoning tokens accumulation

* fix unit test
…Paddle#4531)

* perf: Optimize task queue communication from engine to worker

* perf: get_tasks to numpy

* perf: get_tasks remove to_numpy

* fix: request & replace ENV

* remove test_e2w_perf.py

* fix code style

---------

Co-authored-by: Jiang-Jia-Jun <[email protected]>
)

* [fix] fix terminal hangs when worker process is dead

* [chore] change sleep time of monitor

* [chore] remove redundant comments
@Jiang-Jia-Jun Jiang-Jia-Jun merged commit 8aab4e3 into PaddlePaddle:develop Oct 27, 2025
22 of 27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ERNIE-45-VL V1 V1 scheduler

Projects

None yet

Development

Successfully merging this pull request may close these issues.