[feat] support prefix cache clearing when `/clear_load_weight` is called #4091

liyonghua0910 · 2025-09-12T10:17:06Z

需求描述

在 RL 场景需要交替进行训练和推理，因此每次推理结束后需要调用 /clear_load_weight 清理权重给训练腾出空间，训练结束后需要调用 /update_model_weight 时重新加载权重进行推理。现在 RL 需要开启上下文前缀缓存（Prefix Caching）实现推理加速，因此 KV 缓存也要配套该接口进行与模型权重的清除和加载。本 PR 主要实现该功能点。

主要改动

新增了算子 unset_data_ipc 用于解除 set_data_ipc 对 kv cache tensor 的引用
新增了 prefix_tree_status_signal 和 kv_cache_status_signal 信号，分别用于通知清除缓存树索引和缓存本身
修改 kv cache 的初始化时机，现在有两种情况：(1) cache_transfer_manager 创建 cache 而 gpu_model_runner 连接，(2) gpu_model_runner 创建 cache 而 cache_transfer_manager 连接。只有非 profile 模式的 PD 分离部署为第 1 种情况
扩展了 cache_ready_signal 信号的使用场景，原本用于 prefix_cache_manager 等待 cache_transfer_manager 进程创建 cache tensor，而现在 cache tensor 不一定由 cache_transfer_manager 创建，因此现在 cache_transfer_manager 和 gpu_model_runner 都有可能负责将该信号量置 1，或等待对方将该信号置 1
新增了 swap_space_ready_signal 信号，用于 prefix_cache_manager 等待 cache_transfer_manager 完成 cpu 缓存的分配
新增了部分 IPC 信号的状态常量的名称映射，以免开发时直接使用 0, 1, -1 等数字表示不同状态容易导致混淆
给 engine_client 中的 clear/update 操作加了互斥锁，避免同时清除和加载权重
新增了环境变量 FD_ENABLE_SWAP_SPACE_CLEARING 控制在清除 PrefixCache 时是否要顺带清除 CPU 缓存，默认为 0，即不会清除 CPU 缓存

paddle-bot · 2025-09-12T10:17:11Z

Thanks for your contribution!

Jiang-Jia-Jun · 2025-09-15T09:47:13Z

fastdeploy/engine/args_utils.py

            self.enable_prefix_caching = False
-        if self.dynamic_load_weight:
-            self.enable_prefix_caching = False
+        # if self.dynamic_load_weight:


这个PR合入后，再提一个PR注释代码移除

这行会让 RL 场景的 prefix cache 强制关闭，但是在 develop 又已经被其他人移除了

liyonghua0910 added 2 commits September 12, 2025 16:49

[feat] support clearing prefix cache (cherry-picked from release/2.1)

c87f9b6

[fix] fix ipc suffix, use port instead

c408eb3

liyonghua0910 and others added 4 commits September 12, 2025 19:30

[fix] fix prefix caching not enabled

10a7c1a

[fix] fix code style

e1bd249

[fix] wait for rank0 to update weight status

2969fe6

Merge branch 'release/2.2' into release/2.2+clear_prefix_cache

f73ef48

Jiang-Jia-Jun requested a review from rainyfly September 15, 2025 09:40

YuanRisheng added the skip-ci: coverage label Sep 15, 2025

Jiang-Jia-Jun approved these changes Sep 15, 2025

View reviewed changes

rainyfly approved these changes Sep 15, 2025

View reviewed changes

Merge branch 'release/2.2' into release/2.2+clear_prefix_cache

2e93c27

Jiang-Jia-Jun merged commit 7ccbcc5 into PaddlePaddle:release/2.2 Sep 16, 2025
23 of 27 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[feat] support prefix cache clearing when `/clear_load_weight` is called #4091

[feat] support prefix cache clearing when `/clear_load_weight` is called #4091

Uh oh!

liyonghua0910 commented Sep 12, 2025

Uh oh!

paddle-bot bot commented Sep 12, 2025

Uh oh!

Jiang-Jia-Jun Sep 15, 2025

Uh oh!

liyonghua0910 Sep 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[feat] support prefix cache clearing when /clear_load_weight is called #4091

[feat] support prefix cache clearing when /clear_load_weight is called #4091

Uh oh!

Conversation

liyonghua0910 commented Sep 12, 2025

需求描述

主要改动

Uh oh!

paddle-bot bot commented Sep 12, 2025

Uh oh!

Jiang-Jia-Jun Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

liyonghua0910 Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[feat] support prefix cache clearing when `/clear_load_weight` is called #4091

[feat] support prefix cache clearing when `/clear_load_weight` is called #4091