Skip to content

Conversation

@qwes5s5
Copy link
Collaborator

@qwes5s5 qwes5s5 commented Sep 3, 2025

fastdeploy:cache_config_info 类型为 Gauge,记录了当前节点缓存设置(CacheConfig)相关的信息,当引擎进行初始化时进行记录,内含有:

block_size: KV缓存中每个块(block)的大小,以token为单位。
bytes_per_block: 每个KV缓存块所占用的字节数。
bytes_per_layer_per_block: 每个块在单层模型中占用的字节数。
cache_dtype: 缓存中存储键值(KV)对的数据类型。
cache_queue_port: 用于缓存队列通信的端口号。
cache_transfer_protocol: 缓存数据传输使用的协议,例如 ipc(进程间通信)。
dec_token_num: 解码阶段的令牌数量。
each_token_cache_space: 每个令牌在KV缓存中占用的空间大小。
enable_chunked_prefill: 是否启用分块预填充。
enable_hierarchical_cache: 是否启用分层缓存。
enable_prefix_caching: 是否启用前缀缓存。
enable_ssd_cache: 是否启用SSD固态硬盘缓存。
enc_dec_block_num: 编码器-解码器模型的块数量。
gpu_memory_utilization: 允许使用的GPU内存占总GPU内存的比例。
kv_cache_ratio: KV缓存占用的内存比例。
max_block_num_per_seq: 每个序列(请求)允许分配的最大块数量。
model_cfg: 模型的配置对象。
num_cpu_blocks: 分配给CPU的缓存块数量。
num_gpu_blocks_override: 覆盖默认值后,分配给GPU的缓存块数量。
pd_comm_port: 用于通信的端口号。
prealloc_dec_block_slot_num_threshold: 预先分配解码块槽的阈值。
prefill_kvcache_block_num: 预填充阶段使用的KV缓存块数量。
rdma_comm_ports: 用于RDMA(远程直接数据存取)通信的端口号。
swap_space: 交换空间的大小,用于处理内存不足的情况。
total_block_num: KV缓存的总块数。
示例:

# HELP fastdeploy:cache_config_info Information of the engine's CacheConfig
# TYPE fastdeploy:cache_config_info gauge
fastdeploy:cache_config_info{block_size="64",bytes_per_block="589824",bytes_per_layer_per_block="32768",
cache_dtype="bfloat16",cache_queue_port="8003",cache_transfer_protocol="ipc",dec_token_num="128",
each_token_cache_space="9216",enable_chunked_prefill="False",enable_hierarchical_cache="False",
enable_prefix_caching="False",enable_ssd_cache="False",enc_dec_block_num="2",gpu_memory_utilization="0.9",
kv_cache_ratio="0.75",max_block_num_per_seq="128",model_cfg="<fastdeploy.config.ModelConfig object at 0x7f74a6b413f0>",
num_cpu_blocks="0",num_gpu_blocks_override="1000",pd_comm_port="None",prealloc_dec_block_slot_num_threshold="5",
prefill_kvcache_block_num="750",rdma_comm_ports="None",swap_space="None",total_block_num="1000"} 1.0

fastdeploy:available_batch_size,类型为Gauge,记录了当前节点可用批量大小,表示还能接受多少新的请求,在资源分配和回收时记录

# HELP fastdeploy:available_batch_size Number of how many new requests the system can still accept.
# TYPE fastdeploy:available_batch_size gauge
fastdeploy:available_batch_size 128.0

fastdeploy:hit_req_rate,类型为Gauge,记录了Request级别前缀缓存命中率,在CacheMetrics更新命中metrics时进行记录

# HELP fastdeploy:hit_req_rate Request-level prefix cache hit rate
# TYPE fastdeploy:hit_req_rate gauge
fastdeploy:hit_req_rate 0.5

fastdeploy:hit_token_rate,类型为Gauge,记录了Token级别的前缀缓存命中率,在CacheMetrics更新命中metrics时进行记录

# HELP fastdeploy:hit_token_rate Token-level prefix cache hit rate
# TYPE fastdeploy:hit_token_rate gauge
fastdeploy:hit_token_rate 0.5

fastdeploy:cpu_hit_token_rate,类型为Gauge,记录了Token级别cpu前缀缓存命中率,在CacheMetrics更新命中metrics时进行记录

# HELP fastdeploy:cpu_hit_token_rate Token-level CPU prefix cache hit rate
# TYPE fastdeploy:cpu_hit_token_rate gauge
fastdeploy:cpu_hit_token_rate 0.0

fastdeploy:gpu_hit_token_rate,类型为Gauge,记录了Token级别GPU前缀缓存命中率,在CacheMetrics更新命中metrics时进行记录

# HELP fastdeploy:gpu_hit_token_rate Token-level GPU prefix cache hit rate
# TYPE fastdeploy:gpu_hit_token_rate gauge
fastdeploy:gpu_hit_token_rate 0.5

@CLAassistant
Copy link

CLAassistant commented Sep 3, 2025

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
2 out of 3 committers have signed the CLA.

✅ qwes5s5
✅ Jiang-Jia-Jun
❌ K11OntheBoat


K11OntheBoat seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@paddle-bot
Copy link

paddle-bot bot commented Sep 3, 2025

Thanks for your contribution!

@paddle-bot paddle-bot bot added the contributor External developers label Sep 3, 2025
| `fastdeploy:request_success_total` | Counter | 成功处理的请求个数 ||

| `fastdeploy:cache_config_info` | Gauge | 推理引擎的缓存配置信息 ||
| `fastdeploy:available_batch_size` | Gauge | 系统还可以接受的请求数量 ||
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里改成 Decode阶段还可以插入的请求数量

"available_batch_size": {
"type": Gauge,
"name": "fastdeploy:available_batch_size",
"description": "Number of how many new requests the system can still accept.",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

对应英文也改一下

| `fastdeploy:request_success_total` | Counter | Number of successfully processed requests | Count |

| `fastdeploy:cache_config_info` | Gauge | Information of the engine's CacheConfig | Count |
| `fastdeploy:available_batch_size` | Gauge | Number of how many new requests the system can still accept| Count |
Copy link
Collaborator

@Jiang-Jia-Jun Jiang-Jia-Jun Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同中文,改一下注释含义,例如 "Number of requests that can continue to be inserted during the decode phase"

@Jiang-Jia-Jun Jiang-Jia-Jun changed the title Add serveral observability metrics [metrics] Add serveral observability metrics Sep 3, 2025
Jiang-Jia-Jun
Jiang-Jia-Jun previously approved these changes Sep 3, 2025
@Jiang-Jia-Jun Jiang-Jia-Jun merged commit 17169a1 into PaddlePaddle:develop Sep 8, 2025
qwes5s5 added a commit to qwes5s5/FastDeploy that referenced this pull request Sep 9, 2025
* Add several observability metrics

* [wenxin-tools-584] 【可观测性】支持查看本节点的并发数、剩余block_size、排队请求数等信息

* adjust some metrics and md files

* trigger ci

* adjust ci file

* trigger ci

* trigger ci

---------

Co-authored-by: K11OntheBoat <[email protected]>
Co-authored-by: Jiang-Jia-Jun <[email protected]>
Jiang-Jia-Jun added a commit that referenced this pull request Sep 10, 2025
* [metrics] Add serveral observability metrics (#3868)

* Add several observability metrics

* [wenxin-tools-584] 【可观测性】支持查看本节点的并发数、剩余block_size、排队请求数等信息

* adjust some metrics and md files

* trigger ci

* adjust ci file

* trigger ci

* trigger ci

---------

Co-authored-by: K11OntheBoat <[email protected]>
Co-authored-by: Jiang-Jia-Jun <[email protected]>

* version adjust

---------

Co-authored-by: K11OntheBoat <[email protected]>
Co-authored-by: Jiang-Jia-Jun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants