[Bug Fix] fix the bug when num_key_value_heads < tensor_parallel_size in launching kv_cahce_manager #3717
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
pcard-71500
TP4启动ERNIE4.5-0.3B模型时,卡在了等待kv_cache_manager启动的阶段。
主要原因:cache_config.model_cfg.num_key_value_heads的大小为2,tensor_parallel_size为4,在进行kv_cache切分的时候导致了最后算出的切分结果为0,最后初始化kv_cache的size为[gpu_block, 0, block_size, head_dim]初始化kv_cache失败。
解决方案:初始化KV Cache时,在计算好kv_num_head之后,对其和1取最大值,即
kv_num_head = max(1, kv_num_head)