Skip to content

Conversation

@lizexu123
Copy link
Collaborator

Fix the inference logic to use num_running_requests instead of max_num_seqs; the latter brought clear gains on smaller models.

@paddle-bot
Copy link

paddle-bot bot commented Jul 31, 2025

Thanks for your contribution!

@lizexu123 lizexu123 closed this Jul 31, 2025
@lizexu123 lizexu123 reopened this Jul 31, 2025
@iosmers
Copy link
Collaborator

iosmers commented Aug 5, 2025

LGTM

) -> Optional[ModelRunnerOutput]:
""" """
output = self.model_runner.execute_model(model_forward_batch)
if not is_dummy_run:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么需要区分是否 is_dummy_run ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里写反了,已修改,因为xpu里面dummy_run用的execute_model,而num_running_requests不可能为空,而dummy_run这个传不进去,所以在这里做了判断,dummy_run的时候不做切分

self.model_inputs["block_tables"][idx : idx + 1, :block_num] = np.arange(
idx * block_num, (idx + 1) * block_num, 1
)
self.model_inputs["seq_lens_this_time"] = self.seq_lens_this_time_buffer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

没有slice吗

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dummy的过程不需要

)

def insert_prefill_inputs(self, req_dicts: List[Request]):
def insert_prefill_inputs(self, req_dicts: List[Request], num_running_requests):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

输入类型声明下

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Collaborator

@carryyu carryyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Jiang-Jia-Jun Jiang-Jia-Jun merged commit b01cfd6 into PaddlePaddle:develop Aug 5, 2025
10 of 14 checks passed
lizexu123 added a commit to lizexu123/FastDeploy that referenced this pull request Aug 5, 2025
* support real bsz

* fix

* fix xpu_model_runner.py,gpu_model_runner.py,gcu_model_runner.py,iluvatar_model_runner.py

* add event_loop_ep

* fix

* Add comments

* fix

* support mtp real_batch_size

* fix

* self.tmp_seq_lens_this_time->self.seq_lens_this_time_buffer

* fix

* fix VL real_seq_lens_this_time

* fix

* fix mtp

* fix

* fix mtp

* fix xpu

* fix
Jiang-Jia-Jun pushed a commit that referenced this pull request Aug 6, 2025
* support real bsz

* fix

* fix xpu_model_runner.py,gpu_model_runner.py,gcu_model_runner.py,iluvatar_model_runner.py

* add event_loop_ep

* fix

* Add comments

* fix

* support mtp real_batch_size

* fix

* self.tmp_seq_lens_this_time->self.seq_lens_this_time_buffer

* fix

* fix VL real_seq_lens_this_time

* fix

* fix mtp

* fix

* fix mtp

* fix xpu

* fix
iosmers added a commit that referenced this pull request Aug 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants