-
Notifications
You must be signed in to change notification settings - Fork 683
[BugFix] support real batch_size #3109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…tar_model_runner.py
|
Thanks for your contribution! |
|
LGTM |
fastdeploy/worker/xpu_worker.py
Outdated
| ) -> Optional[ModelRunnerOutput]: | ||
| """ """ | ||
| output = self.model_runner.execute_model(model_forward_batch) | ||
| if not is_dummy_run: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
为什么需要区分是否 is_dummy_run ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里写反了,已修改,因为xpu里面dummy_run用的execute_model,而num_running_requests不可能为空,而dummy_run这个传不进去,所以在这里做了判断,dummy_run的时候不做切分
| self.model_inputs["block_tables"][idx : idx + 1, :block_num] = np.arange( | ||
| idx * block_num, (idx + 1) * block_num, 1 | ||
| ) | ||
| self.model_inputs["seq_lens_this_time"] = self.seq_lens_this_time_buffer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
没有slice吗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dummy的过程不需要
fastdeploy/spec_decode/mtp.py
Outdated
| ) | ||
|
|
||
| def insert_prefill_inputs(self, req_dicts: List[Request]): | ||
| def insert_prefill_inputs(self, req_dicts: List[Request], num_running_requests): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
输入类型声明下
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
carryyu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* support real bsz * fix * fix xpu_model_runner.py,gpu_model_runner.py,gcu_model_runner.py,iluvatar_model_runner.py * add event_loop_ep * fix * Add comments * fix * support mtp real_batch_size * fix * self.tmp_seq_lens_this_time->self.seq_lens_this_time_buffer * fix * fix VL real_seq_lens_this_time * fix * fix mtp * fix * fix mtp * fix xpu * fix
* support real bsz * fix * fix xpu_model_runner.py,gpu_model_runner.py,gcu_model_runner.py,iluvatar_model_runner.py * add event_loop_ep * fix * Add comments * fix * support mtp real_batch_size * fix * self.tmp_seq_lens_this_time->self.seq_lens_this_time_buffer * fix * fix VL real_seq_lens_this_time * fix * fix mtp * fix * fix mtp * fix xpu * fix
Fix the inference logic to use num_running_requests instead of max_num_seqs; the latter brought clear gains on smaller models.