Conversation
…into yunfeng-batch-infer
model-engine/model_engine_server/domain/use_cases/llm_model_endpoint_use_cases.py
Outdated
Show resolved
Hide resolved
model-engine/model_engine_server/inference/batch_inference/vllm_batch.py
Outdated
Show resolved
Hide resolved
yixu34
left a comment
There was a problem hiding this comment.
Not in scope for this PR per se, but given that we're looking at Azure support, we may want to start thinking about how to support Azure in PRs going forward.
Need to take another pass at the actual batch job script.
| """ | ||
| Path to the checkpoint to load the model from. | ||
| """ | ||
| labels: Dict[str, str] |
There was a problem hiding this comment.
Should we make labels required for external users?
There was a problem hiding this comment.
since this is would mostly be used internally, i think it's okay to sacrifice some ergnomics, plus external users can simply use {}
model-engine/model_engine_server/domain/use_cases/llm_model_endpoint_use_cases.py
Outdated
Show resolved
Hide resolved
model-engine/model_engine_server/inference/batch_inference/Dockerfile_vllm
Show resolved
Hide resolved
|
|
||
| def infer_hardware_from_model_name( | ||
| self, model_name: str | ||
| ) -> CreateDockerImageBatchJobResourceRequests: |
There was a problem hiding this comment.
Gah I just realized this should be CreateDockerImageBatchJobResourceRequest, oh well.
There was a problem hiding this comment.
possible to rename, but don't want to put in this PR
| if job_index == 0: | ||
| wait_for_all_chunks(request) | ||
| combine_all_chunks(request) | ||
| if request.output_data_path.startswith("s3://"): |
There was a problem hiding this comment.
Should we have this in a finally?
There was a problem hiding this comment.
i think this "finally" should be at end of job instead of end of worker 0. otherwise if worker 0 failed early, it may not be able to remove the other output chunks.
for now i think it's fine to not remove and leave some traces for debugging.
ian-scale
left a comment
There was a problem hiding this comment.
looks good to me! Just a few nits / questions for myself to learn more.
model-engine/model_engine_server/domain/use_cases/llm_model_endpoint_use_cases.py
Show resolved
Hide resolved
model-engine/model_engine_server/inference/batch_inference/requirements.txt
Show resolved
Hide resolved
model-engine/model_engine_server/inference/batch_inference/vllm_batch.py
Show resolved
Hide resolved
saiatmakuri
left a comment
There was a problem hiding this comment.
some nits on unit test style but lgtm otherwise
| mock_process, | ||
| ): | ||
| # Mock the necessary objects and data | ||
| mock_popen.return_value = mock_process |
There was a problem hiding this comment.
nit: you can create a new fixture for that mocks the return value for subprocess.Popen using the mock_process fixture such that you don't need to make this declaration at the start of each unit test. it'll clean up logic here:
def test(mock1, mock2):
mock2.return_value = mock1
becomes
def test(mock2):
...
you can do that for several of the mocks done at the start of each unit test
There was a problem hiding this comment.
i think you're proposing to use
@patch("subprocess.Popen", mock_process)
i think there's some ordering problem about patch not sure but can't get it work quickly. will skip this for now
Pull Request Summary
Batch completions create API, which currently only uses vLLM to run batch inference with kubernetes jobs. feature highlights:
Test Plan and Usage Guide
note: during testing with arc challenge, found 1 worker and 2 workers do not always return the same results:
single difference across 1k requests:
1 worker:
2 workers: