LLM update API route by squeakymouse · Pull Request #387 · scaleapi/llm-engine

squeakymouse · 2023-11-20T23:57:27Z

Pull Request Summary

Add API route for updating LLM endpoints

Test Plan and Usage Guide

Test deployment

seanshi-scale

did we test whether this would result in zero-downtime deployments of models?

squeakymouse · 2023-11-21T00:24:06Z

Yep, for a test deployment, I was able to change the model weights for an endpoint with no downtime 🙂 Although I'm not sure if there's a way to tell from our API at what point the changes take effect? 🤔

seanshi-scale · 2023-11-21T18:41:31Z

model-engine/model_engine_server/domain/use_cases/llm_model_endpoint_use_cases.py

+            raise ObjectNotFoundException
+        if not self.authz_module.check_access_read_owned_entity(
+            user, model_endpoint.record
+        ) and not self.authz_module.check_endpoint_public_inference_for_user(


I think we want to make sure only the owner can run update_endpoint, e.g. I think this needs to be check_access_write_owned_entity and no check_endpoint_public_inference_for_user?

seanshi-scale · 2023-11-21T18:44:03Z

model-engine/model_engine_server/domain/use_cases/llm_model_endpoint_use_cases.py

+                LLMInferenceFramework.LIGHTLLM,
+                LLMInferenceFramework.TENSORRT_LLM,
+            ]:
+                if endpoint_record.endpoint_type != ModelEndpointType.STREAMING:


should this be request.endpoint_type?

seanshi-scale · 2023-11-21T18:49:35Z

model-engine/model_engine_server/domain/use_cases/llm_model_endpoint_use_cases.py

+            ]:
+                if endpoint_record.endpoint_type != ModelEndpointType.STREAMING:
+                    raise ObjectHasInvalidValueException(
+                        f"Creating endpoint type {str(endpoint_record.endpoint_type)} is not allowed. Can only create streaming endpoints for text-generation-inference, vLLM and LightLLM."


also should we mention tensorrtllm in the error as well

Yeah we probably want a shared constant string or helper function to use in both error sites.

Actually gonna remove this check since we're not allowing changing either endpoint type or inference framework 😅

seanshi-scale · 2023-11-21T18:51:59Z

model-engine/tests/unit/domain/test_llm_use_cases.py

+    fake_llm_artifact_gateway,
+    fake_llm_model_endpoint_service,
+    create_llm_model_endpoint_request_streaming: CreateLLMModelEndpointV1Request,
+    update_llm_model_endpoint_request: UpdateLLMModelEndpointV1Request,


could we test some more cases such as changing resource requests/cases where we expect the use case to return an error?

yixu34 · 2023-11-21T19:03:20Z

model-engine/model_engine_server/api/llms_v1.py

        ) from exc


+@llm_router_v1.post(


This should be a PUT.

yixu34 · 2023-11-21T19:59:38Z

model-engine/model_engine_server/common/dtos/llms.py

+    # General endpoint fields
+    metadata: Optional[Dict[str, Any]]
+    post_inference_hooks: Optional[List[str]]
+    endpoint_type: Optional[ModelEndpointType]


In vanilla Launch, we've typically prevented updating to change endpoint types. The general point is that not all of the "create" fields are necessarily relevant or allowable for updates.

Hmm ok below, you do throw a runtime error to prevent this. Makes me wonder why in Launch we even allowed endpoint_type in the request payload for updates 🤔

Ah oops, UpdateModelEndpointV1Request does not include endpoint_type 😅 Will remove endpoint_type and inference_framework from UpdateLLMModelEndpointV1Request!

yixu34 · 2023-11-21T20:00:15Z

model-engine/model_engine_server/common/dtos/llms.py

+
+
+class UpdateLLMModelEndpointV1Response(BaseModel):
+    endpoint_creation_task_id: str


Should we call this endpoint_update_task_id?

The non-LLM specific UpdateModelEndpointV1Response has endpoint_creation_task_id... do we want to keep these consistent? 🤔

yixu34 · 2023-11-21T20:02:34Z

model-engine/model_engine_server/domain/use_cases/llm_model_endpoint_use_cases.py

+            ]:
+                if endpoint_record.endpoint_type != ModelEndpointType.STREAMING:
+                    raise ObjectHasInvalidValueException(
+                        f"Creating endpoint type {str(endpoint_record.endpoint_type)} is not allowed. Can only create streaming endpoints for text-generation-inference, vLLM and LightLLM."


Yeah we probably want a shared constant string or helper function to use in both error sites.

seanshi-scale

had some nits re the error messages we're returning, other than that looks good

seanshi-scale · 2023-11-22T23:11:09Z

model-engine/model_engine_server/api/llms_v1.py

+    """
+    Updates an LLM endpoint for the current user.
+    """
+    logger.info(f"POST /llm/model-endpoints/{model_endpoint_name} with {request} for {auth}")


seanshi-scale · 2023-11-22T23:12:49Z

model-engine/model_engine_server/api/llms_v1.py

+    except (ObjectNotFoundException, ObjectNotAuthorizedException) as exc:
+        raise HTTPException(
+            status_code=404,
+            detail="The specified model bundle could not be found.",


nit: think error should be "specified llm endpoint could not be found", since having a model bundle under the hood is an implementation detail

seanshi-scale · 2023-11-22T23:14:14Z

model-engine/model_engine_server/api/llms_v1.py

+            status_code=400,
+            detail=str(exc),
+        ) from exc
+    except ObjectNotApprovedException as exc:


is it possible to get to this line? if so the error message seems maybe off (don't think we know about model bundles for llm endpoints, since it's internal + an implementation detail)

seanshi-scale · 2023-11-22T23:19:24Z

model-engine/tests/unit/domain/test_llm_use_cases.py

            "inference_framework_image_tag": create_llm_model_endpoint_request_sync.inference_framework_image_tag,
            "num_shards": create_llm_model_endpoint_request_sync.num_shards,
            "quantize": None,
+            "checkpoint_path": None,


should we set a fake non-null checkpoint path for some of these?

squeakymouse added 2 commits November 20, 2023 19:46

LLM update endpoint

6726975

error handling

ca01271

squeakymouse requested a review from a team November 20, 2023 23:57

seanshi-scale reviewed Nov 21, 2023

View reviewed changes

squeakymouse added 2 commits November 21, 2023 02:14

unit test

f05d636

Merge branch 'main' into katiewu/llm-update-endpoint

4bbe5ee

squeakymouse requested a review from seanshi-scale November 21, 2023 18:08

seanshi-scale reviewed Nov 21, 2023

View reviewed changes

yixu34 reviewed Nov 21, 2023

View reviewed changes

address comments

8cb2aa4

squeakymouse requested review from seanshi-scale and yixu34 November 22, 2023 00:39

seanshi-scale approved these changes Nov 22, 2023

View reviewed changes

address comments

1ab6921

squeakymouse merged commit 3c0f168 into main Nov 27, 2023

squeakymouse deleted the katiewu/llm-update-endpoint branch November 27, 2023 18:06

yunfeng-scale mentioned this pull request Mar 6, 2024

Fix cacher #462

Merged



		class UpdateLLMModelEndpointV1Response(BaseModel):
		endpoint_creation_task_id: str

Conversation

squeakymouse commented Nov 20, 2023

Pull Request Summary

Test Plan and Usage Guide

Uh oh!

seanshi-scale left a comment

Choose a reason for hiding this comment

Uh oh!

squeakymouse commented Nov 21, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seanshi-scale left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants