Conversation
seanshi-scale
left a comment
There was a problem hiding this comment.
did we test whether this would result in zero-downtime deployments of models?
|
Yep, for a test deployment, I was able to change the model weights for an endpoint with no downtime 🙂 Although I'm not sure if there's a way to tell from our API at what point the changes take effect? 🤔 |
| raise ObjectNotFoundException | ||
| if not self.authz_module.check_access_read_owned_entity( | ||
| user, model_endpoint.record | ||
| ) and not self.authz_module.check_endpoint_public_inference_for_user( |
There was a problem hiding this comment.
I think we want to make sure only the owner can run update_endpoint, e.g. I think this needs to be check_access_write_owned_entity and no check_endpoint_public_inference_for_user?
| LLMInferenceFramework.LIGHTLLM, | ||
| LLMInferenceFramework.TENSORRT_LLM, | ||
| ]: | ||
| if endpoint_record.endpoint_type != ModelEndpointType.STREAMING: |
There was a problem hiding this comment.
should this be request.endpoint_type?
| ]: | ||
| if endpoint_record.endpoint_type != ModelEndpointType.STREAMING: | ||
| raise ObjectHasInvalidValueException( | ||
| f"Creating endpoint type {str(endpoint_record.endpoint_type)} is not allowed. Can only create streaming endpoints for text-generation-inference, vLLM and LightLLM." |
There was a problem hiding this comment.
also should we mention tensorrtllm in the error as well
There was a problem hiding this comment.
Yeah we probably want a shared constant string or helper function to use in both error sites.
There was a problem hiding this comment.
Actually gonna remove this check since we're not allowing changing either endpoint type or inference framework 😅
| fake_llm_artifact_gateway, | ||
| fake_llm_model_endpoint_service, | ||
| create_llm_model_endpoint_request_streaming: CreateLLMModelEndpointV1Request, | ||
| update_llm_model_endpoint_request: UpdateLLMModelEndpointV1Request, |
There was a problem hiding this comment.
could we test some more cases such as changing resource requests/cases where we expect the use case to return an error?
| ) from exc | ||
|
|
||
|
|
||
| @llm_router_v1.post( |
| # General endpoint fields | ||
| metadata: Optional[Dict[str, Any]] | ||
| post_inference_hooks: Optional[List[str]] | ||
| endpoint_type: Optional[ModelEndpointType] |
There was a problem hiding this comment.
In vanilla Launch, we've typically prevented updating to change endpoint types. The general point is that not all of the "create" fields are necessarily relevant or allowable for updates.
There was a problem hiding this comment.
Hmm ok below, you do throw a runtime error to prevent this. Makes me wonder why in Launch we even allowed endpoint_type in the request payload for updates 🤔
There was a problem hiding this comment.
Ah oops, UpdateModelEndpointV1Request does not include endpoint_type 😅 Will remove endpoint_type and inference_framework from UpdateLLMModelEndpointV1Request!
|
|
||
|
|
||
| class UpdateLLMModelEndpointV1Response(BaseModel): | ||
| endpoint_creation_task_id: str |
There was a problem hiding this comment.
Should we call this endpoint_update_task_id?
There was a problem hiding this comment.
The non-LLM specific UpdateModelEndpointV1Response has endpoint_creation_task_id... do we want to keep these consistent? 🤔
| ]: | ||
| if endpoint_record.endpoint_type != ModelEndpointType.STREAMING: | ||
| raise ObjectHasInvalidValueException( | ||
| f"Creating endpoint type {str(endpoint_record.endpoint_type)} is not allowed. Can only create streaming endpoints for text-generation-inference, vLLM and LightLLM." |
There was a problem hiding this comment.
Yeah we probably want a shared constant string or helper function to use in both error sites.
seanshi-scale
left a comment
There was a problem hiding this comment.
had some nits re the error messages we're returning, other than that looks good
| """ | ||
| Updates an LLM endpoint for the current user. | ||
| """ | ||
| logger.info(f"POST /llm/model-endpoints/{model_endpoint_name} with {request} for {auth}") |
| except (ObjectNotFoundException, ObjectNotAuthorizedException) as exc: | ||
| raise HTTPException( | ||
| status_code=404, | ||
| detail="The specified model bundle could not be found.", |
There was a problem hiding this comment.
nit: think error should be "specified llm endpoint could not be found", since having a model bundle under the hood is an implementation detail
| status_code=400, | ||
| detail=str(exc), | ||
| ) from exc | ||
| except ObjectNotApprovedException as exc: |
There was a problem hiding this comment.
is it possible to get to this line? if so the error message seems maybe off (don't think we know about model bundles for llm endpoints, since it's internal + an implementation detail)
| "inference_framework_image_tag": create_llm_model_endpoint_request_sync.inference_framework_image_tag, | ||
| "num_shards": create_llm_model_endpoint_request_sync.num_shards, | ||
| "quantize": None, | ||
| "checkpoint_path": None, |
There was a problem hiding this comment.
should we set a fake non-null checkpoint path for some of these?
Pull Request Summary
Add API route for updating LLM endpoints
Test Plan and Usage Guide
Test deployment