fix(vertex-ai): support batch cancel via Vertex API #23957
fix(vertex-ai): support batch cancel via Vertex API #23957Sameerlite merged 11 commits intoBerriAI:litellm_dev_sameer_16_march_weekfrom
Conversation
Add Vertex batch cancellation support in LiteLLM batch APIs, route proxy cancel fallback using request provider headers, and return post-cancel batch state via retrieve to keep response shape compatible. Made-with: Cursor
Incorporate follow-up changes to Vertex batch cancel handling and proxy provider resolution, including config updates used for local verification. Made-with: Cursor
Revert local test-only proxy config edits so the PR does not include unrelated configuration changes. Made-with: Cursor
- Add try/except httpx.HTTPStatusError blocks in _async_cancel_batch for both POST cancel and GET retrieve calls, with verbose_logger error logging - Fix endpoint extraction inconsistency: compute endpoint from URL without :cancel suffix so it matches behaviour of create_batch/retrieve_batch - Add explicit validation that api_base ends with ':cancel' before stripping it, raising a descriptive error for unsupported custom proxy URL rewriting scenarios - Use string-based patch() in test instead of patch.object() for robustness against import order changes Made-with: Cursor
- Replace misleading endpoint extraction with explicit endpoint = "cancel" - Compute retrieve_api_base from URL components directly instead of stripping ":cancel" from the post-proxy URL, removing the hard ValueError that broke any custom Vertex AI proxy configuration - Align cancel_batch provider priority in proxy endpoints to match create_batch order: body field → request headers → query params → default Made-with: Cursor
…rmation.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
… forwarding, sync logging - Fix retrieve_api_base derivation to handle custom proxies with path-based routing (not just :cancel suffix) - Forward timeout to POST calls in cancel_batch (sync + async) - Add try/except error logging to sync cancel path (parity with async) - Add tests for timeout forwarding and custom proxy retrieve URL Made-with: Cursor
- Remove dead elif branch in retrieve_api_base derivation - Replace unreachable try/except httpx.HTTPStatusError around GET calls with logging inside the status_code check (HTTPHandler.get() does not call raise_for_status()) - Add comments noting HTTPHandler.get()/AsyncHTTPHandler.get() do not accept a timeout parameter Made-with: Cursor
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile SummaryThis PR adds Vertex AI support to Key issues found:
Confidence Score: 2/5
|
| Filename | Overview |
|---|---|
| litellm/llms/vertex_ai/batches/handler.py | Adds cancel_batch and _async_cancel_batch to VertexAIBatchPrediction. Has a correctness bug: when a custom api_base is provided, the retrieve URL is built by simply stripping :cancel from the proxy base URL, but the batch_id is never included in the custom proxy URL, so the follow-up GET hits the proxy root instead of the specific batch resource. |
| litellm/batches/main.py | Adds vertex_ai branch to cancel_batch / acancel_batch, correctly resolving project, location, and credentials before delegating to vertex_ai_batches_instance. Looks clean. |
| litellm/proxy/batches_endpoints/endpoints.py | Proxy cancel fallback now respects custom-llm-provider request headers/query params (consistent with other endpoints). The cast(Any, ...) workaround and minor response.data refactor are pre-existing issues. Changes look correct. |
| tests/test_litellm/llms/vertex_ai/test_vertex_ai_batch_transformation.py | Adds four new tests, but test_vertex_ai_cancel_batch_forwards_timeout is entirely empty (docstring only, no assertions), and test_vertex_ai_cancel_batch_custom_proxy_retrieve_url does not assert that the batch_id is present in the GET URL, hiding the retrieve-URL bug. |
Sequence Diagram
sequenceDiagram
participant Client
participant ProxyEndpoint as litellm/proxy/.../endpoints.py
participant LiteLLMMain as litellm/batches/main.py
participant VertexHandler as VertexAIBatchPrediction.cancel_batch
participant VertexAPI as Vertex AI API
Client->>ProxyEndpoint: POST /v1/batches/{id}/cancel<br/>(custom-llm-provider: vertex_ai header)
ProxyEndpoint->>ProxyEndpoint: get_custom_llm_provider_from_request_headers()<br/>→ "vertex_ai"
ProxyEndpoint->>LiteLLMMain: acancel_batch(custom_llm_provider="vertex_ai", ...)
LiteLLMMain->>LiteLLMMain: resolve vertex_project, vertex_location,<br/>vertex_credentials from env/params
LiteLLMMain->>VertexHandler: cancel_batch(_is_async=True, ...)
VertexHandler->>VertexHandler: _ensure_access_token() → Bearer token
VertexHandler->>VertexHandler: Build cancel URL:<br/>…/batchPredictionJobs/{id}:cancel
VertexHandler->>VertexAPI: POST …:cancel {}
VertexAPI-->>VertexHandler: 200 OK (job queued for cancellation)
VertexHandler->>VertexAPI: GET …/batchPredictionJobs/{id}
VertexAPI-->>VertexHandler: BatchPredictionJob (state=JOB_STATE_CANCELLING)
VertexHandler->>VertexHandler: transform_vertex_ai_batch_response → LiteLLMBatch
VertexHandler-->>LiteLLMMain: LiteLLMBatch (status="cancelling")
LiteLLMMain-->>ProxyEndpoint: LiteLLMBatch
ProxyEndpoint-->>Client: LiteLLMBatch JSON
Last reviewed commit: "Update tests/test_li..."
tests/test_litellm/llms/vertex_ai/test_vertex_ai_batch_transformation.py
Outdated
Show resolved
Hide resolved
| if api_base.endswith(":cancel"): | ||
| retrieve_api_base = api_base.removesuffix(":cancel") | ||
| else: | ||
| retrieve_api_base = api_base.rsplit(":cancel", 1)[0].rstrip("/") |
There was a problem hiding this comment.
else branch is unreachable dead code
_check_custom_proxy for Vertex AI always returns "{api_base}:{endpoint}" (i.e., …:cancel), so api_base will always end with :cancel and the else branch is never executed. If the :cancel suffix can genuinely appear in the middle of a custom URL, the else branch handles it correctly, but in practice this path is dead.
More importantly, if _check_custom_proxy were ever to return a URL that contains no :cancel token at all (e.g., due to a future refactor), then:
api_base.rsplit(":cancel", 1)[0] # returns the full api_base unchanged…causing retrieve_api_base == api_base (the cancel endpoint). The follow-up GET would then hit the :cancel endpoint instead of the retrieve endpoint, silently returning a wrong response or an error.
Consider adding an explicit guard or an assertion to make this invariant clear:
if api_base.endswith(":cancel"):
retrieve_api_base = api_base.removesuffix(":cancel")
elif ":cancel" in api_base:
retrieve_api_base = api_base.rsplit(":cancel", 1)[0].rstrip("/")
else:
# _check_custom_proxy always appends ":cancel"; reaching here is a bug
raise ValueError(
f"cancel_batch: expected api_base to contain ':cancel', got: {api_base!r}"
)| @@ -654,7 +654,7 @@ async def list_batches( | |||
| managed_files_obj, "list_user_batches" | |||
There was a problem hiding this comment.
cast(Any, ...) silences a type error rather than fixing it
Using cast(Any, managed_files_obj) removes all type-checking on the subsequent method call. If managed_files_obj gains a new return type or if list_user_batches is renamed, this cast will silently hide the discrepancy at type-check time.
A better approach would be to use a proper Protocol or TypeVar for managed_files_obj, or at minimum cast it to the specific type that exposes list_user_batches, rather than casting to Any.
…rmation.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
|
|
||
| _, api_base = self._check_custom_proxy( | ||
| api_base=api_base, | ||
| custom_llm_provider="vertex_ai", | ||
| gemini_api_key=None, | ||
| endpoint="cancel", | ||
| stream=None, | ||
| auth_header=None, | ||
| url=cancel_api_base_default, | ||
| model=None, | ||
| vertex_project=vertex_project or project_id, | ||
| vertex_location=vertex_location or "us-central1", | ||
| vertex_api_version="v1", | ||
| ) | ||
|
|
||
| if api_base.endswith(":cancel"): | ||
| retrieve_api_base = api_base.removesuffix(":cancel") | ||
| else: | ||
| retrieve_api_base = api_base.rsplit(":cancel", 1)[0].rstrip("/") |
There was a problem hiding this comment.
Custom proxy retrieve URL is missing the
batch_id
When a caller supplies a custom api_base (e.g. "https://my-proxy.example.com"), _check_custom_proxy constructs:
api_base → "https://my-proxy.example.com:cancel"
Stripping :cancel then yields:
retrieve_api_base → "https://my-proxy.example.com" # ← batch_id not included
In contrast, when api_base=None the default URL already has the batch_id baked in (…/batchPredictionJobs/{batch_id}:cancel), so the strip works correctly.
The follow-up GET to retrieve_api_base will therefore hit the proxy root — not the specific batch resource — causing the retrieve step to return an unexpected response or 404.
The custom-proxy path should include the batch_id before the :cancel suffix is added, for example by constructing the base path first:
if api_base is not None:
# Build the per-batch base URL from the custom proxy, then append :cancel
per_batch_api_base = f"{api_base.rstrip('/')}/{batch_id}"
cancel_url = f"{per_batch_api_base}:cancel"
else:
per_batch_api_base = retrieve_api_base_default
cancel_url = cancel_api_base_default
_, api_base = self._check_custom_proxy(
api_base=None, # already handled above
...
url=cancel_url,
...
)
retrieve_api_base = per_batch_api_baseThe existing test test_vertex_ai_cancel_batch_custom_proxy_retrieve_url does not assert that the batch_id (123456) is present in the get_url, so this bug is not caught today.
| def cancel_batch( | ||
| self, | ||
| _is_async: bool, | ||
| batch_id: str, | ||
| api_base: Optional[str], | ||
| vertex_credentials: Optional[VERTEX_CREDENTIALS_TYPES], | ||
| vertex_project: Optional[str], | ||
| vertex_location: Optional[str], | ||
| timeout: Union[float, httpx.Timeout], | ||
| max_retries: Optional[int], | ||
| ) -> Union[LiteLLMBatch, Coroutine[Any, Any, LiteLLMBatch]]: |
There was a problem hiding this comment.
max_retries parameter is accepted but silently ignored
max_retries is declared as a parameter of cancel_batch (and propagated from litellm/batches/main.py) but is never passed to _get_httpx_client(), get_async_httpx_client(), or used anywhere else in the method body. This is inconsistent with the caller's intent and means retry logic is never applied.
If retry support is intentionally deferred, consider removing the parameter from the signature to avoid a misleading contract, or add a # TODO: comment so it is not forgotten.
| def test_vertex_ai_cancel_batch_forwards_timeout(): | ||
| """Test that timeout is forwarded to the POST (cancel) HTTP call. | ||
|
|
||
| Note: the follow-up GET (retrieve) call does not accept a timeout | ||
| parameter in the underlying HTTP handler, so it is intentionally omitted. | ||
| """ |
There was a problem hiding this comment.
Empty test body — always passes trivially
test_vertex_ai_cancel_batch_forwards_timeout contains only a docstring with no assertions or test logic. The function body is empty (implicit return None), so it will always pass regardless of actual behaviour, providing zero regression protection.
The test should verify that the timeout argument is forwarded to sync_handler.post(...). The implementation should follow the same pattern used in test_vertex_ai_cancel_batch — mock _get_httpx_client and _ensure_access_token, call handler.cancel_batch(...) with a specific timeout value (e.g. 42.0), then assert:
post_call_kwargs = mock_client.return_value.post.call_args.kwargs
assert post_call_kwargs["timeout"] == 42.0Without this assertion the test provides no guarantee that callers' timeout settings are honoured.
Rule Used: # Code Review Rule: Mock Test Integrity
What:... (source)
8d843fd
into
BerriAI:litellm_dev_sameer_16_march_week
Summary
vertex_aisupport tocancel_batchandacancel_batchin batch APIs:cancel) and return normalized batch state via follow-up retrievecustom-llm-provider: vertex_airoutes correctlyOriginal score:

also, in get request, timeout is not propagated because get req doesn't accept timeout