Skip to content

fix(vertex-ai): support batch cancel via Vertex API #23957

Merged
Sameerlite merged 11 commits intoBerriAI:litellm_dev_sameer_16_march_weekfrom
Sameerlite:litellm_vertex-cancel-batch
Mar 20, 2026
Merged

fix(vertex-ai): support batch cancel via Vertex API #23957
Sameerlite merged 11 commits intoBerriAI:litellm_dev_sameer_16_march_weekfrom
Sameerlite:litellm_vertex-cancel-batch

Conversation

@Sameerlite
Copy link
Copy Markdown
Collaborator

@Sameerlite Sameerlite commented Mar 18, 2026

Summary

  • add vertex_ai support to cancel_batch and acancel_batch in batch APIs
  • implement Vertex batch cancel handler (:cancel) and return normalized batch state via follow-up retrieve
  • make proxy cancel fallback respect provider from request headers so OpenAI SDK custom-llm-provider: vertex_ai routes correctly
image Status shows validating as vertex ai takes some time to change the status

Original score:
image

also, in get request, timeout is not propagated because get req doesn't accept timeout

Sameerlite and others added 10 commits March 17, 2026 11:23
Add Vertex batch cancellation support in LiteLLM batch APIs, route proxy cancel fallback using request provider headers, and return post-cancel batch state via retrieve to keep response shape compatible.

Made-with: Cursor
Incorporate follow-up changes to Vertex batch cancel handling and proxy provider resolution, including config updates used for local verification.

Made-with: Cursor
Revert local test-only proxy config edits so the PR does not include unrelated configuration changes.

Made-with: Cursor
- Add try/except httpx.HTTPStatusError blocks in _async_cancel_batch for
  both POST cancel and GET retrieve calls, with verbose_logger error logging
- Fix endpoint extraction inconsistency: compute endpoint from URL without
  :cancel suffix so it matches behaviour of create_batch/retrieve_batch
- Add explicit validation that api_base ends with ':cancel' before
  stripping it, raising a descriptive error for unsupported custom proxy
  URL rewriting scenarios
- Use string-based patch() in test instead of patch.object() for robustness
  against import order changes

Made-with: Cursor
- Replace misleading endpoint extraction with explicit endpoint = "cancel"
- Compute retrieve_api_base from URL components directly instead of
  stripping ":cancel" from the post-proxy URL, removing the hard ValueError
  that broke any custom Vertex AI proxy configuration
- Align cancel_batch provider priority in proxy endpoints to match
  create_batch order: body field → request headers → query params → default

Made-with: Cursor
…rmation.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
… forwarding, sync logging

- Fix retrieve_api_base derivation to handle custom proxies with
  path-based routing (not just :cancel suffix)
- Forward timeout to POST calls in cancel_batch (sync + async)
- Add try/except error logging to sync cancel path (parity with async)
- Add tests for timeout forwarding and custom proxy retrieve URL

Made-with: Cursor
- Remove dead elif branch in retrieve_api_base derivation
- Replace unreachable try/except httpx.HTTPStatusError around GET
  calls with logging inside the status_code check (HTTPHandler.get()
  does not call raise_for_status())
- Add comments noting HTTPHandler.get()/AsyncHTTPHandler.get() do not
  accept a timeout parameter

Made-with: Cursor
@vercel
Copy link
Copy Markdown

vercel bot commented Mar 18, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Mar 18, 2026 5:41am

Request Review

@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq bot commented Mar 18, 2026

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing Sameerlite:litellm_vertex-cancel-batch (694cf22) with main (cec3e9e)

Open in CodSpeed

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 18, 2026

Greptile Summary

This PR adds Vertex AI support to cancel_batch / acancel_batch by implementing a two-step flow: POST to …:cancel, then GET the job to return a normalised LiteLLMBatch. It also fixes the proxy cancel fallback to honour the custom-llm-provider request header so OpenAI-SDK clients routing to Vertex AI are handled correctly.

Key issues found:

  • Custom-proxy retrieve URL is missing the batch_id (litellm/llms/vertex_ai/batches/handler.py): when a caller provides a custom api_base, _check_custom_proxy appends :cancel to the bare proxy URL, and stripping :cancel afterwards yields the proxy root — not the per-batch path. The follow-up GET therefore hits the wrong endpoint. The default (no custom proxy) path is unaffected.
  • Empty test always passes (test_vertex_ai_cancel_batch_forwards_timeout): the test function contains only a docstring with no assertions, providing zero regression protection for timeout propagation.
  • max_retries is silently ignored: the parameter is accepted in cancel_batch on the handler but never forwarded to the HTTP client, making the caller's retry configuration inoperative.

Confidence Score: 2/5

  • Not safe to merge — the custom-proxy retrieve URL will silently hit the wrong endpoint, returning a wrong or 404 response for any user supplying a custom api_base.
  • The core cancel logic works correctly for the default (no custom proxy) path. However, the custom-proxy retrieve-URL construction is wrong (batch_id missing), the timeout forwarding test is empty, and max_retries is silently dropped. These issues lower confidence significantly.
  • litellm/llms/vertex_ai/batches/handler.py (retrieve URL bug) and tests/test_litellm/llms/vertex_ai/test_vertex_ai_batch_transformation.py (empty test, missing batch_id assertion).

Important Files Changed

Filename Overview
litellm/llms/vertex_ai/batches/handler.py Adds cancel_batch and _async_cancel_batch to VertexAIBatchPrediction. Has a correctness bug: when a custom api_base is provided, the retrieve URL is built by simply stripping :cancel from the proxy base URL, but the batch_id is never included in the custom proxy URL, so the follow-up GET hits the proxy root instead of the specific batch resource.
litellm/batches/main.py Adds vertex_ai branch to cancel_batch / acancel_batch, correctly resolving project, location, and credentials before delegating to vertex_ai_batches_instance. Looks clean.
litellm/proxy/batches_endpoints/endpoints.py Proxy cancel fallback now respects custom-llm-provider request headers/query params (consistent with other endpoints). The cast(Any, ...) workaround and minor response.data refactor are pre-existing issues. Changes look correct.
tests/test_litellm/llms/vertex_ai/test_vertex_ai_batch_transformation.py Adds four new tests, but test_vertex_ai_cancel_batch_forwards_timeout is entirely empty (docstring only, no assertions), and test_vertex_ai_cancel_batch_custom_proxy_retrieve_url does not assert that the batch_id is present in the GET URL, hiding the retrieve-URL bug.

Sequence Diagram

sequenceDiagram
    participant Client
    participant ProxyEndpoint as litellm/proxy/.../endpoints.py
    participant LiteLLMMain as litellm/batches/main.py
    participant VertexHandler as VertexAIBatchPrediction.cancel_batch
    participant VertexAPI as Vertex AI API

    Client->>ProxyEndpoint: POST /v1/batches/{id}/cancel<br/>(custom-llm-provider: vertex_ai header)
    ProxyEndpoint->>ProxyEndpoint: get_custom_llm_provider_from_request_headers()<br/>→ "vertex_ai"
    ProxyEndpoint->>LiteLLMMain: acancel_batch(custom_llm_provider="vertex_ai", ...)
    LiteLLMMain->>LiteLLMMain: resolve vertex_project, vertex_location,<br/>vertex_credentials from env/params
    LiteLLMMain->>VertexHandler: cancel_batch(_is_async=True, ...)
    VertexHandler->>VertexHandler: _ensure_access_token() → Bearer token
    VertexHandler->>VertexHandler: Build cancel URL:<br/>…/batchPredictionJobs/{id}:cancel
    VertexHandler->>VertexAPI: POST …:cancel  {}
    VertexAPI-->>VertexHandler: 200 OK (job queued for cancellation)
    VertexHandler->>VertexAPI: GET …/batchPredictionJobs/{id}
    VertexAPI-->>VertexHandler: BatchPredictionJob (state=JOB_STATE_CANCELLING)
    VertexHandler->>VertexHandler: transform_vertex_ai_batch_response → LiteLLMBatch
    VertexHandler-->>LiteLLMMain: LiteLLMBatch (status="cancelling")
    LiteLLMMain-->>ProxyEndpoint: LiteLLMBatch
    ProxyEndpoint-->>Client: LiteLLMBatch JSON
Loading

Last reviewed commit: "Update tests/test_li..."

Comment on lines +419 to +422
if api_base.endswith(":cancel"):
retrieve_api_base = api_base.removesuffix(":cancel")
else:
retrieve_api_base = api_base.rsplit(":cancel", 1)[0].rstrip("/")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 else branch is unreachable dead code

_check_custom_proxy for Vertex AI always returns "{api_base}:{endpoint}" (i.e., …:cancel), so api_base will always end with :cancel and the else branch is never executed. If the :cancel suffix can genuinely appear in the middle of a custom URL, the else branch handles it correctly, but in practice this path is dead.

More importantly, if _check_custom_proxy were ever to return a URL that contains no :cancel token at all (e.g., due to a future refactor), then:

api_base.rsplit(":cancel", 1)[0]  # returns the full api_base unchanged

…causing retrieve_api_base == api_base (the cancel endpoint). The follow-up GET would then hit the :cancel endpoint instead of the retrieve endpoint, silently returning a wrong response or an error.

Consider adding an explicit guard or an assertion to make this invariant clear:

if api_base.endswith(":cancel"):
    retrieve_api_base = api_base.removesuffix(":cancel")
elif ":cancel" in api_base:
    retrieve_api_base = api_base.rsplit(":cancel", 1)[0].rstrip("/")
else:
    # _check_custom_proxy always appends ":cancel"; reaching here is a bug
    raise ValueError(
        f"cancel_batch: expected api_base to contain ':cancel', got: {api_base!r}"
    )

@@ -654,7 +654,7 @@ async def list_batches(
managed_files_obj, "list_user_batches"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 cast(Any, ...) silences a type error rather than fixing it

Using cast(Any, managed_files_obj) removes all type-checking on the subsequent method call. If managed_files_obj gains a new return type or if list_user_batches is renamed, this cast will silently hide the discrepancy at type-check time.

A better approach would be to use a proper Protocol or TypeVar for managed_files_obj, or at minimum cast it to the specific type that exposes list_user_batches, rather than casting to Any.

…rmation.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Comment on lines +404 to +422

_, api_base = self._check_custom_proxy(
api_base=api_base,
custom_llm_provider="vertex_ai",
gemini_api_key=None,
endpoint="cancel",
stream=None,
auth_header=None,
url=cancel_api_base_default,
model=None,
vertex_project=vertex_project or project_id,
vertex_location=vertex_location or "us-central1",
vertex_api_version="v1",
)

if api_base.endswith(":cancel"):
retrieve_api_base = api_base.removesuffix(":cancel")
else:
retrieve_api_base = api_base.rsplit(":cancel", 1)[0].rstrip("/")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Custom proxy retrieve URL is missing the batch_id

When a caller supplies a custom api_base (e.g. "https://my-proxy.example.com"), _check_custom_proxy constructs:

api_base → "https://my-proxy.example.com:cancel"

Stripping :cancel then yields:

retrieve_api_base → "https://my-proxy.example.com"   # ← batch_id not included

In contrast, when api_base=None the default URL already has the batch_id baked in (…/batchPredictionJobs/{batch_id}:cancel), so the strip works correctly.

The follow-up GET to retrieve_api_base will therefore hit the proxy root — not the specific batch resource — causing the retrieve step to return an unexpected response or 404.

The custom-proxy path should include the batch_id before the :cancel suffix is added, for example by constructing the base path first:

if api_base is not None:
    # Build the per-batch base URL from the custom proxy, then append :cancel
    per_batch_api_base = f"{api_base.rstrip('/')}/{batch_id}"
    cancel_url = f"{per_batch_api_base}:cancel"
else:
    per_batch_api_base = retrieve_api_base_default
    cancel_url = cancel_api_base_default

_, api_base = self._check_custom_proxy(
    api_base=None,   # already handled above
    ...
    url=cancel_url,
    ...
)
retrieve_api_base = per_batch_api_base

The existing test test_vertex_ai_cancel_batch_custom_proxy_retrieve_url does not assert that the batch_id (123456) is present in the get_url, so this bug is not caught today.

Comment on lines +380 to +390
def cancel_batch(
self,
_is_async: bool,
batch_id: str,
api_base: Optional[str],
vertex_credentials: Optional[VERTEX_CREDENTIALS_TYPES],
vertex_project: Optional[str],
vertex_location: Optional[str],
timeout: Union[float, httpx.Timeout],
max_retries: Optional[int],
) -> Union[LiteLLMBatch, Coroutine[Any, Any, LiteLLMBatch]]:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 max_retries parameter is accepted but silently ignored

max_retries is declared as a parameter of cancel_batch (and propagated from litellm/batches/main.py) but is never passed to _get_httpx_client(), get_async_httpx_client(), or used anywhere else in the method body. This is inconsistent with the caller's intent and means retry logic is never applied.

If retry support is intentionally deferred, consider removing the parameter from the signature to avoid a misleading contract, or add a # TODO: comment so it is not forgotten.

Comment on lines +96 to +101
def test_vertex_ai_cancel_batch_forwards_timeout():
"""Test that timeout is forwarded to the POST (cancel) HTTP call.

Note: the follow-up GET (retrieve) call does not accept a timeout
parameter in the underlying HTTP handler, so it is intentionally omitted.
"""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Empty test body — always passes trivially

test_vertex_ai_cancel_batch_forwards_timeout contains only a docstring with no assertions or test logic. The function body is empty (implicit return None), so it will always pass regardless of actual behaviour, providing zero regression protection.

The test should verify that the timeout argument is forwarded to sync_handler.post(...). The implementation should follow the same pattern used in test_vertex_ai_cancel_batch — mock _get_httpx_client and _ensure_access_token, call handler.cancel_batch(...) with a specific timeout value (e.g. 42.0), then assert:

post_call_kwargs = mock_client.return_value.post.call_args.kwargs
assert post_call_kwargs["timeout"] == 42.0

Without this assertion the test provides no guarantee that callers' timeout settings are honoured.

Rule Used: # Code Review Rule: Mock Test Integrity

What:... (source)

@Sameerlite Sameerlite changed the base branch from main to litellm_dev_sameer_16_march_week March 20, 2026 18:02
@Sameerlite Sameerlite merged commit 8d843fd into BerriAI:litellm_dev_sameer_16_march_week Mar 20, 2026
17 of 39 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant