-
Notifications
You must be signed in to change notification settings - Fork 14.7k
[FEATURE]: Enable prompt caching and cache token tracking for google-vertex-anthropic #20265
Description
Feature hasn't been suggested before.
- I have verified this feature I'm about to request hasn't been suggested before.
Describe the enhancement you want to request
The google-vertex-anthropic provider uses AnthropicMessagesLanguageModel from the AI SDK, which supports Anthropic-style prompt caching via providerOptions.anthropic.cacheControl. The applyCaching() gate in transform.ts catches vertex-anthropic models implicitly via model.api.id.includes("anthropic"), but there's no explicit providerID check for google-vertex-anthropic. This is fragile if the api.id format changes.
For cache token tracking, the Anthropic SDK on Vertex sets response metadata under both the canonical "anthropic" key and a custom "vertex" key (derived from provider string vertex.anthropic.messages). The cacheWriteInputTokens extraction in session/index.ts only checks the "anthropic" key today.
Proposed changes:
- Add
model.providerID === "google-vertex-anthropic"to theapplyCaching()gate condition for explicit, stable matching. - Add
input.metadata?.["vertex"]?.["cacheCreationInputTokens"]to the cache write token extraction chain insession/index.tsas a defensive fallback.
Scope note: Native google-vertex (Gemini) is not included because Gemini uses implicit server-side caching, not Anthropic-style per-message cache breakpoints. Gemini's implicit caching already works without any client-side changes (verified: 97.8% cache hit on second request with identical prefix).
Verified behavior:
- Vertex Anthropic (Claude on Vertex): cache write tokens tracked correctly (46K tokens on a cold request).
- Vertex Gemini (gemini-2.5-flash): implicit caching verified end-to-end via test script. Second request with identical prefix showed
cachedContentTokenCount: 28645out of 29,293 input tokens, correctly normalized tocachedInputTokensby the AI SDK. - Both providers work with
bun run devagainst live Vertex API endpoints.