Implement support for Priority PayGo with VertexAI#5094
Closed
anatolec wants to merge 2 commits intopydantic:mainfrom
Closed
Implement support for Priority PayGo with VertexAI#5094anatolec wants to merge 2 commits intopydantic:mainfrom
anatolec wants to merge 2 commits intopydantic:mainfrom
Conversation
Adds `pt_then_priority` and `priority_only` to `GoogleServiceTier`, mirroring the existing Flex PayGo pattern. Updates the header function, docstrings, parametrized header tests, and the docs page.
adtyavrdhn
approved these changes
Apr 15, 2026
adtyavrdhn
reviewed
Apr 15, 2026
| After a Flex request, you can inspect [`ModelResponse`][pydantic_ai.messages.ModelResponse] `provider_details.get('traffic_type')` (e.g. `ON_DEMAND_FLEX` when Flex was used) if the API returns it. | ||
| Swap `'pt_then_flex'` for any [`GoogleServiceTier`][pydantic_ai.models.google.GoogleServiceTier] value — e.g. `'pt_then_priority'` for [Priority PayGo](https://cloud.google.com/vertex-ai/generative-ai/docs/priority-paygo) spillover, or `'flex_only'` / `'priority_only'` to bypass PT entirely. | ||
|
|
||
| After the request, inspect [`ModelResponse`][pydantic_ai.messages.ModelResponse] `provider_details.get('traffic_type')` to see which tier served it (e.g. `ON_DEMAND_FLEX`, `ON_DEMAND_PRIORITY`) when the API returns it. |
Member
There was a problem hiding this comment.
Should we retain the wording to if the API returns it?
I am not sure why it would not, I am not familiar with this API. @DouweM ?
Collaborator
|
@anatolec Thanks for working on this@ @ewjoachim Can you have a look as well please since you originally contributed the feature? Note also that there's related work happening in #4926. |
Contributor
Author
|
Hey @ewjoachim , Would you please be able to look at this? |
This was referenced Apr 23, 2026
DouweM
pushed a commit
to markmcd/pydantic-ai
that referenced
this pull request
Apr 23, 2026
Extends `GoogleVertexServiceTier` with `'pt_then_priority'` (PT with Priority PayGo spillover) and `'priority_only'` (Priority PayGo without PT), mirroring the existing Flex PayGo pair. Folds pydantic#5094 in so both PayGo tiers land together.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Follow-up to #4312 (Flex PayGo): adds Priority PayGo support to
google_service_tier, mirroring the Flex pattern withpriorityinstead offlex.pt_then_priority→X-Vertex-AI-LLM-Shared-Request-Type: prioritypriority_only→X-Vertex-AI-LLM-Request-Type: shared+X-Vertex-AI-LLM-Shared-Request-Type: priorityHeaders per Google's Priority PayGo docs. Unit tests, docstrings and the docs page updated.
Local validation against live Vertex (click to expand)
Sweep adapted from #4312, run on
gemini-3-flash-previewwithlocation='global':Vertex returns
traffic_type: 'ON_DEMAND_PRIORITY'for both new tiers — mirroringON_DEMAND_FLEXfor Flex — confirming the routing headers take effect end-to-end. As in #4312, I can't prove from a single project that PT is genuinely bypassed inpriority_onlyvs.pt_then_priority.Checklist