feat: add top-level service_tier model setting by Mawox · Pull Request #5158 · pydantic/pydantic-ai

Mawox · 2026-04-22T08:44:54Z

Summary

Introduces a unified cross-provider service_tier on ModelSettings (values: auto / default / flex / priority) and wires it through OpenAI, Bedrock, and the Google Gemini API. Supersedes / rebuilds #4926 on top of current main, picking up the design direction from @DouweM's review there.

New top-level field: ModelSettings.service_tier with a ServiceTier TypeAlias. 'auto' is an explicit "omit" value (useful to override an inherited setting).
OpenAI: pass-through. openai_service_tier still wins when set.
Bedrock: pass-through for default / flex / priority; auto omits. bedrock_service_tier remains the only way to request reserved.
Google (Gemini API / GLA): maps default → 'standard', flex → 'flex', priority → 'priority', auto → omit. Response's x-gemini-service-tier header is surfaced as provider_details['service_tier'] on both non-streaming and streaming responses.
Google (Vertex AI): intentionally ignores top-level service_tier. Vertex's PT / Flex PayGo routing has different failure modes (e.g. pt_only → 429 without PT quota) and no direct priority equivalent, so there is no silent cross-map. Use google_vertex_service_tier explicitly on Vertex.

Google settings refactor

Split GoogleServiceTier into GoogleGLAServiceTier ('standard' | 'flex' | 'priority') and GoogleVertexServiceTier (existing PT / Flex values).
Add per-provider fields: google_gla_service_tier and google_vertex_service_tier on GoogleModelSettings.
google_service_tier is kept as a deprecated alias for google_vertex_service_tier; setting it emits a DeprecationWarning when consulted on Vertex. GoogleServiceTier (the type alias) is kept so existing imports continue to type-check.
Previously, Vertex PT / Flex headers were sent on every Google request (including GLA, where they are silently ignored server-side). They are now only sent when self.system == 'google-vertex'.

Side fixes

Cerebras: added openai_service_tier to openai_unsupported_model_settings so it is correctly filtered alongside the pre-existing service_tier entry. (Latent bug discovered while unifying the setting.)

Dependency

google-genai bumped >=1.66.0 → >=1.70.0 (required for the ServiceTier enum used in GenerateContentConfigDict). The file_search_store additions in the test_google_model_file_search_tool* snapshots are a cosmetic downstream effect of the bump, not a behaviour change.

Test plan

uv run ruff format pydantic_ai_slim tests — clean
uv run ruff check pydantic_ai_slim tests — clean
uv run pyright on all touched source + test files — 0 errors
uv run pytest tests/models/test_google.py tests/models/test_openai.py tests/models/test_bedrock.py tests/providers/test_cerebras.py tests/test_settings.py — 488 passed, 21 skipped (API-key gated)
Wider sweep: uv run pytest tests/models tests/providers tests/test_agent.py tests/test_streaming.py — 2271 passed, 339 skipped, 3 xfailed (all pre-existing)
Live smoke test against Gemini API (gemini-2.5-flash):
- service_tier='auto' → server returns 'standard' (identical to "nothing set")
- service_tier='flex' → server returns 'flex'
- service_tier='flex' + google_gla_service_tier='priority' → server returns 'priority' (per-provider wins)
- google_service_tier='pt_only' on GLA → silently ignored, no crash, server returns 'standard'

Notes / follow-ups

The gateway provider for 'gemini' currently routes through GoogleProvider(vertexai=True), whose .name is 'google-vertex'. With the Vertex refusal, the gateway's Gemini route will ignore top-level service_tier. This is pre-existing behaviour of the gateway mapping and not addressed here; flagging for follow-up if desired.

Replaces the work in #4926 — thanks @markmcd for the initial implementation and design exploration; the Flex PayGo test surface in tests/models/test_google.py is based on tests from that PR.

Introduce a cross-provider `service_tier` field on `ModelSettings` with four values (`auto` / `default` / `flex` / `priority`) and wire it through the OpenAI, Bedrock, and Google (Gemini API) models. `auto` is the documented "omit" value, useful for explicitly overriding an inherited setting. Google-specific changes: - Split `GoogleServiceTier` into `GoogleGLAServiceTier` (Gemini API pricing tier: `standard`/`flex`/`priority`) and `GoogleVertexServiceTier` (Vertex routing: PT/Flex PayGo). `GoogleServiceTier` is kept as a deprecated alias. - Add per-provider `google_gla_service_tier` and `google_vertex_service_tier` fields on `GoogleModelSettings`. `google_service_tier` remains accepted but emits a `DeprecationWarning` and forwards to the Vertex field. - Vertex AI deliberately ignores the top-level `service_tier` — Vertex's routing/quota semantics don't map cleanly onto cross-provider pricing/latency values (e.g. `priority` has no direct equivalent, and silently mapping to `pt_only` would 429 for users without PT quota). Use `google_vertex_service_tier` explicitly on Vertex. - Stop sending Vertex PT/Flex headers when running against the Gemini API (they were previously sent and silently ignored server-side). - Surface the server's `x-gemini-service-tier` response header as `provider_details['service_tier']` on both non-streaming and streaming responses. Cerebras fix: add `openai_service_tier` to `openai_unsupported_model_settings` so it is correctly filtered alongside the pre-existing `service_tier` entry. Bump `google-genai` to `>=1.70.0` for the `ServiceTier` enum. Co-authored-by: Mark McDonald <[email protected]>

devin-ai-integration

Devin Review found 1 potential issue.

View 5 additional findings in Devin Review.

DouweM · 2026-04-23T23:02:48Z

(Written by AI, but I do agree with it :) )

Thanks so much @Mawox — your implementation, docs layout, and thorough tests were exactly what the design needed, and the live repro on the Vertex zero-PT spillover was the clincher. Going to consolidate on #4926 since @markmcd is actively moving it
forward and it covers more ground (Anthropic + all four provider doc pages). The differential improvements from this branch (auto = omit semantics, the system == 'google-vertex' header guard, the Cerebras openai_service_tier filter) are going on top
of his branch, along with @anatolec's Priority PayGo values from #5094.

Closing in favour of #4926 — much appreciated.

github-actions Bot added size: M Medium PR (101-500 weighted lines) feature New feature request, or PR implementing a feature (enhancement) labels Apr 22, 2026

devin-ai-integration Bot reviewed Apr 22, 2026

View reviewed changes

Comment thread pydantic_ai_slim/pydantic_ai/models/bedrock.py

markmcd mentioned this pull request Apr 23, 2026

feat: cross-provider service_tier model setting; Anthropic + Gemini API + Vertex Priority PayGo support #4926

Merged

3 tasks

DouweM closed this Apr 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add top-level service_tier model setting#5158

feat: add top-level service_tier model setting#5158
Mawox wants to merge 1 commit intopydantic:mainfrom
Mawox:service-tier-unified

Mawox commented Apr 22, 2026

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

Uh oh!

DouweM commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Mawox commented Apr 22, 2026

Summary

Google settings refactor

Side fixes

Dependency

Test plan

Notes / follow-ups

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

DouweM commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants