Skip to content

feat: add top-level service_tier model setting#5158

Closed
Mawox wants to merge 1 commit intopydantic:mainfrom
Mawox:service-tier-unified
Closed

feat: add top-level service_tier model setting#5158
Mawox wants to merge 1 commit intopydantic:mainfrom
Mawox:service-tier-unified

Conversation

@Mawox
Copy link
Copy Markdown

@Mawox Mawox commented Apr 22, 2026

Summary

Introduces a unified cross-provider service_tier on ModelSettings (values: auto / default / flex / priority) and wires it through OpenAI, Bedrock, and the Google Gemini API. Supersedes / rebuilds #4926 on top of current main, picking up the design direction from @DouweM's review there.

  • New top-level field: ModelSettings.service_tier with a ServiceTier TypeAlias. 'auto' is an explicit "omit" value (useful to override an inherited setting).
  • OpenAI: pass-through. openai_service_tier still wins when set.
  • Bedrock: pass-through for default / flex / priority; auto omits. bedrock_service_tier remains the only way to request reserved.
  • Google (Gemini API / GLA): maps default → 'standard', flex → 'flex', priority → 'priority', auto → omit. Response's x-gemini-service-tier header is surfaced as provider_details['service_tier'] on both non-streaming and streaming responses.
  • Google (Vertex AI): intentionally ignores top-level service_tier. Vertex's PT / Flex PayGo routing has different failure modes (e.g. pt_only → 429 without PT quota) and no direct priority equivalent, so there is no silent cross-map. Use google_vertex_service_tier explicitly on Vertex.

Google settings refactor

  • Split GoogleServiceTier into GoogleGLAServiceTier ('standard' | 'flex' | 'priority') and GoogleVertexServiceTier (existing PT / Flex values).
  • Add per-provider fields: google_gla_service_tier and google_vertex_service_tier on GoogleModelSettings.
  • google_service_tier is kept as a deprecated alias for google_vertex_service_tier; setting it emits a DeprecationWarning when consulted on Vertex. GoogleServiceTier (the type alias) is kept so existing imports continue to type-check.
  • Previously, Vertex PT / Flex headers were sent on every Google request (including GLA, where they are silently ignored server-side). They are now only sent when self.system == 'google-vertex'.

Side fixes

  • Cerebras: added openai_service_tier to openai_unsupported_model_settings so it is correctly filtered alongside the pre-existing service_tier entry. (Latent bug discovered while unifying the setting.)

Dependency

  • google-genai bumped >=1.66.0>=1.70.0 (required for the ServiceTier enum used in GenerateContentConfigDict). The file_search_store additions in the test_google_model_file_search_tool* snapshots are a cosmetic downstream effect of the bump, not a behaviour change.

Test plan

  • uv run ruff format pydantic_ai_slim tests — clean
  • uv run ruff check pydantic_ai_slim tests — clean
  • uv run pyright on all touched source + test files — 0 errors
  • uv run pytest tests/models/test_google.py tests/models/test_openai.py tests/models/test_bedrock.py tests/providers/test_cerebras.py tests/test_settings.py — 488 passed, 21 skipped (API-key gated)
  • Wider sweep: uv run pytest tests/models tests/providers tests/test_agent.py tests/test_streaming.py — 2271 passed, 339 skipped, 3 xfailed (all pre-existing)
  • Live smoke test against Gemini API (gemini-2.5-flash):
    • service_tier='auto' → server returns 'standard' (identical to "nothing set")
    • service_tier='flex' → server returns 'flex'
    • service_tier='flex' + google_gla_service_tier='priority' → server returns 'priority' (per-provider wins)
    • google_service_tier='pt_only' on GLA → silently ignored, no crash, server returns 'standard'

Notes / follow-ups

  • The gateway provider for 'gemini' currently routes through GoogleProvider(vertexai=True), whose .name is 'google-vertex'. With the Vertex refusal, the gateway's Gemini route will ignore top-level service_tier. This is pre-existing behaviour of the gateway mapping and not addressed here; flagging for follow-up if desired.

Replaces the work in #4926 — thanks @markmcd for the initial implementation and design exploration; the Flex PayGo test surface in tests/models/test_google.py is based on tests from that PR.

Introduce a cross-provider `service_tier` field on `ModelSettings` with four
values (`auto` / `default` / `flex` / `priority`) and wire it through the
OpenAI, Bedrock, and Google (Gemini API) models. `auto` is the documented
"omit" value, useful for explicitly overriding an inherited setting.

Google-specific changes:

- Split `GoogleServiceTier` into `GoogleGLAServiceTier` (Gemini API pricing
  tier: `standard`/`flex`/`priority`) and `GoogleVertexServiceTier` (Vertex
  routing: PT/Flex PayGo). `GoogleServiceTier` is kept as a deprecated alias.
- Add per-provider `google_gla_service_tier` and `google_vertex_service_tier`
  fields on `GoogleModelSettings`. `google_service_tier` remains accepted but
  emits a `DeprecationWarning` and forwards to the Vertex field.
- Vertex AI deliberately ignores the top-level `service_tier` — Vertex's
  routing/quota semantics don't map cleanly onto cross-provider
  pricing/latency values (e.g. `priority` has no direct equivalent, and
  silently mapping to `pt_only` would 429 for users without PT quota).
  Use `google_vertex_service_tier` explicitly on Vertex.
- Stop sending Vertex PT/Flex headers when running against the Gemini API
  (they were previously sent and silently ignored server-side).
- Surface the server's `x-gemini-service-tier` response header as
  `provider_details['service_tier']` on both non-streaming and streaming
  responses.

Cerebras fix: add `openai_service_tier` to
`openai_unsupported_model_settings` so it is correctly filtered alongside
the pre-existing `service_tier` entry.

Bump `google-genai` to `>=1.70.0` for the `ServiceTier` enum.

Co-authored-by: Mark McDonald <[email protected]>
@github-actions github-actions Bot added size: M Medium PR (101-500 weighted lines) feature New feature request, or PR implementing a feature (enhancement) labels Apr 22, 2026
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 5 additional findings in Devin Review.

Open in Devin Review

Comment thread pydantic_ai_slim/pydantic_ai/models/bedrock.py
@DouweM
Copy link
Copy Markdown
Collaborator

DouweM commented Apr 23, 2026

(Written by AI, but I do agree with it :) )

Thanks so much @Mawox — your implementation, docs layout, and thorough tests were exactly what the design needed, and the live repro on the Vertex zero-PT spillover was the clincher. Going to consolidate on #4926 since @markmcd is actively moving it
forward and it covers more ground (Anthropic + all four provider doc pages). The differential improvements from this branch (auto = omit semantics, the system == 'google-vertex' header guard, the Cerebras openai_service_tier filter) are going on top
of his branch, along with @anatolec's Priority PayGo values from #5094.

Closing in favour of #4926 — much appreciated.

@DouweM DouweM closed this Apr 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature New feature request, or PR implementing a feature (enhancement) size: M Medium PR (101-500 weighted lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants