feat: add top-level service_tier model setting#5158
Closed
Mawox wants to merge 1 commit intopydantic:mainfrom
Closed
feat: add top-level service_tier model setting#5158Mawox wants to merge 1 commit intopydantic:mainfrom
Mawox wants to merge 1 commit intopydantic:mainfrom
Conversation
Introduce a cross-provider `service_tier` field on `ModelSettings` with four values (`auto` / `default` / `flex` / `priority`) and wire it through the OpenAI, Bedrock, and Google (Gemini API) models. `auto` is the documented "omit" value, useful for explicitly overriding an inherited setting. Google-specific changes: - Split `GoogleServiceTier` into `GoogleGLAServiceTier` (Gemini API pricing tier: `standard`/`flex`/`priority`) and `GoogleVertexServiceTier` (Vertex routing: PT/Flex PayGo). `GoogleServiceTier` is kept as a deprecated alias. - Add per-provider `google_gla_service_tier` and `google_vertex_service_tier` fields on `GoogleModelSettings`. `google_service_tier` remains accepted but emits a `DeprecationWarning` and forwards to the Vertex field. - Vertex AI deliberately ignores the top-level `service_tier` — Vertex's routing/quota semantics don't map cleanly onto cross-provider pricing/latency values (e.g. `priority` has no direct equivalent, and silently mapping to `pt_only` would 429 for users without PT quota). Use `google_vertex_service_tier` explicitly on Vertex. - Stop sending Vertex PT/Flex headers when running against the Gemini API (they were previously sent and silently ignored server-side). - Surface the server's `x-gemini-service-tier` response header as `provider_details['service_tier']` on both non-streaming and streaming responses. Cerebras fix: add `openai_service_tier` to `openai_unsupported_model_settings` so it is correctly filtered alongside the pre-existing `service_tier` entry. Bump `google-genai` to `>=1.70.0` for the `ServiceTier` enum. Co-authored-by: Mark McDonald <[email protected]>
3 tasks
Collaborator
|
(Written by AI, but I do agree with it :) ) Thanks so much @Mawox — your implementation, docs layout, and thorough tests were exactly what the design needed, and the live repro on the Vertex zero-PT spillover was the clincher. Going to consolidate on #4926 since @markmcd is actively moving it Closing in favour of #4926 — much appreciated. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Introduces a unified cross-provider
service_tieronModelSettings(values:auto/default/flex/priority) and wires it through OpenAI, Bedrock, and the Google Gemini API. Supersedes / rebuilds #4926 on top of currentmain, picking up the design direction from @DouweM's review there.ModelSettings.service_tierwith aServiceTierTypeAlias.'auto'is an explicit "omit" value (useful to override an inherited setting).openai_service_tierstill wins when set.default/flex/priority;autoomits.bedrock_service_tierremains the only way to requestreserved.default → 'standard',flex → 'flex',priority → 'priority',auto → omit. Response'sx-gemini-service-tierheader is surfaced asprovider_details['service_tier']on both non-streaming and streaming responses.service_tier. Vertex's PT / Flex PayGo routing has different failure modes (e.g.pt_only→ 429 without PT quota) and no directpriorityequivalent, so there is no silent cross-map. Usegoogle_vertex_service_tierexplicitly on Vertex.Google settings refactor
GoogleServiceTierintoGoogleGLAServiceTier('standard' | 'flex' | 'priority') andGoogleVertexServiceTier(existing PT / Flex values).google_gla_service_tierandgoogle_vertex_service_tieronGoogleModelSettings.google_service_tieris kept as a deprecated alias forgoogle_vertex_service_tier; setting it emits aDeprecationWarningwhen consulted on Vertex.GoogleServiceTier(the type alias) is kept so existing imports continue to type-check.self.system == 'google-vertex'.Side fixes
openai_service_tiertoopenai_unsupported_model_settingsso it is correctly filtered alongside the pre-existingservice_tierentry. (Latent bug discovered while unifying the setting.)Dependency
google-genaibumped>=1.66.0→>=1.70.0(required for theServiceTierenum used inGenerateContentConfigDict). Thefile_search_storeadditions in thetest_google_model_file_search_tool*snapshots are a cosmetic downstream effect of the bump, not a behaviour change.Test plan
uv run ruff format pydantic_ai_slim tests— cleanuv run ruff check pydantic_ai_slim tests— cleanuv run pyrighton all touched source + test files — 0 errorsuv run pytest tests/models/test_google.py tests/models/test_openai.py tests/models/test_bedrock.py tests/providers/test_cerebras.py tests/test_settings.py— 488 passed, 21 skipped (API-key gated)uv run pytest tests/models tests/providers tests/test_agent.py tests/test_streaming.py— 2271 passed, 339 skipped, 3 xfailed (all pre-existing)gemini-2.5-flash):service_tier='auto'→ server returns'standard'(identical to "nothing set")service_tier='flex'→ server returns'flex'service_tier='flex' + google_gla_service_tier='priority'→ server returns'priority'(per-provider wins)google_service_tier='pt_only'on GLA → silently ignored, no crash, server returns'standard'Notes / follow-ups
'gemini'currently routes throughGoogleProvider(vertexai=True), whose.nameis'google-vertex'. With the Vertex refusal, the gateway's Gemini route will ignore top-levelservice_tier. This is pre-existing behaviour of the gateway mapping and not addressed here; flagging for follow-up if desired.Replaces the work in #4926 — thanks @markmcd for the initial implementation and design exploration; the Flex PayGo test surface in
tests/models/test_google.pyis based on tests from that PR.