Skip to content

feat: cross-provider service_tier model setting; Anthropic + Gemini API + Vertex Priority PayGo support#4926

Merged
DouweM merged 29 commits intopydantic:mainfrom
markmcd:service_tier
Apr 29, 2026
Merged

feat: cross-provider service_tier model setting; Anthropic + Gemini API + Vertex Priority PayGo support#4926
DouweM merged 29 commits intopydantic:mainfrom
markmcd:service_tier

Conversation

@markmcd
Copy link
Copy Markdown
Contributor

@markmcd markmcd commented Apr 1, 2026

Summary

Supersedes the original Vertex-only google_service_tier design and consolidates the cross-provider service-tier work.

Adds a unified [service_tier][pydantic_ai.settings.ModelSettings.service_tier] field on ModelSettings, mapped to each provider's native service-tier concept where one exists. Provider-specific overrides remain available for values that don't fit the unified set.

This PR consolidates earlier exploration work in #5158 (closed, by @Mawox) and #5094 (closed, by @anatolec — Priority PayGo on Vertex). Their commits are preserved in this branch's history.

Cross-provider mapping

service_tier accepts 'auto' | 'default' | 'flex' | 'priority':

value OpenAI Anthropic Bedrock Google (Gemini API) Google (Vertex AI)
'auto' 'auto' 'auto' (omitted) (omitted) no headers (PT then on-demand)
'default' 'default' 'standard_only' {'type': 'default'} 'standard' no headers (PT then on-demand)
'flex' 'flex' (omitted) {'type': 'flex'} 'flex' header Shared-Request-Type: flex (PT then Flex PayGo)
'priority' 'priority' (omitted) {'type': 'priority'} 'priority' header Shared-Request-Type: priority (PT then Priority PayGo)

Per-provider settings (openai_service_tier, anthropic_service_tier, bedrock_service_tier, google_vertex_service_tier) always take precedence over the unified field, and they're the only way to reach values that aren't in the unified set: Bedrock's 'reserved', Anthropic's 'standard_only' explicit form, and Vertex's full PT-routing matrix ('pt_only', 'on_demand', 'flex_only', 'priority_only', etc.).

'auto' vs 'default' distinction: 'auto' lets the provider decide and may include premium tiers when available (matters for OpenAI's scale credits and Anthropic's priority capacity). 'default' explicitly opts out of those promotions. On Bedrock / Google they're functionally equivalent today, but encoded forward-compatibly through the omit-vs-explicit wire choice.

Vertex AI design choice

The unified 'flex' and 'priority' map to the PT-with-spillover variants (single Shared-Request-Type header, no Request-Type: shared), so Vertex customers with Provisioned Throughput keep using their reserved capacity first. To bypass PT entirely, set google_vertex_service_tier='flex_only' / 'priority_only' directly. Open question with Google for confirmation: when a PT customer exceeds quota with the single-header form, does spillover land in Flex/Priority or in standard PayGo? Empirically (Mawox's reproduction on a zero-PT project, anatolec's #5094 live test) the headers fall through safely; the PT-customer-over-quota case is the only path not yet experimentally confirmed.

Other behavior changes

  • google_service_tier (the original Vertex-only field) is deprecated in favor of google_vertex_service_tier. Reading it emits a DeprecationWarning. The values are unchanged.
  • Adds 'pt_then_priority' and 'priority_only' Vertex routing values (from Implement support for Priority PayGo with VertexAI #5094).
  • Cerebras: openai_service_tier added to the unsupported-settings filter (latent bug — the per-provider field was being forwarded to an API that doesn't accept it).
  • google-genai bumped to >=1.70.0 for the SDK's new ServiceTier enum on the Gemini API.

Test plan

  • Unit + parametrize coverage for the cross-provider mapping on each of OpenAI / Anthropic / Bedrock / Google (GLA + Vertex).
  • DeprecationWarning regression test for google_service_tier.
  • Live API call sweep on Gemini API (google_service_tier = 'default' / 'standard' / 'flex' / 'priority').

@github-actions github-actions Bot added the size: M Medium PR (101-500 weighted lines) label Apr 1, 2026
devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

'pt_only',
'pt_then_flex',
'on_demand',
'flex_only',
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@markmcd Is there any way we could make (at least some of) the same values work for both GLA and Vertex?

cc @ewjoachim

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll wait for the Vertex opinion on this one. The GLA values align with what other major providers do, so I'd prefer to remap the Vertex values back (if that's even feasible?)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the relevant docs:

They control the following headers:

Tier Name X-Vertex-AI-LLM-Request-Type X-Vertex-AI-LLM-Shared-Request-Type Description / Behavior
pt_then_on_demand (Not sent) (Not sent) Default Behavior: Uses PT first. Excess traffic spills over to standard PayGo.
pt_only dedicated (Not sent) Provisioned Only: Uses PT exclusively. Rejects traffic with a 429 error if capacity is exceeded.
pt_then_flex (Not sent) flex Hybrid Flex: Uses PT first. Excess traffic spills over to the lower-cost Flex PayGo tier.
on_demand shared (Not sent) Pure On-demand: Bypasses PT to use standard shared resources at regular rates.
flex_only shared flex Pure Flex: Bypasses PT to use the discounted Flex PayGo tier directly.

We could definitly use standard and flex though it's ambiguous if they should map to equivalent with or without PT. That said, by default, PT is used unless we send X-Vertex-AI-LLM-Request-Type: shared so it could make sense to:

  • Replace pt_then_on_demand -> standard and pt_then_flex -> flex, but then we would need to rename at least on_demand to somthing like on_demand_only or pay_as_you_go_only (to match flex_only`)
  • Or we could treat standard and flex as aliases for pt_then_on_demand and pt_then_flex but keep the non-ambiguous names around, it would mean a bit of extra documentation work to convey this in a very unambiguous manner, but this lets folks write unambiguous code.

priority doesn't match to anything on vertex and pt doesn't match to anything on Google (as far as I can tell), so it will be hard to get anything perfect. Not sure exactly if that's 100% helpful, but if you think I can help further, feel free :)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My vote is for unambiguous options, and then provide the additional aliases.

The GLA impl aligns with, e.g. OpenAI, and while that doesn't happen transparently in this case, it's a nice portability feature. I think providing similar ergonomics for the Vertex values would be positive, as long as that's a reasonable mental model for Vertex customers (hopefully you can make that call @ewjoachim! I don't know that stack well)

If we agree on this, I'll update the PR to make it clear that standard and flex are accepted for vertex as a shim only.

Copy link
Copy Markdown
Collaborator

@DouweM DouweM Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Since we support service tiers now for OpenAI, Bedrock, GLA, and Vertex, it'd be nice to add a new top-level service_tier ModelSetting with a narrow set of values (most likely the OpenAI ones) that we then try to map to providers (i.e. interpret in the model classes) as best we can, with clear documentation (in the docstring) of how each provider interprets them.
  • If we have a narrower set there, we could then rename the google_service_tier field to google_vertex_service_tier (and deprecate the original). Then we may either not need a separate google_gla_service_tier (if the top level service_tier covers all the values), or we can add a new google_gla_service_tier in case granular control is needed.

That way we get the convenience of a single set of values across providers, with the ability to override per-provider values as needed.

I don't have an opinion yet on what the exact top-level service tier values, or mapping to Vertex, should be, but if the above approach makes sense to you I trust either/both of you to be able to come up with some reasonable 😄

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I've updated to take this into account using OpenAI's values as the "default". Provider-specific values take precedence, and supported providers have been updated, including adding mappings from generic to specific where it makes sense to.

Another PR has appeared that also addresses this, #5158, I haven't looked at it, but I'm not precious about keeping mine if that's better.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@markmcd Thanks Mark, I've been working with an agent on consolidating these related PRs so I'll tell it to look at your new changes.

The agent did have a question for you (to pass on to the Vertex team). In its own words:

@Mawox has been picking up the design direction in #5158 (top-level service_tier with per-provider fallbacks as we discussed), and @anatolec
added Priority PayGo values in #5094. I'm folding both into one PR and want to extend the cross-provider mapping to Vertex, rather than keeping the "Vertex ignored" carve-out — with
pt_then_priority now in scope, the mapping looks clean:

  • flex → X-Vertex-AI-LLM-Shared-Request-Type: flex
  • priority → X-Vertex-AI-LLM-Shared-Request-Type: priority
  • default / auto → no headers (PT-then-on-demand default)

Before I commit to that, there's one thing the public docs don't quite spell out: if a project has zero PT quota on the target model/region and we send only the single shared-request-type
header, does the request fall through safely to Flex/Priority PayGo, or does it 429?

@ewjoachim's original writeup describes it as "Uses PT first. Excess traffic spills over to Flex PayGo," and @anatolec saw traffic_type: ON_DEMAND_PRIORITY in a live test — so empirically
it looks safe. But before defaulting every cross-provider service_tier='priority' user through this on Vertex, could you check with the DeepMind / Vertex team?

Specifically:

  1. Zero-PT project + only X-Vertex-AI-LLM-Shared-Request-Type: flex → 429, or Flex PayGo?
  2. Same with priority → 429, or Priority PayGo?
  3. For a project with PT quota, is the spillover destination when these single-header requests exceed PT actually Flex/Priority (not standard on-demand)?

If 1+2 fall through safely we'll go single-header (respects PT for customers who have it, safe for everyone else). If not, we'll also send X-Vertex-AI-LLM-Request-Type: shared to guarantee
no PT dependency at the cost of bypassing PT entirely. I'd rather the former if it's actually safe.

Thanks!

Copy link
Copy Markdown

@Mawox Mawox Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @DouweM — happy to fold this into #5158.

Reproduced Devin's Q1 on a separate zero-PT Vertex project (gemini-3-flash-preview, location='global' — Flex PayGo is preview-only):

pt_then_flex    traffic_type='ON_DEMAND_FLEX'      ← Q1: single Shared-Request-Type: flex on zero-PT
flex_only       traffic_type='ON_DEMAND_FLEX'
pt_only         429, PT quota exceeded             ← zero-PT control ✓

Q2 (priority) isn't on this branch — @anatolec's #5094 already shows the same pattern: pt_then_priorityON_DEMAND_PRIORITY on zero-PT.

Q3 (PT-quota spillover destination) still needs the Vertex team.

If Q3 spills to Flex/Priority: drop the carve-out in #5158, flex/priority → single Shared-Request-Type header, keep google_vertex_service_tier as the escape hatch (needs #5094 folded in first for the priority mapping). If it spills to plain on-demand: keep the carve-out — silent downgrade from priority is worse than requiring the explicit field on Vertex.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Just clarifying that I don't feel I'm sufficiently knowledgeable on the subject to add anything meaningful to what has already been said)

@github-actions github-actions Bot added the feature New feature request, or PR implementing a feature (enhancement) label Apr 2, 2026
devin-ai-integration[bot]

This comment was marked as resolved.

* Fix naming convention comment
* Use Flex for Vertex
* Remove incorrectly supported Groq reference from docstring
devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

markmcd and others added 2 commits April 23, 2026 16:04
Extends `GoogleVertexServiceTier` with `'pt_then_priority'` (PT with Priority
PayGo spillover) and `'priority_only'` (Priority PayGo without PT), mirroring
the existing Flex PayGo pair. Folds pydantic#5094 in so both PayGo tiers land together.
@DouweM
Copy link
Copy Markdown
Collaborator

DouweM commented Apr 23, 2026

@markmcd @Mawox Thanks for working on this!

Pushed four commits on top of 878db9e to consolidate the work across #5094 and #5158:

  1. @anatolec's Priority PayGo values (pt_then_priority, priority_only) on google_vertex_service_tier — folding in Implement support for Priority PayGo with VertexAI #5094 (kept as his commit via --author).
  2. auto = omit consistently on Bedrock and GLA: top-level service_tier='auto' was sending explicit {'type': 'default'} / 'standard', now it properly unsets — matching the ServiceTier docstring so 'auto' can act as a clean override-to-unset for
    inherited settings. Also: Cerebras openai_service_tier added alongside the pre-existing service_tier entry in openai_unsupported_model_settings; bedrock_service_tier docstring clarified to note it is the only way to request 'reserved'.
  3. Fix three tests that were failing on this branch after the google-genai 1.70 bump (file search snapshots now include the file_search_store field; streaming safety-filter mock needed sdk_http_response=None, otherwise the new x-gemini-service-tier
    lookup pulls a Mock into provider_details).
  4. Anthropic service_tier mapping test coverage — addresses the Devin Review finding about parity with the OpenAI/Google/Bedrock tests.

@markmcd Vertex top-level → priority-header mapping still off pending your Q3 answer (PT-customer-over-quota spillover destination). Once confirmed safe, it's a one-line change to map priority → X-Vertex-AI-LLM-Shared-Request-Type: priority the same way flex
already does. (see #4926 (comment))

DouweM added 3 commits April 23, 2026 23:09
…service_tier` filter

- Bedrock and Google GLA now treat top-level `service_tier='auto'` as "omit from
  the request", matching the `ServiceTier` docstring's stated semantics. Both
  providers previously sent an explicit `'default'` / `'standard'` tier, which
  was functionally equivalent but prevented `'auto'` from acting as a clean
  override-to-unset for inherited settings.
- Cerebras: add `openai_service_tier` alongside the pre-existing `service_tier`
  entry in `openai_unsupported_model_settings`, so the per-provider field is
  also filtered out rather than forwarded to an API that doesn't accept it.
- Clarify in the `bedrock_service_tier` docstring that it is the only way to
  request `'reserved'` (which needs a pre-purchased capacity reservation).
…bump

Fixes three tests that failed on `main` after the SDK bump:

- File search snapshots now include the `file_search_store` field the
  1.70 response payload adds for built-in file-search tool returns
  (`test_google_model_file_search_tool`, `_stream`).
- The streaming safety-filter test mock now pins `sdk_http_response=None`
  so the new `x-gemini-service-tier` header lookup on every chunk does
  not pull a `Mock` object into `provider_details` and break the later
  pydantic serialization in `ContentFilterError.body`
  (`test_google_stream_safety_filter`).
Covers the cross-provider `service_tier` → Anthropic request-value mapping
and the `anthropic_service_tier` per-provider override:

- `'auto'` passes through (Anthropic accepts it natively)
- `'default'` maps to `'standard_only'`
- `'flex'` / `'priority'` are silently omitted (not supported by Anthropic)
- `anthropic_service_tier` wins over the top-level `service_tier`

Addresses the Devin Review finding on pydantic#4926 about missing Anthropic
coverage parallel to the existing OpenAI/Google/Bedrock tests.
Comment on lines +593 to +597
service_tier = model_settings.get('anthropic_service_tier') or model_settings.get('service_tier')
if service_tier == 'default':
service_tier = 'standard_only'
elif service_tier not in ('auto', 'standard_only'):
service_tier = OMIT
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: when service_tier is not set at all (neither anthropic_service_tier nor service_tier), the or chain evaluates to None, and then None not in ('auto', 'standard_only') is True, so service_tier gets set to OMIT. This works but is subtle — it would be cleaner to guard with an early if service_tier is None: service_tier = OMIT before the mapping logic, for readability.

devin-ai-integration[bot]

This comment was marked as resolved.

- `test_anthropic_service_tier_mapping`: restructure params so `AnthropicModelSettings`
  is constructed inside the test body. The previous parametrize decorator referenced
  it at module scope, which failed collection on the slim/lowest/pydantic-evals CI
  matrices that don't install the `anthropic` extra (NameError before `pytestmark`
  skipif could apply).
- Vertex logprobs snapshots: pick up the new `log_probability_sum: None` field the
  google-genai 1.70 response now exposes (was failing on `all-extras` matrices).
- Capability schema snapshot: pick up the new `service_tier` field on
  `ModelSettings` that this PR adds.
devin-ai-integration[bot]

This comment was marked as resolved.

DouweM added 3 commits April 23, 2026 23:53
… Bedrock branch

- `_google_vertex_service_tier_headers` now takes `GoogleVertexServiceTier | ServiceTier`
  and uses `assert_never` instead of a defensive `.lower()` + `return {}` fallback.
  All callers already pass typed values; the stringly-typed shim + dead `'standard'`
  branch were a carryover from earlier iterations and left coverage at 99.81%.
- Bedrock: drop the redundant `in ('default', 'flex', 'priority')` inner check.
  `ServiceTier = Literal['auto', 'default', 'flex', 'priority']`, and the outer
  branch already excludes `'auto'`, so the guard was unreachable.
… doc/test fixes

Addresses auto-review bot findings on the prior push:

- `google_service_tier` now emits a `DeprecationWarning` when consulted
  (factored into `_get_deprecated_google_service_tier`, called from both the
  Vertex header path and the GLA service-tier path). Adds a regression test.
- Restore `OpenRouter`, `Cerebras`, and `xAI` in the `thinking` docstring
  'Supported by' list — dropped in the earlier consolidation, all three
  support it through their OpenAI-based implementations.
- Bedrock docs: reflect the actual behavior that `service_tier='auto'` omits
  the `serviceTier` field rather than sending `{'type': 'default'}`, and note
  `'reserved'` is only reachable through `bedrock_service_tier`.
- Switch the Vertex-headers parametrize test + VCR tests to
  `google_vertex_service_tier` so they don't emit the new deprecation warning.
…thropic None-early-return

- Map top-level `service_tier='priority'` to `X-Vertex-AI-LLM-Shared-Request-Type: priority`
  on Vertex AI, symmetric with how `'flex'` already maps. Both stay single-header so
  Provisioned Throughput customers still use PT first; `google_vertex_service_tier='priority_only'`
  is the explicit escape hatch for anyone who wants to skip PT. Addresses the Devin finding
  about the `priority` vs. `flex` asymmetry and the auto-review bot note on `GoogleVertexServiceTier`
  parametrization; adds coverage for both `'flex'` and `'priority'`.
- Extract `_resolve_gla_service_tier` + `_resolve_vertex_service_tier` helpers so
  `_build_content_and_config` no longer needs `# noqa: C901` and each resolution
  is independently testable.
- Anthropic: swap the `or`-chain mapping for an early `None → OMIT` return for readability.
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 23 additional findings in Devin Review.

Open in Devin Review

Comment on lines +696 to +697
elif (unified_tier := model_settings.get('service_tier')) and unified_tier != 'auto':
params['serviceTier'] = {'type': unified_tier}
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Bedrock unified service_tier='default' maps to {'type': 'default'} — verify this is valid

At bedrock.py:696-697, the unified service_tier='default' is wrapped as {'type': 'default'}. The ServiceTierTypeDef accepts Literal['default', 'flex', 'priority', 'reserved'], so 'default' should be valid. However, the Bedrock docs page linked from the docstring should be checked to confirm that 'default' is actually a meaningful tier value (as opposed to just being the absence of a tier selection). The test at test_bedrock.py:678-718 mocks the Bedrock client and verifies the dict structure but doesn't validate against the real API.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +245 to +250
`google_vertex_service_tier`) take precedence over this unified field.

Supported by:

* OpenAI
* Gemini
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be completely off, but it seems strange that we mention "Gemini" here (I believe for GLA?), and google_vertex_service_tier above. I wonder it this bullet list should also mention Vertex (especially since Vertex can be used for non-gemini models)

(also, it looks like this docstring is duplicated from the TypeAlias above, which might lead to either getting out of sync)

then the top-level `service_tier`. Maps `'default'` → `'standard'`; drops any value
that isn't valid for GLA (including `'auto'`, which signals "let the server decide").
"""
raw = _get_deprecated_google_service_tier(model_settings) or model_settings.get('service_tier')
Copy link
Copy Markdown
Contributor

@ewjoachim ewjoachim Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid conflating the provider-specific service-tier and the model_settings service tier is bound to create headaches: at some point, some provider is going to call their default mode "flex" or something like that.

Rather than putting either value in a variable and then handle a GoogleVertexServiceTier | ServiceTier, I think it's much much saner to:

  1. See if a provider-specific (in this case GoogleVertexServiceTier) value is defined. If so, use it.
  2. If not, map the ServiceTier to a GoogleVertexServiceTier
  3. Always handle a GoogleVertexServiceTier

(here I'm saying this for Vertex but that would be the way we handle it for every other provider)

As the zen of python says: "In the face of ambiguity, refuse the temptation to guess."

…m docstring duplication

Per @ewjoachim's review on pydantic#4926: separate the cross-provider mapping from the
Vertex-headers helper so the helper is purely about Vertex routing, and avoid
future ambiguity if another provider's tier values ever collide with the
top-level `ServiceTier` literals.

- `_resolve_vertex_service_tier` now resolves to `GoogleVertexServiceTier` directly,
  using a `_TOP_LEVEL_TO_VERTEX_SERVICE_TIER` lookup for the cross-provider fallback
  (`'flex'` → `'pt_then_flex'`, `'priority'` → `'pt_then_priority'`, etc.).
- `_google_vertex_service_tier_headers` parameter is back to a strict
  `GoogleVertexServiceTier` Literal, with no provider-cross-mapping branches.
- `ServiceTier` TypeAlias docstring slimmed to value semantics only;
  `ModelSettings.service_tier` field now points at the alias instead of repeating
  the value list, and lists "Google (Gemini API and Vertex AI)" rather than just
  "Gemini" (the unified field maps on both Google subsystems now).
devin-ai-integration[bot]

This comment was marked as resolved.

…ence; Vertex unified→header detail

Addresses the "can a reader figure out how this works from the docs" gap and
addresses the auto-review bot's stacklevel feedback.

- `ServiceTier` TypeAlias docstring now carries the canonical cross-provider
  mapping table and the precedence rule (per-provider field wins). The
  `ModelSettings.service_tier` field defers to the alias for value semantics
  to keep the two from drifting.
- `docs/models/google.md`: spell out the unified→Vertex header mapping
  (`'flex'` → `Shared-Request-Type: flex`, `'priority'` → `Shared-Request-Type: priority`,
  `'auto'`/`'default'` → no headers, all PT-with-spillover) instead of "this sets
  the default routing behavior", with a note that bypassing PT requires the
  per-provider field.
- `docs/models/openai.md`/`anthropic.md`/`bedrock.md`: short precedence
  sentence + cleaner mapping description on each.
- Drop deprecation warning `stacklevel` from 3 to 2 — points at the resolver
  rather than the now-unhelpful `_build_content_and_config` caller, and is
  stable across refactors of the request-build pipeline.
@github-actions github-actions Bot added size: L Large PR (501-1500 weighted lines) and removed size: M Medium PR (101-500 weighted lines) labels Apr 28, 2026
@github-actions github-actions Bot closed this Apr 28, 2026
@github-actions
Copy link
Copy Markdown
Contributor

All issues referenced by this PR are already closed. If you believe an issue should be reopened, please comment on it first.

@DouweM DouweM changed the title feat: support service_tier for gemini api feat: cross-provider service_tier model setting Apr 28, 2026
@github-actions
Copy link
Copy Markdown
Contributor

All issues referenced by this PR are already closed. If you believe an issue should be reopened, please comment on it first.

@DouweM DouweM changed the title feat: cross-provider service_tier model setting feat: cross-provider service_tier model setting; Anthropic + Gemini API + Vertex Priority PayGo support Apr 28, 2026
@github-actions
Copy link
Copy Markdown
Contributor

All issues referenced by this PR are already closed. If you believe an issue should be reopened, please comment on it first.

@DouweM DouweM reopened this Apr 28, 2026
DouweM and others added 3 commits April 28, 2026 15:39
# Conflicts:
#	pydantic_ai_slim/pydantic_ai/models/anthropic.py
Adds `_resolve_openai_service_tier` / `_resolve_anthropic_service_tier`
helpers that check the provider-specific override first, then map the
unified `service_tier` to a strictly-typed provider value. Mirrors the
Bedrock + Vertex shape so all four providers handle the unified field
the same way.
Stops conflating the deprecated `google_service_tier` alias with the unified
`service_tier` in `_resolve_gla_service_tier`: each is mapped through the same
lookup table independently, with the alias winning when set. Tightens the
return type to `Literal['standard', 'flex', 'priority'] | None` to mirror the
other provider helpers.
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 23 additional findings in Devin Review.

Open in Devin Review

Comment on lines 67 to 71
'presence_penalty',
'parallel_tool_calls',
'service_tier',
'openai_service_tier',
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Other OpenAI-compatible providers (Groq, xAI, OpenRouter) will silently pass service_tier through to the API

Only Cerebras explicitly marks service_tier and openai_service_tier as unsupported via openai_unsupported_model_settings. Other OpenAI-compatible providers (Groq, xAI, OpenRouter) don't declare these as unsupported, so the _resolve_openai_service_tier function will resolve the unified service_tier and pass it through to those APIs. Whether this is a bug depends on whether those providers support or ignore the service_tier parameter — OpenAI-compatible APIs generally ignore unknown parameters, so this is likely harmless, but it's worth verifying.

(Refers to lines 64-71)

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK to leave as is, I believe.

`GoogleServiceTier` only contains Vertex-shaped values (`pt_then_*`, `*_only`),
so on GLA every alias value falls through the `_GLA_VALUE_MAP` lookup. Replace
the dead branch with a single `_get_deprecated_google_service_tier()` call that
preserves the deprecation warning while keeping the resolver branch-coverable.
@DouweM DouweM merged commit 1b4f906 into pydantic:main Apr 29, 2026
82 of 88 checks passed
Alex-Resch pushed a commit to Alex-Resch/pydantic-ai that referenced this pull request Apr 29, 2026
… API + Vertex Priority PayGo support (pydantic#4926)

Co-authored-by: Anatole Callies <[email protected]>
Co-authored-by: Douwe Maan <[email protected]>
Co-authored-by: Mark McDonnell <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature New feature request, or PR implementing a feature (enhancement) size: L Large PR (501-1500 weighted lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants