feat: cross-provider `service_tier` model setting; Anthropic + Gemini API + Vertex Priority PayGo support by markmcd · Pull Request #4926 · pydantic/pydantic-ai

markmcd · 2026-04-01T12:39:17Z

Summary

Supersedes the original Vertex-only google_service_tier design and consolidates the cross-provider service-tier work.

Adds a unified [service_tier][pydantic_ai.settings.ModelSettings.service_tier] field on ModelSettings, mapped to each provider's native service-tier concept where one exists. Provider-specific overrides remain available for values that don't fit the unified set.

This PR consolidates earlier exploration work in #5158 (closed, by @Mawox) and #5094 (closed, by @anatolec — Priority PayGo on Vertex). Their commits are preserved in this branch's history.

Cross-provider mapping

service_tier accepts 'auto' | 'default' | 'flex' | 'priority':

value	OpenAI	Anthropic	Bedrock	Google (Gemini API)	Google (Vertex AI)
`'auto'`	`'auto'`	`'auto'`	(omitted)	(omitted)	no headers (PT then on-demand)
`'default'`	`'default'`	`'standard_only'`	`{'type': 'default'}`	`'standard'`	no headers (PT then on-demand)
`'flex'`	`'flex'`	(omitted)	`{'type': 'flex'}`	`'flex'`	header `Shared-Request-Type: flex` (PT then Flex PayGo)
`'priority'`	`'priority'`	(omitted)	`{'type': 'priority'}`	`'priority'`	header `Shared-Request-Type: priority` (PT then Priority PayGo)

Per-provider settings (openai_service_tier, anthropic_service_tier, bedrock_service_tier, google_vertex_service_tier) always take precedence over the unified field, and they're the only way to reach values that aren't in the unified set: Bedrock's 'reserved', Anthropic's 'standard_only' explicit form, and Vertex's full PT-routing matrix ('pt_only', 'on_demand', 'flex_only', 'priority_only', etc.).

'auto' vs 'default' distinction: 'auto' lets the provider decide and may include premium tiers when available (matters for OpenAI's scale credits and Anthropic's priority capacity). 'default' explicitly opts out of those promotions. On Bedrock / Google they're functionally equivalent today, but encoded forward-compatibly through the omit-vs-explicit wire choice.

Vertex AI design choice

The unified 'flex' and 'priority' map to the PT-with-spillover variants (single Shared-Request-Type header, no Request-Type: shared), so Vertex customers with Provisioned Throughput keep using their reserved capacity first. To bypass PT entirely, set google_vertex_service_tier='flex_only' / 'priority_only' directly. Open question with Google for confirmation: when a PT customer exceeds quota with the single-header form, does spillover land in Flex/Priority or in standard PayGo? Empirically (Mawox's reproduction on a zero-PT project, anatolec's #5094 live test) the headers fall through safely; the PT-customer-over-quota case is the only path not yet experimentally confirmed.

Other behavior changes

google_service_tier (the original Vertex-only field) is deprecated in favor of google_vertex_service_tier. Reading it emits a DeprecationWarning. The values are unchanged.
Adds 'pt_then_priority' and 'priority_only' Vertex routing values (from Implement support for Priority PayGo with VertexAI #5094).
Cerebras: openai_service_tier added to the unsupported-settings filter (latent bug — the per-provider field was being forwarded to an API that doesn't accept it).
google-genai bumped to >=1.70.0 for the SDK's new ServiceTier enum on the Gemini API.

Test plan

Unit + parametrize coverage for the cross-provider mapping on each of OpenAI / Anthropic / Bedrock / Google (GLA + Vertex).
DeprecationWarning regression test for google_service_tier.
Live API call sweep on Gemini API (google_service_tier = 'default' / 'standard' / 'flex' / 'priority').

…ency

DouweM · 2026-04-01T15:02:38Z

+    'pt_only',
+    'pt_then_flex',
+    'on_demand',
+    'flex_only',


@markmcd Is there any way we could make (at least some of) the same values work for both GLA and Vertex?

cc @ewjoachim

I'll wait for the Vertex opinion on this one. The GLA values align with what other major providers do, so I'd prefer to remap the Vertex values back (if that's even feasible?)

These are the relevant docs:

https://docs.cloud.google.com/vertex-ai/generative-ai/docs/provisioned-throughput/use-provisioned-throughput

https://docs.cloud.google.com/vertex-ai/generative-ai/docs/flex-paygo

They control the following headers:

Tier Name X-Vertex-AI-LLM-Request-Type X-Vertex-AI-LLM-Shared-Request-Type Description / Behavior

pt_then_on_demand (Not sent) (Not sent) Default Behavior: Uses PT first. Excess traffic spills over to standard PayGo.

pt_only dedicated (Not sent) Provisioned Only: Uses PT exclusively. Rejects traffic with a 429 error if capacity is exceeded.

pt_then_flex (Not sent) flex Hybrid Flex: Uses PT first. Excess traffic spills over to the lower-cost Flex PayGo tier.

on_demand shared (Not sent) Pure On-demand: Bypasses PT to use standard shared resources at regular rates.

flex_only shared flex Pure Flex: Bypasses PT to use the discounted Flex PayGo tier directly.

We could definitly use standard and flex though it's ambiguous if they should map to equivalent with or without PT. That said, by default, PT is used unless we send X-Vertex-AI-LLM-Request-Type: shared so it could make sense to:

Replace pt_then_on_demand -> standard and pt_then_flex -> flex, but then we would need to rename at least on_demand to somthing like on_demand_only or pay_as_you_go_only (to match flex_only`)

Or we could treat standard and flex as aliases for pt_then_on_demand and pt_then_flex but keep the non-ambiguous names around, it would mean a bit of extra documentation work to convey this in a very unambiguous manner, but this lets folks write unambiguous code.

priority doesn't match to anything on vertex and pt doesn't match to anything on Google (as far as I can tell), so it will be hard to get anything perfect. Not sure exactly if that's 100% helpful, but if you think I can help further, feel free :)

My vote is for unambiguous options, and then provide the additional aliases.

The GLA impl aligns with, e.g. OpenAI, and while that doesn't happen transparently in this case, it's a nice portability feature. I think providing similar ergonomics for the Vertex values would be positive, as long as that's a reasonable mental model for Vertex customers (hopefully you can make that call @ewjoachim! I don't know that stack well)

If we agree on this, I'll update the PR to make it clear that standard and flex are accepted for vertex as a shim only.

Since we support service tiers now for OpenAI, Bedrock, GLA, and Vertex, it'd be nice to add a new top-level service_tier ModelSetting with a narrow set of values (most likely the OpenAI ones) that we then try to map to providers (i.e. interpret in the model classes) as best we can, with clear documentation (in the docstring) of how each provider interprets them.

If we have a narrower set there, we could then rename the google_service_tier field to google_vertex_service_tier (and deprecate the original). Then we may either not need a separate google_gla_service_tier (if the top level service_tier covers all the values), or we can add a new google_gla_service_tier in case granular control is needed.

That way we get the convenience of a single set of values across providers, with the ability to override per-provider values as needed.

I don't have an opinion yet on what the exact top-level service tier values, or mapping to Vertex, should be, but if the above approach makes sense to you I trust either/both of you to be able to come up with some reasonable 😄

OK I've updated to take this into account using OpenAI's values as the "default". Provider-specific values take precedence, and supported providers have been updated, including adding mappings from generic to specific where it makes sense to.

Another PR has appeared that also addresses this, #5158, I haven't looked at it, but I'm not precious about keeping mine if that's better.

@markmcd Thanks Mark, I've been working with an agent on consolidating these related PRs so I'll tell it to look at your new changes.

The agent did have a question for you (to pass on to the Vertex team). In its own words:

@Mawox has been picking up the design direction in #5158 (top-level service_tier with per-provider fallbacks as we discussed), and @anatolec
added Priority PayGo values in #5094. I'm folding both into one PR and want to extend the cross-provider mapping to Vertex, rather than keeping the "Vertex ignored" carve-out — with
pt_then_priority now in scope, the mapping looks clean:

flex → X-Vertex-AI-LLM-Shared-Request-Type: flex

priority → X-Vertex-AI-LLM-Shared-Request-Type: priority

default / auto → no headers (PT-then-on-demand default)

Before I commit to that, there's one thing the public docs don't quite spell out: if a project has zero PT quota on the target model/region and we send only the single shared-request-type
header, does the request fall through safely to Flex/Priority PayGo, or does it 429?

@ewjoachim's original writeup describes it as "Uses PT first. Excess traffic spills over to Flex PayGo," and @anatolec saw traffic_type: ON_DEMAND_PRIORITY in a live test — so empirically
it looks safe. But before defaulting every cross-provider service_tier='priority' user through this on Vertex, could you check with the DeepMind / Vertex team?

Specifically:

Zero-PT project + only X-Vertex-AI-LLM-Shared-Request-Type: flex → 429, or Flex PayGo?

Same with priority → 429, or Priority PayGo?

For a project with PT quota, is the spillover destination when these single-header requests exceed PT actually Flex/Priority (not standard on-demand)?

If 1+2 fall through safely we'll go single-header (respects PT for customers who have it, safe for everyone else). If not, we'll also send X-Vertex-AI-LLM-Request-Type: shared to guarantee
no PT dependency at the cost of bypassing PT entirely. I'd rather the former if it's actually safe.

Thanks!

Thanks @DouweM — happy to fold this into #5158.

Reproduced Devin's Q1 on a separate zero-PT Vertex project (gemini-3-flash-preview, location='global' — Flex PayGo is preview-only):

pt_then_flex traffic_type='ON_DEMAND_FLEX' ← Q1: single Shared-Request-Type: flex on zero-PT flex_only traffic_type='ON_DEMAND_FLEX' pt_only 429, PT quota exceeded ← zero-PT control ✓

Q2 (priority) isn't on this branch — @anatolec's #5094 already shows the same pattern: pt_then_priority → ON_DEMAND_PRIORITY on zero-PT.

Q3 (PT-quota spillover destination) still needs the Vertex team.

If Q3 spills to Flex/Priority: drop the carve-out in #5158, flex/priority → single Shared-Request-Type header, keep google_vertex_service_tier as the escape hatch (needs #5094 folded in first for the priority mapping). If it spills to plain on-demand: keep the carve-out — silent downgrade from priority is worse than requiring the explicit field on Vertex.

(Just clarifying that I don't feel I'm sufficiently knowledgeable on the subject to add anything meaningful to what has already been said)

* Fix naming convention comment * Use Flex for Vertex * Remove incorrectly supported Groq reference from docstring

Extends `GoogleVertexServiceTier` with `'pt_then_priority'` (PT with Priority PayGo spillover) and `'priority_only'` (Priority PayGo without PT), mirroring the existing Flex PayGo pair. Folds pydantic#5094 in so both PayGo tiers land together.

DouweM · 2026-04-23T23:07:18Z

@markmcd @Mawox Thanks for working on this!

Pushed four commits on top of 878db9e to consolidate the work across #5094 and #5158:

@anatolec's Priority PayGo values (pt_then_priority, priority_only) on google_vertex_service_tier — folding in Implement support for Priority PayGo with VertexAI #5094 (kept as his commit via --author).
auto = omit consistently on Bedrock and GLA: top-level service_tier='auto' was sending explicit {'type': 'default'} / 'standard', now it properly unsets — matching the ServiceTier docstring so 'auto' can act as a clean override-to-unset for
inherited settings. Also: Cerebras openai_service_tier added alongside the pre-existing service_tier entry in openai_unsupported_model_settings; bedrock_service_tier docstring clarified to note it is the only way to request 'reserved'.
Fix three tests that were failing on this branch after the google-genai 1.70 bump (file search snapshots now include the file_search_store field; streaming safety-filter mock needed sdk_http_response=None, otherwise the new x-gemini-service-tier
lookup pulls a Mock into provider_details).
Anthropic service_tier mapping test coverage — addresses the Devin Review finding about parity with the OpenAI/Google/Bedrock tests.

@markmcd Vertex top-level → priority-header mapping still off pending your Q3 answer (PT-customer-over-quota spillover destination). Once confirmed safe, it's a one-line change to map priority → X-Vertex-AI-LLM-Shared-Request-Type: priority the same way flex
already does. (see #4926 (comment))

…service_tier` filter - Bedrock and Google GLA now treat top-level `service_tier='auto'` as "omit from the request", matching the `ServiceTier` docstring's stated semantics. Both providers previously sent an explicit `'default'` / `'standard'` tier, which was functionally equivalent but prevented `'auto'` from acting as a clean override-to-unset for inherited settings. - Cerebras: add `openai_service_tier` alongside the pre-existing `service_tier` entry in `openai_unsupported_model_settings`, so the per-provider field is also filtered out rather than forwarded to an API that doesn't accept it. - Clarify in the `bedrock_service_tier` docstring that it is the only way to request `'reserved'` (which needs a pre-purchased capacity reservation).

…bump Fixes three tests that failed on `main` after the SDK bump: - File search snapshots now include the `file_search_store` field the 1.70 response payload adds for built-in file-search tool returns (`test_google_model_file_search_tool`, `_stream`). - The streaming safety-filter test mock now pins `sdk_http_response=None` so the new `x-gemini-service-tier` header lookup on every chunk does not pull a `Mock` object into `provider_details` and break the later pydantic serialization in `ContentFilterError.body` (`test_google_stream_safety_filter`).

Covers the cross-provider `service_tier` → Anthropic request-value mapping and the `anthropic_service_tier` per-provider override: - `'auto'` passes through (Anthropic accepts it natively) - `'default'` maps to `'standard_only'` - `'flex'` / `'priority'` are silently omitted (not supported by Anthropic) - `anthropic_service_tier` wins over the top-level `service_tier` Addresses the Devin Review finding on pydantic#4926 about missing Anthropic coverage parallel to the existing OpenAI/Google/Bedrock tests.

github-actions · 2026-04-23T23:24:17Z

+        service_tier = model_settings.get('anthropic_service_tier') or model_settings.get('service_tier')
+        if service_tier == 'default':
+            service_tier = 'standard_only'
+        elif service_tier not in ('auto', 'standard_only'):
+            service_tier = OMIT


Minor: when service_tier is not set at all (neither anthropic_service_tier nor service_tier), the or chain evaluates to None, and then None not in ('auto', 'standard_only') is True, so service_tier gets set to OMIT. This works but is subtle — it would be cleaner to guard with an early if service_tier is None: service_tier = OMIT before the mapping logic, for readability.

- `test_anthropic_service_tier_mapping`: restructure params so `AnthropicModelSettings` is constructed inside the test body. The previous parametrize decorator referenced it at module scope, which failed collection on the slim/lowest/pydantic-evals CI matrices that don't install the `anthropic` extra (NameError before `pytestmark` skipif could apply). - Vertex logprobs snapshots: pick up the new `log_probability_sum: None` field the google-genai 1.70 response now exposes (was failing on `all-extras` matrices). - Capability schema snapshot: pick up the new `service_tier` field on `ModelSettings` that this PR adds.

… Bedrock branch - `_google_vertex_service_tier_headers` now takes `GoogleVertexServiceTier | ServiceTier` and uses `assert_never` instead of a defensive `.lower()` + `return {}` fallback. All callers already pass typed values; the stringly-typed shim + dead `'standard'` branch were a carryover from earlier iterations and left coverage at 99.81%. - Bedrock: drop the redundant `in ('default', 'flex', 'priority')` inner check. `ServiceTier = Literal['auto', 'default', 'flex', 'priority']`, and the outer branch already excludes `'auto'`, so the guard was unreachable.

… doc/test fixes Addresses auto-review bot findings on the prior push: - `google_service_tier` now emits a `DeprecationWarning` when consulted (factored into `_get_deprecated_google_service_tier`, called from both the Vertex header path and the GLA service-tier path). Adds a regression test. - Restore `OpenRouter`, `Cerebras`, and `xAI` in the `thinking` docstring 'Supported by' list — dropped in the earlier consolidation, all three support it through their OpenAI-based implementations. - Bedrock docs: reflect the actual behavior that `service_tier='auto'` omits the `serviceTier` field rather than sending `{'type': 'default'}`, and note `'reserved'` is only reachable through `bedrock_service_tier`. - Switch the Vertex-headers parametrize test + VCR tests to `google_vertex_service_tier` so they don't emit the new deprecation warning.

…thropic None-early-return - Map top-level `service_tier='priority'` to `X-Vertex-AI-LLM-Shared-Request-Type: priority` on Vertex AI, symmetric with how `'flex'` already maps. Both stay single-header so Provisioned Throughput customers still use PT first; `google_vertex_service_tier='priority_only'` is the explicit escape hatch for anyone who wants to skip PT. Addresses the Devin finding about the `priority` vs. `flex` asymmetry and the auto-review bot note on `GoogleVertexServiceTier` parametrization; adds coverage for both `'flex'` and `'priority'`. - Extract `_resolve_gla_service_tier` + `_resolve_vertex_service_tier` helpers so `_build_content_and_config` no longer needs `# noqa: C901` and each resolution is independently testable. - Anthropic: swap the `or`-chain mapping for an early `None → OMIT` return for readability.

devin-ai-integration

Devin Review found 1 new potential issue.

View 23 additional findings in Devin Review.

devin-ai-integration · 2026-04-24T00:33:39Z

+            elif (unified_tier := model_settings.get('service_tier')) and unified_tier != 'auto':
+                params['serviceTier'] = {'type': unified_tier}


🚩 Bedrock unified service_tier='default' maps to {'type': 'default'} — verify this is valid

At bedrock.py:696-697, the unified service_tier='default' is wrapped as {'type': 'default'}. The ServiceTierTypeDef accepts Literal['default', 'flex', 'priority', 'reserved'], so 'default' should be valid. However, the Bedrock docs page linked from the docstring should be checked to confirm that 'default' is actually a meaningful tier value (as opposed to just being the absence of a tier selection). The test at test_bedrock.py:678-718 mocks the Bedrock client and verifies the dict structure but doesn't validate against the real API.

Was this helpful? React with 👍 or 👎 to provide feedback.

ewjoachim · 2026-04-25T16:10:51Z

+    `google_vertex_service_tier`) take precedence over this unified field.
+
+    Supported by:
+
+    * OpenAI
+    * Gemini


I might be completely off, but it seems strange that we mention "Gemini" here (I believe for GLA?), and google_vertex_service_tier above. I wonder it this bullet list should also mention Vertex (especially since Vertex can be used for non-gemini models)

(also, it looks like this docstring is duplicated from the TypeAlias above, which might lead to either getting out of sync)

ewjoachim · 2026-04-25T16:18:12Z

+    then the top-level `service_tier`. Maps `'default'` → `'standard'`; drops any value
+    that isn't valid for GLA (including `'auto'`, which signals "let the server decide").
    """
+    raw = _get_deprecated_google_service_tier(model_settings) or model_settings.get('service_tier')


I'm afraid conflating the provider-specific service-tier and the model_settings service tier is bound to create headaches: at some point, some provider is going to call their default mode "flex" or something like that.

Rather than putting either value in a variable and then handle a GoogleVertexServiceTier | ServiceTier, I think it's much much saner to:

See if a provider-specific (in this case GoogleVertexServiceTier) value is defined. If so, use it.

If not, map the ServiceTier to a GoogleVertexServiceTier

Always handle a GoogleVertexServiceTier

(here I'm saying this for Vertex but that would be the way we handle it for every other provider)

As the zen of python says: "In the face of ambiguity, refuse the temptation to guess."

@ewjoachim

…m docstring duplication Per @ewjoachim's review on pydantic#4926: separate the cross-provider mapping from the Vertex-headers helper so the helper is purely about Vertex routing, and avoid future ambiguity if another provider's tier values ever collide with the top-level `ServiceTier` literals. - `_resolve_vertex_service_tier` now resolves to `GoogleVertexServiceTier` directly, using a `_TOP_LEVEL_TO_VERTEX_SERVICE_TIER` lookup for the cross-provider fallback (`'flex'` → `'pt_then_flex'`, `'priority'` → `'pt_then_priority'`, etc.). - `_google_vertex_service_tier_headers` parameter is back to a strict `GoogleVertexServiceTier` Literal, with no provider-cross-mapping branches. - `ServiceTier` TypeAlias docstring slimmed to value semantics only; `ModelSettings.service_tier` field now points at the alias instead of repeating the value list, and lists "Google (Gemini API and Vertex AI)" rather than just "Gemini" (the unified field maps on both Google subsystems now).

…ence; Vertex unified→header detail Addresses the "can a reader figure out how this works from the docs" gap and addresses the auto-review bot's stacklevel feedback. - `ServiceTier` TypeAlias docstring now carries the canonical cross-provider mapping table and the precedence rule (per-provider field wins). The `ModelSettings.service_tier` field defers to the alias for value semantics to keep the two from drifting. - `docs/models/google.md`: spell out the unified→Vertex header mapping (`'flex'` → `Shared-Request-Type: flex`, `'priority'` → `Shared-Request-Type: priority`, `'auto'`/`'default'` → no headers, all PT-with-spillover) instead of "this sets the default routing behavior", with a note that bypassing PT requires the per-provider field. - `docs/models/openai.md`/`anthropic.md`/`bedrock.md`: short precedence sentence + cleaner mapping description on each. - Drop deprecation warning `stacklevel` from 3 to 2 — points at the resolver rather than the now-unhelpful `_build_content_and_config` caller, and is stable across refactors of the request-build pipeline.

github-actions · 2026-04-28T21:04:04Z

All issues referenced by this PR are already closed. If you believe an issue should be reopened, please comment on it first.

github-actions · 2026-04-28T21:04:29Z

All issues referenced by this PR are already closed. If you believe an issue should be reopened, please comment on it first.

github-actions · 2026-04-28T21:14:02Z

All issues referenced by this PR are already closed. If you believe an issue should be reopened, please comment on it first.

# Conflicts: # pydantic_ai_slim/pydantic_ai/models/anthropic.py

Adds `_resolve_openai_service_tier` / `_resolve_anthropic_service_tier` helpers that check the provider-specific override first, then map the unified `service_tier` to a strictly-typed provider value. Mirrors the Bedrock + Vertex shape so all four providers handle the unified field the same way.

Stops conflating the deprecated `google_service_tier` alias with the unified `service_tier` in `_resolve_gla_service_tier`: each is mapped through the same lookup table independently, with the alias winning when set. Tightens the return type to `Literal['standard', 'flex', 'priority'] | None` to mirror the other provider helpers.

devin-ai-integration

Devin Review found 1 new potential issue.

View 23 additional findings in Devin Review.

devin-ai-integration · 2026-04-28T23:19:29Z

            'presence_penalty',
            'parallel_tool_calls',
            'service_tier',
+            'openai_service_tier',
        )


🚩 Other OpenAI-compatible providers (Groq, xAI, OpenRouter) will silently pass service_tier through to the API

Only Cerebras explicitly marks service_tier and openai_service_tier as unsupported via openai_unsupported_model_settings. Other OpenAI-compatible providers (Groq, xAI, OpenRouter) don't declare these as unsupported, so the _resolve_openai_service_tier function will resolve the unified service_tier and pass it through to those APIs. Whether this is a bug depends on whether those providers support or ignore the service_tier parameter — OpenAI-compatible APIs generally ignore unknown parameters, so this is likely harmless, but it's worth verifying.

(Refers to lines 64-71)

Was this helpful? React with 👍 or 👎 to provide feedback.

OK to leave as is, I believe.

`GoogleServiceTier` only contains Vertex-shaped values (`pt_then_*`, `*_only`), so on GLA every alias value falls through the `_GLA_VALUE_MAP` lookup. Replace the dead branch with a single `_get_deprecated_google_service_tier()` call that preserves the deprecation warning while keeping the resolver branch-coverable.

… API + Vertex Priority PayGo support (pydantic#4926) Co-authored-by: Anatole Callies <[email protected]> Co-authored-by: Douwe Maan <[email protected]> Co-authored-by: Mark McDonnell <[email protected]>

markmcd added 5 commits March 30, 2026 17:56

feat: add service_tier for gemini api

584965c

doc: minor doc addition

2dc1355

chore: use google-genai 1.69

ed07d97

feat(gemini): add service_tier support and update google-genai depend…

cea32bf

…ency

docs: revert agent.md in favour of larger doc

d0d90b6

github-actions Bot added the size: M Medium PR (101-500 weighted lines) label Apr 1, 2026

This comment was marked as resolved.

Sign in to view

markmcd added 3 commits April 1, 2026 20:46

chore: bump genai SDK to 1.70

2e475d3

fix: don't send service_tier for vertex

0595983

fix: dont send service_tier=None

28a5c45

This comment was marked as resolved.

Sign in to view

DouweM requested changes Apr 1, 2026

View reviewed changes

fix: dont send service_tier to vertex

f83449d

github-actions Bot added the feature New feature request, or PR implementing a feature (enhancement) label Apr 2, 2026

chore: fix bad tool edit

441ceb9

DouweM mentioned this pull request Apr 15, 2026

Implement support for Priority PayGo with VertexAI #5094

Closed

3 tasks

Merge branch 'main' into service_tier

46b5d5a

Mawox mentioned this pull request Apr 22, 2026

feat: add top-level service_tier model setting #5158

Closed

6 tasks

feat: move service_tier to a cross-model field

6150b68

This comment was marked as resolved.

Sign in to view

chore: test feedback

08e8f7e

* Fix naming convention comment * Use Flex for Vertex * Remove incorrectly supported Groq reference from docstring

This comment was marked as resolved.

Sign in to view

fix: remove vertex header leakage into gla

fb4a20d

This comment was marked as resolved.

Sign in to view

markmcd and others added 2 commits April 23, 2026 16:04

fix: use vertex provider in vertex test

878db9e

DouweM added 3 commits April 23, 2026 23:09

github-actions Bot reviewed Apr 23, 2026

View reviewed changes

github-actions Bot removed the auto-review label Apr 23, 2026

This comment was marked as resolved.

Sign in to view

DouweM added 3 commits April 23, 2026 23:53

devin-ai-integration Bot reviewed Apr 24, 2026

View reviewed changes

ewjoachim reviewed Apr 25, 2026

View reviewed changes

This comment was marked as resolved.

Sign in to view

github-actions Bot added size: L Large PR (501-1500 weighted lines) and removed size: M Medium PR (101-500 weighted lines) labels Apr 28, 2026

github-actions Bot closed this Apr 28, 2026

DouweM changed the title ~~feat: support service_tier for gemini api~~ feat: cross-provider service_tier model setting Apr 28, 2026

DouweM changed the title ~~feat: cross-provider service_tier model setting~~ feat: cross-provider service_tier model setting; Anthropic + Gemini API + Vertex Priority PayGo support Apr 28, 2026

DouweM reopened this Apr 28, 2026

DouweM and others added 3 commits April 28, 2026 15:39

Merge remote-tracking branch 'origin/main' into service-tier

777caee

# Conflicts: # pydantic_ai_slim/pydantic_ai/models/anthropic.py

devin-ai-integration Bot reviewed Apr 28, 2026

View reviewed changes

DouweM merged commit 1b4f906 into pydantic:main Apr 29, 2026
82 of 88 checks passed

Tier Name	`X-Vertex-AI-LLM-Request-Type`	`X-Vertex-AI-LLM-Shared-Request-Type`	Description / Behavior
`pt_then_on_demand`	(Not sent)	(Not sent)	Default Behavior: Uses PT first. Excess traffic spills over to standard PayGo.
`pt_only`	`dedicated`	(Not sent)	Provisioned Only: Uses PT exclusively. Rejects traffic with a 429 error if capacity is exceeded.
`pt_then_flex`	(Not sent)	`flex`	Hybrid Flex: Uses PT first. Excess traffic spills over to the lower-cost Flex PayGo tier.
`on_demand`	`shared`	(Not sent)	Pure On-demand: Bypasses PT to use standard shared resources at regular rates.
`flex_only`	`shared`	`flex`	Pure Flex: Bypasses PT to use the discounted Flex PayGo tier directly.

		elif (unified_tier := model_settings.get('service_tier')) and unified_tier != 'auto':
		params['serviceTier'] = {'type': unified_tier}

Conversation

markmcd commented Apr 1, 2026 • edited by DouweM Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Cross-provider mapping

Vertex AI design choice

Other behavior changes

Test plan

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DouweM Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Mawox Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

DouweM commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ewjoachim Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

Uh oh!

github-actions Bot commented Apr 28, 2026

Uh oh!

github-actions Bot commented Apr 28, 2026

Uh oh!

github-actions Bot commented Apr 28, 2026

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

markmcd commented Apr 1, 2026 •

edited by DouweM

Loading

DouweM Apr 2, 2026 •

edited

Loading

Mawox Apr 23, 2026 •

edited

Loading

DouweM commented Apr 23, 2026 •

edited

Loading

devin-ai-integration Bot Apr 24, 2026 •

edited

Loading

ewjoachim Apr 25, 2026 •

edited

Loading