docs(vertex): add PayGo/Priority tutorial and cost tracking flow diagramLitellm vertex paygo tutorial#24009
Conversation
Document how to send Vertex Priority PayGo headers and explain how trafficType maps to service-tier pricing in LiteLLM, including an embedded flow diagram for quick understanding. Made-with: Cursor
Made-with: Cursor
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile SummaryThis PR adds a new documentation tutorial (
Confidence Score: 4/5
|
| Filename | Overview |
|---|---|
| docs/my-website/docs/tutorials/vertex_ai_pay_go.md | New tutorial for Vertex AI PayGo/Priority with SDK, proxy config, and pass-through examples; contains a misleading gemini/ prefix model in the supported models list and an incomplete cost table that omits output pricing keys. |
| docs/my-website/sidebars.js | Adds the new tutorial entry under "Spend Tracking" — placement is appropriate given the cost tracking focus of the document. |
| docs/my-website/static/img/vertex_cost_tracking_flow.svg | New 5-step SVG flow diagram illustrating the Priority PayGo cost tracking pipeline; minor issue of missing trailing newline at end of file. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["HTTP Request\nX-Vertex-AI-LLM-Shared-Request-Type: priority"] -->|Vertex AI| B["Vertex Response\nusageMetadata.trafficType = ON_DEMAND_PRIORITY"]
B --> C["LiteLLM stores it\n_hidden_params.provider_specific_fields.traffic_type"]
C --> D["completion_cost()\nMaps traffic_type → service_tier = 'priority'"]
D --> E["Pricing lookup\ninput/output_cost_per_token_priority"]
style A fill:#0c447c,stroke:#85b7eb,color:#b5d4f4
style B fill:#085041,stroke:#5dcaa5,color:#9fe1cb
style C fill:#3c3489,stroke:#afa9ec,color:#cecbf6
style D fill:#633806,stroke:#ef9f27,color:#fac775
style E fill:#712b13,stroke:#f0997b,color:#f5c4b3
Last reviewed commit: "Fix greptile review"
| Send a priority header, get priority queueing, and pay priority token rates. | ||
|
|
||
| :::info Which models support Priority PayGo? | ||
| As of this writing: `gemini/gemini-2.5-pro`, `vertex_ai/gemini-3-pro-preview`, `vertex_ai/gemini-3.1-pro-preview`, `vertex_ai/gemini-3-flash-preview`, and their variants. |
There was a problem hiding this comment.
gemini/ provider listed in Vertex AI PayGo context
The info box lists gemini/gemini-2.5-pro alongside vertex_ai/ models as supporting Priority PayGo. However, the gemini/ prefix routes through Google AI Studio (Gemini API), not Vertex AI. The X-Vertex-AI-LLM-Shared-Request-Type: priority header is a Vertex AI-specific header and will not have the intended effect on requests routed via gemini/. All Priority PayGo examples in this tutorial use the vertex_ai/ prefix, so including gemini/gemini-2.5-pro here will mislead users into thinking they can enable Vertex AI Priority PayGo via the Gemini API provider.
| As of this writing: `gemini/gemini-2.5-pro`, `vertex_ai/gemini-3-pro-preview`, `vertex_ai/gemini-3.1-pro-preview`, `vertex_ai/gemini-3-flash-preview`, and their variants. | |
| As of this writing: `vertex_ai/gemini-2.5-pro`, `vertex_ai/gemini-3-pro-preview`, `vertex_ai/gemini-3.1-pro-preview`, `vertex_ai/gemini-3-flash-preview`, and their variants. |
| | `usageMetadata.trafficType` | `service_tier` | Pricing keys used | | ||
| |---|---|---| | ||
| | `ON_DEMAND` | `None` | `input_cost_per_token` | | ||
| | `ON_DEMAND_PRIORITY` | `"priority"` | `input_cost_per_token_priority` | | ||
| | `FLEX` / `BATCH` | `"flex"` | `input_cost_per_token_flex` | | ||
|
|
||
| If a tier-specific key is missing, LiteLLM falls back to standard pricing keys. |
There was a problem hiding this comment.
Cost table only lists input pricing keys
The "Pricing keys used" column only documents the input_cost_per_token family of keys, but output tokens are also billed at tier-specific rates. Looking at model_prices_and_context_window.json, models like gemini/gemini-2.5-pro define both input_cost_per_token_priority and output_cost_per_token_priority. Showing only input keys may cause users to incorrectly assume output costs are always billed at the standard rate regardless of tier.
Consider updating the column to be more explicit:
usageMetadata.trafficType |
service_tier |
Pricing keys used |
|---|---|---|
ON_DEMAND |
None |
input_cost_per_token, output_cost_per_token |
ON_DEMAND_PRIORITY |
"priority" |
input_cost_per_token_priority, output_cost_per_token_priority |
FLEX / BATCH |
"flex" |
input_cost_per_token_flex, output_cost_per_token_flex |
| <text x="172" y="328" text-anchor="end" dominant-baseline="central" style="fill:rgb(194, 192, 182);stroke:none;color:rgb(255, 255, 255);stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;opacity:1;font-family:"Anthropic Sans", -apple-system, "system-ui", "Segoe UI", sans-serif;font-size:12px;font-weight:400;text-anchor:end;dominant-baseline:central">④</text> | ||
| <text x="172" y="418" text-anchor="end" dominant-baseline="central" style="fill:rgb(194, 192, 182);stroke:none;color:rgb(255, 255, 255);stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;opacity:1;font-family:"Anthropic Sans", -apple-system, "system-ui", "Segoe UI", sans-serif;font-size:12px;font-weight:400;text-anchor:end;dominant-baseline:central">⑤</text> | ||
|
|
||
| </svg> No newline at end of file |
There was a problem hiding this comment.
Missing newline at end of file
The SVG file is missing a trailing newline character. Most linters and git tooling expect files to end with a newline. Add a newline after the closing </svg> tag.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
1104f92
into
BerriAI:litellm_dev_sameer_16_march_week
Summary
add a concise Vertex AI PayGo/Priority tutorial with request examples for SDK, proxy config, and pass-through
document the two relevant Vertex headers (X-Vertex-AI-LLM-Shared-Request-Type and X-Vertex-AI-LLM-Request-Type) with expected behavior
add and embed vertex_cost_tracking_flow.svg in the cost-tracking section for faster onboarding