Available Models

Input: $2.50/MOutput: $15.00/MContext: 1M

Most capable and efficient frontier model with 1M context, native computer use, and thinking mode

GPT-5.4 Pro

Input: $30.00/MOutput: $180.00/MContext: 1M

Premium GPT-5.4 with maximum compute for the hardest problems

GPT-5.3

Input: $1.75/MOutput: $14.00/MContext: 128K

High intelligence with medium speed. Multimodal with vision, function calling, and structured outputs

GPT-5.2

Input: $1.75/MOutput: $14.00/MContext: 400K

Frontier model with 400K context and adaptive reasoning

GPT-5.4 Mini

Input: $0.75/MOutput: $4.50/MContext: 400K

Strongest mini model for coding, computer use, and subagents with GPT-5.4 capabilities

GPT-5 Mini

Input: $0.25/MOutput: $2.00/MContext: 200K

Cost-optimized reasoning and chat

GPT-5.4 Nano

Input: $0.20/MOutput: $1.25/MContext: 1M

Fastest and most affordable GPT-5.4 model for high-throughput tasks

GPT-5.2 Pro

Input: $21.00/MOutput: $168.00/MContext: 400K

Uses more compute for consistently better answers

GPT-5.3 Codex

Input: $1.75/MOutput: $14.00/MContext: 400K

Industry-leading agentic coding model. 400K context, reasoning, tool use, and complex execution

o1

Input: $15.00/MOutput: $60.00/MContext: 200K

Advanced reasoning model for complex tasks

o1-mini

Input: $1.10/MOutput: $4.40/MContext: 128K

Fast reasoning model optimized for STEM

o3

Input: $2.00/MOutput: $8.00/MContext: 200K

Latest reasoning model with improved performance

o3-mini

Input: $1.10/MOutput: $4.40/MContext: 128K

Efficient reasoning model for STEM tasks

Price: $0.0010/requestContext: 128K

GPT-OSS 20B

openaiTestnet

Open-weight 20B model (Apache 2.0), similar performance to o3-mini. Available on testnet for developer testing.

Price: $0.0020/requestContext: 128K

GPT-OSS 120B

openaiTestnet

Open-weight 120B model (Apache 2.0), flagship open model. Available on testnet for developer testing.

Claude Haiku 4.5

Input: $1.00/MOutput: $5.00/MContext: 200K

Fastest and most efficient Claude, near-frontier intelligence

Claude Sonnet 4.6

Input: $3.00/MOutput: $15.00/MContext: 200K

Best balance of intelligence, speed, and cost

Claude Opus 4.5

Input: $5.00/MOutput: $25.00/MContext: 200K

Latest Anthropic flagship with enhanced reasoning and creativity

Claude Opus 4.6

Input: $5.00/MOutput: $25.00/MContext: 200K

Latest flagship Claude with extended 64k output, vision, and advanced reasoning

Gemini 3.1 Pro

Input: $2.00/MOutput: $12.00/MContext: 1M

Latest Gemini with improved thinking, token efficiency, and agentic capabilities. Optimized for software engineering (requires new SDK)

Gemini 3 Pro Preview

Input: $2.00/MOutput: $12.00/MContext: 1M

Flagship frontier model for high-precision multimodal reasoning

Gemini 3 Flash Preview

Input: $0.50/MOutput: $3.00/MContext: 1M

Frontier-class performance with Pro-level intelligence at Flash speed and pricing. Includes thinking mode (requires new SDK)

Gemini 2.5 Pro

Input: $1.25/MOutput: $10.00/MContext: 1M

State-of-the-art for reasoning, coding, and mathematics

Gemini 2.5 Flash

Input: $0.30/MOutput: $2.50/MContext: 1M

Fast and efficient Gemini model with vision support

Gemini 3.1 Flash Lite

Input: $0.25/MOutput: $1.50/MContext: 1M

Ultra-fast and lightweight Gemini 3.1 model with thinking mode for high-throughput tasks

Gemini 2.5 Flash Lite

Input: $0.10/MOutput: $0.40/MContext: 1M

Most economical Gemini model - ultra-fast and lightweight (requires new SDK)

Input: $0.28/MOutput: $0.42/MContext: 128K

DeepSeek V3.2 Chat

deepseek

DeepSeek V3.2 non-thinking mode, excellent for chat and coding

Input: $0.28/MOutput: $0.42/MContext: 128K

DeepSeek V3.2 Reasoner

deepseek

DeepSeek V3.2 thinking mode for complex reasoning tasks

Price: $0.0010/requestContext: 200K

GLM-5

zai

Z.AI's flagship foundation model with 200K context. Strong reasoning and agentic capabilities

Price: $0.0010/requestContext: 200K

GLM-5 Turbo

zai

Optimized GLM-5 variant with faster inference

Input: $0.30/MOutput: $1.20/MContext: 205K

MiniMax M2.7

minimax

MiniMax's flagship reasoning model with recursive self-improvement. Great value for complex tasks (~60 tps)

GPT-OSS 120B (Free)

Input: Free/MOutput: Free/MContext: 128K

OpenAI's open-weight 120B model hosted free by NVIDIA. Apache 2.0 license, great for experimentation

GPT-OSS 20B (Free)

Input: Free/MOutput: Free/MContext: 128K

OpenAI's open-weight 20B model hosted free by NVIDIA. Fast and efficient for simpler tasks

Input: $0.60/MOutput: $3.00/MContext: 262K

Kimi K2.5

nvidia

Moonshot's flagship MoE model (1T params) hosted by NVIDIA. Vision and agentic capabilities

Nemotron Ultra 253B (Free)

NVIDIA's flagship 253B reasoning model. Strong on math, coding, and instruction following

Nemotron 3 Super 120B (Free)

NVIDIA MoE model (12B active params) with thinking mode. Fast and capable reasoning

Nemotron Super 49B (Free)

NVIDIA Nemotron 49B with thinking mode. Good balance of speed and reasoning quality

DeepSeek V3.2 (Free)

DeepSeek's latest V3.2 MoE model hosted free by NVIDIA. Same quality, zero cost

Mistral Large 3 675B (Free)

Mistral's flagship 675B model hosted free by NVIDIA. Largest Mistral model ever released

Qwen3 Coder 480B (Free)

Qwen's 480B MoE coding model (35B active) hosted by NVIDIA. Optimized for code generation

Devstral 2 123B (Free)

Mistral's 123B coding-focused model hosted free by NVIDIA. Strong code and instruction following

GLM-4.7 (Free)

Zhipu AI's GLM-4.7 with thinking mode hosted by NVIDIA. Unique Chinese AI lab model

Llama 4 Maverick (Free)

Meta's Llama 4 Maverick MoE (17B x 128 experts) hosted free by NVIDIA