Together AI - Pricing

Serverless Inference

Text & Vision Models

State-of-the-art language and multimodal models.

Price per 1M tokens

Batch API price

Model	Input	Output
Llama 4 Maverick Llama	$0.27	$0.85
Llama 4 Scout Llama	$0.18	$0.59
Llama 3.3 70B Llama	$0.88	$0.88
Llama 3.2 3B Instruct Turbo Llama	$0.06	$0.06
Llama 3.1 405B Llama	$3.50	$3.50
Llama 3.1 70B Llama	$0.88	$0.88
Llama 3.1 8B Llama	$0.18	$0.18
Llama 3 8B Instruct Lite Llama	$0.10	$0.10
Llama 3 70B Instruct Reference Llama	$0.88	$0.88
LLaMA-2 Llama	$0.90	$0.90
DeepSeek-R1-0528 DeepSeek	$3.00	$7.00
DeepSeek R1 Distilled Qwen 14B DeepSeek	$0.18	$0.18
DeepSeek R1 Distilled Llama 70B DeepSeek	$2.00	$2.00
DeepSeek-R1-0528 Throughput DeepSeek	$0.55	$2.19
DeepSeek-V3.1 DeepSeek	$0.60	$1.25
DeepSeek-V3-0324 DeepSeek	$1.25	$1.25
gpt-oss-120B GPT-OSS	$0.15	$0.60
gpt-oss-20B GPT-OSS	$0.05	$0.20
Qwen3-Next-80B-A3B-Instruct Qwen	$0.15	$1.50
Qwen3-Next-80B-A3B-Thinking Qwen	$0.15	$1.50
Qwen3-VL-32B-Instruct Qwen	$0.50	$1.50
Qwen3-Coder 480B A35B Instruct Qwen	$2.00	$2.00
Qwen3 235B A22B Thinking 2507 FP8 Qwen	$0.65	$3.00
Qwen3 235B A22B FP8 Throughput Qwen	$0.20	$0.60
Qwen2.5 72B Qwen	$1.20	$1.20
Qwen2.5 Coder 32B Instruct Qwen	$0.80	$0.80
Qwen2.5-VL 72B Instruct Qwen	$1.95	$8
Qwen2.5 7B Instruct Turbo Qwen	$0.30	$0.30
Qwen QwQ-32B Qwen	$1.20	$1.20
GLM-4.7 GLM	$0.45	$2.00
Kimi K2 Instruct Kimi	$1.00	$3.00
GLM-4.6 GLM	$0.60	$2.20
GLM-4.5-Air GLM	$0.20	$1.10
Mistral (7B) Instruct v0.2 Mistral	$0.20	$0.20
Mistral Instruct Mistral	$0.20	$0.20
Mistral Small 3 Mistral	$0.80	$0.80
Arcee AI AFM-4.5B AFM	$0.10	$0.40
Arcee AI Coder-Large Arcee	$0.50	$0.80
Arcee AI Maestro Arcee	$0.90	$3.30
Arcee AI Virtuoso-Large Arcee	$0.75	$1.20
Cogito v2 preview - 109B MoE Cogito	$0.18	$0.59
Cogito v2 preview - 405B Cogito	$3.50	$3.50
Cogito v2 preview - 671B MoE Cogito	$1.25	$1.25
Cogito v2 preview - 70B Cogito	$0.88	$0.88
Refuel LLM-2	$0.60	$0.60
Refuel LLM-2 Small	$0.20	$0.20
Gemma 3n E4B Instruct Gemma	$0.02	$0.04

Looks like there are no models for this filter.

Displayed prices refer to the lowest resolution/duration settings. Actual prices might vary.

Image Models

Generate stunning visuals with the latest and greatest image models.

Price per MP

Model	Input	Images Per $1 (1MP)	Default steps
FLUX.1 Krea [dev]	$0.025	-	28
FLUX.1 Kontext [dev]	$0.025	-	28
FLUX.1 Kontext [pro]	$0.04	-	28
FLUX.1 Kontext [max]	$0.08	-	28
FLUX1.1 [pro]	$0.04	-	-
FLUX.1 [dev]	$0.025	-	28
FLUX.1 [schnell]	$0.0027	-	4
FLUX.1 Canny [pro]	$0.05	-	-
Google Imagen 4.0 Preview	$0.04	-	-
Google Imagen 4.0 Fast	$0.02	-	-
Google Imagen 4.0 Ultra	$0.06	-	-
Gemini Flash Image 2.5 (Nano Banana)	$0.039	-	-
ByteDance Seedream 3.0	$0.018	-	-
ByteDance Seedream 4.0	$0.03	-	-
ByteDance SeedEdit	$0.03	-	-
Qwen Image Edit	$0.0032	-	-
Qwen Image	$0.0058	-	-
Juggernaut Pro Flux	$0.0049	-	-
Juggernaut Lightning Flux	$0.0017	-	-
HiDream-I1-Full	$0.009	-	-
HiDream-I1-Dev	$0.0045	-	-
HiDream-I1-Fast	$0.0032	-	-
Ideogram 3.0	$0.06	-	-
Dreamshaper	$0.0006	-	-
SD XL	$0.0019	-	-
Stable Diffusion 3	$0.0019	-	-

Prices include default steps shown above. Additional costs apply only when exceeding default steps. See full pricing details →

Audio Models

Speech synthesis and processing models.

Price per 1M Characters

Model	Price
Cartesia Sonic-2	$65.00

Video Models

Use our video generation API to create high-quality videos.

Price per video

Model	Price
MiniMax 01 Director	$0.28
MiniMax Hailuo 02	$0.49
Google Veo 2.0	$2.50
Google Veo 3.0	$1.60
Google Veo 3.0 + Audio	$3.20
Google Veo 3.0 Fast	$0.80
Google Veo 3.0 Fast + Audio	$1.20
ByteDance Seedance 1.0 Lite	$0.14
ByteDance Seedance 1.0 Pro	$0.57
PixVerse v5	$0.30
Kling 2.1 Master	$0.92
Kling 2.1 Standard	$0.18
Kling 2.1 Pro	$0.32
Kling 2.0 Master	$0.92
Kling 1.6 Standard	$0.19
Kling 1.6 Pro	$0.32
Wan 2.2 I2V	$0.31
Wan 2.2 T2V	$0.66
Vidu 2.0	$0.28
Vidu Q1	$0.22
Sora 2	$0.80
Sora 2 Pro	$2.40

Transcription Models

Models for automatic speech recognition (ASR) and speech translation.

Price per audio minute

Batch API price

Model	Price
Whisper Large v3	$0.0015

Embedding Models

Vector embeddings for semantic search and RAG.

Price per 1M tokens

Model	Price
BGE-Base-EN v1.5	$0.01
BGE-Large-EN v1.5	$0.02
GTE ModernBERT base	$0.08
Multilingual e5 large instruct	$0.02
M2-BERT 80M 32K Retrieval	$0.01

Rerank Models

Improve search relevance with reranking models.

Price per 1M tokens

Model	Price
Mxbai Rerank Large V2	$0.10
Salesforce LlamaRank	$0.10

Moderation Models

Filter and classify content for safety and compliance.

Price per 1M tokens

Model	Price
VirtueGuard Text Lite	$0.20
Llama Guard 4 12B	$0.20
Llama Guard 3 11B Vision Turbo	$0.18
Llama Guard 3 8B	$0.20
Llama Guard 2 8B	$0.20

Dedicated Endpoints

Deploy models on custom hardware with guaranteed performance and full control.

Single-tenant GPU instances with:

Guaranteed performance (no sharing)
Support for custom models
Autoscaling & traffic spike handling

Hardware Type	Price/Hour
1x H200 141GB	$4.99
1x H100 80GB	$3.36
1x A100 SXM 80GB	$2.56
1x A100 SXM 40GB	$2.40
1x A100 PCIe 80GB	$2.40
1x L40S 48GB	$2.10

Fine-tuning

Standard pricing

Price per 1M tokens

	Supervised Fine-Tuning		Direct Preference Optimization
Size	LoRA	Full Fine-Tuning	LoRA	Full Fine-Tuning
Up to 16B	$0.48	$0.54	$1.20	$1.35
17B-69B	$1.50	$1.65	$3.75	$4.12
70-100B	$2.90	$3.20	$7.25	$8.00

Price is based on the sum of tokens processed in the fine-tuning training dataset (training dataset size * number of epochs) plus any tokens in the optional evaluation dataset (validation dataset size * number of evaluations).

Specialized pricing

Fine-tuning for the models below incurs minimum charges and is limited to LoRA fine-tuning.

Price per 1M tokens

Model	Supervised Fine-Tuning (LoRA)	Direct Preference Optimization (LoRA)	Minimum charge
DeepSeek-R1 DeepSeek-R1-0528 DeepSeek-V3 DeepSeek-V3-0324 DeepSeek-V3.1 DeepSeek-V3.1-Base	$10.00	$25.00	$20.00
GLM-4.6 GLM-4.7	$9.00	$22.50	$27.00
gpt-oss-120B	$5.00	$12.50	$6.00
Kimi K2 Thinking Kimi K2 Instruct-0905 Kimi K2 Instruct Kimi K2 Base	$15.00	$37.50	$60.00
Llama 4 Maverick Llama 4 Maverick Instruct	$8.00	$20.00	$16.00
Llama 4 Scout Llama 4 Scout Instruct	$3.00	$7.50	$6.00
Qwen3-Coder-480B-A35B-Instruct	$9.00	$22.50	$18.00
Qwen3-235B-A22B Qwen3-235B-A22B-Instruct-2507	$6.00	$15.00	No min price

Code Execution

Code Sandbox

Customize a deployment of VM sandboxes for large development environments.

Price per hour

	Price
Per vCPU	$0.0446
Per GiB RAM	$0.0149

Code Interpreter

Execute LLM-generated code securely using our API.

Price per session

	Price
Session (60 minutes)	$0.03

GPU Cloud

All Together Instant and Reserved Clusters feature:

Choice of Kubernetes or Slurm on Kubernetes
Free network ingress and egress
NVIDIA InfiniBand and NVLink networking

All Together Instant and Reserved Clusters feature: choice of Kubernetes or Slurm on Kubernetes, free network ingress and egress, NVIDIA InfiniBand, and NVLink networking.

Instant Clusters

Ready to use, self-service GPUs.

Price per hour per GPU

Hardware	1 Week - 3 Months	1 - 6 Days	Hourly
NVIDIA HGX H100 SXM	$2.20	$2.50	$2.99
NVIDIA HGX H200	$3.15	$3.45	$3.79
NVIDIA HGX B200	$4.50	$4.90	$5.50

Reserved Clusters

Dedicated capacity, with expert support.

Price per hour

Hardware	GPU Memory	Price
NVIDIA GB200 NVL72	384GB HBM3e	Contact us
NVIDIA B200	192GB HBM3e	Contact us
NVIDIA H200	141GB HBM3e	Starting at $2.09
NVIDIA H100	80GB HBM2e	Starting at $1.75
NVIDIA A100	80GB HBM2e	Starting at $1.30

Frontier AI Factory

Large-scale, custom-built private GPU clusters. 1K → 10K → 100K+ NVIDIA GPUs.

NVIDIA Blackwell GPUs at scale

Talk to our team of experts to get a custom quote for your AI Factory project plan.

Request a project plan

Storage

High-bandwidth, parallel filesystem colocated with your compute.

Item	Price	Unit
Shared Filesystem	$0.16	GiB/month

Interested in a custom large-scale deployment?

Talk to an expert

Serverless Inference

Fine-tuning

Code Execution

GPU Cloud

Interested in a custom large-scale deployment?

Subscribe to newsletter