Serverless Inference

Text & Vision Models

State-of-the-art language and multimodal models.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Price per 1M tokens

Batch API price

Model

Input

Output

Llama 4 Maverick

Llama

$0.27

$0.85

Llama 4 Scout

Llama

$0.18

$0.59

Llama 3.3 70B

Llama

$0.88

$0.88

Llama 3.2 3B Instruct Turbo

Llama

$0.06

$0.06

Llama 3.1 405B

Llama

$3.50

$3.50

Llama 3.1 70B

Llama

$0.88

$0.88

Llama 3.1 8B

Llama

$0.18

$0.18

Llama 3 8B Instruct Lite

Llama

$0.10

$0.10

Llama 3 70B Instruct Reference

Llama

$0.88

$0.88

LLaMA-2

Llama

$0.90

$0.90

DeepSeek-R1-0528

DeepSeek

$3.00

$7.00

DeepSeek R1 Distilled Qwen 14B

DeepSeek

$0.18

$0.18

DeepSeek R1 Distilled Llama 70B

DeepSeek

$2.00

$2.00

DeepSeek-R1-0528 Throughput

DeepSeek

$0.55

$2.19

DeepSeek-V3.1

DeepSeek

$0.60

$1.25

DeepSeek-V3-0324

DeepSeek

$1.25

$1.25

gpt-oss-120B

GPT-OSS

$0.15

$0.60

gpt-oss-20B

GPT-OSS

$0.05

$0.20

Qwen3-Next-80B-A3B-Instruct

Qwen

$0.15

$1.50

Qwen3-Next-80B-A3B-Thinking

Qwen

$0.15

$1.50

Qwen3-VL-32B-Instruct

Qwen

$0.50

$1.50

Qwen3-Coder 480B A35B Instruct

Qwen

$2.00

$2.00

Qwen3 235B A22B Thinking 2507 FP8

Qwen

$0.65

$3.00

Qwen3 235B A22B FP8 Throughput

Qwen

$0.20

$0.60

Qwen2.5 72B

Qwen

$1.20

$1.20

Qwen2.5 Coder 32B Instruct

Qwen

$0.80

$0.80

Qwen2.5-VL 72B Instruct

Qwen

$1.95

$8

Qwen2.5 7B Instruct Turbo

Qwen

$0.30

$0.30

Qwen QwQ-32B

Qwen

$1.20

$1.20

GLM-4.7

GLM

$0.45

$2.00

Kimi K2 Instruct

Kimi

$1.00

$3.00

GLM-4.6

GLM

$0.60

$2.20

GLM-4.5-Air

GLM

$0.20

$1.10

Mistral (7B) Instruct v0.2

Mistral

$0.20

$0.20

Mistral Instruct

Mistral

$0.20

$0.20

Mistral Small 3

Mistral

$0.80

$0.80

Arcee AI AFM-4.5B

AFM

$0.10

$0.40

Arcee AI Coder-Large

Arcee

$0.50

$0.80

Arcee AI Maestro

Arcee

$0.90

$3.30

Arcee AI Virtuoso-Large

Arcee

$0.75

$1.20

Cogito v2 preview - 109B MoE

Cogito

$0.18

$0.59

Cogito v2 preview - 405B

Cogito

$3.50

$3.50

Cogito v2 preview - 671B MoE

Cogito

$1.25

$1.25

Cogito v2 preview - 70B

Cogito

$0.88

$0.88

Refuel LLM-2

$0.60

$0.60

Refuel LLM-2 Small

$0.20

$0.20

Gemma 3n E4B Instruct

Gemma

$0.02

$0.04

Looks like there are no models for this filter.

Displayed prices refer to the lowest resolution/duration settings. Actual prices might vary.

Image Models

Generate stunning visuals with the latest and greatest image models.

Price per MP

Prices include default steps shown above. Additional costs apply only when exceeding default steps. See full pricing details →

Audio Models

Speech synthesis and processing models.

Price per 1M Characters

Model

Price

Cartesia Sonic-2

$65.00

Transcription Models

Models for automatic speech recognition (ASR) and speech translation.

Price per audio minute

Batch API price

Model

Price

Whisper Large v3

$0.0015

Embedding Models

Vector embeddings for semantic search and RAG.

Price per 1M tokens

Rerank Models

Improve search relevance with reranking models.

Price per 1M tokens

Moderation Models

Filter and classify content for safety and compliance.

Price per 1M tokens

Dedicated Endpoints

Deploy models on custom hardware with guaranteed performance and full control.

Single-tenant GPU instances with:
  • Guaranteed performance (no sharing)

  • Support for custom models

  • Autoscaling & traffic spike handling

Hardware Type

Price/Hour

1x H200 141GB

$4.99

1x H100 80GB

$3.36

1x A100 SXM 80GB

$2.56

1x A100 SXM 40GB

$2.40

1x A100 PCIe 80GB

$2.40

1x L40S 48GB

$2.10

Fine-tuning

Standard pricing

Price per 1M tokens

Supervised Fine-Tuning

Direct Preference Optimization

Size

LoRA

Full Fine-Tuning

LoRA

Full Fine-Tuning

Up to 16B

$0.48

$0.54

$1.20

$1.35

17B-69B

$1.50

$1.65

$3.75

$4.12

70-100B

$2.90

$3.20

$7.25

$8.00

Price is based on the sum of tokens processed in the  fine-tuning training dataset (training dataset size * number of epochs)  plus any tokens in the optional evaluation dataset (validation dataset  size * number of evaluations).

Specialized pricing

Fine-tuning for the models below incurs minimum charges and is limited to LoRA fine-tuning.

Price per 1M tokens

Model

Supervised Fine-Tuning (LoRA)

Direct Preference Optimization (LoRA)

Minimum charge

DeepSeek-R1

DeepSeek-R1-0528

DeepSeek-V3

DeepSeek-V3-0324

DeepSeek-V3.1

DeepSeek-V3.1-Base

$10.00

$25.00

$20.00

GLM-4.6

GLM-4.7

$9.00

$22.50

$27.00

gpt-oss-120B

$5.00

$12.50

$6.00

Kimi K2 Thinking

Kimi K2 Instruct-0905

Kimi K2 Instruct

Kimi K2 Base

$15.00

$37.50

$60.00

Llama 4 Maverick

Llama 4 Maverick Instruct

$8.00

$20.00

$16.00

Llama 4 Scout

Llama 4 Scout Instruct

$3.00

$7.50

$6.00

Qwen3-Coder-480B-A35B-Instruct

$9.00

$22.50

$18.00

Qwen3-235B-A22B

Qwen3-235B-A22B-Instruct-2507

$6.00

$15.00

No min price

Price is based on the sum of tokens processed in the  fine-tuning training dataset (training dataset size * number of epochs)  plus any tokens in the optional evaluation dataset (validation dataset  size * number of evaluations).

Code Execution

Code Sandbox

Customize a deployment of VM sandboxes for large development environments.

Price per hour

Price

Per vCPU

$0.0446

Per GiB RAM

$0.0149

Code Interpreter

Execute LLM-generated code securely using our API.

Price per session

Price

Session (60 minutes)

$0.03

GPU Cloud

All Together Instant and Reserved Clusters feature:
  • Choice of Kubernetes or Slurm on Kubernetes

  • Free network ingress and egress

  • NVIDIA InfiniBand and NVLink networking

All Together Instant and Reserved Clusters feature: choice of Kubernetes or Slurm on Kubernetes, free network ingress and egress, NVIDIA InfiniBand, and NVLink networking.

Instant Clusters

Ready to use, self-service GPUs.

Price per hour per GPU

Hardware

1 Week - 3 Months

1 - 6 Days

Hourly

NVIDIA HGX H100 SXM

$2.20

$2.50

$2.99

NVIDIA HGX H200

$3.79

NVIDIA HGX B200

$5.50

Reserved Clusters

Dedicated capacity, with expert support.

Price per hour

Hardware

GPU Memory

Price

NVIDIA GB200 NVL72

384GB HBM3e

NVIDIA B200

192GB HBM3e

NVIDIA H200

141GB HBM3e

Starting at $2.09

NVIDIA H100

80GB HBM2e

Starting at $1.75

NVIDIA A100

80GB HBM2e

Starting at $1.30

Frontier AI Factory

Large-scale, custom-built private GPU clusters.
1K → 10K → 100K+ NVIDIA GPUs.

NVIDIA Blackwell GPUs at scale
Talk to our team of experts to get a custom quote for your AI Factory project plan.
Request a project plan
Storage

High-bandwidth, parallel filesystem colocated with your compute.

Item

Price

Unit

Shared Filesystem

$0.16

GiB/month

Interested in a custom large-scale deployment?