Menu

MLPerf Endpoints

MLPerf Endpoints turns complex AI benchmark data into clear, interactive visualizations that reveal performance trade-offs at a glance. Compare systems, understand what you're acquiring, and make confident infrastructure decisions.

Throughput vs. Interactivity

This chart shows the system throughput (token/sec) on the Y axis (greater is better) vs. interactivity (tokens/sec per user) on the X axis (greater is better).

Throughput vs. Concurrency

How system throughput scales with increasing concurrent clients.

Throughput (tokens/s)
0.00.51
better ↑
1.002.667.0718.8050.00

Latency vs. Concurrency

How latency scales with increasing concurrent clients.

Latency (TTFT P99 seconds)
0.00.51
better ↓
1.002.667.0718.8050.00

Interactivity vs. Concurrency

How per-user token delivery speed scales with usage.

Interactivity (tokens/s/user)
0.00.51
better ↑
1.002.667.0718.8050.00
Concurrency Increasing →
Concurrency Level
10.00
1.002.667.0718.8050.00

Throughput vs. Concurrency

This chart shows the system throughput (token/sec) on the Y axis (greater is better) vs. system concurrency on the X axis.

As system concurrency increases, the throughput goes up. Typically, low concurrency is not enough to fully load the system, and the system will be delivering substantially less than its peak throughput. As concurrency increases, the throughput scales up. However, at a certain concurrency, the throughput will saturate and hit peak.