MLPerf Endpoints
MLPerf Endpoints turns complex AI benchmark data into clear, interactive visualizations that reveal performance trade-offs at a glance. Compare systems, understand what you're acquiring, and make confident infrastructure decisions.
Throughput vs. Interactivity
This chart shows the system throughput (token/sec) on the Y axis (greater is better) vs. interactivity (tokens/sec per user) on the X axis (greater is better).
Throughput vs. Concurrency
How system throughput scales with increasing concurrent clients.
Latency vs. Concurrency
How latency scales with increasing concurrent clients.
Interactivity vs. Concurrency
How per-user token delivery speed scales with usage.
Throughput vs. Concurrency
This chart shows the system throughput (token/sec) on the Y axis (greater is better) vs. system concurrency on the X axis.
As system concurrency increases, the throughput goes up. Typically, low concurrency is not enough to fully load the system, and the system will be delivering substantially less than its peak throughput. As concurrency increases, the throughput scales up. However, at a certain concurrency, the throughput will saturate and hit peak.
