Benchmarks

Compare model performance across 15+ benchmarks with ~400 entries from Artificial Analysis. The Benchmarks tab offers browse and compare modes with head-to-head tables, scatter plots, and radar charts. The CLI provides filtering, sorting, and JSON output.

Benchmarks tab

TUI

Layout

The Benchmarks tab has two modes:

Browse mode -- model list on the left, detail panel on the right
Compare mode -- model list on the left, comparison view on the right (H2H table, scatter plot, or radar chart)

The left panel can toggle between a Models list and a Creators sidebar with t.

Quick sort

Press the number key once to sort by that metric; press again to toggle direction:

Key	Metric
`1`	Intelligence index
`2`	Release date
`3`	Speed (tokens/second)

Filters

Key	Action
`4`	Cycle source filter (All / Open / Closed)
`5`	Cycle region filter (US / China / Europe / ...)
`6`	Cycle type filter (Startup / Big Tech / Research)
`7`	Cycle reasoning filter (All / Reasoning / Non-reasoning)

Sort picker

Key	Action
`s`	Open sort picker popup with all available metrics
`S`	Toggle sort direction (ascending/descending)

The sort picker popup lists all available benchmark metrics. Select one with Enter or dismiss with Esc.

Creators sidebar

Press t to toggle the left panel between the model list and the creators sidebar. The sidebar shows 40+ model creators with counts, filterable by region, type, and open/closed source. Select a creator to filter the model list to their models only.

Detail panel

In browse mode, the detail panel shows full benchmark data for the selected model:

Indexes -- Intelligence, Coding, Math, GPQA Diamond
Scores -- individual benchmark scores
Performance -- speed (tokens/second), latency, time to first token
Pricing -- input and output cost per million tokens

Values are formatted as {:.1} (one decimal place) for scores and indexes. Missing values display as an em-dash.

Compare mode

Benchmarks compare mode

Select up to 8 models for comparison:

Key	Action
`Space`	Toggle model selection (max 8)
`v`	Cycle comparison view (H2H table, Scatter plot, Radar chart)
`c`	Clear all selections
`Left` / `Right`	Switch focus between list and compare panel

Head-to-head table

A side-by-side comparison table showing all metrics for selected models. Press d to show the detail overlay. Scroll with arrow keys when the compare panel is focused.

Scatter plot

A two-axis scatter plot comparing selected models. Cycle the axes:

Key	Action
`x`	Cycle X axis metric
`y`	Cycle Y axis metric

Radar chart

A multi-axis radar chart overlaying selected models. Press a to cycle through radar presets (different metric combinations).

Each selected model gets a unique color from the comparison palette for consistent identification across all three views and the legend.

Search

Press / to search benchmark entries by name, slug, or creator.

Actions

Key	Action
`o`	Open the selected model's Artificial Analysis page in browser

CLI

CLI benchmarks list

The benchmarks CLI can be invoked as models benchmarks <command> or as a standalone benchmarks <command> via a symlink (see Installation#setting-up-command-aliases).

Interactive benchmark picker

models benchmarks list
models benchmarks list --sort speed --limit 10
models benchmarks list --creator openai --reasoning
models benchmarks list --open --sort price-input --asc

Opens an inline terminal picker with model table and detail preview. Inside the picker:

/ starts a live text filter over name, slug, and creator
s cycles sort metrics
S reverses the current sort
Enter prints the selected model's benchmark details

Show benchmark details

models benchmarks show gpt-4o
models benchmarks show "Claude Sonnet 4"

Prints a formatted benchmark breakdown. If the query matches multiple variants in an interactive terminal, the picker reopens with just the matching candidates.

Filtering flags

Flag	Description
`--creator <name>`	Filter by creator name
`--open`	Show only open-source models
`--closed`	Show only closed-source models
`--reasoning`	Show only reasoning models
`--sort <metric>`	Sort by metric (intelligence, coding, math, speed, price-input, etc.)
`--asc`	Sort ascending (default is descending)
`--limit <n>`	Limit results

JSON output

models benchmarks list --creator anthropic --json
models benchmarks show gpt-4o --json

Data freshness

Benchmark data is fetched fresh from the Artificial Analysis CDN on every launch -- there is no local cache for benchmark data. The upstream dataset is updated automatically every 30 minutes via a GitHub Actions workflow.

Repository · Issues · Releases · brew install models · MIT License

models wiki

Features

Reference

Development

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Benchmarks

Benchmarks

TUI

Layout

Quick sort

Filters

Sort picker

Creators sidebar

Detail panel

Compare mode

Head-to-head table

Scatter plot

Radar chart

Search

Actions

CLI

Interactive benchmark picker

Show benchmark details

Filtering flags

JSON output

Data freshness

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally