Skip to content

Benchmark data schema — hardware specs, model throughput, and test result format #74

@canoo

Description

@canoo

Problem

There is no defined schema for benchmark results, which means #75 (benchmark runner), #2 (cross-platform comparisons), and #30 (community submissions) cannot be built consistently. Every component would invent its own format.

Note: distinct from session logging schema (#9)

Session logging (#9) records live operational data: which tool called which model, how long it took, what it cost. That schema is user-specific and continuous.

This schema is for benchmark results: a user runs a standardized test suite once, and the result is a snapshot record that can be compared across hardware setups. Different fields, different purpose.

Proposed schema (benchmark result record)

{
  "schema_version": 1,
  "nexus_version": "0.3.5",
  "submitted_at": "2026-04-28T00:00:00Z",
  "hardware": {
    "gpu_model": "RTX 4070",
    "vram_gb": 12,
    "cpu": "AMD Ryzen 9 7900X",
    "ram_gb": 64,
    "os": "linux",
    "arch": "amd64"
  },
  "results": [
    {
      "model": "qwen2.5-coder:1.5b",
      "task": "commit-msg",
      "tokens_per_second": 143.2,
      "time_to_first_token_ms": 210,
      "total_duration_ms": 1840,
      "prompt_tokens": 312,
      "completion_tokens": 28,
      "passed_quality_check": true
    }
  ]
}

Dependency

This is the first issue to complete in the v0.3.5 milestone. #75, #2, and #30 all depend on this schema being stable before implementation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    Status

    Todo

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions