Skip to content

antonykamp/cc-experiment-reports

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cc-experiment-reports

ETL pipeline for analyzing Claude Code session performance on GraalVM Truffle runtime optimization tasks. Produces SVG charts for embedding in a master thesis.

Setup

uv sync

Usage

uv run cc-create-report <config.yaml>

Example with included test data:

uv run cc-create-report examples/config.yaml

Configuration

experiments:
  - id: plain-claude
    path: ./data/2026-02-15-a-s4-test
  - id: plugin-claude
    path: ./data/2026-02-16-b-s4-test

analyses:
  - strategy: single_experiment
    experiment: plain-claude

  - strategy: single_experiment
    experiment: plugin-claude

  - strategy: compare_experiments
    experiments: [plain-claude, plugin-claude]

Paths are resolved relative to the config file location.

Strategies

Single-experiment (analyze one experiment)

ID Name Chart Description
SE-1 Geo-mean progression Line Overall benchmark performance across iterations
SE-2 Character usage Stacked bar Output character breakdown (thinking, read, edit, bash, skill)
SE-3 Sub-agent tokens Stacked bar Token distribution across main/Explore/Bash agents
SE-4 Per-benchmark Line (one per benchmark) Individual benchmark progression
SE-5 Iteration efficiency Scatter Performance improvement vs. cost per iteration
SE-6 Session duration Grouped bar Session duration across iterations

Comparison (compare N experiments)

ID Name Chart Description
CE-1 Runtime comparison Grouped bar Final-iteration benchmark runtimes
CE-2 Geo-mean comparison Line Geo-mean progression overlaid
CE-3 Token & cost Grouped bar Total tokens, cost, messages
CE-4 Tool usage Grouped bar Calls per tool type
CE-5 Character usage Grouped bar Characters per category
CE-6 USD efficiency Bar Improvement per dollar
CE-7 Token efficiency Bar Improvement per token
CE-8 Consistency Box plot Final geo-mean distribution across runs
CE-9 Duration Grouped bar Session duration per iteration

Input data format

Each experiment is a directory containing:

  • <id>--summary.csv.csv — one session metrics file
  • <id>--run-<R>--iteration-<I>--<benchmark>.csv — benchmark result files

See REQUIREMENTS.md for full column schemas.

Adding a strategy

Create a file in src/etl/strategies/single/ or src/etl/strategies/comparison/:

from etl.strategy import SingleExperimentStrategy, register_single

@register_single
class MyStrategy(SingleExperimentStrategy):
    name = "my_strategy"

    def run(self, data, output_dir):
        # data.benchmarks, data.summary_aggregated, data.summary_separated
        ...

The file is auto-discovered on import — no registration boilerplate needed.

Tests

uv run pytest

About

ETL pipeline for analyzing Claude Code session performance on GraalVM Truffle optimization tasks

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages