Add evaluation models by Yuki-Imajuku · Pull Request #28 · SakanaAI/ALE-Bench

Yuki-Imajuku · 2026-02-20T10:25:08Z

No description provided.

Copilot

Pull request overview

This PR adds support for 6 new evaluation models to the ALE-Bench toolkit, including pricing information and configuration files for multiple variants of these models.

Changes:

Added pricing information for gemini-3.1-pro-preview, claude-sonnet-4.6, glm-5, minimax-m2.5, qwen3.5-397b-a17b, and qwen3.5-plus-02-15 models
Created 8 configuration files for different model variants with appropriate provider settings and reasoning configurations
All model names follow the established convention where config files include provider prefixes that are stripped during cost calculation

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
src/ale_bench_eval/calc_cost.py	Added pricing entries for 6 new models with tiered pricing structures where applicable
llm_configs/qwen3.5-plus.json	Config for Qwen 3.5 Plus model via OpenRouter with reasoning enabled
llm_configs/qwen3.5-397b-a17b.json	Config for Qwen 3.5 397B model via OpenRouter with custom temperature settings
llm_configs/minimax-m2.5.json	Config for Minimax M2.5 model via OpenRouter with reasoning enabled
llm_configs/glm-5.json	Config for GLM-5 model via OpenRouter through z-ai provider
llm_configs/gemini-3.1-pro-preview-low.json	Config for Gemini 3.1 Pro with low thinking level via Google provider
llm_configs/gemini-3.1-pro-preview-high.json	Config for Gemini 3.1 Pro with high thinking level via Google provider
llm_configs/claude-4.6-sonnet-max.json	Config for Claude 4.6 Sonnet with max verbosity via OpenRouter
llm_configs/claude-4.6-sonnet-high.json	Config for Claude 4.6 Sonnet with high verbosity via OpenRouter

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Yuki-Imajuku added 4 commits February 16, 2026 19:58

add glm-5 minimax-m2.5 qwen-3.5

0ec07fd

add claude-4.6-sonnet

f9a4d38

add gemini-3.1-pro-preview

1f98a20

apply ruff

87f91a8

Copilot AI review requested due to automatic review settings February 20, 2026 10:25

Copilot started reviewing on behalf of Yuki-Imajuku February 20, 2026 10:25 View session

Copilot AI reviewed Feb 20, 2026

View reviewed changes

Yuki-Imajuku merged commit a15d207 into main Feb 20, 2026
14 checks passed

Yuki-Imajuku deleted the feat/new-models branch February 20, 2026 10:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add evaluation models#28

Add evaluation models#28
Yuki-Imajuku merged 4 commits intomainfrom
feat/new-models

Yuki-Imajuku commented Feb 20, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Yuki-Imajuku commented Feb 20, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants