Skip to content

Add evaluation models#28

Merged
Yuki-Imajuku merged 4 commits intomainfrom
feat/new-models
Feb 20, 2026
Merged

Add evaluation models#28
Yuki-Imajuku merged 4 commits intomainfrom
feat/new-models

Conversation

@Yuki-Imajuku
Copy link
Copy Markdown
Collaborator

No description provided.

Copilot AI review requested due to automatic review settings February 20, 2026 10:25
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for 6 new evaluation models to the ALE-Bench toolkit, including pricing information and configuration files for multiple variants of these models.

Changes:

  • Added pricing information for gemini-3.1-pro-preview, claude-sonnet-4.6, glm-5, minimax-m2.5, qwen3.5-397b-a17b, and qwen3.5-plus-02-15 models
  • Created 8 configuration files for different model variants with appropriate provider settings and reasoning configurations
  • All model names follow the established convention where config files include provider prefixes that are stripped during cost calculation

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/ale_bench_eval/calc_cost.py Added pricing entries for 6 new models with tiered pricing structures where applicable
llm_configs/qwen3.5-plus.json Config for Qwen 3.5 Plus model via OpenRouter with reasoning enabled
llm_configs/qwen3.5-397b-a17b.json Config for Qwen 3.5 397B model via OpenRouter with custom temperature settings
llm_configs/minimax-m2.5.json Config for Minimax M2.5 model via OpenRouter with reasoning enabled
llm_configs/glm-5.json Config for GLM-5 model via OpenRouter through z-ai provider
llm_configs/gemini-3.1-pro-preview-low.json Config for Gemini 3.1 Pro with low thinking level via Google provider
llm_configs/gemini-3.1-pro-preview-high.json Config for Gemini 3.1 Pro with high thinking level via Google provider
llm_configs/claude-4.6-sonnet-max.json Config for Claude 4.6 Sonnet with max verbosity via OpenRouter
llm_configs/claude-4.6-sonnet-high.json Config for Claude 4.6 Sonnet with high verbosity via OpenRouter

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Yuki-Imajuku Yuki-Imajuku merged commit a15d207 into main Feb 20, 2026
14 checks passed
@Yuki-Imajuku Yuki-Imajuku deleted the feat/new-models branch February 20, 2026 10:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants