Change coding style by Yuki-Imajuku · Pull Request #27 · SakanaAI/ALE-Bench

Yuki-Imajuku · 2026-02-20T09:42:00Z

Change type checker (mypy -> ty)
Add ruff rules
Update dependencies

Copilot

Pull request overview

This PR modernizes the project’s Python tooling and style configuration (switching from mypy to ty, expanding Ruff lint rules), while also applying consistent formatting and small refactors across runtime and evaluation code.

Changes:

Replace mypy with ty for type checking and update CI/docs accordingly.
Expand Ruff lint rule set and apply formatting-driven refactors across the codebase.
Improve path handling in tests (move away from hard-coded /tmp) and tighten type/error handling in several modules.

Reviewed changes

Copilot reviewed 47 out of 49 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
tests/tool_wrappers/test_local_visualization.py	Reformat `pytest.mark.parametrize` argument list to tuple style.
tests/tool_wrappers/test_input_generation.py	Use `tmp_path` instead of hard-coded `/tmp`; adjust volume key assertions; parametrize formatting.
tests/tool_wrappers/test_code_runner.py	Reformat `pytest.mark.parametrize` argument list to tuple style.
tests/tool_wrappers/test_case_runner.py	Replace hard-coded `/tmp` paths with constants-based temp paths; parametrize formatting.
tests/test_utils.py	Remove `tempfile` usage in favor of `tmp_path`; tighten iteration (`zip(..., strict=True)`); adjust exception type.
tests/test_session.py	Improve fixtures (use `tmp_path` for tool dir); add coverage for `estimate_rank_and_performance`; minor parametrize formatting.
tests/test_schemas.py	Make serialized datetimes timezone-aware (UTC) and update expected ISO strings; parametrize formatting.
tests/test_result.py	Reformat `pytest.mark.parametrize` argument list to tuple style.
tests/test_data.py	Reformat parametrizations; adjust typing for rank/performance tests; make datetimes timezone-aware (UTC).
tests/judge/test_tle.py	Reformat `pytest.mark.parametrize` argument list to tuple style.
tests/judge/test_mle.py	Reformat `pytest.mark.parametrize` argument list to tuple style.
tests/judge/test_ce.py	Reformat `pytest.mark.parametrize` argument list to tuple style.
src/ale_bench_eval/shared_async_loop.py	Refactor singleton storage and cleanup; improve logging style and exception messaging.
src/ale_bench_eval/selection.py	Add `CodeLanguage` TypeGuard validation and centralize extraction/validation of selected solution info.
src/ale_bench_eval/scaffolds.py	Switch to structured logging; simplify token/cost extraction; improve exception logging.
src/ale_bench_eval/safe_generation.py	Add `TYPE_CHECKING` imports; extract response validation helper; refine error parsing and messages.
src/ale_bench_eval/prompts/builder.py	Refactor content merging logic; remove unnecessary branches; tighten type errors.
src/ale_bench_eval/prompts/init.py	Add package docstring.
src/ale_bench_eval/logger.py	Use pathlib `.open()`; adjust JSON encode/decode helpers; expand LoggerAdapter passthrough methods.
src/ale_bench_eval/evaluate.py	Use `Session.estimate_rank_and_performance`; improve logging and minor logic simplification.
src/ale_bench_eval/calc_cost.py	Accept both `genai_prices.Usage` and `pydantic_ai.usage.RunUsage`; normalize for pricing.
src/ale_bench_eval/analyze_results.py	Use pathlib `.open()` consistently; docstring style changes.
src/ale_bench_eval/main.py	Switch to structured logging; normalize pathlib usage; improve argument passing/readability.
src/ale_bench_eval/init.py	Replace explicit imports with `importlib.import_module` loop; improve dependency error message.
src/ale_bench/utils.py	Add module logger; replace `print` in some paths; add overloads for `parse_statement`; tighten errors.
src/ale_bench/tool_wrappers/local_visualization.py	Add module/function docstrings; use `zip(..., strict=True)` for input/output pairing.
src/ale_bench/tool_wrappers/input_generation.py	Add module docstring; improve warnings (`stacklevel`); replace asserts with explicit errors.
src/ale_bench/tool_wrappers/code_runner.py	Alias `requests` connection error; replace asserts with explicit type checks; docstring formatting.
src/ale_bench/tool_wrappers/case_runner.py	Update tmp bind paths via `constants.TMP_DIR`; alias requests connection error; replace asserts with explicit checks.
src/ale_bench/tool_wrappers/init.py	Add package docstring.
src/ale_bench/start.py	Add module logger; improve warnings (`stacklevel`); tweak signature typing; minor refactors.
src/ale_bench/session.py	Add constants; switch many asserts to explicit exceptions; add `estimate_rank_and_performance`; improve logging.
src/ale_bench/schemas.py	Add module docstring; use stdlib `Annotated`; simplify serializer usage.
src/ale_bench/result.py	Add module docstring; use `collections.abc.Sequence`; simplify computed fields and loops.
src/ale_bench/error.py	Add module docstring; remove redundant `pass`.
src/ale_bench/data.py	Add module docstring; replace asserts with explicit errors; strengthen typing and interpolation logic.
src/ale_bench/constants.py	Add module docstring; introduce `TMP_DIR` and derive temp-file constants from it.
src/ale_bench/code_language.py	Add module docstring; simplify conditionals and improve error messages.
src/ale_bench/init.py	Remove runtime Python-version guard (now `requires-python` governs).
pyproject.toml	Replace optional dev deps with `[dependency-groups]`; remove mypy config; expand Ruff rules; add `ty` config.
mcp/pyproject.toml	Bump version; switch to `[dependency-groups]`; update lint/type tooling deps.
docs/session_object.md	Document `estimate_rank_and_performance`.
docs/mcp_server.md	Update `uv sync` instructions for dev dependency groups.
docs/evaluation.md	Update `uv sync` instructions to `--no-dev --extra eval`.
README.md	Update supported Python range and `uv sync` commands (`--no-dev` / `--extra eval`).
CONTRIBUTING.md	Update contributor workflow to `uv` + `ruff` + `ty`; adjust commands.
.github/workflows/check.yml	Switch CI to `uv sync --dev --extra eval` and `ty check`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-20T09:55:24Z

src/ale_bench/session.py

@@ -413,7 +424,8 @@ def case_gen_eval(
        )


case_gen_eval calls self.case_gen(seed, **gen_kwargs), but case_gen only accepts seed and a single gen_kwargs dict parameter. Expanding gen_kwargs will raise TypeError when any generation options are present (and is also incorrect when empty). Pass the dict as gen_kwargs=gen_kwargs (or as the second positional argument) instead of expanding it.

Yuki-Imajuku added 3 commits February 20, 2026 18:17

change coding style

5e43ea0

add test and document

37b0ecf

update contributing.md

2a9f447

Copilot AI review requested due to automatic review settings February 20, 2026 09:51

Copilot started reviewing on behalf of Yuki-Imajuku February 20, 2026 09:52 View session

Copilot AI reviewed Feb 20, 2026

View reviewed changes

update test and bug fix

efb0de6

Yuki-Imajuku merged commit e698721 into main Feb 20, 2026
10 checks passed

Yuki-Imajuku deleted the chore/change-coding-style branch February 20, 2026 10:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change coding style#27

Change coding style#27
Yuki-Imajuku merged 4 commits intomainfrom
chore/change-coding-style

Yuki-Imajuku commented Feb 20, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Yuki-Imajuku commented Feb 20, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants