Skip to content

Version 1.2.0#6

Merged
Yuki-Imajuku merged 8 commits intomainfrom
versions/v1.2.0
Sep 9, 2025
Merged

Version 1.2.0#6
Yuki-Imajuku merged 8 commits intomainfrom
versions/v1.2.0

Conversation

@Yuki-Imajuku
Copy link
Copy Markdown
Collaborator

No description provided.

Yuki-Imajuku and others added 8 commits July 17, 2025 15:47
* "Claude PR Assistant workflow"

* "Claude Code Review workflow"

* update config
* add code_run function

* update test code run
* fix MCP server and update scripts

* update MCP pyproject.toml and GitHub Actions config

* revert dev dependencies and fix MCP server docstring
@Yuki-Imajuku Yuki-Imajuku requested a review from Copilot September 9, 2025 15:00
@Yuki-Imajuku
Copy link
Copy Markdown
Collaborator Author

@codex Review this PR

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request introduces version 1.2.0 of ALE-Bench, adding a new code_run function, refactoring Docker client management, updating dependencies, and improving various components.

  • Adds a new code_run function that compiles and executes arbitrary code without judging
  • Refactors Docker client usage to use a context manager pattern for better resource management
  • Updates project dependencies and includes several bug fixes and improvements

Reviewed Changes

Copilot reviewed 23 out of 25 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
pyproject.toml Updates version to 1.2.0 and bumps development dependencies
src/ale_bench/result.py Adds CodeRunResult model for raw code execution results
src/ale_bench/tool_wrappers/code_runner.py New module implementing code_run functionality
src/ale_bench/session.py Adds code_run method to Session class with resource tracking
tests/tool_wrappers/test_code_runner.py New comprehensive test suite for code_run functionality
tests/judge/test_code_run.py New integration tests for code_run with Docker
src/ale_bench/utils.py Adds docker_client context manager for proper resource management
Multiple files Refactors Docker client usage throughout codebase
src/ale_bench/tool_wrappers/case_runner.py Simplifies function signatures by replacing separate parameters with prefix strings

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

memory_usage=0,
),
id="non_aero_exited_broken_wa",
id="non_zero_exited_broken_wa",
Copy link

Copilot AI Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed typo from 'non_aero_exited_broken_wa' to 'non_zero_exited_broken_wa' which correctly describes a non-zero exit status.

Copilot uses AI. Check for mistakes.
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Can't wait for the next one!

About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you open a pull request for review, mark a draft as ready, or comment "@codex review". If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex fix this CI failure" or "@codex address that feedback".

@Yuki-Imajuku Yuki-Imajuku merged commit 1a05735 into main Sep 9, 2025
5 checks passed
@Yuki-Imajuku Yuki-Imajuku deleted the versions/v1.2.0 branch September 9, 2025 15:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants