Skip to content

Add code_run method#10

Merged
Yuki-Imajuku merged 2 commits intoversions/v1.2.0from
feat/add-run-code
Sep 9, 2025
Merged

Add code_run method#10
Yuki-Imajuku merged 2 commits intoversions/v1.2.0from
feat/add-run-code

Conversation

@Yuki-Imajuku
Copy link
Copy Markdown
Collaborator

No description provided.

@Yuki-Imajuku
Copy link
Copy Markdown
Collaborator Author

@codex Review this PR

@Yuki-Imajuku Yuki-Imajuku requested a review from Copilot September 9, 2025 13:02
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a new code_run method to the Session class that allows users to compile and run arbitrary code inside language-specific Docker containers without performing judging or visualization. This complements the existing evaluation functionality by providing direct code execution capabilities.

Key changes include:

  • Implementation of the core code_run functionality with proper error handling and resource tracking
  • Refactoring of existing path setup functions to support prefixed file naming for better organization
  • Addition of comprehensive test coverage for the new functionality

Reviewed Changes

Copilot reviewed 12 out of 13 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/ale_bench/session.py Adds the main code_run method with resource usage tracking and validation
src/ale_bench/tool_wrappers/code_runner.py Implements core code execution logic including compilation, running, and profile parsing
src/ale_bench/result.py Defines CodeRunResult class for code execution results
src/ale_bench/tool_wrappers/case_runner.py Refactors path setup functions to use prefix-based naming and improves code formatting
tests/tool_wrappers/test_code_runner.py Comprehensive test suite for profile parsing functionality
tests/judge/test_code_run.py Integration tests for the code_run method across different languages
tests/test_session.py Unit tests for session-level code_run functionality and resource usage validation

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codex Review: Here are some suggestions.

About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you open a pull request for review, mark a draft as ready, or comment "@codex review". If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex fix this CI failure" or "@codex address that feedback".

Comment on lines +76 to +108
try:
try:
container.wait(timeout=ale_bench.constants.COMPILE_TIMEOUT)
except (Timeout, ConnectionError):
if code_language != CodeLanguage.PYTHON:
return CodeRunResult(
stdin="",
stdout="",
stderr=f"Compilation timed out ({ale_bench.constants.COMPILE_TIMEOUT}s).",
exit_status=ExitStatus.COMPILE_ERROR.value,
execution_time=0.0,
memory_usage=0,
)
except Exception:
return CodeRunResult(
stdin="",
stdout="",
stderr="Failed to compile the code due to an unexpected error.",
exit_status=ExitStatus.COMPILE_ERROR.value,
execution_time=0.0,
memory_usage=0,
)
stderr = container.logs(stdout=False, stderr=True).decode("utf-8").strip()
exit_code = container.attrs["State"]["ExitCode"]
finally:
container.remove(force=True)
object_size = host_paths_compile.object_file.stat().st_size
if any(
[
exit_code != 0,
code_language != CodeLanguage.PYTHON and object_size == 0,
code_language == CodeLanguage.PYTHON and "SyntaxError" in stderr,
]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] Handle Python compile timeouts without crashing

When container.wait() hits the 60s timeout for a Python submission, the except (Timeout, ConnectionError) branch does not assign stderr or exit_code and immediately falls through. The code below then unconditionally reads those variables to build the error response, which raises UnboundLocalError instead of returning a CodeRunResult. Any Python program that hangs during compilation will therefore crash code_run rather than reporting a compilation timeout. Initialize stderr/exit_code (or return an error) in the timeout path before using them.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SakanaAI SakanaAI deleted a comment from chatgpt-codex-connector bot Sep 9, 2025
@Yuki-Imajuku
Copy link
Copy Markdown
Collaborator Author

@codex Please read lines L98-L99 in src/ale_bench/tool_wrappers/code_runner.py carefully. Since stderr/exit_code are defined outside the try-except block, I don't think an UnboundLocalError can occur. Is this understanding correct?

@chatgpt-codex-connector
Copy link
Copy Markdown

stderr and exit_code are always assigned after the inner try … except block:

try:
    try:
        container.wait(timeout=COMPILE_TIMEOUT)
    except (Timeout, ConnectionError):
        ...
    except Exception:
        ...

    stderr = container.logs(...)
    exit_code = container.attrs["State"]["ExitCode"]
finally:
    container.remove(force=True)

Because these assignments occur after the except clauses, they will execute in any path that doesn’t return early. Thus, stderr and exit_code are guaranteed to be defined before being used later, and an UnboundLocalError cannot occur.

View task →

@Yuki-Imajuku
Copy link
Copy Markdown
Collaborator Author

@codex Would you agree to merge this PR?

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Delightful!

About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you open a pull request for review, mark a draft as ready, or comment "@codex review". If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex fix this CI failure" or "@codex address that feedback".

@Yuki-Imajuku Yuki-Imajuku merged commit 82d3225 into versions/v1.2.0 Sep 9, 2025
5 checks passed
@Yuki-Imajuku Yuki-Imajuku deleted the feat/add-run-code branch September 9, 2025 13:59
Yuki-Imajuku added a commit that referenced this pull request Sep 9, 2025
* update pyproject.toml

* update uv.lock

* update docker handling (#7)

* Add claude GitHub actions (#8)

* "Claude PR Assistant workflow"

* "Claude Code Review workflow"

* update config

* Revert "Add claude GitHub actions (#8)" (#9)

This reverts commit 33ec2c2.

* Add `code_run` method (#10)

* add code_run function

* update test code run

* Minor Changes (#12)

* fix MCP server and update scripts

* update MCP pyproject.toml and GitHub Actions config

* revert dev dependencies and fix MCP server docstring

* update README and add scripts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants