Conversation
|
@codex Review this PR |
There was a problem hiding this comment.
Pull Request Overview
This PR adds a new code_run method to the Session class that allows users to compile and run arbitrary code inside language-specific Docker containers without performing judging or visualization. This complements the existing evaluation functionality by providing direct code execution capabilities.
Key changes include:
- Implementation of the core
code_runfunctionality with proper error handling and resource tracking - Refactoring of existing path setup functions to support prefixed file naming for better organization
- Addition of comprehensive test coverage for the new functionality
Reviewed Changes
Copilot reviewed 12 out of 13 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/ale_bench/session.py | Adds the main code_run method with resource usage tracking and validation |
| src/ale_bench/tool_wrappers/code_runner.py | Implements core code execution logic including compilation, running, and profile parsing |
| src/ale_bench/result.py | Defines CodeRunResult class for code execution results |
| src/ale_bench/tool_wrappers/case_runner.py | Refactors path setup functions to use prefix-based naming and improves code formatting |
| tests/tool_wrappers/test_code_runner.py | Comprehensive test suite for profile parsing functionality |
| tests/judge/test_code_run.py | Integration tests for the code_run method across different languages |
| tests/test_session.py | Unit tests for session-level code_run functionality and resource usage validation |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Codex Review: Here are some suggestions.
About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you open a pull request for review, mark a draft as ready, or comment "@codex review". If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex fix this CI failure" or "@codex address that feedback".
| try: | ||
| try: | ||
| container.wait(timeout=ale_bench.constants.COMPILE_TIMEOUT) | ||
| except (Timeout, ConnectionError): | ||
| if code_language != CodeLanguage.PYTHON: | ||
| return CodeRunResult( | ||
| stdin="", | ||
| stdout="", | ||
| stderr=f"Compilation timed out ({ale_bench.constants.COMPILE_TIMEOUT}s).", | ||
| exit_status=ExitStatus.COMPILE_ERROR.value, | ||
| execution_time=0.0, | ||
| memory_usage=0, | ||
| ) | ||
| except Exception: | ||
| return CodeRunResult( | ||
| stdin="", | ||
| stdout="", | ||
| stderr="Failed to compile the code due to an unexpected error.", | ||
| exit_status=ExitStatus.COMPILE_ERROR.value, | ||
| execution_time=0.0, | ||
| memory_usage=0, | ||
| ) | ||
| stderr = container.logs(stdout=False, stderr=True).decode("utf-8").strip() | ||
| exit_code = container.attrs["State"]["ExitCode"] | ||
| finally: | ||
| container.remove(force=True) | ||
| object_size = host_paths_compile.object_file.stat().st_size | ||
| if any( | ||
| [ | ||
| exit_code != 0, | ||
| code_language != CodeLanguage.PYTHON and object_size == 0, | ||
| code_language == CodeLanguage.PYTHON and "SyntaxError" in stderr, | ||
| ] |
There was a problem hiding this comment.
[P1] Handle Python compile timeouts without crashing
When container.wait() hits the 60s timeout for a Python submission, the except (Timeout, ConnectionError) branch does not assign stderr or exit_code and immediately falls through. The code below then unconditionally reads those variables to build the error response, which raises UnboundLocalError instead of returning a CodeRunResult. Any Python program that hangs during compilation will therefore crash code_run rather than reporting a compilation timeout. Initialize stderr/exit_code (or return an error) in the timeout path before using them.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
To use Codex here, create an environment for this repo.
|
@codex Please read lines L98-L99 in |
|
try:
try:
container.wait(timeout=COMPILE_TIMEOUT)
except (Timeout, ConnectionError):
...
except Exception:
...
stderr = container.logs(...)
exit_code = container.attrs["State"]["ExitCode"]
finally:
container.remove(force=True)Because these assignments occur after the |
|
@codex Would you agree to merge this PR? |
|
Codex Review: Didn't find any major issues. Delightful! About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you open a pull request for review, mark a draft as ready, or comment "@codex review". If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex fix this CI failure" or "@codex address that feedback". |
* update pyproject.toml * update uv.lock * update docker handling (#7) * Add claude GitHub actions (#8) * "Claude PR Assistant workflow" * "Claude Code Review workflow" * update config * Revert "Add claude GitHub actions (#8)" (#9) This reverts commit 33ec2c2. * Add `code_run` method (#10) * add code_run function * update test code run * Minor Changes (#12) * fix MCP server and update scripts * update MCP pyproject.toml and GitHub Actions config * revert dev dependencies and fix MCP server docstring * update README and add scripts
No description provided.