[Evals] task error handling and memory cleanup #1419

miguelg719 · 2025-12-15T20:33:28Z

why

Error handling on tasks was causing the runner to idle. Also, proper log cleanup is needed to prevent memory leaks.

what changed

Remove nested try/finally around task execution, let main try/catch handle cleanup
Add cleanup to EvalLogger

test plan

Summary by cubic

Improves eval task error handling and cleanup to prevent memory leaks and dangling V3 sessions. Always closes resources and clears logs after each task run.

Bug Fixes
- Always close the V3 instance in a finally block; log close errors as warnings so they don’t mask task results.
- Clear the EvalLogger after returning logs to free memory.
- Track v3Input at the outer scope to ensure cleanup on both success and failure.
Refactors
- Simplified result logging and removed nested try/finally around task execution.

^{Written for commit 34aebe9. Summary will update automatically on new commits.}

changeset-bot · 2025-12-15T20:33:31Z

⚠️ No Changeset found

Latest commit: 34aebe9

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

cubic-dev-ai

No issues found across 2 files

greptile-apps · 2025-12-16T00:33:43Z

Greptile Overview

Greptile Summary

This PR fixes critical error handling issues in the eval runner that were causing tasks to idle and memory leaks.

Key improvements:

Moved v3Input declaration to outer scope so it's accessible in all error paths
Removed nested try/finally that was complicating cleanup logic
Added comprehensive finally block that guarantees V3 instance closure and logger cleanup
Close errors are now logged as warnings instead of throwing, preventing them from masking original task results
Added clear() method to EvalLogger to free memory after logs are captured

The changes follow clean error handling patterns: logs are captured in return statements before the finally block clears them, and cleanup always happens regardless of success or failure.

Confidence Score: 5/5

This PR is safe to merge with no risk
The changes implement robust error handling and resource cleanup patterns. The variable scoping is correct, cleanup is guaranteed via finally block, and error handling properly prevents cleanup errors from masking original results. No edge cases or issues found.
No files require special attention

Important Files Changed

File Analysis

Filename	Score	Overview
packages/evals/index.eval.ts	5/5	Improved error handling by moving v3Input to outer scope and adding comprehensive cleanup in finally block
packages/evals/logger.ts	5/5	Added clear() method to free memory after logs are captured and processed

Sequence Diagram

sequenceDiagram
    participant Runner as Eval Runner
    participant Logger as EvalLogger
    participant Task as Task Module
    participant V3 as V3 Instance

    Runner->>Logger: new EvalLogger()
    Note over Runner: Declare v3Input at outer scope
    
    alt Task Execution Success
        Runner->>Task: import task module
        Runner->>V3: initV3()
        V3-->>Runner: v3Input
        Runner->>Task: taskFunction(v3Input)
        Task->>Logger: log events
        Task-->>Runner: result { _success: true }
        Runner->>Runner: console.log success
        Runner-->>Runner: return result
    else Task Execution Error
        Runner->>Task: import or execute task
        Task--xRunner: throws error
        Runner->>Logger: logger.error()
        Runner-->>Runner: return { _success: false, error, logs }
    end
    
    Note over Runner,V3: Finally block always runs
    Runner->>V3: v3Input?.v3.close()
    alt Close Success
        V3-->>Runner: closed
    else Close Error
        V3--xRunner: closeError
        Runner->>Runner: console.error (warning only)
    end
    Runner->>Logger: logger.clear()
    Logger->>Logger: logs = [], stagehand = undefined

greptile-apps

_{2 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

[Evals] task error handling and memory cleanup

34aebe9

miguelg719 marked this pull request as ready for review December 16, 2025 00:29

cubic-dev-ai bot reviewed Dec 16, 2025

View reviewed changes

greptile-apps bot reviewed Dec 16, 2025

View reviewed changes

tkattkat approved these changes Dec 16, 2025

View reviewed changes

miguelg719 merged commit 8cff9ac into main Dec 16, 2025
31 of 32 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Evals] task error handling and memory cleanup #1419

[Evals] task error handling and memory cleanup #1419

Uh oh!

miguelg719 commented Dec 15, 2025 •

edited

Loading

Uh oh!

changeset-bot bot commented Dec 15, 2025

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

greptile-apps bot commented Dec 16, 2025

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Evals] task error handling and memory cleanup #1419

[Evals] task error handling and memory cleanup #1419

Uh oh!

Conversation

miguelg719 commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

why

what changed

test plan

Summary by cubic

Uh oh!

changeset-bot bot commented Dec 15, 2025

⚠️ No Changeset found

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Dec 16, 2025

Greptile Overview

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

miguelg719 commented Dec 15, 2025 •

edited

Loading