Skip to content

Conversation

@miguelg719
Copy link
Collaborator

@miguelg719 miguelg719 commented Dec 15, 2025

why

Error handling on tasks was causing the runner to idle. Also, proper log cleanup is needed to prevent memory leaks.

what changed

  • Remove nested try/finally around task execution, let main try/catch handle cleanup
  • Add cleanup to EvalLogger

test plan


Summary by cubic

Improves eval task error handling and cleanup to prevent memory leaks and dangling V3 sessions. Always closes resources and clears logs after each task run.

  • Bug Fixes

    • Always close the V3 instance in a finally block; log close errors as warnings so they don’t mask task results.
    • Clear the EvalLogger after returning logs to free memory.
    • Track v3Input at the outer scope to ensure cleanup on both success and failure.
  • Refactors

    • Simplified result logging and removed nested try/finally around task execution.

Written for commit 34aebe9. Summary will update automatically on new commits.

@changeset-bot
Copy link

changeset-bot bot commented Dec 15, 2025

⚠️ No Changeset found

Latest commit: 34aebe9

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@miguelg719 miguelg719 marked this pull request as ready for review December 16, 2025 00:29
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 2 files

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Dec 16, 2025

Greptile Overview

Greptile Summary

This PR fixes critical error handling issues in the eval runner that were causing tasks to idle and memory leaks.

Key improvements:

  • Moved v3Input declaration to outer scope so it's accessible in all error paths
  • Removed nested try/finally that was complicating cleanup logic
  • Added comprehensive finally block that guarantees V3 instance closure and logger cleanup
  • Close errors are now logged as warnings instead of throwing, preventing them from masking original task results
  • Added clear() method to EvalLogger to free memory after logs are captured

The changes follow clean error handling patterns: logs are captured in return statements before the finally block clears them, and cleanup always happens regardless of success or failure.

Confidence Score: 5/5

  • This PR is safe to merge with no risk
  • The changes implement robust error handling and resource cleanup patterns. The variable scoping is correct, cleanup is guaranteed via finally block, and error handling properly prevents cleanup errors from masking original results. No edge cases or issues found.
  • No files require special attention

Important Files Changed

File Analysis

Filename Score Overview
packages/evals/index.eval.ts 5/5 Improved error handling by moving v3Input to outer scope and adding comprehensive cleanup in finally block
packages/evals/logger.ts 5/5 Added clear() method to free memory after logs are captured and processed

Sequence Diagram

sequenceDiagram
    participant Runner as Eval Runner
    participant Logger as EvalLogger
    participant Task as Task Module
    participant V3 as V3 Instance

    Runner->>Logger: new EvalLogger()
    Note over Runner: Declare v3Input at outer scope
    
    alt Task Execution Success
        Runner->>Task: import task module
        Runner->>V3: initV3()
        V3-->>Runner: v3Input
        Runner->>Task: taskFunction(v3Input)
        Task->>Logger: log events
        Task-->>Runner: result { _success: true }
        Runner->>Runner: console.log success
        Runner-->>Runner: return result
    else Task Execution Error
        Runner->>Task: import or execute task
        Task--xRunner: throws error
        Runner->>Logger: logger.error()
        Runner-->>Runner: return { _success: false, error, logs }
    end
    
    Note over Runner,V3: Finally block always runs
    Runner->>V3: v3Input?.v3.close()
    alt Close Success
        V3-->>Runner: closed
    else Close Error
        V3--xRunner: closeError
        Runner->>Runner: console.error (warning only)
    end
    Runner->>Logger: logger.clear()
    Logger->>Logger: logs = [], stagehand = undefined
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@miguelg719 miguelg719 merged commit 8cff9ac into main Dec 16, 2025
31 of 32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants