-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Add more agent evals to evals cli #1422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Greptile OverviewGreptile SummaryThis PR adds 19 new agent evaluation tasks and fixes a critical import bug across all existing agent evals.
Confidence Score: 5/5
Important Files ChangedFile Analysis
Sequence DiagramsequenceDiagram
participant Test as Eval Runner
participant Agent as Agent Instance
participant SC as ScreenshotCollector
participant Page as Browser Page
participant Eval as V3Evaluator
Test->>Page: goto(target URL)
Test->>SC: new ScreenshotCollector(page)
Test->>SC: start()
SC->>Page: capture screenshots (interval)
Test->>Agent: execute(instruction, maxSteps)
Agent->>Page: perform actions
Agent-->>Test: agentResult
Test->>SC: stop()
SC-->>Test: screenshots[]
Test->>Eval: ask(question, screenshots, agentReasoning)
Eval-->>Test: evaluation, reasoning
Test-->>Test: return success/failure
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No issues found across 44 files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
44 files reviewed, no comments
why
we need more evals for agent
what changed
test plan
ran evals
Summary by cubic
Added 18 new hard-level agent evals and fixed the agent import to use the correct agent, improving coverage and stability of browser tasks.
New Features
Bug Fixes
Written for commit b947d97. Summary will update automatically on new commits.