Skip to content

feat(autofix): migrate to explorer agent#104615

Merged
roaga merged 10 commits intomasterfrom
autofix/migrate-to-explorer-be
Dec 12, 2025
Merged

feat(autofix): migrate to explorer agent#104615
roaga merged 10 commits intomasterfrom
autofix/migrate-to-explorer-be

Conversation

@roaga
Copy link
Copy Markdown
Contributor

@roaga roaga commented Dec 9, 2025

Runs Autofix's 3 steps and 2 more based on the Seer Explorer Client rather than calling it's own Seer agent. Supports both manual runs and automation runs.

Controlled by the seer explorer feature flag and a new one, so we can start with internal testing for now.

Does not support automation or manual handoff to 3rd party agents currently, but i have a ticket to add that back along with other improvements (e.g. suspect commit artifact will need improvement too)

Frontend: #104618
Part of AIML-2004

@linear
Copy link
Copy Markdown

linear bot commented Dec 9, 2025

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Dec 9, 2025
@codecov
Copy link
Copy Markdown

codecov bot commented Dec 9, 2025

Codecov Report

❌ Patch coverage is 70.61404% with 67 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/sentry/seer/autofix/on_completion_hook.py 60.90% 43 Missing ⚠️
src/sentry/seer/autofix/autofix_agent.py 63.41% 15 Missing ⚠️
src/sentry/seer/endpoints/group_ai_autofix.py 80.64% 6 Missing ⚠️
src/sentry/seer/autofix/issue_summary.py 25.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##           master   #104615    +/-   ##
=========================================
  Coverage   80.51%    80.52%            
=========================================
  Files        9329      9337     +8     
  Lines      400842    401462   +620     
  Branches    25705     25705            
=========================================
+ Hits       322756    323278   +522     
- Misses      77620     77718    +98     
  Partials      466       466            

@@ -0,0 +1,11 @@
CODE_CHANGES_PROMPT = """Implement the fix for issue {short_id}: "{title}" (culprit: {culprit})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will be less error-prone to wrap this in a function which inputs parameters and outputs a dedented formatted string


When you have enough information, generate the triage artifact with:
- suspect_commit: If you can identify a likely culprit commit:
- sha: The git commit SHA
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double checking that explorer will see short SHAs, not the 40 char ones?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, this whole model will need to be reworked soon to enable better UI, just have this here as a placeholder so I can test in prod with real data

@@ -0,0 +1,92 @@
from __future__ import annotations
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is this necessary?

Comment on lines +14 to +15
- five_whys: Chain of "why" statements leading to the root cause.
- reproduction_steps: Steps that would reproduce this issue
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like the event timeline artifact + prompting in current RCA. "repro steps" could be reasonably interpreted as how to repro in a test env, rather than how the issue/event at hand unfolded

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's fair, but I was intentionally going for a minimal, test env repro. I kinda like it better. But we'll see, it's very easy to change prompts and UI once we can feel it out in prod

@@ -0,0 +1,17 @@
"""
Prompts for Explorer-based Autofix steps.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: prompts in one file seems simpler. same is done for schemas. then no need for init logic and can order them more linearly


Guidelines:
1. Use your tools to fetch the issue details and examine the evidence
2. Investigate the trace, replay, logs, other issues, trends, and other telemetry when available to gain a deeper understanding of the issue
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(for later) does transitioning to explorer mean we can encourage it to look at other events in the issue to capture a slightly broader picture? feel like it's nicer for RCA to not mention transient, event-specific info. maybe explorer will search events on its own if it needs to, since i see there's already prompting + tool descs around issue vs event

Copy link
Copy Markdown
Contributor Author

@roaga roaga Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep this is definitely possible. we can prompt engineer later, but in local testing on mock issues, explorer was already using Discover to find multiple samples so it might just happen naturally

metadata=metadata,
)
else:
return client.continue_run(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does context accumulation work—is it that all artifacts for a run_id are included as context? feel like the impact and triage runs should only have RCA as context, not sure

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah the whole run is one continuous chat and the agent can freely update any artifact it has been tasked with generating before when it deems it's appropriate. This is intentional because the siloing of steps in Autofix were what made the UX so rigid (imagine investigating the solution and the agent realizes the root cause was wrong, you'd have to rethink all the way up the convo instead of it just editing the root cause with the new info)

artifacts: dict[str, Artifact],
) -> None:
"""
Continue to the next step if stopping_point hasn't been reached.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be cool to trigger triage and impact concurrently w/ solution

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be, i just didn't want to mess with automation too much in this PR. we can def discuss what ideal automation flow looks like in a world with 5 steps and steps that can run in any order

@roaga roaga marked this pull request as ready for review December 12, 2025 13:53
@roaga roaga requested a review from a team as a code owner December 12, 2025 13:53
@roaga roaga merged commit c4d1cc6 into master Dec 12, 2025
67 checks passed
@roaga roaga deleted the autofix/migrate-to-explorer-be branch December 12, 2025 13:55
roaga added a commit that referenced this pull request Dec 12, 2025
Adds UI to support the new explorer-backed Autofix agent. Controlled by
explorer FF and a new one. Should preserve old UI as is if not behind
the flags.
<img width="1610" height="1686" alt="image"
src="https://github.com/user-attachments/assets/c8b095a5-e2f0-4c58-af53-a9f291f55a6d"
/>
<img width="720" height="591" alt="image"
src="https://github.com/user-attachments/assets/1a0fb45f-5825-4f4f-be02-3e1ae893a94c"
/>

Have tickets for follow up work like backrgound agents, better code
changes display, and better suspect commit card.

Backend: #104615
Part of AIML-2004 and AIML-1732
- impacts: List of specific impacts, each with:
- label: What is impacted (e.g., "User Authentication", "Payment Flow")
- impact_description: One line describing the impact
- evidence: Evidence or reasoning for this assessment
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Impact assessment prompt missing required rating field

The ImpactItem schema in artifact_schemas.py requires a rating field of type Literal["low", "medium", "high"], but the impact_assessment_prompt function only instructs the agent to include label, impact_description, and evidence. Since rating is not optional (no default value), artifacts generated by the agent following these prompt instructions will fail Pydantic validation when the schema is applied.

Additional Locations (1)

Fix in Cursor Fix in Web


elif "impact_assessment" in artifacts and artifacts["impact_assessment"].data:
webhook_event = "impact_assessment_completed"
webhook_payload["impact_assessment"] = artifacts["impact_assessment"].data
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Invalid webhook event types for triage and impact

The _send_step_webhook method uses event names "triage_completed" and "impact_assessment_completed", but these are not defined in the SentryAppEventType enum. The broadcast_webhooks_for_organization task validates event types against this enum and raises SentryAppSentryError for invalid types (lines 949-956 in sentry_apps.py). This will cause webhook failures when the triage or impact_assessment steps complete.

Fix in Cursor Fix in Web


elif "root_cause" in artifacts and artifacts["root_cause"].data:
webhook_event = "root_cause_completed"
webhook_payload["root_cause"] = artifacts["root_cause"].data
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Webhook step detection wrong when triage/impact artifacts exist

The _send_step_webhook method determines which step completed by checking artifact presence in a fixed order (triage first, then impact_assessment, etc.). Since artifacts accumulate across all steps in a run, if triage or impact_assessment was run before other pipeline steps, subsequent step completions will incorrectly trigger the triage/impact webhook instead of the correct one. For example, if a user runs root_cause, then triage, then solution, the solution completion would check "triage" in artifacts first (which is true from the earlier step) and incorrectly send a triage webhook instead of a solution webhook.

Fix in Cursor Fix in Web

@github-actions github-actions bot locked and limited conversation to collaborators Dec 28, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants