feat(autofix): migrate to explorer agent#104615
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #104615 +/- ##
=========================================
Coverage 80.51% 80.52%
=========================================
Files 9329 9337 +8
Lines 400842 401462 +620
Branches 25705 25705
=========================================
+ Hits 322756 323278 +522
- Misses 77620 77718 +98
Partials 466 466 |
| @@ -0,0 +1,11 @@ | |||
| CODE_CHANGES_PROMPT = """Implement the fix for issue {short_id}: "{title}" (culprit: {culprit}) | |||
There was a problem hiding this comment.
will be less error-prone to wrap this in a function which inputs parameters and outputs a dedented formatted string
|
|
||
| When you have enough information, generate the triage artifact with: | ||
| - suspect_commit: If you can identify a likely culprit commit: | ||
| - sha: The git commit SHA |
There was a problem hiding this comment.
double checking that explorer will see short SHAs, not the 40 char ones?
There was a problem hiding this comment.
yeah, this whole model will need to be reworked soon to enable better UI, just have this here as a placeholder so I can test in prod with real data
| @@ -0,0 +1,92 @@ | |||
| from __future__ import annotations | |||
| - five_whys: Chain of "why" statements leading to the root cause. | ||
| - reproduction_steps: Steps that would reproduce this issue |
There was a problem hiding this comment.
i like the event timeline artifact + prompting in current RCA. "repro steps" could be reasonably interpreted as how to repro in a test env, rather than how the issue/event at hand unfolded
There was a problem hiding this comment.
that's fair, but I was intentionally going for a minimal, test env repro. I kinda like it better. But we'll see, it's very easy to change prompts and UI once we can feel it out in prod
| @@ -0,0 +1,17 @@ | |||
| """ | |||
| Prompts for Explorer-based Autofix steps. | |||
There was a problem hiding this comment.
nit: prompts in one file seems simpler. same is done for schemas. then no need for init logic and can order them more linearly
|
|
||
| Guidelines: | ||
| 1. Use your tools to fetch the issue details and examine the evidence | ||
| 2. Investigate the trace, replay, logs, other issues, trends, and other telemetry when available to gain a deeper understanding of the issue |
There was a problem hiding this comment.
(for later) does transitioning to explorer mean we can encourage it to look at other events in the issue to capture a slightly broader picture? feel like it's nicer for RCA to not mention transient, event-specific info. maybe explorer will search events on its own if it needs to, since i see there's already prompting + tool descs around issue vs event
There was a problem hiding this comment.
yep this is definitely possible. we can prompt engineer later, but in local testing on mock issues, explorer was already using Discover to find multiple samples so it might just happen naturally
| metadata=metadata, | ||
| ) | ||
| else: | ||
| return client.continue_run( |
There was a problem hiding this comment.
how does context accumulation work—is it that all artifacts for a run_id are included as context? feel like the impact and triage runs should only have RCA as context, not sure
There was a problem hiding this comment.
yeah the whole run is one continuous chat and the agent can freely update any artifact it has been tasked with generating before when it deems it's appropriate. This is intentional because the siloing of steps in Autofix were what made the UX so rigid (imagine investigating the solution and the agent realizes the root cause was wrong, you'd have to rethink all the way up the convo instead of it just editing the root cause with the new info)
| artifacts: dict[str, Artifact], | ||
| ) -> None: | ||
| """ | ||
| Continue to the next step if stopping_point hasn't been reached. |
There was a problem hiding this comment.
would be cool to trigger triage and impact concurrently w/ solution
There was a problem hiding this comment.
it would be, i just didn't want to mess with automation too much in this PR. we can def discuss what ideal automation flow looks like in a world with 5 steps and steps that can run in any order
Adds UI to support the new explorer-backed Autofix agent. Controlled by explorer FF and a new one. Should preserve old UI as is if not behind the flags. <img width="1610" height="1686" alt="image" src="https://github.com/user-attachments/assets/c8b095a5-e2f0-4c58-af53-a9f291f55a6d" /> <img width="720" height="591" alt="image" src="https://github.com/user-attachments/assets/1a0fb45f-5825-4f4f-be02-3e1ae893a94c" /> Have tickets for follow up work like backrgound agents, better code changes display, and better suspect commit card. Backend: #104615 Part of AIML-2004 and AIML-1732
| - impacts: List of specific impacts, each with: | ||
| - label: What is impacted (e.g., "User Authentication", "Payment Flow") | ||
| - impact_description: One line describing the impact | ||
| - evidence: Evidence or reasoning for this assessment |
There was a problem hiding this comment.
Bug: Impact assessment prompt missing required rating field
The ImpactItem schema in artifact_schemas.py requires a rating field of type Literal["low", "medium", "high"], but the impact_assessment_prompt function only instructs the agent to include label, impact_description, and evidence. Since rating is not optional (no default value), artifacts generated by the agent following these prompt instructions will fail Pydantic validation when the schema is applied.
Additional Locations (1)
|
|
||
| elif "impact_assessment" in artifacts and artifacts["impact_assessment"].data: | ||
| webhook_event = "impact_assessment_completed" | ||
| webhook_payload["impact_assessment"] = artifacts["impact_assessment"].data |
There was a problem hiding this comment.
Bug: Invalid webhook event types for triage and impact
The _send_step_webhook method uses event names "triage_completed" and "impact_assessment_completed", but these are not defined in the SentryAppEventType enum. The broadcast_webhooks_for_organization task validates event types against this enum and raises SentryAppSentryError for invalid types (lines 949-956 in sentry_apps.py). This will cause webhook failures when the triage or impact_assessment steps complete.
|
|
||
| elif "root_cause" in artifacts and artifacts["root_cause"].data: | ||
| webhook_event = "root_cause_completed" | ||
| webhook_payload["root_cause"] = artifacts["root_cause"].data |
There was a problem hiding this comment.
Bug: Webhook step detection wrong when triage/impact artifacts exist
The _send_step_webhook method determines which step completed by checking artifact presence in a fixed order (triage first, then impact_assessment, etc.). Since artifacts accumulate across all steps in a run, if triage or impact_assessment was run before other pipeline steps, subsequent step completions will incorrectly trigger the triage/impact webhook instead of the correct one. For example, if a user runs root_cause, then triage, then solution, the solution completion would check "triage" in artifacts first (which is true from the earlier step) and incorrectly send a triage webhook instead of a solution webhook.
Runs Autofix's 3 steps and 2 more based on the Seer Explorer Client rather than calling it's own Seer agent. Supports both manual runs and automation runs.
Controlled by the seer explorer feature flag and a new one, so we can start with internal testing for now.
Does not support automation or manual handoff to 3rd party agents currently, but i have a ticket to add that back along with other improvements (e.g. suspect commit artifact will need improvement too)
Frontend: #104618
Part of AIML-2004