fix: recognise Alertmanager, Coralogix, and Honeycomb keys in is_clearly_healthy short-circuit by kaushal-bakrania · Pull Request #672 · Tracer-Cloud/opensre

kaushal-bakrania · 2026-04-19T16:23:42Z

Summary

Same drift class as [BUG] is_clearly_healthy short-circuit never fires for pure-EKS healthy states — eks_* keys missing from _INVESTIGATED_EVIDENCE_KEYS #582 (EKS fix): _INVESTIGATED_EVIDENCE_KEYS in evidence_checker.py was missing alertmanager_alerts, alertmanager_silences, coralogix_logs, coralogix_error_logs, and honeycomb_traces — keys that the investigation merge step actually writes via the matching _map_* mappers in post_process.py.
Consequence: condition 4 of is_clearly_healthy never fires for pure-Alertmanager / pure-Coralogix / pure-Honeycomb healthy states, so every resolved low-severity alert from those stacks pays a full LLM RCA round-trip instead of taking the fast path. Cost + latency paper-cut, not a correctness bug.
Fix adds the five missing keys to the frozenset. The severity gate (condition 2) still rejects firing critical alerts when these evidence keys are populated, so the safety properties of the short-circuit are unchanged.

Fixes #670.

Test plan

make lint — clean
make typecheck — clean
pytest tests/nodes/root_cause_diagnosis/test_evidence_checker.py — 28 passed (14 new + 14 existing)
Added parameterised tests covering all five new keys plus a firing-critical safety test
Reproduction from the issue now prints missing: [] and all four keys print short-circuit -> True

…n is_clearly_healthy short-circuit Same drift class as Tracer-Cloud#582 (which fixed it for EKS): the investigation merge step writes alertmanager_alerts / alertmanager_silences / coralogix_logs / coralogix_error_logs / honeycomb_traces into the evidence dict via the corresponding _map_* mappers in post_process.py, but these keys were not added to _INVESTIGATED_EVIDENCE_KEYS. As a result, condition 4 of is_clearly_healthy never fires for pure-Alertmanager / pure-Coralogix / pure-Honeycomb healthy states and every resolved low-severity alert from those stacks pays a full LLM RCA round-trip instead of taking the fast path. Add the five missing keys to the frozenset. The severity gate (condition 2) still rejects firing critical alerts even when these evidence keys are populated, so the safety properties of the short-circuit are unchanged. Fixes Tracer-Cloud#670

greptile-apps · 2026-04-19T16:26:08Z

Greptile Summary

This PR fixes a recurring drift pattern where _INVESTIGATED_EVIDENCE_KEYS in evidence_checker.py was not updated when new integrations (Alertmanager, Coralogix, Honeycomb) were added, causing is_clearly_healthy condition 4 to never fire for those stacks and forcing a full LLM RCA round-trip on every resolved low-severity alert from them. The fix adds the five missing keys (alertmanager_alerts, alertmanager_silences, coralogix_logs, coralogix_error_logs, honeycomb_traces) and backs them with parameterized tests plus a severity-gate safety check.

Confidence Score: 5/5

Safe to merge — the change is additive, well-tested, and the safety properties of the short-circuit are unchanged.

No P0/P1 issues found. The added keys are verified against what the post_process.py mappers actually write. Tests cover all five new keys plus the critical-severity rejection path. The only finding is a P2 observation about potentially similar gaps for Vercel/GitHub integrations, which is pre-existing and out of scope for this PR.

No files require special attention.

Important Files Changed

Filename	Overview
app/nodes/root_cause_diagnosis/evidence_checker.py	Adds 5 missing mapper-output keys to _INVESTIGATED_EVIDENCE_KEYS and check_evidence_availability; keys correctly match what map_alertmanager*, _map_coralogix_logs, and _map_honeycomb_traces write in post_process.py.
tests/nodes/root_cause_diagnosis/test_evidence_checker.py	Adds parameterized tests for all 5 new keys plus a safety test confirming critical-severity alerts still reject; good coverage and follows existing test patterns.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Incoming resolved alert] --> B{Condition 1\nstate in healthy_states?}
    B -- No --> FULL[Full LLM RCA round-trip]
    B -- Yes --> C{Condition 2\nseverity in healthy_severities?}
    C -- No --> FULL
    C -- Yes --> D{Condition 3\nno error annotations?}
    D -- No --> FULL
    D -- Yes --> E{Condition 4\nany key in _INVESTIGATED_EVIDENCE_KEYS?}
    E -- No --> FULL
    E -- Yes --> SC[Short-circuit → healthy]

    subgraph "Keys added by this PR"
        K1[alertmanager_alerts]
        K2[alertmanager_silences]
        K3[coralogix_logs]
        K4[coralogix_error_logs]
        K5[honeycomb_traces]
    end

    K1 & K2 & K3 & K4 & K5 --> E

Prompt To Fix All With AI

This is a comment left during a code review.
Path: app/nodes/root_cause_diagnosis/evidence_checker.py
Line: 16-44

Comment:
**Potential same-drift gap for Vercel and GitHub integrations**

`_map_vercel_deployment_status` writes `vercel_deployments`, `_map_vercel_deployment_logs` writes `vercel_deployment` / `vercel_events`, and the three GitHub mappers write `github_code_matches`, `github_file`, and `github_commits` — none of which appear in `_INVESTIGATED_EVIDENCE_KEYS`. If a pure-Vercel or pure-GitHub healthy investigation runs, condition 4 of `is_clearly_healthy` would still not fire, forcing the same full LLM round-trip this PR was designed to eliminate. Worth checking whether those integrations should be included here too.

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (2): Last reviewed commit: "fix: recognise alertmanager/coralogix/ho..." | Re-trigger Greptile}

…e_availability The same drift class the frozenset fix addressed: _map_* mappers write these keys into the evidence dict, but the has_cloudwatch_evidence OR- chain was not updated alongside. A firing-critical alert from these stacks with a sparse payload (no tracer_web_run, no annotation body) would fall into _handle_insufficient_evidence and skip the reasoning LLM — the mirror image of the short-circuit miss this PR already fixes. Review feedback from Greptile on PR Tracer-Cloud#672.

…acer-Cloud#684) The key is already in _INVESTIGATED_EVIDENCE_KEYS and emitted by the _map_grafana_alert_rules mapper, but check_evidence_availability's has_cloudwatch_evidence OR-chain never probed it. A healthy alert whose only evidence was grafana_alert_rules fell through to _handle_insufficient_evidence instead of satisfying the evidence gate. Same drift-bug class as commit 3dbfb52 (EKS) and open PR Tracer-Cloud#672 (AM/CL/HC).

…) (#685) The key is already in _INVESTIGATED_EVIDENCE_KEYS and emitted by the _map_grafana_alert_rules mapper, but check_evidence_availability's has_cloudwatch_evidence OR-chain never probed it. A healthy alert whose only evidence was grafana_alert_rules fell through to _handle_insufficient_evidence instead of satisfying the evidence gate. Same drift-bug class as commit 3dbfb52 (EKS) and open PR #672 (AM/CL/HC).

VaibhavUpreti · 2026-04-20T19:01:41Z

@copilot resolve the merge conflicts in this pull request

Devesh36 · 2026-04-23T07:35:31Z

Closing in favour of #777 which has been merged and covers this fix as part of a broader whitelist approach. The missing keys (alertmanager, coralogix, honeycomb) are now handled via CLAIM_EVIDENCE_KEYS.

kaushal-bakrania mentioned this pull request Apr 19, 2026

bug: grafana_alert_rules key missing from check_evidence_availability OR-chain #684

Closed

kaushal-bakrania mentioned this pull request Apr 19, 2026

fix: recognise grafana_alert_rules in check_evidence_availability (#684) #685

Merged

3 tasks

mayankbharati-ops mentioned this pull request Apr 23, 2026

fix: emit healthy-short-circuit claims from investigation keys only #777

Merged

Devesh36 closed this Apr 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: recognise Alertmanager, Coralogix, and Honeycomb keys in is_clearly_healthy short-circuit#672

fix: recognise Alertmanager, Coralogix, and Honeycomb keys in is_clearly_healthy short-circuit#672
kaushal-bakrania wants to merge 2 commits intoTracer-Cloud:mainfrom
kaushal-bakrania:fix/is-clearly-healthy-recognise-alertmanager-coralogix-honeycomb

kaushal-bakrania commented Apr 19, 2026

Uh oh!

greptile-apps Bot commented Apr 19, 2026 •

edited

Loading

Flowchart

Uh oh!

VaibhavUpreti commented Apr 20, 2026

Uh oh!

Devesh36 commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kaushal-bakrania commented Apr 19, 2026

Summary

Test plan

Uh oh!

greptile-apps Bot commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

VaibhavUpreti commented Apr 20, 2026

Uh oh!

Devesh36 commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

greptile-apps Bot commented Apr 19, 2026 •

edited

Loading