fix: recognise Alertmanager, Coralogix, and Honeycomb keys in is_clearly_healthy short-circuit#672
Conversation
…n is_clearly_healthy short-circuit Same drift class as Tracer-Cloud#582 (which fixed it for EKS): the investigation merge step writes alertmanager_alerts / alertmanager_silences / coralogix_logs / coralogix_error_logs / honeycomb_traces into the evidence dict via the corresponding _map_* mappers in post_process.py, but these keys were not added to _INVESTIGATED_EVIDENCE_KEYS. As a result, condition 4 of is_clearly_healthy never fires for pure-Alertmanager / pure-Coralogix / pure-Honeycomb healthy states and every resolved low-severity alert from those stacks pays a full LLM RCA round-trip instead of taking the fast path. Add the five missing keys to the frozenset. The severity gate (condition 2) still rejects firing critical alerts even when these evidence keys are populated, so the safety properties of the short-circuit are unchanged. Fixes Tracer-Cloud#670
Greptile SummaryThis PR fixes a recurring drift pattern where Confidence Score: 5/5Safe to merge — the change is additive, well-tested, and the safety properties of the short-circuit are unchanged. No P0/P1 issues found. The added keys are verified against what the post_process.py mappers actually write. Tests cover all five new keys plus the critical-severity rejection path. The only finding is a P2 observation about potentially similar gaps for Vercel/GitHub integrations, which is pre-existing and out of scope for this PR. No files require special attention. Important Files Changed
|
…e_availability The same drift class the frozenset fix addressed: _map_* mappers write these keys into the evidence dict, but the has_cloudwatch_evidence OR- chain was not updated alongside. A firing-critical alert from these stacks with a sparse payload (no tracer_web_run, no annotation body) would fall into _handle_insufficient_evidence and skip the reasoning LLM — the mirror image of the short-circuit miss this PR already fixes. Review feedback from Greptile on PR Tracer-Cloud#672.
…acer-Cloud#684) The key is already in _INVESTIGATED_EVIDENCE_KEYS and emitted by the _map_grafana_alert_rules mapper, but check_evidence_availability's has_cloudwatch_evidence OR-chain never probed it. A healthy alert whose only evidence was grafana_alert_rules fell through to _handle_insufficient_evidence instead of satisfying the evidence gate. Same drift-bug class as commit 3dbfb52 (EKS) and open PR Tracer-Cloud#672 (AM/CL/HC).
…) (#685) The key is already in _INVESTIGATED_EVIDENCE_KEYS and emitted by the _map_grafana_alert_rules mapper, but check_evidence_availability's has_cloudwatch_evidence OR-chain never probed it. A healthy alert whose only evidence was grafana_alert_rules fell through to _handle_insufficient_evidence instead of satisfying the evidence gate. Same drift-bug class as commit 3dbfb52 (EKS) and open PR #672 (AM/CL/HC).
|
@copilot resolve the merge conflicts in this pull request |
|
Closing in favour of #777 which has been merged and covers this fix as part of a broader whitelist approach. The missing keys (alertmanager, coralogix, honeycomb) are now handled via CLAIM_EVIDENCE_KEYS. |
Summary
_INVESTIGATED_EVIDENCE_KEYSinevidence_checker.pywas missingalertmanager_alerts,alertmanager_silences,coralogix_logs,coralogix_error_logs, andhoneycomb_traces— keys that the investigation merge step actually writes via the matching_map_*mappers inpost_process.py.is_clearly_healthynever fires for pure-Alertmanager / pure-Coralogix / pure-Honeycomb healthy states, so every resolved low-severity alert from those stacks pays a full LLM RCA round-trip instead of taking the fast path. Cost + latency paper-cut, not a correctness bug.Fixes #670.
Test plan
make lint— cleanmake typecheck— cleanpytest tests/nodes/root_cause_diagnosis/test_evidence_checker.py— 28 passed (14 new + 14 existing)missing: []and all four keys printshort-circuit -> True