fix: emit healthy-short-circuit claims from investigation keys only by mayankbharati-ops · Pull Request #777 · Tracer-Cloud/opensre

mayankbharati-ops · 2026-04-22T22:39:08Z

Summary

_handle_healthy_finding built the validated_claims list by iterating every evidence key and filtering by truthiness. Two problems fall out of that:

Metadata leaks into findings. grafana_logs_query, datadog_logs_query, counts (datadog_monitors_count, eks_total_pods), timings (datadog_fetch_ms), service names (grafana_logs_service), trace IDs (honeycomb_trace_id), URLs (honeycomb_query_url) — all truthy, all surfaced as e.g.

"datadog_logs_query data confirmed within normal operating bounds"
Empty-list evidence is silently dropped. An empty grafana_logs / eks_events after a completed investigation is itself the healthy signal (per is_clearly_healthy's condition 4 and docstring). Empty lists are falsy, so the claim that actually triggered the short-circuit never appears.

Fix

Introduce an explicit CLAIM_EVIDENCE_KEYS whitelist in evidence_checker.py, populated by auditing every mapper in app/nodes/investigate/processing/post_process.py. Replace the old truthy-filter with:

def _is_healthy_claim_key(key, value):
    if key in INVESTIGATED_EVIDENCE_KEYS:
        return True           # empty lists count — the healthy signal
    if key in CLAIM_EVIDENCE_KEYS:
        return bool(value)    # adjacent data keys claim when non-empty
    return False              # metadata / unknown keys never claim

CLAIM_EVIDENCE_KEYS (58 entries) is a superset of INVESTIGATED_EVIDENCE_KEYS (16 entries) listing every primary data key across the 39 mappers. Metadata keys — queries, counts, timings, names, IDs, URLs, text summaries — are authoritatively excluded by omission. Also renames _INVESTIGATED_EVIDENCE_KEYS → INVESTIGATED_EVIDENCE_KEYS (cross-module import, leading underscore was misleading).

Deliberate trade-off: a future mapper whose primary data key is not added to CLAIM_EVIDENCE_KEYS will not produce a healthy-short-circuit claim for that key until someone updates the whitelist. That's the safe default — silent inclusion is what produced the metadata-leak bug this PR fixes.

E2E validation — full synthetic suites, real Anthropic LLM

HEALTHY_SHORT_CIRCUIT=true, mocked EKS/Grafana/Datadog backends, real claude-sonnet-4-6 reasoning + claude-haiku-4-5 toolcall for the upstream graph.

`tests/synthetic/eks/000-healthy` (short-circuit path — this PR's scope)

Before (main) — 8 claims, 4 metadata noise, eks_events silently dropped:

eks_pods, eks_total_pods (count), eks_deployments, eks_total_deployments (count),
datadog_monitors, datadog_monitors_count (count), datadog_logs, datadog_logs_query (query)

After (this PR) — 5 clean claims, eks_events: [] now surfaced:

datadog_logs, datadog_monitors, eks_deployments, eks_events, eks_pods

Passed, actual_category: healthy, matched_keywords: [normal, no failure]. Diagnose at 0ms, healthy_short_circuit=true, LLM not invoked.

Full EKS suite — 14 scenarios, all mocked backends

Result	Count	Details
PASS	10	000-healthy (short-circuit), 001-005, 007-010 (all non-short-circuit LLM path)
FAIL	4	006, 011, 012, 013 — all `state: alerting` ⇒ `is_clearly_healthy=False` ⇒ short-circuit never fires, LLM categorization drift (e.g. `got 'infrastructure', expected 'healthy'`)

Full RDS suite — 15 scenarios, `--mock-grafana`

Result	Count	Details
PASS	11	000-healthy (short-circuit), 001, 003-006, 008-011, 014
FAIL	4	002, 007, 012, 013 — all `state: alerting` ⇒ short-circuit never fires, LLM categorization / keyword-match drift

Confirmed every failure is in state: alerting where _handle_healthy_finding is unreachable (grepped via json.load(alert.json)['state'] for each). Both 000-healthy scenarios — the only short-circuit exercises in the suites — PASS with clean output.

Non-healthy regression check

tests/synthetic/eks/001-oomkilled-crashloop: full LLM investigation (2× diagnose, 192s), produced OOM/SIGKILL root cause, category=resource_exhaustion. Confirms the change is confined to the healthy branch.

Unit coverage — 188 tests in `tests/nodes/root_cause_diagnosis/`

TestInvestigationKeyClaims — parametrised over every member of INVESTIGATED_EVIDENCE_KEYS; empty list produces a claim.
TestAdjacentDataKeys — parametrised twice over CLAIM_EVIDENCE_KEYS - INVESTIGATED_EVIDENCE_KEYS: truthy produces a claim, empty does not.
TestNonWhitelistedKeysFiltered — 51 exhaustive metadata shapes from the mapper audit (counts, totals, queries, timings, names, IDs, URLs, limits, windows, text summaries); none leak in isolation or alongside adjacent investigation keys.
TestWhitelistIntegrity — INVESTIGATED_EVIDENCE_KEYS ⊆ CLAIM_EVIDENCE_KEYS; no whitelist entry ends in an obvious-metadata suffix.
TestHealthyFindingShape — shape, validity score, loop-count preservation, tracker completion, deterministic claim order.
End-to-end — diagnose_root_cause short-circuits on a mixed evidence dict, asserts mock_llm_factory.assert_not_called().

Verification

CI (Python 3.13): test (ubuntu-latest) PASS, typecheck PASS, quality PASS, CodeQL PASS, Analyze (python) PASS
Local pytest tests/ (Python 3.14, no coverage): 2931 pass, 1 skipped, 0 failures (was 2767 on main; +164 new)
ruff check app/ tests/: clean
ruff format --check: clean
mypy app/: same 3 pre-existing types-PyMySQL stub warnings, nothing new

Scope

Two app/ files (evidence_checker.py, node.py), one test file. No public API changes aside from renaming the already-exported-by-behavior constant. No state-shape changes. No behavior change for the non-healthy branch. Orthogonal to open PR #672 (which extends INVESTIGATED_EVIDENCE_KEYS itself for alertmanager/coralogix/honeycomb) — this PR composes cleanly regardless of which keys that set contains.

When is_clearly_healthy() trips, _handle_healthy_finding built the validated_claims list by iterating every evidence key and filtering by truthiness. Two problems fall out of that: 1. Query-string metadata (grafana_logs_query, datadog_logs_query), counts (eks_total_pods, datadog_monitors_count), and other non-data fields are truthy and leak into the findings as e.g. "datadog_logs_query data confirmed within normal operating bounds" which is nonsensical — that field is a query, not data. 2. An empty grafana_logs/eks_events list after a completed investigation is itself the healthy signal (per is_clearly_healthy's docstring and condition 4), but an empty list is falsy and silently dropped, so the claim that actually triggered the short-circuit never appears. Iterate _INVESTIGATED_EVIDENCE_KEYS intersected with evidence.keys() (the same gating is_clearly_healthy uses) and sort for deterministic output. E2E validated on the synthetic healthy scenarios (HEALTHY_SHORT_CIRCUIT=true, mocked backends, real Anthropic LLM for the upstream graph steps): tests/synthetic/eks/000-healthy passed, findings trimmed from 8 claims (4 noise) to 5 clean claims; eks_events: [] now surfaces as a healthy claim instead of being dropped. tests/synthetic/rds_postgres/000-healthy passed, efficiency_score=1.0. Unit coverage added (24 tests): empty-list produces a claim, non- investigation keys stay out, order is deterministic, every investigation key is recognised (parametrized), and diagnose_root_cause short-circuits without invoking the LLM.

greptile-apps · 2026-04-22T22:41:13Z

Greptile Summary

This PR fixes two related bugs in the healthy short-circuit path of _handle_healthy_finding in the root-cause-diagnosis node. The old code built validated_claims by iterating all evidence keys filtered by truthiness, which caused query-string metadata keys (grafana_logs_query, datadog_logs_query, count fields) to incorrectly surface as data-backed claims, while empty-list evidence (the actual healthy signal per is_clearly_healthy condition 4) was silently dropped. The fix mirrors is_clearly_healthy's own gating by iterating sorted(_INVESTIGATED_EVIDENCE_KEYS) ∩ evidence.keys().

Key changes:

app/nodes/root_cause_diagnosis/node.py: 8-line listcomp rewrite — iterates sorted(_INVESTIGATED_EVIDENCE_KEYS) with k in evidence guard instead of iterating all evidence keys filtered by truthiness.
tests/nodes/root_cause_diagnosis/test_healthy_short_circuit.py: 24 new unit tests covering the regression (empty-list evidence → claim emitted), exclusion of non-investigation keys, deterministic ordering, full parametric coverage of every _INVESTIGATED_EVIDENCE_KEYS member, shape invariants, tracker assertions, and end-to-end LLM-bypass assertion.

Confidence Score: 5/5

Safe to merge — minimal, targeted fix with comprehensive regression tests and no behavior changes outside the healthy short-circuit path.

The change is 8 lines in one production file and is logically correct: it precisely mirrors the evidence-key gating already used by is_clearly_healthy, fixing both reported bugs (metadata leakage and dropped empty-list signals). Tests are thorough (24 cases including parametric full-key coverage and end-to-end LLM-bypass assertion). The PR description confirms all CI checks (pytest 2791 pass, ruff, mypy) are clean. No public API or state-shape changes.

No files require special attention.

Important Files Changed

Filename	Overview
app/nodes/root_cause_diagnosis/node.py	Core fix: replaces truthy-evidence iteration with sorted _INVESTIGATED_EVIDENCE_KEYS intersection guard, correctly including empty-list healthy signals and excluding query-string metadata keys.
tests/nodes/root_cause_diagnosis/test_healthy_short_circuit.py	New test file with 24 tests covering the regression, non-investigation key exclusion, deterministic ordering, full parametric key coverage, shape invariants, and LLM-bypass assertion.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[diagnose_root_cause] --> B{_short_circuit_enabled\nAND is_clearly_healthy?}
    B -- Yes --> C[_handle_healthy_finding]
    B -- No --> D{has_tracer OR\nhas_cloudwatch\nOR has_alert?}
    D -- No --> E[_handle_insufficient_evidence]
    D -- Yes --> F[LLM call → parse_root_cause]
    C --> G["for k in sorted(_INVESTIGATED_EVIDENCE_KEYS)\n    if k in evidence"]
    G --> H["validated_claims: k + ' data confirmed within normal operating bounds'"]
    H --> I["Return healthy finding\nroot_cause_category='healthy'\nvalidity_score=1.0\nno LLM call"]

_{Reviews (1): Last reviewed commit: "fix: emit healthy-short-circuit claims f..." | Re-trigger Greptile}

Follow-up to address two completeness concerns with the previous commit. 1. **Back-compat for truthy non-investigation data keys.** The previous revision iterated only ``INVESTIGATED_EVIDENCE_KEYS``, which silently dropped claims for truthy evidence entries like ``datadog_events``, ``datadog_error_logs``, ``honeycomb_traces``, ``eks_failing_pods``, and similar adjacent data keys written by the investigate mappers. Claim generation now unions two rules: - Investigation keys present in evidence always produce a claim (empty lists included — mirrors ``is_clearly_healthy`` condition 4). - Other evidence keys produce a claim when they carry non-empty data *and* do not match a metadata shape (query string, count, timing, resource name, total, by-pipeline aggregate). Metadata is detected via ``str.endswith`` on a small suffix tuple (``_query``, ``_count``, ``_ms``, ``_source``, ``_name``, ``_namespace``, ``_total``, ``_by_pipeline``), plus ``total_`` prefix and ``_total_`` infix for keys like ``eks_total_pods`` / ``lambda_total_logs``. Strict ``endswith`` avoids false positives like ``grafana_service_names`` (plural — list of service names, real data). 2. **Rename ``_INVESTIGATED_EVIDENCE_KEYS`` to ``INVESTIGATED_EVIDENCE_KEYS``.** The set is now imported from another module in this package, so the leading underscore is misleading. Public name matches its actual API surface. Internal references in ``evidence_checker.py`` and the two test modules updated in the same commit. Coverage additions (91 tests total across the two test modules, up from the previous 24): - ``TestTruthyNonInvestigationDataClaims`` parametrised over 18 adjacent data keys — confirms the union rule keeps them as claims. - ``TestMetadataKeyFiltering`` parametrised over 22 metadata shapes — confirms queries, counts, totals, timings, and resource names are excluded even when the adjacent investigation key is also present. - Forward-compat test: a new ``my_new_source_data`` key not in the investigation set is claimed automatically once populated by any future mapper, without a code change. E2E re-validated with real Anthropic LLM for upstream graph: - ``tests/synthetic/eks/000-healthy`` — short-circuit still hits at 0ms, 5 clean claims (``datadog_logs``, ``datadog_monitors``, ``eks_deployments``, ``eks_events``, ``eks_pods``). Scenario passed. - ``tests/synthetic/eks/001-oomkilled-crashloop`` — non-healthy path, full LLM investigation runs (2× diagnose, 192s total), produces the expected OOM/SIGKILL root cause with confidence scores. Scenario passed, ``category=resource_exhaustion``. Confirms the change does not leak into the non-healthy branch. Verification: - ``pytest tests/``: 2834 pass, 1 skipped, 0 failures. - ``ruff check`` and ``ruff format --check``: clean. - ``mypy app/``: unchanged pre-existing pymysql-stub warnings only.

…whitelist The previous revision used a ``str.endswith`` heuristic on a suffix tuple to separate data keys from metadata. Auditing every mapper in ``app/nodes/investigate/processing/post_process.py`` surfaced seven key shapes the heuristic did not cover: ``_service``, ``_id``, ``_url``, ``_text``, ``_limit``, ``_window``, and ``_dataset`` — each of which is metadata written alongside a real data key (e.g. ``grafana_logs_service`` next to ``grafana_logs``, ``honeycomb_query_url`` next to ``honeycomb_traces``, ``github_file_text`` next to ``github_file``). Replace the heuristic with an explicit ``CLAIM_EVIDENCE_KEYS`` whitelist in ``evidence_checker.py``, populated from a full audit of every mapper's primary data key output. The whitelist is a superset of ``INVESTIGATED_EVIDENCE_KEYS``; investigation keys qualify even when empty (an empty list after a completed investigation is the healthy signal), non-investigation whitelist entries qualify only when truthy. Everything else — metadata, unknown keys — is ignored. Trade-off: a new mapper whose primary data key is not added to the whitelist will not produce a healthy-short-circuit claim for that key until the whitelist is updated. That is the deliberate, safe default — silent inclusion risked the metadata-leak bug this PR is fixing. Test suite expanded to 188 tests in tests/nodes/root_cause_diagnosis/ covering: - Every member of INVESTIGATED_EVIDENCE_KEYS (parametrised; empty list produces a claim). - Every member of CLAIM_EVIDENCE_KEYS - INVESTIGATED_EVIDENCE_KEYS (parametrised twice: truthy produces a claim, empty does not). - 51 exhaustive metadata shapes from the mapper audit — none leak into claims in isolation or alongside an adjacent investigation key. - Guardrail assertions on the whitelist itself: investigation-subset property and no obvious-metadata-suffix offenders. - End-to-end ``diagnose_root_cause`` short-circuit test with a mixed evidence dict — LLM factory not invoked. Verification: - ``pytest``: 2931 pass, 1 skipped, 0 failures (was 2767 on main; +164). - ``ruff check`` / ``ruff format --check``: clean. - ``mypy app/``: unchanged pre-existing pymysql-stub warnings only. - EKS synthetic ``000-healthy`` e2e: 5 clean claims, short-circuit at 0ms.

rrajan94 · 2026-04-23T05:40:19Z

Verified locally:

Whitelist completeness: every key post_process.py mappers write is covered by CLAIM_EVIDENCE_KEYS (160 mapper keys audited against 58 whitelist entries).
pytest tests/nodes/root_cause_diagnosis/test_healthy_short_circuit.py: 164 passed.
make lint: clean.
make typecheck: 379 files, no issues.

Fix is well-scoped 👍

mayankbharati-ops added 2 commits April 23, 2026 07:35

rrajan94 merged commit cb178d4 into Tracer-Cloud:main Apr 23, 2026
7 checks passed

Devesh36 mentioned this pull request Apr 23, 2026

fix: recognise Alertmanager, Coralogix, and Honeycomb keys in is_clearly_healthy short-circuit #672

Closed

5 tasks

mayankbharati-ops mentioned this pull request Apr 23, 2026

fix(guardrails): redact union of overlapping spans, never leak sensitive bookends #780

Merged

This was referenced May 2, 2026

fix(root-cause): recognize healthy observability evidence #1095

Closed

[BUG] is_clearly_healthy short-circuit ignores Alertmanager, Coralogix, and Honeycomb evidence keys #670

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: emit healthy-short-circuit claims from investigation keys only#777

fix: emit healthy-short-circuit claims from investigation keys only#777
rrajan94 merged 3 commits intoTracer-Cloud:mainfrom
mayankbharati-ops:fix/healthy-short-circuit-claims-from-investigation-keys

mayankbharati-ops commented Apr 22, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Apr 22, 2026

Uh oh!

rrajan94 commented Apr 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mayankbharati-ops commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Fix

E2E validation — full synthetic suites, real Anthropic LLM

tests/synthetic/eks/000-healthy (short-circuit path — this PR's scope)

Full EKS suite — 14 scenarios, all mocked backends

Full RDS suite — 15 scenarios, --mock-grafana

Non-healthy regression check

Unit coverage — 188 tests in tests/nodes/root_cause_diagnosis/

Verification

Scope

Uh oh!

greptile-apps Bot commented Apr 22, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

rrajan94 commented Apr 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mayankbharati-ops commented Apr 22, 2026 •

edited

Loading

`tests/synthetic/eks/000-healthy` (short-circuit path — this PR's scope)

Full RDS suite — 15 scenarios, `--mock-grafana`

Unit coverage — 188 tests in `tests/nodes/root_cause_diagnosis/`