Skip to content

fix(extract): prevent health checks with state=normal from being marked as noise#1096

Merged
Devesh36 merged 1 commit intoTracer-Cloud:mainfrom
AarushSharmaa:fix/655-is-noise-health-check
Apr 30, 2026
Merged

fix(extract): prevent health checks with state=normal from being marked as noise#1096
Devesh36 merged 1 commit intoTracer-Cloud:mainfrom
AarushSharmaa:fix/655-is-noise-health-check

Conversation

@AarushSharmaa
Copy link
Copy Markdown
Contributor

@AarushSharmaa AarushSharmaa commented Apr 30, 2026

What and why

Closes #655.

The is_noise classifier in extract_alert_details was silently dropping scheduled health checks and normal-state payloads. The prompt already said health checks are not noise, but the wording was not strong enough. The LLM would see state: normal or a summary like "Periodic health check passed. All Kubernetes signals within normal operating bounds." and override the rule anyway.

This PR adds one clarifying sentence to the is_noise=false rule to name the exact patterns that were being misclassified.

Change

One sentence added to the prompt in app/nodes/extract_alert/extract.py (line 57).

Before:

is_noise=false (default) for: any alert, error, failure, incident, warning, monitoring notification (including health checks and informational states).

After:

is_noise=false (default) for: any alert, error, failure, incident, warning, monitoring notification (including health checks and informational states). A payload with state=normal, a scheduled health check, or a summary saying "no errors found" is still a monitoring event and must not be treated as noise.

No logic, schema, or import changes.

Proof

Tested against all 29 synthetic alert fixtures in tests/synthetic/ using the Groq API (llama-3.3-70b-versatile, temp=0, same prompt template as production).

Alert Before After
eks/000-healthy (state=normal health check) is_noise=True (wrong) is_noise=False (correct)
All other 28 alerts is_noise=False (correct) is_noise=False (correct)

Result: 28/29 to 29/29. Zero regressions.

Full run output (29 alerts, new prompt)
Running all 29 synthetic alerts through the NEW prompt...

  ok     is_noise=False  eks/000-healthy
  ok     is_noise=False  eks/001-oomkilled-crashloop
  ok     is_noise=False  eks/002-image-pull-backoff
  ok     is_noise=False  eks/003-pending-insufficient-resources
  ok     is_noise=False  eks/004-liveness-probe-killing
  ok     is_noise=False  eks/005-resource-quota-exceeded
  ok     is_noise=False  eks/006-dns-resolution-failure
  ok     is_noise=False  eks/007-node-not-ready
  ok     is_noise=False  eks/008-deployment-rollout-stuck
  ok     is_noise=False  eks/009-noisy-healthy-restart-recovered
  ok     is_noise=False  eks/010-red-herring-old-rollout
  ok     is_noise=False  eks/011-recovered-rollout
  ok     is_noise=False  eks/012-pending-recovered
  ok     is_noise=False  eks/013-spurious-alert-storm
  ok     is_noise=False  rds_postgres/000-healthy
  ok     is_noise=False  rds_postgres/001-replication-lag
  ok     is_noise=False  rds_postgres/002-connection-exhaustion
  ok     is_noise=False  rds_postgres/003-storage-full
  ok     is_noise=False  rds_postgres/004-cpu-saturation-bad-query
  ok     is_noise=False  rds_postgres/005-failover
  ok     is_noise=False  rds_postgres/006-replication-lag-cpu-redherring
  ok     is_noise=False  rds_postgres/007-connection-pressure-noisy-healthy
  ok     is_noise=False  rds_postgres/008-storage-full-missing-metric
  ok     is_noise=False  rds_postgres/009-dual-fault-connection-cpu
  ok     is_noise=False  rds_postgres/010-replication-lag-missing-metric
  ok     is_noise=False  rds_postgres/011-cpu-storage-compositional
  ok     is_noise=False  rds_postgres/012-replication-lag-misleading-events
  ok     is_noise=False  rds_postgres/013-storage-recovery-false-alert
  ok     is_noise=False  rds_postgres/014-checkpoint-storm-cpu-saturation

Result: 29/29 correct
Zero regressions.

Quality gates

  • ruff check app/nodes/extract_alert/extract.py: all checks passed
  • mypy app/nodes/extract_alert/extract.py: no issues found

…ed as noise

Adds one clarifying sentence to the is_noise=false rule so the LLM does
not silently drop scheduled health checks or normal-state payloads.

Tested against all 29 synthetic alert fixtures (eks + rds_postgres).
Before: 28/29 correct — eks/000-healthy misclassified as is_noise=true.
After:  29/29 correct — no regressions.

Closes Tracer-Cloud#655

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 30, 2026

Greptile Summary

This PR adds one clarifying sentence to the is_noise classifier prompt in extract_alert_details to explicitly tell the LLM that payloads with state=normal, scheduled health checks, and "no errors found" summaries are monitoring events — not noise. The fix is a targeted prompt-only change with no logic, schema, or import modifications, and it resolves a misclassification on the eks/000-healthy synthetic fixture.

Confidence Score: 5/5

Safe to merge — single-line prompt clarification with no logic changes, validated against 29 synthetic fixtures with zero regressions.

The change is a one-sentence addition to a prompt string. All remaining findings are P2 style/quality suggestions that do not affect correctness. The PR author demonstrated 29/29 fixture pass rate and confirmed ruff/mypy green.

No files require special attention.

Important Files Changed

Filename Overview
app/nodes/extract_alert/extract.py Single-line prompt clarification: extends the is_noise=false rule with three concrete examples (state=normal, scheduled health checks, "no errors found") to prevent LLM misclassification. No logic or schema changes.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant extract_alert_details
    participant LLM
    participant AlertDetails

    Caller->>extract_alert_details: state (raw_alert)
    extract_alert_details->>extract_alert_details: _format_raw_alert()
    note over extract_alert_details: Builds prompt with<br/>is_noise classifier rules<br/>(incl. new state=normal sentence)
    extract_alert_details->>LLM: prompt (classify + extract)
    alt LLM succeeds
        LLM-->>extract_alert_details: AlertDetails (is_noise, fields…)
        extract_alert_details-->>Caller: AlertDetails
    else LLM fails
        extract_alert_details->>extract_alert_details: _fallback_details()<br/>is_noise=False always
        extract_alert_details-->>Caller: AlertDetails (fallback)
    end
Loading

Reviews (1): Last reviewed commit: "fix(extract): prevent health checks with..." | Re-trigger Greptile

@Devesh36 Devesh36 merged commit d093319 into Tracer-Cloud:main Apr 30, 2026
7 checks passed
@github-actions
Copy link
Copy Markdown
Contributor

🎲 Researchers are baffled. @AarushSharmaa opened a PR, got it reviewed without drama, and merged clean. This violates known laws of open source. 🔬


👋 Join us on Discord - OpenSRE : hang out, contribute, or hunt for features and issues. Everyone's welcome.

gitsofaryan pushed a commit to gitsofaryan/opensre that referenced this pull request May 3, 2026
…ed as noise (Tracer-Cloud#1096)

Adds one clarifying sentence to the is_noise=false rule so the LLM does
not silently drop scheduled health checks or normal-state payloads.

Tested against all 29 synthetic alert fixtures (eks + rds_postgres).
Before: 28/29 correct — eks/000-healthy misclassified as is_noise=true.
After:  29/29 correct — no regressions.

Closes Tracer-Cloud#655

Co-authored-by: Aarush Sharma <[email protected]>
Co-authored-by: Claude Sonnet 4.6 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve the alert-extraction prompt so RCA-critical fields survive early parsing

2 participants