Skip to content

Improve the alert-extraction prompt so RCA-critical fields survive early parsing #655

@davincios

Description

@davincios

Summary

The alert-extraction prompt in app/nodes/extract_alert/extract.py decides whether an incoming message is noise and extracts RCA-critical fields like alert_source, kube_namespace, error_message, log_query, and Kubernetes identifiers. If it misses or misclassifies these fields, the downstream investigation can start from the wrong context and produce a worse RCA even if the later prompts are strong.

Goal

Improve the extraction prompt so synthetic Kubernetes and RDS alerts preserve the routing and context fields needed for reliable downstream RCA.

Acceptance Criteria

  • The prompt remains conservative about is_noise and does not suppress real alerts.
  • It extracts RCA-critical routing fields more reliably for synthetic alerts in tests/synthetic/.
  • Kubernetes-specific identifiers and source hints are preserved when present.
  • The extraction change demonstrably improves or hardens downstream RCA behavior on at least one synthetic Kubernetes or RDS case.

Required Proof

Use synthetic alert payloads from tests/synthetic/, including at least one of:

  • an RDS case from tests/synthetic/rds_postgres/*/alert.json
  • tests/synthetic/eks/000-healthy/alert.json

Include all of the following in the PR or issue follow-up:

  • the exact scenario(s) and alert payload(s) used
  • the baseline extracted fields or misclassification before the prompt change
  • the improved extracted fields after the prompt change
  • a downstream RCA check showing the same scenario is at least as reliable after the extraction fix, using a synthetic suite command such as python -m tests.synthetic.rds_postgres.run_suite --scenario <scenario> --mock-grafana or python -m tests.synthetic.eks.run_suite --scenario 000-healthy --mock-backends

Notes

This work should stay focused on prompt quality first. If a small follow-on parser or schema tweak is needed to make the prompt useful, keep that tightly scoped and justified by the synthetic evidence.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions