Skip to content

feat(alerts): normalize incoming payloads to OpenSRE canonical format#977

Merged
w3joe merged 3 commits intoTracer-Cloud:mainfrom
7vignesh:issue/822-standardized-alert-format
May 1, 2026
Merged

feat(alerts): normalize incoming payloads to OpenSRE canonical format#977
w3joe merged 3 commits intoTracer-Cloud:mainfrom
7vignesh:issue/822-standardized-alert-format

Conversation

@7vignesh
Copy link
Copy Markdown
Contributor

Fixes #822

Describe the changes you have made in this PR -

  • Added centralized alert normalization so incoming payloads are converted to a canonical OpenSRE format before downstream processing.
  • Added consistent normalization for commonLabels and commonAnnotations.
  • Added process metadata extraction/backfill for process_name, cmdline, and pid.
  • Added canonical alert envelope fields (alert_name, pipeline_name, severity, alert_source, labels, annotations, process).
  • Integrated normalization in investigation state initialization so all ingestion paths share the same format.
  • Added focused tests for Datadog tag parsing, Grafana-style label/annotation mapping, and process metadata extraction.

Demo/Screenshot for feature changes and bug fixes -

Validation summary:

  • Targeted tests passed (state, extract, CLI, remote interaction suites).
  • Lint + format checks passed.
  • Targeted mypy checks for changed files passed.

Code Understanding and AI Usage

Did you use AI assistance (ChatGPT, Claude, Copilot, etc.) to write any part of this code?

  • No, I wrote all the code myself
  • Yes, I used AI assistance (continue below)

If you used AI assistance:

  • I have reviewed every single line of the AI-generated code
  • I can explain the purpose and logic of each function/component I added
  • I have tested edge cases and understand how the code handles them
  • I have modified the AI output to follow this project's coding standards and conventions

Explain your implementation approach:

  • Solved schema inconsistency across alert providers by normalizing once at state initialization.
  • Chose centralized normalization to avoid repeating mapping logic across adapters/nodes.
  • Preserved original payload while adding canonical fields for reliable downstream use.

Checklist before requesting a review

  • I have added proper PR title and linked to the issue
  • I have performed a self-review of my code
  • I can explain the purpose of every function, class, and logic block I added
  • I understand why my changes work and have tested them thoroughly
  • I have considered potential edge cases and how my code handles them
  • If it is a core feature, I have added thorough tests
  • My code follows the project's style guidelines and conventions

- add shared alert normalizer for vendor-specific schemas

- derive consistent labels/annotations and process metadata (process_name, cmdline, pid)

- apply normalization in make_initial_state so all ingestion paths are normalized

- add state tests covering datadog tags and grafana labels/annotations

Refs Tracer-Cloud#822
Copilot AI review requested due to automatic review settings April 26, 2026 11:36
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 26, 2026

Greptile Summary

This PR introduces a centralized normalize_alert_payload function that converts vendor-specific alert payloads (Datadog tags, Grafana labels/annotations) into a stable opensre.alert.v1 canonical envelope before any downstream graph nodes run. Integration into make_initial_state is clean and scoped correctly to the dict-payload branch.

  • P1 – aliased mutable dicts: normalized[\"commonLabels\"] and canonical_alert[\"labels\"] reference the same object; mutations by downstream nodes to one silently corrupt the other. Same alias applies to annotations.
  • P2 – falsy empty-dict fallback: an explicit commonLabels: {} is indistinguishable from commonLabels being absent, causing unintended fallthrough to the labels field.
  • P2 – float PIDs dropped silently: _coerce_pid raises ValueError on \"4242.0\" and returns None, losing valid PIDs from sources that serialize integers as floats.

Confidence Score: 4/5

Safe to merge after fixing the shared-dict alias in normalize.py to prevent silent state corruption in downstream nodes.

One P1 defect (aliased mutable dicts between commonLabels and canonical_alert.labels) is a present correctness issue — any node that enriches labels would corrupt the other field. P2s are edge-case quality items. Core logic, factory integration, and tests are otherwise solid.

app/alerts/normalize.py — shared dict alias (P1) and two P2 edge-case gaps.

Important Files Changed

Filename Overview
app/alerts/normalize.py New normalization module; contains a shared mutable dict alias between commonLabels/canonical_alert.labels (P1), a falsy-check fallback that ignores explicit empty commonLabels (P2), and a silent float-PID drop (P2).
app/alerts/init.py Thin module init re-exporting normalize_alert_payload; no issues.
app/state/factory.py Integrates normalize_alert_payload into make_initial_state correctly within the dict-only branch, preserving existing rubric-stripping logic.
tests/test_state.py Two focused integration tests for Datadog tags and Grafana label/annotation normalization; no tests for the shared-alias or float-PID edge cases.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Raw Alert Payload] --> B{is dict?}
    B -- No / string --> Z[Pass through unchanged]
    B -- Yes --> C[strip_scoring_points if needed]
    C --> D[normalize_alert_payload]
    D --> E[_as_mapping commonLabels / labels]
    D --> F[_parse_tags tags]
    E & F --> G[merged labels dict]
    D --> H[_as_mapping commonAnnotations / annotations]
    G & H --> I[_first_present: process_name / cmdline / pid]
    I --> J[Backfill top-level process fields]
    G & H & J --> K[Build canonical_alert envelope\nschema: opensre.alert.v1]
    K --> L[normalized dict with\ncommonLabels + commonAnnotations\n+ process fields + canonical_alert]
    L --> M[AgentStateModel.model_validate\nraw_alert = normalized]
Loading

Reviews (1): Last reviewed commit: "feat(alerts): normalize incoming payload..." | Re-trigger Greptile

Comment thread app/alerts/normalize.py
Comment on lines +104 to +105
normalized["commonLabels"] = labels
normalized["commonAnnotations"] = annotations
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Shared mutable dict aliased across two keys

normalized["commonLabels"] and canonical_alert["labels"] (line 173) are assigned the same labels dict object. Any downstream node that appends or modifies entries in canonical_alert["labels"] will silently mutate commonLabels (and vice versa). The same alias exists for annotations / commonAnnotations. Since LangGraph nodes frequently enrich labels in place, this will cause hard-to-trace state corruption.

In the canonical_alert dict construction, pass copies:

"labels": dict(labels),
"annotations": dict(annotations),

Comment thread app/alerts/normalize.py
Comment on lines +57 to +69
def _coerce_pid(value: Any) -> int | None:
if value is None:
return None
if isinstance(value, int):
return value if value >= 0 else None
text = _to_text(value)
if text is None:
return None
try:
pid = int(text)
except ValueError:
return None
return pid if pid >= 0 else None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 _coerce_pid silently drops float PID values

If a JSON alert payload serializes the PID as a float (e.g., {"pid": 4242.0}), isinstance(value, int) returns False in Python 3, _to_text(4242.0) produces "4242.0", and int("4242.0") raises ValueError, causing None to be returned. The PID is silently lost. Adding a float guard before the string path covers this case.

Suggested change
def _coerce_pid(value: Any) -> int | None:
if value is None:
return None
if isinstance(value, int):
return value if value >= 0 else None
text = _to_text(value)
if text is None:
return None
try:
pid = int(text)
except ValueError:
return None
return pid if pid >= 0 else None
def _coerce_pid(value: Any) -> int | None:
if value is None:
return None
if isinstance(value, int):
return value if value >= 0 else None
if isinstance(value, float) and value.is_integer():
pid = int(value)
return pid if pid >= 0 else None
text = _to_text(value)
if text is None:
return None
try:
pid = int(text)
except ValueError:
return None
return pid if pid >= 0 else None

Comment thread app/alerts/normalize.py Outdated
Comment on lines +92 to +94
labels = _as_mapping(normalized.get("commonLabels"))
if not labels:
labels = _as_mapping(normalized.get("labels"))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Empty commonLabels: {} causes fallback to labels field unexpectedly

_as_mapping({}) returns {}, which is falsy, so the if not labels branch falls through and uses the labels field instead. A payload that explicitly supplies commonLabels: {} alongside a labels dict will have its commonLabels silently replaced. Consider checking is None on the raw value rather than testing truthiness of the parsed result:

Suggested change
labels = _as_mapping(normalized.get("commonLabels"))
if not labels:
labels = _as_mapping(normalized.get("labels"))
_raw_common_labels = normalized.get("commonLabels")
labels = _as_mapping(_raw_common_labels) if _raw_common_labels is not None else _as_mapping(normalized.get("labels"))

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@muddlebee
Copy link
Copy Markdown
Collaborator

@7vignesh pls fix reviews

@7vignesh
Copy link
Copy Markdown
Contributor Author

@muddlebee ready for review!

@muddlebee
Copy link
Copy Markdown
Collaborator

lets wait for feedback from @w3joe :)

@w3joe
Copy link
Copy Markdown
Collaborator

w3joe commented May 1, 2026

looks good, thanks!

@w3joe w3joe merged commit 7391c4f into Tracer-Cloud:main May 1, 2026
7 checks passed
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 1, 2026

🌮 @7vignesh's PR: showed up unannounced, improved everything, left zero bugs. Just like a perfect taco. 🌮


👋 Join us on Discord - OpenSRE : hang out, contribute, or hunt for features and issues. Everyone's welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Centralised standardised alert format for opensre

4 participants