feat(alerts): normalize incoming payloads to OpenSRE canonical format#977
Conversation
- add shared alert normalizer for vendor-specific schemas - derive consistent labels/annotations and process metadata (process_name, cmdline, pid) - apply normalization in make_initial_state so all ingestion paths are normalized - add state tests covering datadog tags and grafana labels/annotations Refs Tracer-Cloud#822
Greptile SummaryThis PR introduces a centralized
Confidence Score: 4/5Safe to merge after fixing the shared-dict alias in normalize.py to prevent silent state corruption in downstream nodes. One P1 defect (aliased mutable dicts between commonLabels and canonical_alert.labels) is a present correctness issue — any node that enriches labels would corrupt the other field. P2s are edge-case quality items. Core logic, factory integration, and tests are otherwise solid. app/alerts/normalize.py — shared dict alias (P1) and two P2 edge-case gaps. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Raw Alert Payload] --> B{is dict?}
B -- No / string --> Z[Pass through unchanged]
B -- Yes --> C[strip_scoring_points if needed]
C --> D[normalize_alert_payload]
D --> E[_as_mapping commonLabels / labels]
D --> F[_parse_tags tags]
E & F --> G[merged labels dict]
D --> H[_as_mapping commonAnnotations / annotations]
G & H --> I[_first_present: process_name / cmdline / pid]
I --> J[Backfill top-level process fields]
G & H & J --> K[Build canonical_alert envelope\nschema: opensre.alert.v1]
K --> L[normalized dict with\ncommonLabels + commonAnnotations\n+ process fields + canonical_alert]
L --> M[AgentStateModel.model_validate\nraw_alert = normalized]
Reviews (1): Last reviewed commit: "feat(alerts): normalize incoming payload..." | Re-trigger Greptile |
| normalized["commonLabels"] = labels | ||
| normalized["commonAnnotations"] = annotations |
There was a problem hiding this comment.
Shared mutable dict aliased across two keys
normalized["commonLabels"] and canonical_alert["labels"] (line 173) are assigned the same labels dict object. Any downstream node that appends or modifies entries in canonical_alert["labels"] will silently mutate commonLabels (and vice versa). The same alias exists for annotations / commonAnnotations. Since LangGraph nodes frequently enrich labels in place, this will cause hard-to-trace state corruption.
In the canonical_alert dict construction, pass copies:
"labels": dict(labels),
"annotations": dict(annotations),| def _coerce_pid(value: Any) -> int | None: | ||
| if value is None: | ||
| return None | ||
| if isinstance(value, int): | ||
| return value if value >= 0 else None | ||
| text = _to_text(value) | ||
| if text is None: | ||
| return None | ||
| try: | ||
| pid = int(text) | ||
| except ValueError: | ||
| return None | ||
| return pid if pid >= 0 else None |
There was a problem hiding this comment.
_coerce_pid silently drops float PID values
If a JSON alert payload serializes the PID as a float (e.g., {"pid": 4242.0}), isinstance(value, int) returns False in Python 3, _to_text(4242.0) produces "4242.0", and int("4242.0") raises ValueError, causing None to be returned. The PID is silently lost. Adding a float guard before the string path covers this case.
| def _coerce_pid(value: Any) -> int | None: | |
| if value is None: | |
| return None | |
| if isinstance(value, int): | |
| return value if value >= 0 else None | |
| text = _to_text(value) | |
| if text is None: | |
| return None | |
| try: | |
| pid = int(text) | |
| except ValueError: | |
| return None | |
| return pid if pid >= 0 else None | |
| def _coerce_pid(value: Any) -> int | None: | |
| if value is None: | |
| return None | |
| if isinstance(value, int): | |
| return value if value >= 0 else None | |
| if isinstance(value, float) and value.is_integer(): | |
| pid = int(value) | |
| return pid if pid >= 0 else None | |
| text = _to_text(value) | |
| if text is None: | |
| return None | |
| try: | |
| pid = int(text) | |
| except ValueError: | |
| return None | |
| return pid if pid >= 0 else None |
| labels = _as_mapping(normalized.get("commonLabels")) | ||
| if not labels: | ||
| labels = _as_mapping(normalized.get("labels")) |
There was a problem hiding this comment.
Empty
commonLabels: {} causes fallback to labels field unexpectedly
_as_mapping({}) returns {}, which is falsy, so the if not labels branch falls through and uses the labels field instead. A payload that explicitly supplies commonLabels: {} alongside a labels dict will have its commonLabels silently replaced. Consider checking is None on the raw value rather than testing truthiness of the parsed result:
| labels = _as_mapping(normalized.get("commonLabels")) | |
| if not labels: | |
| labels = _as_mapping(normalized.get("labels")) | |
| _raw_common_labels = normalized.get("commonLabels") | |
| labels = _as_mapping(_raw_common_labels) if _raw_common_labels is not None else _as_mapping(normalized.get("labels")) |
|
@7vignesh pls fix reviews |
|
@muddlebee ready for review! |
|
lets wait for feedback from @w3joe :) |
|
looks good, thanks! |
|
🌮 @7vignesh's PR: showed up unannounced, improved everything, left zero bugs. Just like a perfect taco. 🌮 👋 Join us on Discord - OpenSRE : hang out, contribute, or hunt for features and issues. Everyone's welcome. |

Fixes #822
Describe the changes you have made in this PR -
commonLabelsandcommonAnnotations.process_name,cmdline, andpid.alert_name,pipeline_name,severity,alert_source,labels,annotations,process).Demo/Screenshot for feature changes and bug fixes -
Validation summary:
Code Understanding and AI Usage
Did you use AI assistance (ChatGPT, Claude, Copilot, etc.) to write any part of this code?
If you used AI assistance:
Explain your implementation approach:
Checklist before requesting a review