Skip to content

Incident window tool integration#954

Merged
hamzzaaamalik merged 4 commits intoTracer-Cloud:mainfrom
hamzzaaamalik:incident-window-tool-integration
Apr 28, 2026
Merged

Incident window tool integration#954
hamzzaaamalik merged 4 commits intoTracer-Cloud:mainfrom
hamzzaaamalik:incident-window-tool-integration

Conversation

@hamzzaaamalik
Copy link
Copy Markdown
Collaborator

PR 2 of the dynamic incident-window work. Stacked on PR 1 (foundation).

What was needed:
PR 1 added state.incident_window populated by extract_alert from the
alert's own timestamps. But no tool reads it yet. This PR proves the
end-to-end pattern with one tool: GitDeployTimelineTool.

What this PR does:

  • app/nodes/investigate/models.py: InvestigateInput gains a typed
    incident_window: dict | None. from_state populates it from
    state.incident_window with a defensive isinstance check.

  • app/nodes/plan_actions/plan_actions.py: after detect_sources runs,
    the wrapping step attaches available_sources["_meta"] = {
    "incident_window": } when input_data.incident_window is set.
    The "_meta" key is reserved for investigation-level context shared
    across tools — today only the incident window, future PRs may add
    window history. The key is OMITTED when state has no window so the
    sources dict shape stays clean for tools that don't opt in.

  • app/tools/GitDeployTimelineTool/init.py:

    • extract_params reads sources.get("_meta", {}).get(
      "incident_window") and threads it as the shared_incident_window
      kwarg. Defensive isinstance guards against non-dict _meta or
      non-dict incident_window so a future upstream bug cannot crash
      the tool.
    • The function gains a new shared_incident_window: dict | None
      parameter. window_minutes_before_alert default changed from
      DEFAULT_WINDOW_MINUTES to None so the tool can distinguish
      "caller didn't override" from "caller wants 120 min".
    • Resolution priority (highest -> lowest):
      1. caller-explicit since AND until
      2. caller-explicit window_minutes_before_alert
      3. shared_incident_window from agent state (parsed via
        IncidentWindow.from_dict; None on bad shape -> falls
        through to default)
      4. DEFAULT_WINDOW_MINUTES (120 min)
    • Returned window dict gains a "source" field that reports
      "shared_incident_window" or "tool_default" so the diagnose
      narrative can explain where the window came from.
  • app/nodes/plan_actions/build_prompt.py: one-line update to the
    GitHub planning hint telling the agent the tool now auto-picks up
    the shared incident window when no explicit args are passed.

What's not in this PR (deferred to follow-ups):

  • No other tool migrated yet (Datadog Context, Grafana Logs, etc.).
    Each is a small ~50-line follow-up using the same pattern.
  • No adaptive expansion / contraction logic. PR 3 adds that.

Tests (14 new):
Tool integration:
- extract_params reads _meta.incident_window and passes it through
- extract_params handles missing _meta key (returns None)
- extract_params defensive against non-dict _meta (returns None)
- extract_params defensive against non-dict incident_window
(returns None)
- run uses shared window when caller passes no overrides
- caller since/until overrides shared window (caller wins)
- caller window_minutes_before_alert overrides shared window
- falls back to DEFAULT_WINDOW_MINUTES when no shared window present
- falls back to default when shared window dict is malformed

Wiring contract:
- InvestigateInput surfaces incident_window from state
- InvestigateInput defaults to None when state has no window
- InvestigateInput rejects non-dict incident_window (defensive)
- _meta.incident_window attached when state has one
- _meta key omitted entirely when state has no window

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 25, 2026

Greptile Summary

This PR wires the incident_window value object (introduced in PR 1) end-to-end through the investigation pipeline so GitDeployTimelineTool auto-picks the alert's own timestamp window instead of always defaulting to "last 120 minutes from now." The _meta channel in available_sources carries the window dict to opt-in tools without polluting the source-specific dicts.

Confidence Score: 5/5

Safe to merge — all remaining findings are P2 style/diagnostic-accuracy suggestions with no runtime impact.

The implementation is clean and well-tested (14 new tests covering all priority branches, defensive paths, and backward-compat). The only finding is a P2: the source field in the window dict conflates "caller-explicit" and "tool-default" cases, which slightly misleads the diagnose narrative but has no effect on commit retrieval or window correctness. No P0/P1 issues found.

app/tools/GitDeployTimelineTool/init.py — the source label on line 303 conflates caller-explicit and tool-default windows.

Important Files Changed

Filename Overview
app/incident_window.py New value object and resolver with thorough validation: frozen dataclass, UTC normalisation, clock-skew guard, and defensive parsers for Alertmanager, PagerDuty, Datadog, and CloudWatch. No issues found.
app/tools/GitDeployTimelineTool/init.py Adds shared-window resolution path. Minor: source field in window dict reports "tool_default" for both actual fallback and explicit caller timestamps, making the diagnose narrative field inaccurate for the caller-explicit case.
app/nodes/investigate/models.py Adds incident_window field to InvestigateInput with defensive isinstance guard in from_state. Clean and correct.
app/nodes/plan_actions/plan_actions.py Attaches _meta.incident_window to available_sources only when the window is present; omits key entirely otherwise. Well-placed and documented.
app/state/agent_state.py Adds `incident_window: dict[str, Any]
app/nodes/extract_alert/extract_node.py Passes original raw_alert (not the enriched dict) to resolve_incident_window, correctly preserving timestamps from JSON-string payloads. The inline comment explains the reasoning clearly.
tests/tools/test_git_deploy_timeline_tool.py 14 new tests cover extract_params defensive paths, run-level priority overrides, and backward-compat fallback. Good coverage, all assertions are meaningful.
tests/app/test_incident_window.py Comprehensive fuzz, edge-case, parser-variant, and serialisation round-trip tests for the new IncidentWindow type and resolver. Well-structured and thorough.

Sequence Diagram

sequenceDiagram
    participant EA as extract_alert node
    participant AS as AgentState
    participant II as InvestigateInput
    participant PA as plan_actions node
    participant AVS as available_sources["_meta"]
    participant EP as GitDeployTimelineTool.extract_params
    participant RUN as get_git_deploy_timeline()

    EA->>EA: resolve_incident_window(raw_alert)
    EA->>AS: state["incident_window"] = window.to_dict()

    AS->>II: InvestigateInput.from_state(state)
    note over II: isinstance guard: non-dict → None

    II->>PA: input_data.incident_window

    alt incident_window is not None
        PA->>AVS: available_sources["_meta"] = {"incident_window": ...}
    else no window
        note over AVS: _meta key omitted entirely
    end

    AVS->>EP: sources["_meta"]["incident_window"]
    EP->>RUN: shared_incident_window=...

    alt no since/until/window_minutes override
        RUN->>RUN: IncidentWindow.from_dict(shared_incident_window)
        RUN->>RUN: use shared window → source="shared_incident_window"
    else caller provided overrides
        RUN->>RUN: ignore shared window → source="tool_default"
    end
Loading

Reviews (1): Last reviewed commit: "feat(incident-window): wire GitDeployTim..." | Re-trigger Greptile

Comment thread app/tools/GitDeployTimelineTool/__init__.py Outdated
@hamzzaaamalik hamzzaaamalik merged commit a0fb857 into Tracer-Cloud:main Apr 28, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant