Adaptive incident window: expand-on-empty-deploy-timeline by hamzzaaamalik · Pull Request #1074 · Tracer-Cloud/opensre

hamzzaaamalik · 2026-04-29T08:33:52Z

PR 3 of the dynamic incident-window work. Builds on #951 and #954 (both merged).

What it does

Inserts a new adapt_window node on the diagnose → plan_actions loop-back edge. When the loop continues, one rule runs:

Widen the incident window when the deploy timeline came back empty for a shared-window query.

Specifically: if get_git_deploy_timeline ran this iteration with window.source == "shared_incident_window" and returned 0 commits, double the lookback (clamped at 7 days) and record the old window in state.incident_window_history. Next iteration sees the wider window.

Capped at 2 expansions per investigation (e.g. 120m → 240m → 480m, then stops). Caller-explicit windows are never overridden. Terminal routing bypasses the node entirely.

Files

New:

app/nodes/adapt_window/{rules.py, node.py, __init__.py}
tests/nodes/adapt_window/{test_rules.py, test_node.py}
tests/pipeline/test_graph_adapt_window_wiring.py

Modified:

app/incident_window.py — IncidentWindow.expanded() helper
app/state/agent_state.py — incident_window_history field (drift test green)
app/investigation_constants.py — MAX_EXPANSIONS = 2
app/nodes/__init__.py, app/pipeline/graph.py — wiring

Tests

52 new tests, 1993 passing in the broad sweep, 0 regressions.

Not in this PR

Window contraction on deploy anchor (PR 4), LLM-driven adaptation, migrating other tools to the shared window, diagnose narrative changes.

Foundation for PR 3 (adaptive window). The adapt_window node will call this when the deploy timeline came back empty for a shared-window query to widen the lookback for the next investigation iteration. Semantics: - until is preserved (anchor edge does not move) - since moves earlier by factor x current lookback - new lookback is clamped to MAX_LOOKBACK_MINUTES (7d) - source and confidence are preserved - factor must be > 1.0 (expansion only; contraction is a separate, deferred operation with different semantics) The fact of expansion is recorded by the caller in state.incident_window_history (added in the next commit), not on the window object itself. Keeps the value object minimal and the audit trail in one place. Tests cover: default factor=2.0, custom factor, MAX_LOOKBACK_MINUTES clamp, already-at-cap returns same width, returns new instance (frozen dataclass not mutated), preserves until anchor, preserves source and confidence, factor=1.0 and factor<1.0 rejected.

Append-only audit trail of windows replaced by the (incoming) adapt_window node. Each entry is the OLD window dict at the moment of replacement, plus ``replaced_at`` (ISO-8601) and ``replaced_reason`` (e.g. "expanded:empty_deploy_timeline"). The cap on entries lives in the adapt_window rule layer (MAX_EXPANSIONS) — the field itself is unbounded so future rules can record contractions, anchor refinements, etc. None until the first replacement. Diagnose narratives may cite this in a future PR to explain "we tried 120m, found no deploys, widened to 240m". Added in both AgentState (TypedDict) and AgentStateModel (Pydantic) so the existing drift test in test_agent_state_sync.py stays green. No node writes to this field yet — wiring lands in commit 4.

Pure decision logic (no LangChain imports — testable in isolation) for the upcoming adapt_window node. The rule widens state.incident_window when the GitDeployTimelineTool came back empty for a shared-window query, so the next investigation iteration looks further back. Guard chain (rule no-ops unless every guard passes): 1. state.incident_window is well-formed (IncidentWindow.from_dict). 2. state.incident_window_history has fewer than MAX_EXPANSIONS (= 2) entries — bounded so a pathological loop cannot run away. 3. STALE-SIGNAL GUARD: the deploy timeline action must be in the most recent state.executed_hypotheses[-1].actions. Without this, evidence from iteration 1 would re-fire the rule at the end of iteration 2 even when the tool didn't run again (caught in plan review). 4. evidence.git_deploy_timeline_window.source == "shared_incident_window". caller_explicit / tool_default / unset (== {}) all fall through, so a caller's explicit window is NEVER overridden. 5. evidence.git_deploy_timeline_count == 0. 6. Expansion would actually widen the window (i.e. not already at MAX_LOOKBACK_MINUTES — IncidentWindow.expanded() would return same width). When fired, the rule emits a state delta with the new window plus the OLD window appended to history with replaced_at + replaced_reason. Tests inject now_fn for deterministic ISO timestamps. 30 tests covering happy path, every guard short-circuit, the stale-signal guard specifically, multi-iteration accumulation, the MAX_EXPANSIONS cap, and defensive shape handling (malformed evidence, malformed history, malformed executed_hypotheses, immutable state). The node entry point lands in the next commit; this commit only adds the pure logic and the constant.

Thin @Traceable wrapper around adapt_incident_window from rules.py. Mirrors the existing convention: node_X function in node.py, package __init__.py re-exports the entry point, top-level app.nodes also re-exports for graph builder convenience. The node: - calls the pure rule with a copy of state (no mutation) - returns {} on no-op (LangGraph's "no state change" signal) - returns the rule's state delta when an expansion fires - logs at INFO + debug_print when an expansion happens so operators can audit which run widened the window and why Tests cover: state delta passthrough, no-op contract, completely empty state (early pipeline), state immutability, INFO log on fire, no INFO log on no-op, RunnableConfig kwarg accepted-and-ignored. The graph wiring lands in the next commit; nothing currently calls this node, so adding it does not affect any existing investigation.

Wire node_adapt_window into the LangGraph between diagnose and the loop-back to plan_actions. Terminal routing decisions (opensre_eval, publish) bypass it entirely — adaptation only runs when the investigation is continuing. Topology change: before: diagnose --[conditional "investigate"]--> plan_actions after: diagnose --[conditional "investigate"]--> adapt_window | v plan_actions The string "investigate" returned by route_investigation_loop is preserved (it means "loop again", not "go to the investigate node"); only the destination node mapping changes. No routing.py change needed. 6 graph-introspection smoke tests assert: node registered, loop edge present, unconditional adapt_window->plan_actions edge present, terminal paths bypass adapt_window, no direct diagnose->plan_actions edge remains (regression guard), pre-loop edges unchanged.

greptile-apps · 2026-04-29T08:43:40Z

Greptile Summary

This PR inserts a new adapt_window node on the diagnose → plan_actions loop-back edge. When looping continues, the node applies a pure rule that doubles the incident lookback window (capped at 7 days, max 2 expansions) when get_git_deploy_timeline ran in the most recent iteration against a shared window and returned 0 commits. Terminal paths bypass the node entirely.

The implementation is well-structured: the 8-guard rule chain is side-effect-free, the stale-signal guard (executed_hypotheses[-1].actions) correctly prevents re-firing across iterations when the deploy tool didn't run again, and the full-list replacement approach is correct for LangGraph's default reducer on incident_window_history.

Confidence Score: 5/5

Safe to merge — all remaining findings are P2 style/test quality nits with no impact on runtime behaviour.

Logic is sound, guard chain is exhaustive, state mutations are non-destructive, and 52 tests cover happy-path, each guard independently, multi-expansion, and defensive shapes. The two P2 findings (stale docstring, dead parametrized test cases) have zero effect on correctness or reliability.

No files require special attention.

Important Files Changed

Filename	Overview
app/nodes/adapt_window/rules.py	Pure rule function with a well-structured 8-guard chain. Stale-signal guard correctly prevents re-fire across iterations. History rebuilding returns the full list for LangGraph's replace reducer.
app/nodes/adapt_window/node.py	Thin LangGraph wrapper around the pure rule; logs INFO on expansion, returns rule's delta directly. @Traceable, dict(state) shallow copy is safe since the rule is read-only.
app/nodes/adapt_window/init.py	Package init — contains a stale docstring saying node.py is 'added in the next commit' when it is already present in this PR.
app/pipeline/graph.py	Correctly re-routes the 'investigate' conditional edge through adapt_window before plan_actions. Terminal paths bypass the node. Wiring tests confirm the contract.
app/state/agent_state.py	Adds incident_window_history to both AgentState and AgentStateModel, keeping them in sync. Default-replace reducer is appropriate since the rule returns the full accumulated list.
app/incident_window.py	Adds expanded() method — correctly widens lookback by factor, clamps to MAX_LOOKBACK_MINUTES, preserves until/source/confidence, raises on factor ≤ 1.0.
tests/nodes/adapt_window/test_rules.py	Thorough guard coverage, stale-signal class, multi-expansion tests, and defensive shape tests. One parametrized test has dead cases for factor≠EXPANSION_FACTOR that assert nothing.
tests/nodes/adapt_window/test_node.py	Good coverage of the wrapper contract: delta pass-through, no-op path, empty-state safety, non-mutation, INFO-log assertions, config kwarg acceptance.
tests/pipeline/test_graph_adapt_window_wiring.py	Graph-introspection tests verify the conditional edge path-map and unconditional adapt_window → plan_actions edge. Includes regression guard against direct diagnose → plan_actions edge.

Sequence Diagram

sequenceDiagram
    participant D as diagnose
    participant R as route_investigation_loop
    participant AW as adapt_window
    participant PA as plan_actions
    participant E as opensre_eval / publish

    D->>R: conditional edge
    alt route == investigate
        R->>AW: loop-back path
        AW->>AW: adapt_incident_window(state)
        note over AW: guard chain: window present? history < MAX_EXPANSIONS? deploy timeline ran this iter? evidence present? source == shared_incident_window? commits_count == 0? expansion actually widens?
        alt all guards pass
            AW-->>PA: {incident_window: widened, incident_window_history: [...old]}
        else any guard fails
            AW-->>PA: {} (no state change)
        end
    else route == opensre_eval or publish
        R->>E: terminal path (bypasses adapt_window)
    end

Comments Outside Diff (1)

tests/nodes/adapt_window/test_rules.py, line 970-982 (link)

Dead parametrized test cases assert nothing

For factor=3.0 and factor=4.0 the if factor == EXPANSION_FACTOR guard is always False, so those two parametrized cases execute zero assertions and pass trivially. They inflate the test count without providing any coverage.

Since the actual intent is to pin the constant, a single direct assertion would be cleaner:
```
def test_expansion_factor_is_two() -> None:
    """Pin EXPANSION_FACTOR so changes are caught immediately."""
    from app.nodes.adapt_window.rules import EXPANSION_FACTOR
    assert EXPANSION_FACTOR == 2.0
```

_{Reviews (1): Last reviewed commit: "feat(pipeline): route investigate loop t..." | Re-trigger Greptile}

…string

github-actions · 2026-04-29T09:40:01Z

🧑‍💻 @hamzzaaamalik has entered the contributor hall of fame. Merged. Done. Shipped. Go touch grass (then come back with another PR). 🌱

👋 Join us on Discord - OpenSRE : hang out, contribute, or hunt for features and issues. Everyone's welcome.

hamzzaaamalik added 5 commits April 29, 2026 13:24

greptile-apps Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread app/nodes/adapt_window/__init__.py

docs(adapt-window): drop stale "next commit" reference in package doc…

0b7c6e6

…string

hamzzaaamalik merged commit 142a7b8 into Tracer-Cloud:main Apr 29, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adaptive incident window: expand-on-empty-deploy-timeline#1074

Adaptive incident window: expand-on-empty-deploy-timeline#1074
hamzzaaamalik merged 6 commits intoTracer-Cloud:mainfrom
hamzzaaamalik:incident-window-adaptive-expansion

hamzzaaamalik commented Apr 29, 2026

Uh oh!

greptile-apps Bot commented Apr 29, 2026 •

edited

Loading

Comments Outside Diff (1)

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hamzzaaamalik commented Apr 29, 2026

What it does

Files

Tests

Not in this PR

Uh oh!

greptile-apps Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Comments Outside Diff (1)

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps Bot commented Apr 29, 2026 •

edited

Loading