Product Experience: OpenSRE as an Interactive AI SRE Terminal
Today, OpenSRE is a one-shot CLI: you run opensre investigate -i alert.json, it executes the pipeline (extract → plan → investigate → diagnose → publish), prints results, and exits.
The proposal is to transform this into a persistent, conversational SRE session, the same product experience Claude Code or OpenClaw brought to development, but for incident response.
Background
Claude Code has set the new standard in how terminal applications should be:
- interactive and providing direct feedback as well as providing the opportunity to add additional input during execution
- This is also how product excellence looks like today
Product Requirements
- Stream everything. Characters from the RCA reports appear as they're generated. Silence longer than 200ms gets a spinner.
- Zero-exit architecture, meaning that the terminal app never closes. Every command returns to the prompt. Delete: exit codes, one-shot flags, --output files.
- Identity uptop always with baanner, version, model, status and visible on launch. The user knows exactly what they're talking to.
- Interrupt, don't queue. User input is always accepted, even mid-operation.
- Do not ask for permission ever behind single [Y/n]. Provide a "trust mode" toggle. Delete: multi-step wizards, --dry-run as default.
- Show the work, with user sees what's happening (files changed, commands run) but never internal state, debug logs, or stack traces
- The session remembers everything said. Previous answers inform future ones. No re-explaining. Delete: stateless invocations, repeated boilerplate.
What this is NOT
- Not a full chat agent (we already have chat mode). This is investigation-first with conversational refinement.
- Not a dashboard. It's a terminal. Power users live here.
- Not autonomous-by-default. The human is in the loop, steering the investigation, adding context the agent can't see (like "we just deployed").
Migration Plan:
Phase 1: Decouple
- Unify runners.py and graph_pipeline.py into one execution path
- Kill argparse shim, route everything through Click
- Extract a Renderer protocol so nodes emit events, renderers are pluggable
- Decompose node_publish_findings into pure functions + side-effect handlers
- Wrap AgentState in a SessionState that persists across investigations
Phase 2: Rebuild
- REPL loop: opensre with no args enters a persistent session
- Command router: classify input as new alert, follow-up, slash command, or tool call
- Streaming: graph nodes yield events, tool results appear individually, LLM streams tokens
- Interrupt/redirect: Ctrl+C cancels in-flight tools but keeps evidence, user input pivots the plan
- Approval gates: auto-run read-only tools, [Y/n] for destructive actions, /trust skips all
- Context accumulation: reuse infra context across alerts, accept corrections mid-session
- Stack: prompt_toolkit + Rich + asyncio
Product Experience: OpenSRE as an Interactive AI SRE Terminal
Today, OpenSRE is a one-shot CLI: you run opensre investigate -i alert.json, it executes the pipeline (extract → plan → investigate → diagnose → publish), prints results, and exits.
The proposal is to transform this into a persistent, conversational SRE session, the same product experience Claude Code or OpenClaw brought to development, but for incident response.
Background
Claude Code has set the new standard in how terminal applications should be:
Product Requirements
What this is NOT
Migration Plan:
Phase 1: Decouple
Phase 2: Rebuild