Skip to content

feat: interactive REPL — the first step toward a Claude-Code-style SRE terminal#591

Merged
VaibhavUpreti merged 52 commits intoTracer-Cloud:mainfrom
yashksaini-coder:feat/repl-terminal
Apr 29, 2026
Merged

feat: interactive REPL — the first step toward a Claude-Code-style SRE terminal#591
VaibhavUpreti merged 52 commits intoTracer-Cloud:mainfrom
yashksaini-coder:feat/repl-terminal

Conversation

@yashksaini-coder
Copy link
Copy Markdown
Collaborator

What this does

Opens up an interactive, zero-exit REPL when you run opensre with no arguments in a terminal. Each input is classified as a slash command, a new incident description, or a follow-up question on the previous investigation, and the existing LangGraph pipeline streams live into the session. Walking incidents through a single persistent terminal is the product direction in #243 — this PR ships the foundation.

Showcase

  • Run opensre — a compact banner with the active provider, model, and quick hints appears.
  • Type /help, /status, /reset, /trust, /clear, /exit, /quit for session controls. Bare words like help or exit are accepted too and routed to their slash form.
  • Type an incident description in plain English and the investigation streams in place, the same way remote runs stream.
  • After an investigation completes, short questions like why did CPU spike? are routed as follow-ups and answered against the stored final state.
  • Ctrl+C during an investigation cancels the in-flight LangGraph run without dropping you out of the session. Another Ctrl+C at the empty prompt exits cleanly.

The opensre investigate -i alert.json one-shot flow now streams the same way the REPL does — no more waiting for the whole run to finish before anything prints.

What's in the branch

Each commit is a single logical step:

  1. Switch the investigate command to the streaming path.
  2. Promote prompt_toolkit to a direct dependency.
  3. Add the app/cli/repl/ package: banner, session state, router, slash commands, follow-up handler, async loop.
  4. Enter the REPL when opensre is invoked on a TTY with no subcommand.
  5. Add a session-aware streaming investigation runner with Ctrl+C cancellation wiring.
  6. Guarantee the progress spinner stops even when the stream raises.
  7. Unit tests for the session, the router, and slash dispatch.
  8. A short 'Interactive mode' section in the README.
  9. Silence a LangGraph warning caused by the PEP-604 annotation format.
  10. Skip the "streaming from deployed agent" banner for local runs.
  11. Delete a leftover unused helper.

Out of scope for this PR (known follow-ups)

This is a draft showcase, not the complete Claude Code experience. The following are tracked as next steps:

  • Mid-operation input (the signature 'interrupt, don't queue' behavior) — requires moving the investigation into an executor and a concurrent prompt_toolkit input reader.
  • A persistent bottom status toolbar showing model, cost, elapsed time during operations.
  • Multi-turn conversation memory via the LangGraph chat mode (follow-ups today use a one-shot LLM call against the last state).
  • 200ms-silence threshold before the spinner starts (today it starts immediately).
  • Trust mode actually gating destructive tools once any exist.
  • Tab completion for slash commands and persistent REPL history on disk.

Verification

  • make lint — clean
  • make typecheck — 340 source files, no issues
  • make test-cov — all REPL unit tests pass; no regressions in the broader CLI suite
  • Manual smoke: banner renders, /help table shows properly, help (no slash) routes the same, Ctrl+C during an investigation cancels cleanly and preserves the session, /exit quits.

Closes #243 (draft — expects follow-ups for the deferred items listed above).

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 15, 2026

Greptile Summary

This PR ships a Claude-Code-style interactive REPL for opensre, wiring the existing LangGraph pipeline into a persistent zero-exit terminal with slash commands, follow-up question routing, Ctrl+C cancellation, and session state accumulation. The majority of issues raised in earlier rounds — router misclassification of follow-ups, Rich markup injection, the daemon-thread warning, the duplicate questionary dep, and the --no-interactive landing-page regression — have been addressed cleanly. The two remaining P2 items (failed follow-ups recorded as ok=True in history; config accepted but unused in _repl_main) are minor and do not block the primary user path.

Confidence Score: 4/5

Safe to merge with the understanding that the remaining open thread (context accumulation after /investigate) is tracked as a follow-up.

All P0/P1 findings from previous rounds are resolved. Two P2 findings remain (history ok-flag accuracy and unused config param), neither of which affects the primary investigation flow.

app/cli/repl/commands.py (_cmd_investigate_file context accumulation, tracked in prior thread) and app/cli/repl/loop.py (follow-up ok-flag)

Important Files Changed

Filename Overview
app/cli/repl/loop.py Core REPL async event loop; session record for follow-up always reports ok=True on failure, and the config parameter is accepted but never applied.
app/cli/repl/router.py Input classifier; short-question check now correctly precedes alert-signal check, fixing the previously reported misrouting of follow-up questions that contain metric keywords.
app/cli/repl/commands.py Slash command dispatch; all Rich markup is properly escaped; _cmd_investigate_file still omits _accumulate_context after a successful run (tracked in prior thread).
app/cli/investigate.py run_investigation_for_session added with Ctrl+C cancellation via loop.call_soon_threadsafe; thread-leak warning now logged when join times out.
app/cli/repl/follow_up.py LLM follow-up handler; Rich markup is correctly escaped via escape(text); evidence serialization failures are surfaced to the LLM rather than silently dropped.
app/cli/main.py TTY guard + ReplConfig.load wired correctly; --no-interactive now reaches render_landing() as intended.
app/cli/repl/session.py Clean dataclass for per-REPL session state with history, last_state, context accumulation, trust mode, and token usage.
app/cli/repl/config.py Three-tier config resolution (file → env → CLI flag) with silent fallback on malformed config file.
pyproject.toml prompt_toolkit added as direct dependency; duplicate questionary entry removed; dependency list is clean.
app/remote/renderer.py StreamRenderer extended with local=True flag to skip the remote banner for REPL/local runs; spinner stop-on-exception guaranteed via finally.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[opensre — no subcommand] -->|TTY detected| B[ReplConfig.load]
    B -->|enabled=True| C[run_repl]
    B -->|enabled=False| D[render_landing]
    C --> E[_repl_main async loop]
    E --> F[PromptSession.prompt_async]
    F --> G[classify_input]
    G -->|slash| H[dispatch_slash]
    G -->|new_alert| I[_run_new_alert]
    G -->|follow_up| J[answer_follow_up]
    I --> K[run_investigation_for_session]
    K --> L[background asyncio thread astream_investigation]
    L -->|events| M[StreamRenderer.render_stream]
    M --> N[session.last_state / _accumulate_context]
    H -->|/investigate file| O[_cmd_investigate_file]
    O --> P[session.last_state set, _accumulate_context missing]
    J --> Q[get_llm_for_reasoning client.invoke]
    Q --> R[console.print escaped answer]
    F -->|Ctrl+C| S[KeyboardInterrupt → _cancel_pump]
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: app/cli/repl/loop.py
Line: 82-83

Comment:
**Failed follow-ups recorded as `ok=True` in session history**

`answer_follow_up` catches all its own exceptions internally and never raises — so `session.record("follow_up", text)` always runs with the default `ok=True`, even when the LLM client is unavailable or the invocation fails. The `/history` table will show a green ✓ for interactions that actually displayed a red error to the user.

This requires `answer_follow_up` to return `bool` (currently `None`). Alternatively, use a try/finally pattern or have the function accept a mutable `ok` container.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: app/cli/repl/loop.py
Line: 87

Comment:
**`config` parameter is accepted but silently unused**

`_repl_main` declares `config: ReplConfig | None = None` and suppresses the `ARG001` lint warning, but `config` is never read inside the function. Layout, and any future per-session config flags, will be silently ignored even when passed in from `run_repl`. At minimum a `# TODO: wire config.layout` comment would make the gap explicit, rather than letting callers believe the parameter is honoured.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (11): Last reviewed commit: "Merge branch 'Tracer-Cloud:main' into fe..." | Re-trigger Greptile

Comment thread app/cli/repl/router.py
Comment on lines +82 to +112
def classify_input(text: str, session: ReplSession) -> InputKind:
"""Classify a single line of REPL input.

Rules (in order):
1. Anything starting with ``/`` is a slash command.
2. If there is no previous investigation, treat as a new alert.
3. If the input has alert-shaped signals, treat as a new alert.
4. If the input is a short question, treat as a follow-up.
5. Otherwise default to a new alert (safer — produces a fresh run rather
than a free-floating chat message).
"""
stripped = text.strip()
if stripped.startswith("/"):
return "slash"

# A bare word that matches a known slash command is almost always a typo
# for the slash command itself — route it there instead of triggering a
# full investigation.
if stripped.lower() in _BARE_COMMAND_ALIASES:
return "slash"

if session.last_state is None:
return "new_alert"

if _mentions_alert_signal(stripped):
return "new_alert"

if _is_short_question(stripped):
return "follow_up"

return "new_alert"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Alert-keyword check blocks follow-up questions that mention metric names

Rule 3 (_mentions_alert_signal) fires before rule 4 (_is_short_question), so any follow-up that includes a metric or incident keyword — "why did CPU spike?", "what caused the memory error?", "how did connection drop?" — is classified as new_alert and triggers a fresh investigation rather than a grounded answer. The PR description explicitly lists "why did CPU spike?" as the canonical follow-up example, but the code misroutes it. The existing tests avoid this by only testing inputs with no alert keywords ("why?", "what caused it?").

A simple fix is to check the short-question shape first when prior state exists:

if session.last_state is None:
    return "new_alert"

if _is_short_question(stripped):
    return "follow_up"

if _mentions_alert_signal(stripped):
    return "new_alert"

return "new_alert"
Prompt To Fix With AI
This is a comment left during a code review.
Path: app/cli/repl/router.py
Line: 82-112

Comment:
**Alert-keyword check blocks follow-up questions that mention metric names**

Rule 3 (`_mentions_alert_signal`) fires before rule 4 (`_is_short_question`), so any follow-up that includes a metric or incident keyword — "why did CPU spike?", "what caused the memory error?", "how did connection drop?" — is classified as `new_alert` and triggers a fresh investigation rather than a grounded answer. The PR description explicitly lists "why did CPU spike?" as the canonical follow-up example, but the code misroutes it. The existing tests avoid this by only testing inputs with no alert keywords ("why?", "what caused it?").

A simple fix is to check the short-question shape first when prior state exists:

```python
if session.last_state is None:
    return "new_alert"

if _is_short_question(stripped):
    return "follow_up"

if _mentions_alert_signal(stripped):
    return "new_alert"

return "new_alert"
```

How can I resolve this? If you propose a fix, please make it concise.

Comment thread pyproject.toml Outdated
Comment thread app/cli/investigate.py
Comment on lines +239 to +241
thread = threading.Thread(target=_run_async, daemon=True)
thread.start()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Silent daemon thread leak on 5-second join timeout

If the background asyncio loop doesn't complete within 5 seconds after cancellation, thread.join(timeout=5) returns silently and the daemon thread keeps running (with its LLM call in-flight). There's no log warning, so this is invisible in production. Consider logging a warning so it's observable:

thread.join(timeout=5)
if thread.is_alive():
    import logging
    logging.getLogger(__name__).warning(
        "investigation thread did not terminate within 5s after cancellation"
    )
Prompt To Fix With AI
This is a comment left during a code review.
Path: app/cli/investigate.py
Line: 239-241

Comment:
**Silent daemon thread leak on 5-second join timeout**

If the background asyncio loop doesn't complete within 5 seconds after cancellation, `thread.join(timeout=5)` returns silently and the daemon thread keeps running (with its LLM call in-flight). There's no log warning, so this is invisible in production. Consider logging a warning so it's observable:

```python
thread.join(timeout=5)
if thread.is_alive():
    import logging
    logging.getLogger(__name__).warning(
        "investigation thread did not terminate within 5s after cancellation"
    )
```

How can I resolve this? If you propose a fix, please make it concise.

@yashksaini-coder yashksaini-coder marked this pull request as ready for review April 15, 2026 12:03
Copilot AI review requested due to automatic review settings April 15, 2026 12:03
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an interactive “zero-exit” REPL mode to opensre (when run with no args on a TTY) and moves local investigations onto the same live-streaming rendering path used for remote runs, aligning the CLI UX with the direction in #243.

Changes:

  • Introduces app/cli/repl/ (banner, session state, router, slash commands, follow-up handling, async loop) plus unit tests.
  • Switches the one-shot opensre investigate Click command to a streaming execution path.
  • Improves streaming renderer robustness (always stops spinner) and tweaks banners/typing to avoid warnings.

Reviewed changes

Copilot reviewed 17 out of 18 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
app/cli/__main__.py Enters REPL when invoked with no subcommand on a TTY.
app/cli/commands/general.py Reworks investigate Click command to use streaming runner.
app/cli/investigate.py Adds session-oriented streaming runner with Ctrl+C cancellation wiring.
app/cli/repl/__init__.py Exposes run_repl.
app/cli/repl/banner.py Adds REPL identity banner (provider/model + hints).
app/cli/repl/commands.py Implements slash command registry + dispatch.
app/cli/repl/follow_up.py Implements follow-up answering grounded on last investigation.
app/cli/repl/loop.py Implements the async REPL loop and routing to actions.
app/cli/repl/router.py Implements input classification (slash vs new alert vs follow-up).
app/cli/repl/session.py Adds persistent per-session state container.
app/nodes/resolve_integrations/node.py Adjusts annotation style to avoid LangGraph warning.
app/remote/renderer.py Adds local= flag and guarantees spinner cleanup on exceptions.
pyproject.toml Adds prompt_toolkit as a direct dependency.
README.md Documents the new interactive mode.
tests/cli/repl/test_commands.py Unit tests for slash dispatch.
tests/cli/repl/test_router.py Unit tests for input classification.
tests/cli/repl/test_session.py Unit tests for session state behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +148 to +155
payload = load_payload(
input_path=input_path,
input_json=input_json,
interactive=interactive,
)
result = run_investigation_cli_streaming(raw_alert=payload)
write_json(result, output)
except SystemExit:
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_investigation_cli_streaming() prints streaming UI to stdout via StreamRenderer, and then write_json(result, output) prints JSON to stdout when --output is not provided. This mixes human terminal output with JSON and breaks consumers that expect stdout to be valid JSON (especially under --json). Consider gating: when JSON output is desired, run the non-streaming run_investigation_cli() (or emit streaming to stderr) so stdout remains clean JSON, and only stream in interactive mode when not producing JSON.

Copilot uses AI. Check for mistakes.
Comment thread app/cli/repl/router.py Outdated
Comment thread app/cli/repl/follow_up.py Outdated
Comment thread app/cli/repl/follow_up.py Outdated
@Devesh36
Copy link
Copy Markdown
Collaborator

Hey @yashksaini-coder let me know if you need any help with this one 🚀

@yashksaini-coder
Copy link
Copy Markdown
Collaborator Author

yashksaini-coder commented Apr 23, 2026

Progress update

State of the branch

CI is fully green (quality, typecheck, test, CodeQL, Analyze all passing). The PR is mergeable against upstream/main.

Credits for the P1 lift

Most of the P1 work — the big jump from 8 to 26 slash commands plus the two-axis config — came from @rrajan94 in three commits:

  • d311bd25 — two-axis REPL config (CLI flag + env var + ~/.opensre/config.yml resolution)
  • 18d8b513 — 18 new slash commands (integrations, mcp, model, health, doctor, version, history, last, save, context, cost, verbose, compact, template, investigate, stop, cancel, and friends)
  • 113a988b — test coverage for the new config + slashes

Thanks for picking those up. The branch is much richer as a result.

What we resolved this session

Four CI/review gaps closed:

SHA What Why
cf7465f1 chore: apply ruff format to unblock CI quality check was failing on 7 files — now clean on 888
5d112c25 test: lock in --no-interactive landing-page fallthrough Greptile P1 on __main__.py:87 — the if config.enabled: guard was added by a later merge; added regression coverage so it can't silently regress
35a4e3fc fix: /investigate slash command now accumulates infra context Greptile P1 on commands.py:378 — promoted _accumulate_context from loop.py's private function to ReplSession.accumulate_from_state() so both loop._run_new_alert AND commands._cmd_investigate_file call it. Closes #243 requirement 7 ("session remembers everything") for slash-initiated investigations
c214c74c Merge upstream/main Fork's origin/main was 44 commits behind Tracer-Cloud/main — that's why GitHub was reporting CONFLICTING. Resolved conflicts in resolve_integrations/node.py (noqa code update) and 3 daily-update docs

How the #243 product requirements stand today

# Requirement State Notes
1 Stream everything; >200ms silence shows spinner PARTIAL Node-level spinners work; token-level streaming currently surfaces as 60-char spinner subtext. Wider streaming = Phase B work
3 Zero-exit architecture PARTIAL REPL loop is zero-exit; one-shot opensre investigate still exists (intentional — additive ship, not replacement)
4 Identity banner up top DONE banner.py renders on launch + /clear
5 Interrupt, don't queue PARTIAL Ctrl+C cancels in-flight investigation cleanly; mid-op typing requires the Phase B pinned Application
6 /trust toggle, no wizards PARTIAL /trust toggles session.trust_mode; no approval gate reads it yet (no destructive tools today)
7 Show the work, not internal state DONE StreamRenderer shows tool calls/node progress; exceptions escape-printed, no tracebacks
8 Session remembers everything PARTIAL last_state + accumulated_context persist; multi-turn Q&A threading is a follow-up

How the discussion #614 decisions are reflected

Decision Status Notes
Classic retires in ~2 weeks; ship both layouts in place PARTIAL layout=pinned is accepted by config + CLI + env, but only classic is wired today. Phase B will land the actual pinned renderer
MCP server: completely separate PR YES Nothing MCP-server-exposing has landed in this branch
Memory format: .md YES No memory feature landed yet — confirmed format for when it does
interactive.agentic default: ON Deferred to Phase C Field isn't in config.py today. Per the "narrow scope, one PR per tool" direction in the same comment, I'll add the axis as part of the first tool PR (query_datadog) with the default flipped ON — not as a dead skeleton now
Narrow tool scope, one PR per tool YES No tool-framework code added to this PR; stays clean for follow-up PRs

Open review threads hygiene

Six threads are still "unresolved" in the UI, but five of them are fixed in later commits — they just need Greptile's next re-scan or a UI-level click. Specifically:

  • Greptile P1 on router.py (alert-keyword check) — fixed in 13d78d8 (earlier round)
  • Greptile P2 on daemon thread leak — fixed in 0b86c254 (earlier round)
  • Greptile P1 on --no-interactive — fixed + regression test in 5d112c25 (this round)
  • Greptile P1 on /investigate — fixed in 35a4e3fc (this round)
  • Copilot on streaming UI corrupting JSON stdout — fixed in 117f998b (earlier round)
  • @muddlebee's question about the LLM prompt in follow_up.py:87 — answered in-thread by @davincios; no code change needed

I've replied on each with the resolving SHA. Happy to click Resolve myself if access allows.

What's next — phased plan

Immediate (this PR) — nothing blocking merge. Classic mode works end-to-end: streaming investigations, grounded follow-ups, 26 slash commands, two-axis config, clean exit paths.

Follow-up A — Phase B (pinned layout, ~1,080 LoC)
Stacked PR on top of this one, lands before the ~2-week classic retirement window:

  • app/cli/repl/app.py — prompt_toolkit Application(HSplit(header, output, status, input))
  • New BufferRenderer writing to a Buffer instead of stdout
  • Keybindings: Enter / Ctrl+C cancel-or-exit / Ctrl+D / Ctrl+L / ↑↓ history
  • Wires interactive.layout=pinned (config axis already exists)
  • Headless tests via create_pipe_input + DummyOutput
  • Manual compat smoke on ghostty, iTerm, Windows Terminal

Follow-up B — Phase C (seven parallel per-tool PRs)
Per @davincios's "narrow scope, one PR per tool, iterate fast":

  • PR 1: query_datadog + adds interactive.agentic config axis (default ON), ReAct scaffold
  • PR 2–7: query_prometheus, get_pod_logs, describe_pipeline, get_recent_alerts, list_integrations, show_last_state

Each tool PR is ~150–400 LoC, none blocks any other.

Deferred to their own issues (per team decision)

  • opensre as MCP server
  • ~/.opensre/memory.md persistent context threading

Thanks

Thanks to @davincios for the product direction calls in #614 — default ON for agentic, narrow PR scope per tool, markdown for memory, MCP separate. Those made the next steps concrete.

Thanks to @rrajan94 for landing the P1 command-surface lift. Happy to hand off Phase B or any of the Phase C tool PRs to whoever's interested.

Thanks to @muddlebee for the spot-check on the follow-up LLM prompt — keeps us honest against the #243 spec.

Ready for review whenever the maintainers have a moment.

@muddlebee
Copy link
Copy Markdown
Collaborator

@yashksaini-coder nice work. can we have a demo pls? 👀

Comment thread app/cli/repl/session.py
# Keys from a completed AgentState that carry reusable infra context into
# the next investigation. Kept as a class-level tuple so any caller that
# wants to know "what counts as accumulated context" has a single source.
_ACCUMULATED_KEYS: tuple[str, ...] = (
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_ACCUMULATED_KEYS has a type annotation on a dataclass without ClassVar, so the dataclass machinery treats it as a regular field with a default — it appears in init as an overridable kwarg and will be flagged by mypy. Change to: _ACCUMULATED_KEYS: ClassVar[tuple[str, ...]] = (...) and add ClassVar to the typing import.

Comment thread app/cli/repl/loop.py
return True


async def _repl_main(initial_input: str | None = None, config: ReplConfig | None = None) -> int: # noqa: ARG001
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config is silently ignored (# noqa: ARG001) — layout is never applied to the Console. This is fine today since only 'classic' exists, but once 'pinned' ships, the wire-up will need to happen here. Add a TODO comment or route to a layout-specific console factory so this doesn't get forgotten.

Comment thread app/cli/investigate.py
)

event_queue: queue.Queue[StreamEvent | BaseException | None] = queue.Queue()
loop_ref: dict[str, asyncio.AbstractEventLoop] = {}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Race window: if KeyboardInterrupt fires between thread.start() and loop_ref['loop'] = loop (line 218), _cancel_pump() reads loop_ref.get('loop') as None and silently no-ops — the investigation thread leaks. Consider using a threading.Event (set after loop_ref is written) that _cancel_pump waits on briefly before calling call_soon_threadsafe.

Comment thread app/cli/investigate.py
try:
final_state = renderer.render_stream(_events())
except KeyboardInterrupt:
_cancel_pump()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_cancel_pump() is called twice on the Ctrl+C path: once in _events() (line 262) and again here. The second call is a no-op because the task is already cancelled, but the double-call is misleading. Remove the call from _events() and let this outer except be the single cancellation point — it's the only place that knows render_stream has exited.

Comment thread app/cli/repl/router.py


# Short, question-shaped strings that obviously target the previous investigation.
_FOLLOW_UP_CUES = (
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_FOLLOW_UP_CUES misses common question starters: "when", "where", "which", "who", "did". Without them, inputs like "when did this start?" or "where is the bottleneck?" (<90 chars, end with ?) still route to follow_up because of the endswith('?') check in _is_short_question, but "when did the spike start" (no ?) routes to new_alert and fires a fresh investigation instead. Either extend the cue list or rely solely on the '?' suffix for follow-up detection.

yashksaini-coder and others added 6 commits April 29, 2026 17:45
`StreamRenderer.render_stream` called `_print_report()` outside the
try/finally that wraps event iteration. Any non-`KeyboardInterrupt`
exception (LLM quota, network error) stopped the spinner correctly but
silently dropped accumulated `_final_state` before propagating — the
user, who had been watching the report stream live, saw only the raw
exception.

Move `_print_report()` into the finally block alongside
`_finish_active_node()`. Both have the same invariant: always run, even
when the stream raises. Add a regression test that streams a partial
investigation, raises mid-iteration, and asserts the report flushes and
accumulated state is preserved.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
The agent module docstring promised the explicit subcommand always
starts the REPL on user intent, but `run_repl`'s non-TTY guard caused
piped/CI invocations (e.g. `echo "..." | opensre agent`) to silently
return 0 with no output — a confusing no-op.

Add an explicit TTY check in `agent_command` that raises a clear
`OpenSREError` with a remediation suggestion. Tighten the docstring so
it no longer overpromises. Update the existing `test_agent_subcommand_*`
tests to monkeypatch `isatty=True` (their semantic is "user runs the
command from a real terminal") and add a regression test that confirms
non-TTY invocation surfaces a clear error instead of silently
succeeding.

Leave `run_repl`'s non-TTY guard in place — it's still correct for the
bare-`opensre` flow, where silently falling through to the landing page
is the desired behavior.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@muddlebee
Copy link
Copy Markdown
Collaborator

@yashksaini-coder please add an e2e video demo for showcase also.

@VaibhavUpreti VaibhavUpreti merged commit 8da7c57 into Tracer-Cloud:main Apr 29, 2026
7 checks passed
@VaibhavUpreti
Copy link
Copy Markdown
Member

merging this for velocity, let's fix the issues in a new patch

@github-actions
Copy link
Copy Markdown
Contributor

LGTM → Merged. @yashksaini-coder, your work is in. Every commit counts — thank you for this one.


👋 Join us on Discord - OpenSRE : hang out, contribute, or hunt for features and issues. Everyone's welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Turning our CLI into a Claud Code like terminal application

6 participants