-
Notifications
You must be signed in to change notification settings - Fork 0
Implement LIVE=1 smoke test for Claude CLI runtime #34
Description
Summary
End-to-end smoke test that validates the full pipeline with a real Claude CLI: message → ChannelBridge → ClaudeCliRuntime → claude -p --output-format stream-json → NDJSON streaming → AgentDeliveryResult. Gated behind LIVE=1 environment variable so CI skips it by default.
Context
All middleware components are implemented and unit-tested (367 tests across 16 test files). This is the first test that exercises a real CLI subprocess rather than mocked streams. It validates that the argument building, NDJSON parsing, session resumption, and event extraction work end-to-end with actual Claude CLI output.
CLIs are pre-authenticated on the developer machine — no API key env var checks needed. The CLI handles its own auth.
Acceptance Criteria
Given the middleware is implemented and ChannelBridge works (PR #33)
When a LIVE=1 smoke test is executed with a pre-authenticated claude CLI
Then a test message is sent through ChannelBridge → ClaudeCliRuntime →
claude --print --output-format stream-json
And a coherent text response is received via NDJSON streaming
And the session ID is captured and a follow-up message resumes the session
And the test setup (beforeAll) unsets CLAUDECODE from process.env to prevent
nesting rejection when tests run from a Claude Code terminal
(ref: https://github.com/anthropics/claude-agent-sdk-python/issues/573)
And the test is skippable in CI (gated behind LIVE=1 environment variable)Architecture
Test Structure
src/middleware/__smoke__/claude-live.test.ts
Uses vitest with a describe block gated by process.env.LIVE === "1". When LIVE is unset, all tests in the file are skipped via describe.skipIf.
Test Flow
-
Setup (
beforeAll):- Delete
CLAUDECODEfromprocess.envto prevent nesting rejection - Create a
ChannelBridgewithprovider: "claude"and an in-memorySessionMap - No MCP server needed for this smoke test (pure CLI → response)
- No API key env var checks — CLI is pre-authenticated
- Delete
-
Test 1: Single-turn response:
- Send a simple message (e.g., "What is 2+2? Reply with just the number.") via
ChannelBridge.handle() - Assert:
result.payloads.length > 0 - Assert:
result.run.textcontains a coherent response - Assert:
result.run.sessionIdis a non-empty string - Assert:
result.run.aborted === false - Assert:
result.run.durationMs > 0
- Send a simple message (e.g., "What is 2+2? Reply with just the number.") via
-
Test 2: Session resumption:
- Send a follow-up message using the same channel/user identifiers
- The
SessionMapshould return the session ID from Test 1 - Assert: response is received (session resumed, not rejected)
- Assert:
result.run.sessionIdmatches the first test's session ID
-
Teardown (
afterAll):- Restore original
CLAUDECODEvalue if it was set
- Restore original
Key Design Decisions
- No MCP server: This test validates the CLI runtime pipeline, not MCP tools.
ChannelBridgepasses emptymcpServerswhen no gateway URL/token is configured (or the test uses a minimal config that skips MCP setup). CLAUDECODEunset: When running from a Claude Code terminal,CLAUDECODE=1is inherited.claude -prefuses to run inside another Claude Code session. The test must unset this env var.- No API key checks: CLIs are pre-authenticated on the developer machine. The test assumes the CLI is installed and authed — if not, the test fails with a clear CLI error.
- Timeout: Set a generous timeout (60s per test) — real CLI invocations are slow.
- Deterministic prompt: Use a simple factual question to get a short, predictable response.
Related
- Part of a 4-CLI smoke test suite: Implement LIVE=1 smoke test for Claude CLI runtime #34 (Claude), Implement LIVE=1 smoke test for Gemini CLI runtime #36 (Gemini), Implement LIVE=1 smoke test for Codex CLI runtime #37 (Codex), Implement LIVE=1 smoke test for OpenCode CLI runtime #38 (OpenCode)
Dependencies
- ChannelBridge orchestrator (PR feat(middleware): implement ChannelBridge orchestrator #33) ✅
Estimate
~120 LoC test file