Skip to content

Implement LIVE=1 smoke test for Claude CLI runtime #34

@alexey-pelykh

Description

@alexey-pelykh

Summary

End-to-end smoke test that validates the full pipeline with a real Claude CLI: message → ChannelBridge → ClaudeCliRuntime → claude -p --output-format stream-json → NDJSON streaming → AgentDeliveryResult. Gated behind LIVE=1 environment variable so CI skips it by default.

Context

All middleware components are implemented and unit-tested (367 tests across 16 test files). This is the first test that exercises a real CLI subprocess rather than mocked streams. It validates that the argument building, NDJSON parsing, session resumption, and event extraction work end-to-end with actual Claude CLI output.

CLIs are pre-authenticated on the developer machine — no API key env var checks needed. The CLI handles its own auth.

Acceptance Criteria

Given the middleware is implemented and ChannelBridge works (PR #33)
When a LIVE=1 smoke test is executed with a pre-authenticated claude CLI
Then a test message is sent through ChannelBridge → ClaudeCliRuntime →
  claude --print --output-format stream-json
And a coherent text response is received via NDJSON streaming
And the session ID is captured and a follow-up message resumes the session
And the test setup (beforeAll) unsets CLAUDECODE from process.env to prevent
  nesting rejection when tests run from a Claude Code terminal
  (ref: https://github.com/anthropics/claude-agent-sdk-python/issues/573)
And the test is skippable in CI (gated behind LIVE=1 environment variable)

Architecture

Test Structure

src/middleware/__smoke__/claude-live.test.ts

Uses vitest with a describe block gated by process.env.LIVE === "1". When LIVE is unset, all tests in the file are skipped via describe.skipIf.

Test Flow

  1. Setup (beforeAll):

    • Delete CLAUDECODE from process.env to prevent nesting rejection
    • Create a ChannelBridge with provider: "claude" and an in-memory SessionMap
    • No MCP server needed for this smoke test (pure CLI → response)
    • No API key env var checks — CLI is pre-authenticated
  2. Test 1: Single-turn response:

    • Send a simple message (e.g., "What is 2+2? Reply with just the number.") via ChannelBridge.handle()
    • Assert: result.payloads.length > 0
    • Assert: result.run.text contains a coherent response
    • Assert: result.run.sessionId is a non-empty string
    • Assert: result.run.aborted === false
    • Assert: result.run.durationMs > 0
  3. Test 2: Session resumption:

    • Send a follow-up message using the same channel/user identifiers
    • The SessionMap should return the session ID from Test 1
    • Assert: response is received (session resumed, not rejected)
    • Assert: result.run.sessionId matches the first test's session ID
  4. Teardown (afterAll):

    • Restore original CLAUDECODE value if it was set

Key Design Decisions

  • No MCP server: This test validates the CLI runtime pipeline, not MCP tools. ChannelBridge passes empty mcpServers when no gateway URL/token is configured (or the test uses a minimal config that skips MCP setup).
  • CLAUDECODE unset: When running from a Claude Code terminal, CLAUDECODE=1 is inherited. claude -p refuses to run inside another Claude Code session. The test must unset this env var.
  • No API key checks: CLIs are pre-authenticated on the developer machine. The test assumes the CLI is installed and authed — if not, the test fails with a clear CLI error.
  • Timeout: Set a generous timeout (60s per test) — real CLI invocations are slow.
  • Deterministic prompt: Use a simple factual question to get a short, predictable response.

Related

Dependencies

Estimate

~120 LoC test file

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions