pip install peaky-peek-server && peaky-peek --open
Local-first, open-source agent debugger. Capture decisions, replay from checkpoints, visualize reasoning trees — all on your machine, no data sent anywhere.
Traditional observability tools weren't built for agent-native debugging:
| Tool | Focus | Problem |
|---|---|---|
| LangSmith | LLM tracing | SaaS-first, your data leaves your machine |
| OpenTelemetry | Infra metrics | Blind to reasoning chains and decision trees |
| Sentry | Error tracking | No insight into why agents chose specific actions |
| Peaky Peek | Agent-native debugging | Local-first, open source, privacy by default |
Peaky Peek captures the causal chain behind every action so you can debug agents like distributed systems: trace failures, replay from checkpoints, and search across reasoning paths.
pip install peaky-peek-server
peaky-peek --open # launches API + UI at http://localhost:8000from agent_debugger_sdk import trace
@trace
async def my_agent(prompt: str) -> str:
# Your agent logic here — traces are captured automatically
return await llm_call(prompt)from agent_debugger_sdk import trace_session
async with trace_session("weather_agent") as ctx:
await ctx.record_decision(
reasoning="User asked for weather",
confidence=0.9,
chosen_action="call_weather_api",
evidence=[{"source": "user_input", "content": "What's the weather?"}],
)
await ctx.record_tool_call("weather_api", {"city": "Seattle"})
await ctx.record_tool_result("weather_api", result={"temp": 52, "forecast": "rain"})# Set env var, then run your agent normally
PEAKY_PEEK_AUTO_PATCH=true python my_agent.pyWorks with PydanticAI, LangChain, OpenAI SDK, CrewAI, AutoGen, LlamaIndex, and Anthropic — no imports or decorators needed.
from pydantic_ai import Agent
from agent_debugger_sdk import init
from agent_debugger_sdk.adapters import PydanticAIAdapter
init()
agent = Agent("openai:gpt-4o")
adapter = PydanticAIAdapter(agent, agent_name="support_agent")from agent_debugger_sdk import init
from agent_debugger_sdk.adapters import LangChainTracingHandler
init()
handler = LangChainTracingHandler(session_id="my-session")
# Pass handler to your LangChain agent's callbacksNo code needed — just set the environment variable:
PEAKY_PEEK_AUTO_PATCH=true python my_openai_agent.pyOr use the simplified decorator:
from agent_debugger_sdk import trace
@trace(name="openai_agent", framework="openai")
async def my_agent(prompt: str) -> str:
client = openai.AsyncOpenAI()
response = await client.chat.completions.create(
model="gpt-4o", messages=[{"role": "user", "content": prompt}]
)
return response.choices[0].message.contentimport agent_debugger_sdk.auto_patch # activates on import when PEAKY_PEEK_AUTO_PATCH is set
# Now run your agent normally — all LLM calls are traced automaticallyNavigate agent reasoning as an interactive tree. Click nodes to inspect events, zoom to explore complex flows, and trace the causal chain from policy to tool call to safety check.
Time-travel through agent execution with checkpoint-aware playback. Play, pause, step, and seek to any point in the trace. Checkpoints are ranked by restore value so you jump to the most useful state.
Find specific events across all sessions. Search by keyword, filter by event type, and jump directly to results.
Adaptive analysis groups similar failures. Inspect planner/critic debates, speaker topology, and prompt policy parameters across multi-agent systems.
Compare two agent runs side-by-side. See diffs in turn count, speaker topology, policies, stance shifts, and grounded decisions.
- Local-first by default — no external telemetry, no data leaves your machine
- Zero-config auto-patching — no credentials or API keys needed for local debugging
- Optional redaction pipeline — prompts, payloads, PII regex
- API key authentication — bcrypt hashing
- GDPR/HIPAA friendly — SQLite storage, no cloud dependency
pip install peaky-peek-server
peaky-peek --opendocker build -t peaky-peek .
docker run -p 8000:8000 -v ./traces:/app/traces peaky-peekgit clone https://github.com/acailic/agent_debugger
cd agent_debugger
pip install -e ".[dev]"
pip install fastapi "uvicorn[standard]" "sqlalchemy[asyncio]" aiosqlite alembic aiofiles bcrypt
python3 -m pytest -q
cd frontend && npm install && npm run buildflowchart TB
classDef layer fill:#0f172a,stroke:#334155,color:#e2e8f0,stroke-width:2px
classDef ext fill:none,stroke:#94a3b8,stroke-dasharray:6 3,color:#94a3b8
AGENT("🤖 Your Agent Code"):::ext
subgraph RUNTIME[" "]
direction TB
SDK["<b>🔌 SDK Layer</b><br/><small>Instrument & capture</small><br/><sub>@trace · TraceContext · Auto-Patch · Adapters</sub>"]:::layer
INTEL["<b>🧠 Intelligence</b><br/><small>Detect, remember, alert</small><br/><sub>Event Buffer · Pattern Detector · Failure Memory · Replay Engine</sub>"]:::layer
end
subgraph SERVER[" "]
direction TB
API["<b>🌐 API Server</b><br/><small>FastAPI + SSE</small><br/><sub>11 routers: sessions · traces · replay · search · analytics · compare</sub>"]:::layer
STORE["<b>💾 Storage</b><br/><small>SQLite WAL · async</small><br/><sub>Events · Checkpoints · Analytics · Embeddings</sub>"]:::layer
end
UI["<b>🖥️ Frontend</b><br/><small>React · TypeScript · Vite</small><br/><sub>8 panels: decision tree · timeline · tools · replay · search · analytics · compare · live</sub>"]:::layer
AGENT ==>|"decorate"| SDK
SDK ==>|"emit"| INTEL
INTEL -->|"persist"| STORE
SDK -.->|"ingest"| API
API <-->|"query"| STORE
API ==>|"SSE stream"| UI
INTEL -.->|"replay"| API
flowchart LR
classDef sdk fill:#4f46e5,stroke:#3730a3,color:#fff,stroke-width:2px
classDef intel fill:#dc2626,stroke:#b91c1c,color:#fff,stroke-width:2px
classDef api fill:#059669,stroke:#047857,color:#fff,stroke-width:2px
classDef store fill:#b45309,stroke:#92400e,color:#fff,stroke-width:2px
classDef ui fill:#7c3aed,stroke:#6d28d9,color:#fff,stroke-width:2px
subgraph SDK[" 🔌 SDK "]
direction TB
DEC["@trace decorator"]:::sdk
CTX["TraceContext"]:::sdk
AP["Auto-Patch"]:::sdk
AD["Framework Adapters"]:::sdk
end
subgraph INT[" 🧠 Intelligence "]
direction TB
BUF["Event Buffer"]:::intel
PAT["Pattern Detector"]:::intel
FMEM["Failure Memory"]:::intel
ALERT["Alert Engine"]:::intel
RPLAY["Replay Engine"]:::intel
end
subgraph APIL[" 🌐 API "]
direction TB
R1["Sessions · Traces"]:::api
R2["Replay · Search"]:::api
R3["Analytics · Compare"]:::api
SSE["SSE Stream"]:::api
end
subgraph STO[" 💾 Storage "]
direction TB
DB[("SQLite WAL")]:::store
S1["Events · Checkpoints"]:::store
S2["Analytics Aggregations"]:::store
end
subgraph UIF[" 🖥️ Frontend "]
direction TB
DT["Decision Tree"]:::ui
TL["Trace Timeline"]:::ui
TI["Tool Inspector"]:::ui
RP["Session Replay"]:::ui
SE["Cross-session Search"]:::ui
AN["Analytics Dashboard"]:::ui
end
DEC & CTX --> BUF
AP & AD --> BUF
BUF --> PAT & FMEM & ALERT
BUF --> S1
S1 --> DB
DB --> S2
R1 & R2 & R3 <--> S1
RPLAY --> R2
SSE --> DT & TL & RP
See ARCHITECTURE.md for full module breakdown.
- Core debugger — local path end-to-end, stable
- SDK —
@trace,trace_session(), auto-patch for 7 frameworks - API — 11 routers: sessions, traces, replay, search, analytics, cost, comparison
- Frontend — 8 specialized panels (decision tree, replay, checkpoints, search)
- Tests — 365+ passing, CI on Python 3.10/3.11/3.12
Peaky Peek is informed by research on agent debugging, causal tracing, failure analysis, and adaptive replay. See paper notes for design takeaways from each.
- AgentTrace: Causal Graph Tracing for Root Cause Analysis
- XAI for Coding Agent Failures
- FailureMem: Failure-Aware Autonomous Software Repair
- MSSR: Memory-Aware Adaptive Replay
- Learning When to Act or Refuse
- Policy-Parameterized Prompts
- CXReasonAgent: Evidence-Grounded Diagnostic Reasoning
- NeuroSkill: Proactive Real-Time Agentic System
- REST: Receding Horizon Explorative Steiner Tree
- Towards a Neural Debugger for Python
Contributions are welcome! See CONTRIBUTING.md for guidelines.
MIT





