feat(debug): structure debug dumps as OpenTelemetry-compatible traces#1797
Merged
feat(debug): structure debug dumps as OpenTelemetry-compatible traces#1797
Conversation
Add DumpFormat::Trace variant that emits OTLP-compatible JSON instead of numbered dump files. TracingCollector captures session/iteration/LLM/tool/ memory spans with full hierarchy, redacts secrets in all text attributes (C-01), uses owned SpanGuard for async-safe begin/end calls (C-02), is a no-op through DebugDumper when Trace format is active (C-03), and flushes partial traces on Drop for error/panic/cancellation paths (C-04). An optional mpsc channel forwards completed spans to the otel feature's OTLP exporter (C-05). Concurrent iterations are tracked via HashMap<usize, IterationEntry> (I-03). OTLP JSON encoding follows the Protobuf JSON spec with string int64 timestamps (I-04). Integration points: --dump-format CLI flag, /dump-format slash command, TUI command dispatch, --init wizard step, config [debug.traces] section with otlp_endpoint/service_name/redact fields, config default.toml update.
CR-01: wire begin/end_llm_request and begin/end_tool_call at actual call sites. Store current_iteration_span_id in DebugState so both legacy and native execution paths can attach child spans without parameter threading. Introduce execute_tool_with_trace helper in legacy.rs to stay within the 100-line function limit. CR-02: replace std::fs::write with write_trace_file helper using OpenOptions + mode(0o600) on Unix (SEC-01). CR-03: add max_spans field (default 10000) and push_span() helper that drops the oldest span when the cap is reached (SEC-02). CR-04: handle_dump_format_command now creates a fresh TracingCollector when switching TO trace format, and flushes/drops it when switching AWAY. Store dump_dir/trace_service_name/trace_redact in DebugState via with_trace_config builder. Wire with_trace_config in runner.rs. IMP-01: TracingCollector::new already writes to its own output_dir which is a timestamped subdir created by DebugDumper::new — no change needed. IMP-04: apply maybe_redact() to error_kind in end_tool_call. CR-05 / test gaps: add tool_call_span_emitted, tool_call_error_span_emitted, session_to_iteration_parent_span_id tests to trace.rs; add dump_format_from_str_valid and dump_format_from_str_invalid_returns_error tests to mod.rs.
call_chat_with_tools exceeded the 100-line limit (101 lines) after the OTel trace instrumentation was added. Extract the debug dump response writing block into a dedicated helper method to bring it under the limit.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements #1343 — extends debug dumps to emit structured OpenTelemetry-compatible OTLP JSON traces.
DumpFormat::Trace: when--dump-format traceis set, legacy numbered files are NOT written; whenotelfeature is enabled, spans are forwarded via mpsc channel to the OTLP subscriber[debug.traces]config section:otlp_endpoint,service_name(default:"zeph"),redact(default:true),max_spans(default: 10000)0o600permissions on Unix; all text attributes pass through existingRedactor;max_spanscap prevents memory exhaustion in long sessions;error_kindin tool spans is redacted/dump-format <json|raw|trace>TUI/CLI command creates/destroysTracingCollectoron the fly--dump-formatCLI flag,--initwizard step,--migrate-configauto-migration for existing[debug]configs{dump_dir}/{session_id}/trace.json(session-isolated, no concurrent overwrites)Test plan
cargo nextest run --workspace --features full --lib --bins— 5684 tests passcargo clippy --workspace --features full -- -D warnings— cleancargo +nightly fmt --check— clean--dump-format traceproduces valid OTLP JSON with session → iteration → tool/llm span hierarchy--dump-format rawstill produces legacy numbered files (backward compat)0o600permissions on Unix