Claude/review voice call async x3 ox7 by dgarson · Pull Request #105 · dgarson/clawdbot

dgarson · 2026-02-23T18:00:55Z

Summary

Describe the problem and fix in 2–5 bullets:

Problem:
Why it matters:
What changed:
What did NOT change (scope boundary):

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Closes #
Related #

User-visible / Behavior Changes

List user-visible changes (including defaults/config).
If none, write None.

Security Impact (required)

New permissions/capabilities? (Yes/No)
Secrets/tokens handling changed? (Yes/No)
New/changed network calls? (Yes/No)
Command/tool execution surface changed? (Yes/No)
Data access scope changed? (Yes/No)
If any Yes, explain risk + mitigation:

Repro + Verification

Environment

OS:
Runtime/container:
Model/provider:
Integration/channel (if any):
Relevant config (redacted):

Steps

Expected

Actual

Evidence

Attach at least one:

Failing test/log before + passing after
Trace/log snippets
Screenshot/recording
Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

Verified scenarios:
Edge cases checked:
What you did not verify:

Compatibility / Migration

Backward compatible? (Yes/No)
Config/env changes? (Yes/No)
Migration needed? (Yes/No)
If yes, exact upgrade steps:

Failure Recovery (if this breaks)

How to disable/revert this change quickly:
Files/config to restore:
Known bad symptoms reviewers should watch for:

Risks and Mitigations

List only real risks for this PR. Add/remove entries as needed. If none, write None.

Risk:
- Mitigation:

…malization, prompts - Fix maxPerCall config bug: broker was reading maxConcurrency for both limits; add dedicated maxPerCall field to VoiceCallSubagentConfigSchema - Replace O(n) pruning on every state transition with lazy threshold-based pruning (amortized via PRUNE_THRESHOLD=16) - Add BrokerMetrics type and metrics getter for observability: enqueued, completed, failed, expired, canceled, fallbacksSpoken, repairAttempts, repairSuccesses, totalExecutionMs - Improve JSON normalization pipeline: strip markdown fences, trailing commas, single-line comments; case-insensitive key matching; expanded alias sets; export extractFirstJsonObject, parseBoolean, parseNumber - Lighten repairPayload: thinkLevel=off, 4s timeout, truncate input to 1000 chars, pass context through to avoid redundant loadCoreAgentDeps - Clean up orphaned session store entries and temp files after job completion (finally block with fs.unlinkSync) - Move foreground envelope prompt to end of system prompt with structured RESPONSE FORMAT heading; wrap conversation history in XML tags - Add specialist-specific prompts (SPECIALIST_PROMPTS map) for research, scheduler, and policy roles - Expand test coverage: 55 tests across broker and normalization suites covering pruning, metrics, deadline clamping, JSON extraction edge cases, alias resolution, type coercion, and specialist prompts https://claude.ai/code/session_01SYcKJe4ySZthJSCkKxzbFZ

…onflict - Wrap speakFallbackIfActive in try/catch inside the runJob catch block to prevent unhandled promise rejections when onSummaryReady is broken - Check remaining time before applying Math.max(1000ms) clamp so jobs that have already expired hit the expiry branch instead of launching a 1s LLM call past their deadline - Replace "use tools when helpful" with delegation guidance in the foreground voice prompt to avoid contradicting the JSON-only envelope instruction — tools belong in the specialist background lane https://claude.ai/code/session_01SYcKJe4ySZthJSCkKxzbFZ

Replace naive regex-based stripTrailingCommas and stripJsonComments with character-walking implementations that track whether we're inside a JSON string literal, so commas and // sequences inside string values are never corrupted. Move session store cleanup into the finally block of runJob so entries are removed on both success and failure paths. Add reapOrphanedSessions() method that sweeps stale voice-subagent:* keys older than a configurable threshold, called at webhook server startup to catch entries orphaned by crashes. https://claude.ai/code/session_01SYcKJe4ySZthJSCkKxzbFZ

fix: async agent delegation — config bug, pruning, observability, nor…

…unication-for-tts

Detailed review of codex/implement-async-sub-agent-communication-for-tts. Covers two must-fix bugs (canceled metric inaccuracy, unconditional envelope prompt injection), correctness concerns (race conditions, dead follow-up code), and voice-call pattern improvements (backpressure, deduplication, fallback double-speak). https://claude.ai/code/session_01THZqxrjf9fbMGMDuTJX9Rg

Bugs fixed: - cancelByCall now returns the actual count of jobs canceled (was void); cancelCallJobs and runJob accumulate the real count instead of always adding 1, preventing metric inflation and double-counting - envelopePrompt JSON format instruction is now gated on subagents?.enabled; disabled deployments no longer receive a changed system prompt, and rawText is returned directly without envelope parsing when subagents are off - summaryDelivered flag prevents catch block from delivering fallback speech when onSummaryReady already succeeded, eliminating double-speak - FALLBACK_SPOKEN_SUMMARY result now sets needs_followup: false (was true), which would have triggered spurious follow-up logic once wired - callForEvent guard: cancelCallJobs is now skipped when callForEvent is null (call was never known to the manager) rather than falling back to event.callId, preventing phantom cancels Pattern improvements: - enqueue() enforces per-call queue depth before admitting a new job, providing backpressure when a call already has maxPerCall active jobs - repairPayload is skipped for short/bare outputs (< 20 chars or no JSON-like characters), avoiding a wasted LLM call on "done" / "ok" - onSummaryReady speaks the follow-up question when the specialist signals needs_followup; speak() errors are swallowed inside the callback so they do not propagate into the broker's catch/fallback path - Delegations from the foreground agent are deduplicated by specialist+goal before enqueuing to prevent redundant subagent jobs - Add JSDoc comment documenting the concurrent saveSessionStore race Tests: - cancelByCall mock signatures updated to return number - "cancelCallJobs increments canceled metric" test replaced with accurate count-based assertions and a zero-count case - New backpressure test: enqueue drops when per-call depth >= maxPerCall https://claude.ai/code/session_01THZqxrjf9fbMGMDuTJX9Rg

The `execute` callback in the `registerTool` call had two `noImplicitAny` errors (toolCallId and params) because the function literal's parameter types weren't being inferred from context. Fix: annotate `_toolCallId: string` and `params: Record<string, unknown>`. Using `Record<string, unknown>` rather than `Static<typeof VoiceCallToolSchema>` is correct here because all property accesses in the execute body are already guarded by `typeof` runtime checks, and the discriminated union's last member lacks `action` which would cause cascade type errors with the narrowed type. https://claude.ai/code/session_01THZqxrjf9fbMGMDuTJX9Rg

Lock file was out of sync with apps/web-next/package.json after new frontend dependencies were added (react, vite, tailwind, etc.). https://claude.ai/code/session_01THZqxrjf9fbMGMDuTJX9Rg

… Slack ID normalization Reverts two regressions introduced in commit e95e175 ("Luis/UI redesign feb 21"): 1. workspace.ts: Remove `|| !file.missing` from filterBootstrapFilesForSession — this clause defeated the MINIMAL_BOOTSTRAP_ALLOWLIST by allowing any non-missing extra bootstrap file to pass through for subagent/cron sessions. Only AGENTS.md and TOOLS.md should be included, matching the original strict allowlist filter. 2. targets.ts: Fix normalizeSlackId regex — change `{8,}` minimum-length guard to `[A-Z0-9]*[0-9][A-Z0-9]*` so short Slack IDs like C999, C1, C1A2B3 are uppercased correctly (Slack IDs need at least one digit, not 8+ chars). 3. targets.ts: Fix # handler pattern — change `/^[A-Z0-9]+$/i` to `/^[A-Z0-9_-]+$/i` so #channel-names with hyphens (e.g. #eng-frontend, #does-not-exist) are parsed as name lookups rather than rejected with an ID-requirement error. Fixes 4 test failures in src/slack/targets.test.ts and src/slack/send.blocks.test.ts, and 1 failure in src/hooks/bundled/bootstrap-extra-files/handler.test.ts. https://claude.ai/code/session_01THZqxrjf9fbMGMDuTJX9Rg

The resolveSlackMedia describe block's beforeEach mocked ssrf.resolvePinnedHostname, but fetch-guard.ts calls resolvePinnedHostnameWithPolicy directly (not via resolvePinnedHostname). This left the SSRF guard unmocked, causing real DNS resolution to fail in the test environment and all download-dependent tests to silently return null. Fix: spy on resolvePinnedHostnameWithPolicy, matching the pattern already used correctly in the resolveSlackAttachmentContent describe block added in the same commit (54eaca8). Fixes 6 test failures in src/slack/monitor/media.test.ts. https://claude.ai/code/session_01THZqxrjf9fbMGMDuTJX9Rg

The harness mocked ssrf.resolvePinnedHostname, but fetch-guard.ts calls resolvePinnedHostnameWithPolicy directly. This left the SSRF guard unmocked, causing real DNS lookups for example.com which fail with EAI_AGAIN in the test environment — all 7 tests in cf-markdown and response-limit test files silently errored out. Fix: spy on resolvePinnedHostnameWithPolicy and inject the lookupMock while preserving the rest of the params (policy, etc.) so the existing SSRF logic still runs cleanly with a controlled DNS resolver. Same root cause as the resolveSlackMedia SSRF mock bug fixed in the previous commit: resolvePinnedHostname delegates to resolvePinnedHostnameWithPolicy, so mocking only the wrapper never intercepts the real call site. https://claude.ai/code/session_01THZqxrjf9fbMGMDuTJX9Rg

…st tests Both tests mocked ssrf.resolvePinnedHostname (the wrapper) instead of ssrf.resolvePinnedHostnameWithPolicy (the function fetch-guard.ts calls directly), leaving real DNS resolution in place and causing EAI_AGAIN failures for example.com and api.telegram.org in the test environment. src/web/media.test.ts: Use the delegate-to-real pattern so the SSRF policy checks (private-network blocking, cloud-metadata blocking) still execute correctly, with only DNS resolution mocked. src/telegram/bot.create-telegram-bot.test.ts: Add resolvePinnedHostnameWithPolicy spy scoped to the "buffers channel_post media groups" test, which mocks globalThis.fetch for photo downloads but was missing the DNS mock needed for the SSRF guard pre-flight. https://claude.ai/code/session_01THZqxrjf9fbMGMDuTJX9Rg

dgarson and others added 14 commits February 21, 2026 12:43

test(discord): align loadConfig mock signature with source

6631c87

Merge pull request #34 from dgarson/claude/async-agent-delegation-5SWIg

b6a4660

fix: async agent delegation — config bug, pruning, observability, nor…

Merge branch 'dgarson/fork' into codex/implement-async-sub-agent-comm…

99c2709

…unication-for-tts

dgarson closed this Feb 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Claude/review voice call async x3 ox7#105

Claude/review voice call async x3 ox7#105
dgarson wants to merge 14 commits intodgarson/forkfrom
claude/review-voice-call-async-X3Ox7

dgarson commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dgarson commented Feb 23, 2026

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

User-visible / Behavior Changes

Security Impact (required)

Repro + Verification

Environment

Steps

Expected

Actual

Evidence

Human Verification (required)

Compatibility / Migration

Failure Recovery (if this breaks)

Risks and Mitigations

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants