Skip to content

Claude/review voice call async x3 ox7#105

Closed
dgarson wants to merge 14 commits intodgarson/forkfrom
claude/review-voice-call-async-X3Ox7
Closed

Claude/review voice call async x3 ox7#105
dgarson wants to merge 14 commits intodgarson/forkfrom
claude/review-voice-call-async-X3Ox7

Conversation

@dgarson
Copy link
Copy Markdown
Owner

@dgarson dgarson commented Feb 23, 2026

Summary

Describe the problem and fix in 2–5 bullets:

  • Problem:
  • Why it matters:
  • What changed:
  • What did NOT change (scope boundary):

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #
  • Related #

User-visible / Behavior Changes

List user-visible changes (including defaults/config).
If none, write None.

Security Impact (required)

  • New permissions/capabilities? (Yes/No)
  • Secrets/tokens handling changed? (Yes/No)
  • New/changed network calls? (Yes/No)
  • Command/tool execution surface changed? (Yes/No)
  • Data access scope changed? (Yes/No)
  • If any Yes, explain risk + mitigation:

Repro + Verification

Environment

  • OS:
  • Runtime/container:
  • Model/provider:
  • Integration/channel (if any):
  • Relevant config (redacted):

Steps

Expected

Actual

Evidence

Attach at least one:

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

  • Verified scenarios:
  • Edge cases checked:
  • What you did not verify:

Compatibility / Migration

  • Backward compatible? (Yes/No)
  • Config/env changes? (Yes/No)
  • Migration needed? (Yes/No)
  • If yes, exact upgrade steps:

Failure Recovery (if this breaks)

  • How to disable/revert this change quickly:
  • Files/config to restore:
  • Known bad symptoms reviewers should watch for:

Risks and Mitigations

List only real risks for this PR. Add/remove entries as needed. If none, write None.

  • Risk:
    • Mitigation:

dgarson and others added 14 commits February 21, 2026 12:43
…malization, prompts

- Fix maxPerCall config bug: broker was reading maxConcurrency for both
  limits; add dedicated maxPerCall field to VoiceCallSubagentConfigSchema
- Replace O(n) pruning on every state transition with lazy threshold-based
  pruning (amortized via PRUNE_THRESHOLD=16)
- Add BrokerMetrics type and metrics getter for observability: enqueued,
  completed, failed, expired, canceled, fallbacksSpoken, repairAttempts,
  repairSuccesses, totalExecutionMs
- Improve JSON normalization pipeline: strip markdown fences, trailing
  commas, single-line comments; case-insensitive key matching; expanded
  alias sets; export extractFirstJsonObject, parseBoolean, parseNumber
- Lighten repairPayload: thinkLevel=off, 4s timeout, truncate input to
  1000 chars, pass context through to avoid redundant loadCoreAgentDeps
- Clean up orphaned session store entries and temp files after job
  completion (finally block with fs.unlinkSync)
- Move foreground envelope prompt to end of system prompt with structured
  RESPONSE FORMAT heading; wrap conversation history in XML tags
- Add specialist-specific prompts (SPECIALIST_PROMPTS map) for research,
  scheduler, and policy roles
- Expand test coverage: 55 tests across broker and normalization suites
  covering pruning, metrics, deadline clamping, JSON extraction edge
  cases, alias resolution, type coercion, and specialist prompts

https://claude.ai/code/session_01SYcKJe4ySZthJSCkKxzbFZ
…onflict

- Wrap speakFallbackIfActive in try/catch inside the runJob catch block
  to prevent unhandled promise rejections when onSummaryReady is broken
- Check remaining time before applying Math.max(1000ms) clamp so jobs
  that have already expired hit the expiry branch instead of launching
  a 1s LLM call past their deadline
- Replace "use tools when helpful" with delegation guidance in the
  foreground voice prompt to avoid contradicting the JSON-only envelope
  instruction — tools belong in the specialist background lane

https://claude.ai/code/session_01SYcKJe4ySZthJSCkKxzbFZ
Replace naive regex-based stripTrailingCommas and stripJsonComments with
character-walking implementations that track whether we're inside a JSON
string literal, so commas and // sequences inside string values are never
corrupted.

Move session store cleanup into the finally block of runJob so entries are
removed on both success and failure paths.  Add reapOrphanedSessions()
method that sweeps stale voice-subagent:* keys older than a configurable
threshold, called at webhook server startup to catch entries orphaned by
crashes.

https://claude.ai/code/session_01SYcKJe4ySZthJSCkKxzbFZ
fix: async agent delegation — config bug, pruning, observability, nor…
Detailed review of codex/implement-async-sub-agent-communication-for-tts.
Covers two must-fix bugs (canceled metric inaccuracy, unconditional envelope
prompt injection), correctness concerns (race conditions, dead follow-up code),
and voice-call pattern improvements (backpressure, deduplication, fallback
double-speak).

https://claude.ai/code/session_01THZqxrjf9fbMGMDuTJX9Rg
Bugs fixed:
- cancelByCall now returns the actual count of jobs canceled (was void);
  cancelCallJobs and runJob accumulate the real count instead of always
  adding 1, preventing metric inflation and double-counting
- envelopePrompt JSON format instruction is now gated on
  subagents?.enabled; disabled deployments no longer receive a changed
  system prompt, and rawText is returned directly without envelope
  parsing when subagents are off
- summaryDelivered flag prevents catch block from delivering fallback
  speech when onSummaryReady already succeeded, eliminating double-speak
- FALLBACK_SPOKEN_SUMMARY result now sets needs_followup: false (was
  true), which would have triggered spurious follow-up logic once wired
- callForEvent guard: cancelCallJobs is now skipped when callForEvent
  is null (call was never known to the manager) rather than falling back
  to event.callId, preventing phantom cancels

Pattern improvements:
- enqueue() enforces per-call queue depth before admitting a new job,
  providing backpressure when a call already has maxPerCall active jobs
- repairPayload is skipped for short/bare outputs (< 20 chars or no
  JSON-like characters), avoiding a wasted LLM call on "done" / "ok"
- onSummaryReady speaks the follow-up question when the specialist
  signals needs_followup; speak() errors are swallowed inside the
  callback so they do not propagate into the broker's catch/fallback path
- Delegations from the foreground agent are deduplicated by
  specialist+goal before enqueuing to prevent redundant subagent jobs
- Add JSDoc comment documenting the concurrent saveSessionStore race

Tests:
- cancelByCall mock signatures updated to return number
- "cancelCallJobs increments canceled metric" test replaced with
  accurate count-based assertions and a zero-count case
- New backpressure test: enqueue drops when per-call depth >= maxPerCall

https://claude.ai/code/session_01THZqxrjf9fbMGMDuTJX9Rg
The `execute` callback in the `registerTool` call had two `noImplicitAny`
errors (toolCallId and params) because the function literal's parameter
types weren't being inferred from context.

Fix: annotate `_toolCallId: string` and `params: Record<string, unknown>`.
Using `Record<string, unknown>` rather than `Static<typeof VoiceCallToolSchema>`
is correct here because all property accesses in the execute body are already
guarded by `typeof` runtime checks, and the discriminated union's last member
lacks `action` which would cause cascade type errors with the narrowed type.

https://claude.ai/code/session_01THZqxrjf9fbMGMDuTJX9Rg
Lock file was out of sync with apps/web-next/package.json after new
frontend dependencies were added (react, vite, tailwind, etc.).

https://claude.ai/code/session_01THZqxrjf9fbMGMDuTJX9Rg
… Slack ID normalization

Reverts two regressions introduced in commit e95e175 ("Luis/UI redesign feb 21"):

1. workspace.ts: Remove `|| !file.missing` from filterBootstrapFilesForSession —
   this clause defeated the MINIMAL_BOOTSTRAP_ALLOWLIST by allowing any non-missing
   extra bootstrap file to pass through for subagent/cron sessions. Only AGENTS.md
   and TOOLS.md should be included, matching the original strict allowlist filter.

2. targets.ts: Fix normalizeSlackId regex — change `{8,}` minimum-length guard to
   `[A-Z0-9]*[0-9][A-Z0-9]*` so short Slack IDs like C999, C1, C1A2B3 are
   uppercased correctly (Slack IDs need at least one digit, not 8+ chars).

3. targets.ts: Fix # handler pattern — change `/^[A-Z0-9]+$/i` to `/^[A-Z0-9_-]+$/i`
   so #channel-names with hyphens (e.g. #eng-frontend, #does-not-exist) are parsed
   as name lookups rather than rejected with an ID-requirement error.

Fixes 4 test failures in src/slack/targets.test.ts and src/slack/send.blocks.test.ts,
and 1 failure in src/hooks/bundled/bootstrap-extra-files/handler.test.ts.

https://claude.ai/code/session_01THZqxrjf9fbMGMDuTJX9Rg
The resolveSlackMedia describe block's beforeEach mocked
ssrf.resolvePinnedHostname, but fetch-guard.ts calls
resolvePinnedHostnameWithPolicy directly (not via resolvePinnedHostname).
This left the SSRF guard unmocked, causing real DNS resolution to fail
in the test environment and all download-dependent tests to silently
return null.

Fix: spy on resolvePinnedHostnameWithPolicy, matching the pattern already
used correctly in the resolveSlackAttachmentContent describe block added
in the same commit (54eaca8).

Fixes 6 test failures in src/slack/monitor/media.test.ts.

https://claude.ai/code/session_01THZqxrjf9fbMGMDuTJX9Rg
The harness mocked ssrf.resolvePinnedHostname, but fetch-guard.ts calls
resolvePinnedHostnameWithPolicy directly. This left the SSRF guard unmocked,
causing real DNS lookups for example.com which fail with EAI_AGAIN in the
test environment — all 7 tests in cf-markdown and response-limit test files
silently errored out.

Fix: spy on resolvePinnedHostnameWithPolicy and inject the lookupMock while
preserving the rest of the params (policy, etc.) so the existing SSRF logic
still runs cleanly with a controlled DNS resolver.

Same root cause as the resolveSlackMedia SSRF mock bug fixed in the previous
commit: resolvePinnedHostname delegates to resolvePinnedHostnameWithPolicy,
so mocking only the wrapper never intercepts the real call site.

https://claude.ai/code/session_01THZqxrjf9fbMGMDuTJX9Rg
…st tests

Both tests mocked ssrf.resolvePinnedHostname (the wrapper) instead of
ssrf.resolvePinnedHostnameWithPolicy (the function fetch-guard.ts calls
directly), leaving real DNS resolution in place and causing EAI_AGAIN
failures for example.com and api.telegram.org in the test environment.

src/web/media.test.ts: Use the delegate-to-real pattern so the SSRF policy
checks (private-network blocking, cloud-metadata blocking) still execute
correctly, with only DNS resolution mocked.

src/telegram/bot.create-telegram-bot.test.ts: Add resolvePinnedHostnameWithPolicy
spy scoped to the "buffers channel_post media groups" test, which mocks
globalThis.fetch for photo downloads but was missing the DNS mock needed
for the SSRF guard pre-flight.

https://claude.ai/code/session_01THZqxrjf9fbMGMDuTJX9Rg
@dgarson dgarson closed this Feb 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants