feat(bluebubbles): inbound audio enricher (#68719)#75450
feat(bluebubbles): inbound audio enricher (#68719)#75450omarshahine wants to merge 1 commit intoopenclaw:mainfrom
Conversation
|
Codex review: found issues before merge. What this changes: Adds a BlueBubbles inbound voice-note enricher that fetches an Maintainer follow-up before merge: This implementation has a mechanical generated-metadata blocker, but the PR also has explicit maintainer asks and an unresolved upstream endpoint/default-on contract that should be decided by a maintainer before automation attempts a replacement or fixup branch. Security review: Security review cleared: No concrete security or supply-chain regression was found in the diff; the new request uses the existing BlueBubbles client SSRF/auth path, encodes the message GUID, adds no dependencies or workflows, and avoids logging transcript text. Review findings:
Review detailsBest possible solution: Keep the extension-owned approach, but regenerate the bundled channel config metadata and any required config baselines, then confirm the BlueBubbles endpoint against a real supported server or adjust docs/default behavior to the actual shipped upstream contract before maintainer approval. Do we have a high-confidence way to reproduce the issue? Do we have a high-confidence way to reproduce the issue? Yes for the current-main gap and the generated-metadata drift: Is this the best way to solve the issue? Is this the best way to solve the issue? Unclear in its current form: keeping the behavior in the BlueBubbles plugin and using the existing Full review comments:
Overall correctness: patch is incorrect Acceptance criteria:
What I checked:
Likely related people:
Remaining risk / open question:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 42d73fd955af. |
|
Closing this PR — verified that the upstream BB endpoint this PR depends on does not exist (see issue comment). BB Server tops out at v1.9.9 with no The clawsweeper review's P2/2 was correct (0.76 confidence on the upstream-proof concern was, if anything, too cautious). Salvaging the UTI-aware audio attachment detection — a real gap (BlueBubbles webhooks sometimes deliver voice notes with |
Summary
Adds a pre-dispatch transcript enricher for inbound BlueBubbles voice notes (#68719). When BB Server v1.14.0+ delivers an iMessage voice note, OpenClaw now fetches Apple's free on-device dictation transcript via
GET /api/v1/message/audio-transcript/:guidand substitutes it for the<media:audio>placeholder before agent dispatch — so agents see what the user said instead of apologizing about an empty body.BlueBubblesClient.getAudioTranscript()method. Returnsnullon 404 (older BB), transport error, or empty body — best-effort, never throws.isBlueBubblesAudioAttachment): MIMEaudio/*plus Apple UTIspublic.audio,public.mpeg-4-audio,com.apple.m4a-audio,com.apple.coreaudio-format.inbound-audio-enricher.tsmodule with structured outcome (applied | no-audio | no-transcript | disabled | skipped) so the call site can log a precise verbose breadcrumb.<media:attachment>placeholder so the image cue isn't masked.monitor-processing.tsbetween attachment download and envelope formatting; preserves the original placeholder for cache/dedupe paths so only the agent-facing body changes.createBlueBubblesClient) so same-account inbound bursts don't repeat SSRF policy / auth construction per message.channels.bluebubbles.inboundAudioEnricher: { enabled?: boolean (default true), perType?: { audio?: boolean } }. Strict zod schema, types, and docs added.Hook contract
No new public Plugin SDK seam. The internal
message:transcribedhook already fires automatically whenctx.Transcriptis populated (src/auto-reply/reply/message-preprocess-hooks.ts:26), and plugins already see transcript text via the existinginbound:claimevent metadata. This PR just populatesctx.Transcriptfrom the BB endpoint pre-dispatch — no new SDK surface, no baseline regen needed.This sidesteps the
clawsweeperreview's flagged maintainer-approval ask on a new public hook contract.Maintainer asks
true. On older BB Servers (< 1.14.0) the endpoint returns 404 and we silently no-op with a verbose breadcrumb; on >= 1.14.0 the transcript replaces the placeholder. No per-message version probe — accepted cost is one extra HTTP round-trip per audio message on legacy BB. Acceptable, or do you want a per-account "transcripts unavailable" memoization?<media:image>cue on mixed sends. Alternative: always enrich first audio + append transcript to mixed placeholder. Speak up if you'd prefer the latter.chars=<n>+ sanitized message id.Live verification
Deployed via fast-patch on a maintainer host running BB Server
1.9.9(well below the 1.14.0 floor). Withlogging.level=debug:Direct probe of
GET /api/v1/message/audio-transcript/<guid>against BB 1.9.9 returnedHTTP 404 / Not Foundas expected. Patched code path executed, gracefully fell back to existing media-understanding pipeline, no behavioral regression. Success path can't be validated end-to-end on this host without a BB upgrade — covered by unit tests + the structured outcome contract.Test plan
pnpm test extensions/bluebubbles— 585/585 pass (11 new tests)pnpm tsgo:extensions+pnpm tsgo:extensions:test— cleanpnpm check:changed— green (typecheck, oxlint 0/0, conflict markers, attributions, runtime sidecar guard, import cycles, dup scan)pnpm exec oxfmt --check— cleanCloses
#68719
🤖 Generated with Claude Code