Skip to content

feat(bluebubbles): inbound audio enricher (#68719)#75450

Closed
omarshahine wants to merge 1 commit intoopenclaw:mainfrom
omarshahine:feat/bb-inbound-audio-enricher
Closed

feat(bluebubbles): inbound audio enricher (#68719)#75450
omarshahine wants to merge 1 commit intoopenclaw:mainfrom
omarshahine:feat/bb-inbound-audio-enricher

Conversation

@omarshahine
Copy link
Copy Markdown
Contributor

Summary

Adds a pre-dispatch transcript enricher for inbound BlueBubbles voice notes (#68719). When BB Server v1.14.0+ delivers an iMessage voice note, OpenClaw now fetches Apple's free on-device dictation transcript via GET /api/v1/message/audio-transcript/:guid and substitutes it for the <media:audio> placeholder before agent dispatch — so agents see what the user said instead of apologizing about an empty body.

  • New BlueBubblesClient.getAudioTranscript() method. Returns null on 404 (older BB), transport error, or empty body — best-effort, never throws.
  • UTI-aware audio detection (isBlueBubblesAudioAttachment): MIME audio/* plus Apple UTIs public.audio, public.mpeg-4-audio, com.apple.m4a-audio, com.apple.coreaudio-format.
  • New inbound-audio-enricher.ts module with structured outcome (applied | no-audio | no-transcript | disabled | skipped) so the call site can log a precise verbose breadcrumb.
  • Gated on all-audio attachments only — mixed audio+image keeps the existing <media:attachment> placeholder so the image cue isn't masked.
  • Wired into monitor-processing.ts between attachment download and envelope formatting; preserves the original placeholder for cache/dedupe paths so only the agent-facing body changes.
  • Cached BB client (createBlueBubblesClient) so same-account inbound bursts don't repeat SSRF policy / auth construction per message.
  • Config: channels.bluebubbles.inboundAudioEnricher: { enabled?: boolean (default true), perType?: { audio?: boolean } }. Strict zod schema, types, and docs added.

Hook contract

No new public Plugin SDK seam. The internal message:transcribed hook already fires automatically when ctx.Transcript is populated (src/auto-reply/reply/message-preprocess-hooks.ts:26), and plugins already see transcript text via the existing inbound:claim event metadata. This PR just populates ctx.Transcript from the BB endpoint pre-dispatch — no new SDK surface, no baseline regen needed.

This sidesteps the clawsweeper review's flagged maintainer-approval ask on a new public hook contract.

Maintainer asks

  1. Default-on behavior. The flag defaults to true. On older BB Servers (< 1.14.0) the endpoint returns 404 and we silently no-op with a verbose breadcrumb; on >= 1.14.0 the transcript replaces the placeholder. No per-message version probe — accepted cost is one extra HTTP round-trip per audio message on legacy BB. Acceptable, or do you want a per-account "transcripts unavailable" memoization?
  2. Mixed-attachment behavior. Chose conservative gating (only enrich when all attachments are audio) to preserve the <media:image> cue on mixed sends. Alternative: always enrich first audio + append transcript to mixed placeholder. Speak up if you'd prefer the latter.
  3. PII. Transcript text is never logged. Verbose breadcrumb records only chars=<n> + sanitized message id.

Live verification

Deployed via fast-patch on a maintainer host running BB Server 1.9.9 (well below the 1.14.0 floor). With logging.level=debug:

2026-04-30T22:47:16 [bluebubbles] inbound audio enricher: no transcript msgId=CF5472DA-... attachments=1

Direct probe of GET /api/v1/message/audio-transcript/<guid> against BB 1.9.9 returned HTTP 404 / Not Found as expected. Patched code path executed, gracefully fell back to existing media-understanding pipeline, no behavioral regression. Success path can't be validated end-to-end on this host without a BB upgrade — covered by unit tests + the structured outcome contract.

Test plan

  • pnpm test extensions/bluebubbles — 585/585 pass (11 new tests)
  • pnpm tsgo:extensions + pnpm tsgo:extensions:test — clean
  • pnpm check:changed — green (typecheck, oxlint 0/0, conflict markers, attributions, runtime sidecar guard, import cycles, dup scan)
  • pnpm exec oxfmt --check — clean
  • Live: BB 1.9.9 host, fast-patched gateway, voice note → 404 graceful fall-through verified via verbose log breadcrumb
  • Live: BB >= 1.14.0 host with on-device-dictated voice note (success path) — needs maintainer or contributor with a recent BB Server

Closes

#68719

🤖 Generated with Claude Code

@openclaw-barnacle openclaw-barnacle Bot added docs Improvements or additions to documentation channel: bluebubbles Channel integration: bluebubbles size: L maintainer Maintainer-authored PR labels May 1, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 1, 2026

Codex review: found issues before merge.

What this changes:

Adds a BlueBubbles inbound voice-note enricher that fetches an audio-transcript endpoint before dispatch, rewrites agent-facing message text, adds UTI-aware audio detection, configuration/docs, and unit coverage.

Maintainer follow-up before merge:

This implementation has a mechanical generated-metadata blocker, but the PR also has explicit maintainer asks and an unresolved upstream endpoint/default-on contract that should be decided by a maintainer before automation attempts a replacement or fixup branch.

Security review:

Security review cleared: No concrete security or supply-chain regression was found in the diff; the new request uses the existing BlueBubbles client SSRF/auth path, encodes the message GUID, adds no dependencies or workflows, and avoids logging transcript text.

Review findings:

  • [P2] Regenerate bundled channel config metadata — extensions/bluebubbles/src/config-schema.ts:111
  • [P2] Prove the BlueBubbles transcript endpoint before shipping — docs/channels/bluebubbles.md:550
Review details

Best possible solution:

Keep the extension-owned approach, but regenerate the bundled channel config metadata and any required config baselines, then confirm the BlueBubbles endpoint against a real supported server or adjust docs/default behavior to the actual shipped upstream contract before maintainer approval.

Do we have a high-confidence way to reproduce the issue?

Do we have a high-confidence way to reproduce the issue? Yes for the current-main gap and the generated-metadata drift: rg shows the feature absent on main, and the generated BlueBubbles schema lacks the newly documented key while validation consumes that generated schema. The successful transcript-fetch path remains unclear because it is only unit-mocked and not live-verified in the PR body.

Is this the best way to solve the issue?

Is this the best way to solve the issue? Unclear in its current form: keeping the behavior in the BlueBubbles plugin and using the existing ctx.Transcript hook is the right boundary, but the generated config surface and upstream endpoint proof need to be fixed before this is a maintainable default-on feature.

Full review comments:

  • [P2] Regenerate bundled channel config metadata — extensions/bluebubbles/src/config-schema.ts:111
    Adding inboundAudioEnricher only to the runtime zod schema leaves the generated BlueBubbles channel schema stale. Config validation seeds bundled channel schemas from src/config/bundled-channel-config-metadata.generated.ts and that generated schema still has additionalProperties: false, so channels.bluebubbles.inboundAudioEnricher can be rejected or omitted from schema/UI surfaces. Please run the channel config metadata generator and commit the generated output.
    Confidence: 0.91
  • [P2] Prove the BlueBubbles transcript endpoint before shipping — docs/channels/bluebubbles.md:550
    The docs promise BlueBubbles Server v1.14.0+ support for /api/v1/message/audio-transcript/:guid, but the public BlueBubbles Server releases I inspected currently top out at v1.9.9 and the public message router source does not expose this route. Since the PR only live-verifies the 404 fallback, this may ship a default-on feature that silently no-ops for current upstream users. Please link or test against the upstream implementation/release, or scope the docs/default to the patched deployment that actually provides the endpoint.
    Confidence: 0.76

Overall correctness: patch is incorrect
Overall confidence: 0.88

Acceptance criteria:

  • pnpm config:channels:check
  • pnpm config:schema:check
  • pnpm config:docs:check
  • pnpm test extensions/bluebubbles
  • pnpm tsgo:extensions

What I checked:

  • PR diff inspected: The diff adds channels.bluebubbles.inboundAudioEnricher, the new audio transcript client/enricher path, monitor-processing wiring, docs, changelog, and tests. (extensions/bluebubbles/src/config-schema.ts:111, 4b0d9db0caee)
  • Generated channel metadata is stale: The generated BlueBubbles schema still ends its properties without inboundAudioEnricher and keeps additionalProperties: false, so the documented opt-out field is absent from generated metadata. (src/config/bundled-channel-config-metadata.generated.ts:641, ffcc0d1fe171)
  • Validation uses generated channel schemas: Config validation seeds bundled channel schemas from GENERATED_BUNDLED_CHANNEL_CONFIG_METADATA and validates channels.<id> with that JSON schema, so stale generated metadata can reject the new BlueBubbles config key. (src/config/validation.ts:928, ffcc0d1fe171)
  • Runtime config schema also relies on metadata: Runtime config schema loading avoids importing bundled channel config-schema modules on schema requests and builds channel schemas from collected manifest/metadata entries. (src/config/runtime-schema.ts:18, ffcc0d1fe171)
  • Current main lacks this feature: Current main has no inboundAudioEnricher, audio-transcript, getAudioTranscript, or BlueBubbles-specific audio attachment helper in OpenClaw code/docs. (ffcc0d1fe171)
  • Existing transcript hook path supports the PR direction: The existing pre-agent hook emits message:transcribed when ctx.Transcript is populated, so the PR's no-new-SDK-seam approach matches current hook plumbing. (src/auto-reply/reply/message-preprocess-hooks.ts:25, ffcc0d1fe171)

Likely related people:

  • @steipete: Recent current-main commits touched the BlueBubbles monitor/client/config surface and generated config metadata/validation-adjacent paths, including the latest broad BlueBubbles refactor visible in blame. (role: recent maintainer and generated-config adjacent owner; confidence: high; commits: 4987482e4c54, 1c300cec5d80; files: extensions/bluebubbles/src/monitor-processing.ts, extensions/bluebubbles/src/client.ts, extensions/bluebubbles/src/config-schema.ts)
  • @coletebou: Introduced the recent BlueBubbles reply-context API fallback across the same account config and monitor-processing area this PR extends. (role: recent BlueBubbles feature owner; confidence: medium; commits: 76930da7ebc7; files: extensions/bluebubbles/src/monitor-processing.ts, extensions/bluebubbles/src/config-schema.ts)
  • @omarshahine: Appears in prior merged BlueBubbles reply-context work as a co-author/reviewer, beyond authoring this PR, so they are a plausible domain follow-up person. (role: adjacent current-main contributor/reviewer; confidence: medium; commits: 76930da7ebc7; files: extensions/bluebubbles/src/monitor-processing.ts, extensions/bluebubbles/src/config-schema.ts)

Remaining risk / open question:

  • The PR's success path depends on an upstream audio-transcript endpoint that was not found in the public BlueBubbles Server source/release data inspected, and the PR body says a successful BB >= claimed-version live run is still missing.
  • The new config key is documented but absent from generated bundled-channel metadata, which can make the opt-out invalid or invisible in config/schema surfaces until regenerated.
  • One public check run for the PR head is failing (check-additional), so merge readiness still needs CI follow-up even after code fixes.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 42d73fd955af.

@omarshahine omarshahine marked this pull request as ready for review May 1, 2026 05:54
@omarshahine
Copy link
Copy Markdown
Contributor Author

Closing this PR — verified that the upstream BB endpoint this PR depends on does not exist (see issue comment). BB Server tops out at v1.9.9 with no audio-transcript route in master or any release.

The clawsweeper review's P2/2 was correct (0.76 confidence on the upstream-proof concern was, if anything, too cautious).

Salvaging the UTI-aware audio attachment detection — a real gap (BlueBubbles webhooks sometimes deliver voice notes with public.audio / com.apple.m4a-audio UTIs but no audio/* MIME, so the placeholder builder misclassifies them) — as a small standalone PR. Closing this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

channel: bluebubbles Channel integration: bluebubbles docs Improvements or additions to documentation maintainer Maintainer-authored PR size: L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant