Skip to content

Commit 4b0d9db

Browse files
committed
feat(bluebubbles): inbound audio enricher (#68719)
1 parent 5d1ba08 commit 4b0d9db

11 files changed

Lines changed: 753 additions & 9 deletions

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ Docs: https://docs.openclaw.ai
66

77
### Changes
88

9+
- BlueBubbles: pre-dispatch transcribe inbound voice notes via the upstream `audio-transcript` endpoint and substitute the transcript for the `<media:audio>` placeholder so agents see what the user said instead of apologizing about an empty body; gated by the new default-on `channels.bluebubbles.inboundAudioEnricher.enabled` flag, detects audio by both MIME and Apple UTIs, falls back silently on older BB Servers, and never logs transcript text. Fixes #68719. Thanks @markthebest12.
910
- Messages/docs: clarify that `BodyForAgent` is the primary inbound model text while `Body` is the legacy envelope fallback, and add Signal coverage so channel hardening patches target the real prompt path. Refs #66198. Thanks @defonota3box.
1011
- Control UI/Usage: add UTC quarter-hour token buckets for the Usage Mosaic and reuse them for hour filtering, keeping the legacy session-span fallback for older summaries. (#74337) Thanks @konanok.
1112

docs/channels/bluebubbles.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -546,6 +546,25 @@ Control whether responses are sent as a single message or streamed in blocks:
546546
- Media cap via `channels.bluebubbles.mediaMaxMb` for inbound and outbound media (default: 8 MB).
547547
- Outbound text is chunked to `channels.bluebubbles.textChunkLimit` (default: 4000 chars).
548548

549+
### Inbound voice notes (audio transcripts)
550+
551+
When BlueBubbles Server v1.14.0+ delivers an iMessage voice note, OpenClaw fetches the transcript Apple already produced on-device and substitutes it for the `<media:audio>` placeholder before the agent sees the message.
552+
553+
- No third-party speech-to-text provider is required and no extra cost is incurred — the text comes from Apple's built-in dictation that ran on the sending device.
554+
- Audio attachments are detected by both MIME (`audio/*`) and Apple UTIs (`public.audio`, `public.mpeg-4-audio`, `com.apple.m4a-audio`, `com.apple.coreaudio-format`).
555+
- The transcript text is treated as PII: it never appears in verbose logs, only the character length is recorded.
556+
- Older BB Servers reply 404 to the `audio-transcript` endpoint; the call is best-effort and silently falls back to the placeholder body when unavailable.
557+
- If the user typed text alongside the voice note, the original text is preserved.
558+
559+
Disable per account:
560+
561+
```yaml
562+
channels:
563+
bluebubbles:
564+
inboundAudioEnricher:
565+
enabled: false
566+
```
567+
549568
## Configuration reference
550569
551570
Full configuration: [Configuration](/gateway/configuration)
@@ -577,6 +596,8 @@ Full configuration: [Configuration](/gateway/configuration)
577596
</Accordion>
578597
<Accordion title="Media and history">
579598
- `channels.bluebubbles.mediaMaxMb`: Inbound/outbound media cap in MB (default: 8).
599+
- `channels.bluebubbles.inboundAudioEnricher.enabled`: Pre-dispatch transcription of inbound voice notes via the upstream BlueBubbles `audio-transcript` endpoint (default: `true`). See [Inbound voice notes](#inbound-voice-notes-audio-transcripts).
600+
- `channels.bluebubbles.inboundAudioEnricher.perType.audio`: Per-type opt-out for audio transcription (default: enabled when the parent flag is on).
580601
- `channels.bluebubbles.mediaLocalRoots`: Explicit allowlist of absolute local directories permitted for outbound local media paths. Local path sends are denied by default unless this is configured. Per-account override: `channels.bluebubbles.accounts.<accountId>.mediaLocalRoots`.
581602
- `channels.bluebubbles.coalesceSameSenderDms`: Merge consecutive same-sender DM webhooks into one agent turn so Apple's text+URL split-send arrives as a single message (default: `false`). See [Coalescing split-send DMs](#coalescing-split-send-dms-command--url-in-one-composition) for scenarios, window tuning, and trade-offs. Widens the default inbound debounce window from 500 ms to 2500 ms when enabled without an explicit `messages.inbound.byChannel.bluebubbles`.
582603
- `channels.bluebubbles.historyLimit`: Max group messages for context (0 disables).

extensions/bluebubbles/src/client.test.ts

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -497,6 +497,83 @@ describe("client.getMessageAttachments", () => {
497497
});
498498
});
499499

500+
// --- Audio transcript ------------------------------------------------------
501+
502+
describe("client.getAudioTranscript (#68719)", () => {
503+
it("returns the transcript string from a `data: <string>` envelope", async () => {
504+
mockFetch.mockResolvedValue(
505+
new Response(JSON.stringify({ data: " Pick up milk on the way home " }), {
506+
status: 200,
507+
headers: { "content-type": "application/json" },
508+
}),
509+
);
510+
const client = createBlueBubblesClient({
511+
serverUrl: "http://localhost:1234",
512+
password: "s3cret",
513+
});
514+
const transcript = await client.getAudioTranscript({ messageGuid: "msg-audio-1" });
515+
expect(transcript).toBe("Pick up milk on the way home");
516+
expect(String(mockFetch.mock.calls[0]?.[0])).toContain(
517+
"/api/v1/message/audio-transcript/msg-audio-1",
518+
);
519+
});
520+
521+
it("returns the transcript from a `data: { transcript }` envelope", async () => {
522+
mockFetch.mockResolvedValue(
523+
new Response(JSON.stringify({ data: { transcript: "hello world" } }), {
524+
status: 200,
525+
headers: { "content-type": "application/json" },
526+
}),
527+
);
528+
const client = createBlueBubblesClient({
529+
serverUrl: "http://localhost:1234",
530+
password: "s3cret",
531+
});
532+
expect(await client.getAudioTranscript({ messageGuid: "g" })).toBe("hello world");
533+
});
534+
535+
it("returns null on 404 (older BB Server, endpoint unavailable)", async () => {
536+
mockFetch.mockResolvedValue(new Response("not found", { status: 404 }));
537+
const client = createBlueBubblesClient({
538+
serverUrl: "http://localhost:1234",
539+
password: "s3cret",
540+
});
541+
expect(await client.getAudioTranscript({ messageGuid: "g" })).toBeNull();
542+
});
543+
544+
it("returns null on transport error rather than throwing", async () => {
545+
mockFetch.mockRejectedValue(new Error("ECONNREFUSED"));
546+
const client = createBlueBubblesClient({
547+
serverUrl: "http://localhost:1234",
548+
password: "s3cret",
549+
});
550+
expect(await client.getAudioTranscript({ messageGuid: "g" })).toBeNull();
551+
});
552+
553+
it("returns null when the response body is empty/whitespace", async () => {
554+
mockFetch.mockResolvedValue(
555+
new Response(JSON.stringify({ data: " " }), {
556+
status: 200,
557+
headers: { "content-type": "application/json" },
558+
}),
559+
);
560+
const client = createBlueBubblesClient({
561+
serverUrl: "http://localhost:1234",
562+
password: "s3cret",
563+
});
564+
expect(await client.getAudioTranscript({ messageGuid: "g" })).toBeNull();
565+
});
566+
567+
it("returns null when the message guid is empty", async () => {
568+
const client = createBlueBubblesClient({
569+
serverUrl: "http://localhost:1234",
570+
password: "s3cret",
571+
});
572+
expect(await client.getAudioTranscript({ messageGuid: " " })).toBeNull();
573+
expect(mockFetch).not.toHaveBeenCalled();
574+
});
575+
});
576+
500577
// --- Cache + invalidation --------------------------------------------------
501578

502579
describe("client cache", () => {

extensions/bluebubbles/src/client.ts

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -356,6 +356,58 @@ export class BlueBubblesClient {
356356

357357
// --- Attachments (fixes #34749) -----------------------------------------
358358

359+
/**
360+
* GET /api/v1/message/audio-transcript/{guid}. Returns Apple's on-device
361+
* voice-note transcript when BB Server is recent enough (>= 1.14.0) and the
362+
* referenced message is an audio message. Older BB versions reply 404, in
363+
* which case we treat the call as unavailable rather than an error so the
364+
* inbound enricher can fall back to the placeholder body. (#68719)
365+
*
366+
* Returns `null` when the endpoint is unavailable, the message is not audio,
367+
* or the response cannot be parsed. Never throws on transport/HTTP errors;
368+
* the caller decides whether the absence of a transcript is fatal.
369+
*/
370+
async getAudioTranscript(params: {
371+
messageGuid: string;
372+
timeoutMs?: number;
373+
}): Promise<string | null> {
374+
const guid = params.messageGuid.trim();
375+
if (!guid) {
376+
return null;
377+
}
378+
let response: Response;
379+
let raw: unknown;
380+
try {
381+
const result = await this.requestJson({
382+
method: "GET",
383+
path: `/api/v1/message/audio-transcript/${encodeURIComponent(guid)}`,
384+
timeoutMs: params.timeoutMs,
385+
});
386+
response = result.response;
387+
raw = result.data;
388+
} catch {
389+
return null;
390+
}
391+
if (!response.ok || raw === null || typeof raw !== "object") {
392+
return null;
393+
}
394+
const inner = (raw as { data?: unknown }).data;
395+
if (typeof inner === "string") {
396+
const trimmed = inner.trim();
397+
return trimmed.length > 0 ? trimmed : null;
398+
}
399+
if (inner && typeof inner === "object") {
400+
const { transcript, text } = inner as { transcript?: unknown; text?: unknown };
401+
const candidate =
402+
typeof transcript === "string" ? transcript : typeof text === "string" ? text : null;
403+
if (candidate !== null) {
404+
const trimmed = candidate.trim();
405+
return trimmed.length > 0 ? trimmed : null;
406+
}
407+
}
408+
return null;
409+
}
410+
359411
/**
360412
* GET /api/v1/message/{guid} to read attachment metadata. BlueBubbles may
361413
* fire `new-message` before attachment indexing completes, so this re-reads

extensions/bluebubbles/src/config-schema.ts

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,25 @@ const bluebubblesNetworkSchema = z
4646
.strict()
4747
.optional();
4848

49+
const bluebubblesInboundAudioEnricherSchema = z
50+
.object({
51+
/**
52+
* When true (default) BlueBubbles voice notes are transcribed via the
53+
* upstream BB Server's `audio-transcript` endpoint and the message body
54+
* is rewritten before agent dispatch. (#68719)
55+
*/
56+
enabled: z.boolean().optional(),
57+
/** Per-attachment-type opt-outs. Today only `audio` is honored. */
58+
perType: z
59+
.object({
60+
audio: z.boolean().optional(),
61+
})
62+
.strict()
63+
.optional(),
64+
})
65+
.strict()
66+
.optional();
67+
4968
const bluebubblesCatchupSchema = z
5069
.object({
5170
/** Replay messages delivered while the gateway was unreachable. Defaults to on. */
@@ -92,6 +111,7 @@ const bluebubblesAccountSchema = z
92111
sendReadReceipts: z.boolean().optional(),
93112
network: bluebubblesNetworkSchema,
94113
catchup: bluebubblesCatchupSchema,
114+
inboundAudioEnricher: bluebubblesInboundAudioEnricherSchema,
95115
blockStreaming: z.boolean().optional(),
96116
groups: z.object({}).catchall(bluebubblesGroupConfigSchema).optional(),
97117
coalesceSameSenderDms: z.boolean().optional(),

0 commit comments

Comments
 (0)