Skip to content

Commit ffcc0d1

Browse files
committed
fix: delay meet twilio intro speech
1 parent e8810c0 commit ffcc0d1

15 files changed

Lines changed: 354 additions & 21 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ Docs: https://docs.openclaw.ai
1313

1414
### Fixes
1515

16+
- Google Meet/Voice Call: defer Twilio dial-in intro speech until after Meet DTMF entry and route delayed speech through the active realtime Voice Call bridge. Thanks @donkeykong91 and @PfanP.
1617
- Google Meet/Voice Call: make Twilio setup preflight honor explicit `--transport twilio` and fail local/private Voice Call webhook URLs before joins. Thanks @donkeykong91 and @PfanP.
1718
- Voice Call/Twilio: retry transient 21220 live-call TwiML updates and catch answered-path initial-greeting failures, so a fast answered callback no longer crashes the Gateway or drops the Twilio greeting/listen transition. (#74606) Thanks @Sivan22.
1819
- Voice Call/Twilio: register accepted media streams immediately but wait for realtime transcription readiness before speaking the initial greeting, so reconnect grace handling stays live while OpenAI STT startup is no longer starved by TTS. Fixes #75197. (#75257) Thanks @donkeykong91 and @PfanP.

docs/plugins/google-meet.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1411,6 +1411,9 @@ participant:
14111411
the PIN.
14121412
- Increase the leading pauses in `--dtmf-sequence` if Meet answers slowly, for
14131413
example `wwww123456#`.
1414+
- If the participant joins but you miss the first spoken line, increase
1415+
`plugins.entries.google-meet.config.voiceCall.postDtmfSpeechDelayMs` so the
1416+
intro is spoken after Meet finishes admitting the phone participant.
14141417

14151418
If webhooks do not arrive, debug the Voice Call plugin first: the provider must
14161419
reach `plugins.entries.voice-call.config.publicUrl` or the configured tunnel.

docs/plugins/voice-call.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -766,6 +766,11 @@ If Voice Call is green but the Meet participant never joins, check the Meet
766766
dial-in number, PIN, and `--dtmf-sequence`. The phone call can be healthy while
767767
the meeting rejects or ignores an incorrect DTMF sequence.
768768

769+
Google Meet starts Voice Call silently, sends DTMF, then asks Voice Call to
770+
speak the intro after `voiceCall.postDtmfSpeechDelayMs`. Increase that delay in
771+
the Google Meet plugin config if the first line is spoken before Meet admits the
772+
phone participant.
773+
769774
### Realtime call has no speech
770775

771776
Confirm only one audio mode is enabled. `realtime.enabled` and

extensions/google-meet/index.create.test.ts

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,13 @@ import {
1212
import { CREATE_MEET_FROM_BROWSER_SCRIPT } from "./src/transports/chrome-create.js";
1313

1414
const voiceCallMocks = vi.hoisted(() => ({
15-
joinMeetViaVoiceCallGateway: vi.fn(async () => ({ callId: "call-1", dtmfSent: true })),
15+
joinMeetViaVoiceCallGateway: vi.fn(async () => ({
16+
callId: "call-1",
17+
dtmfSent: true,
18+
introSent: true,
19+
})),
1620
endMeetVoiceCallGatewayCall: vi.fn(async () => {}),
21+
speakMeetViaVoiceCallGateway: vi.fn(async () => {}),
1722
}));
1823

1924
const fetchGuardMocks = vi.hoisted(() => ({
@@ -38,6 +43,7 @@ vi.mock("openclaw/plugin-sdk/ssrf-runtime", () => ({
3843
vi.mock("./src/voice-call-gateway.js", () => ({
3944
joinMeetViaVoiceCallGateway: voiceCallMocks.joinMeetViaVoiceCallGateway,
4045
endMeetVoiceCallGatewayCall: voiceCallMocks.endMeetVoiceCallGatewayCall,
46+
speakMeetViaVoiceCallGateway: voiceCallMocks.speakMeetViaVoiceCallGateway,
4147
}));
4248

4349
function setup(

extensions/google-meet/index.test.ts

Lines changed: 41 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -35,8 +35,13 @@ import { buildMeetDtmfSequence, normalizeDialInNumber } from "./src/transports/t
3535
import type { GoogleMeetSession } from "./src/transports/types.js";
3636

3737
const voiceCallMocks = vi.hoisted(() => ({
38-
joinMeetViaVoiceCallGateway: vi.fn(async () => ({ callId: "call-1", dtmfSent: true })),
38+
joinMeetViaVoiceCallGateway: vi.fn(async () => ({
39+
callId: "call-1",
40+
dtmfSent: true,
41+
introSent: true,
42+
})),
3943
endMeetVoiceCallGatewayCall: vi.fn(async () => {}),
44+
speakMeetViaVoiceCallGateway: vi.fn(async () => {}),
4045
}));
4146

4247
const fetchGuardMocks = vi.hoisted(() => ({
@@ -61,6 +66,7 @@ vi.mock("openclaw/plugin-sdk/ssrf-runtime", () => ({
6166
vi.mock("./src/voice-call-gateway.js", () => ({
6267
joinMeetViaVoiceCallGateway: voiceCallMocks.joinMeetViaVoiceCallGateway,
6368
endMeetVoiceCallGatewayCall: voiceCallMocks.endMeetVoiceCallGatewayCall,
69+
speakMeetViaVoiceCallGateway: voiceCallMocks.speakMeetViaVoiceCallGateway,
6470
}));
6571

6672
function setup(
@@ -348,7 +354,12 @@ describe("google-meet plugin", () => {
348354
"BlackHole 2ch",
349355
],
350356
},
351-
voiceCall: { enabled: true, requestTimeoutMs: 30000, dtmfDelayMs: 2500 },
357+
voiceCall: {
358+
enabled: true,
359+
requestTimeoutMs: 30000,
360+
dtmfDelayMs: 2500,
361+
postDtmfSpeechDelayMs: 5000,
362+
},
352363
realtime: {
353364
provider: "openai",
354365
introMessage: "Say exactly: I'm here and listening.",
@@ -955,12 +966,14 @@ describe("google-meet plugin", () => {
955966
dtmfSequence: "123456#",
956967
voiceCallId: "call-1",
957968
dtmfSent: true,
969+
introSent: true,
958970
},
959971
});
960972
expect(voiceCallMocks.joinMeetViaVoiceCallGateway).toHaveBeenCalledWith({
961973
config: expect.objectContaining({ defaultTransport: "twilio" }),
962974
dialInNumber: "+15551234567",
963975
dtmfSequence: "123456#",
976+
message: "Say exactly: I'm here and listening.",
964977
});
965978
});
966979

@@ -984,6 +997,32 @@ describe("google-meet plugin", () => {
984997
});
985998
});
986999

1000+
it("delegates Twilio session speech through voice-call", async () => {
1001+
const { tools } = setup({ defaultTransport: "twilio" });
1002+
const tool = tools[0] as {
1003+
execute: (id: string, params: unknown) => Promise<{ details: { session: { id: string } } }>;
1004+
};
1005+
const joined = await tool.execute("id", {
1006+
action: "join",
1007+
url: "https://meet.google.com/abc-defg-hij",
1008+
dialInNumber: "+15551234567",
1009+
pin: "123456",
1010+
});
1011+
1012+
const spoken = await tool.execute("id", {
1013+
action: "speak",
1014+
sessionId: joined.details.session.id,
1015+
message: "Say exactly: hello after joining.",
1016+
});
1017+
1018+
expect(spoken.details).toMatchObject({ spoken: true });
1019+
expect(voiceCallMocks.speakMeetViaVoiceCallGateway).toHaveBeenCalledWith({
1020+
config: expect.objectContaining({ defaultTransport: "twilio" }),
1021+
callId: "call-1",
1022+
message: "Say exactly: hello after joining.",
1023+
});
1024+
});
1025+
9871026
it("reports setup status through the tool", async () => {
9881027
const originalPlatform = process.platform;
9891028
Object.defineProperty(process, "platform", { value: "darwin" });

extensions/google-meet/index.ts

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -119,6 +119,10 @@ const googleMeetConfigSchema = {
119119
advanced: true,
120120
},
121121
"voiceCall.dtmfDelayMs": { label: "DTMF Delay (ms)", advanced: true },
122+
"voiceCall.postDtmfSpeechDelayMs": {
123+
label: "Post-DTMF Speech Delay (ms)",
124+
advanced: true,
125+
},
122126
"voiceCall.introMessage": { label: "Voice Call Intro Message", advanced: true },
123127
"realtime.provider": {
124128
label: "Realtime Provider",

extensions/google-meet/src/config.ts

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@ export type GoogleMeetConfig = {
5252
token?: string;
5353
requestTimeoutMs: number;
5454
dtmfDelayMs: number;
55+
postDtmfSpeechDelayMs: number;
5556
introMessage?: string;
5657
};
5758
realtime: {
@@ -181,6 +182,7 @@ export const DEFAULT_GOOGLE_MEET_CONFIG: GoogleMeetConfig = {
181182
enabled: true,
182183
requestTimeoutMs: 30_000,
183184
dtmfDelayMs: 2_500,
185+
postDtmfSpeechDelayMs: 5_000,
184186
},
185187
realtime: {
186188
provider: "openai",
@@ -432,6 +434,10 @@ export function resolveGoogleMeetConfigWithEnv(
432434
voiceCall.dtmfDelayMs,
433435
DEFAULT_GOOGLE_MEET_CONFIG.voiceCall.dtmfDelayMs,
434436
),
437+
postDtmfSpeechDelayMs: resolveNumber(
438+
voiceCall.postDtmfSpeechDelayMs,
439+
DEFAULT_GOOGLE_MEET_CONFIG.voiceCall.postDtmfSpeechDelayMs,
440+
),
435441
introMessage: normalizeOptionalString(voiceCall.introMessage),
436442
},
437443
realtime: {

extensions/google-meet/src/runtime.ts

Lines changed: 33 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,11 @@ import type {
2121
GoogleMeetJoinResult,
2222
GoogleMeetSession,
2323
} from "./transports/types.js";
24-
import { endMeetVoiceCallGatewayCall, joinMeetViaVoiceCallGateway } from "./voice-call-gateway.js";
24+
import {
25+
endMeetVoiceCallGatewayCall,
26+
joinMeetViaVoiceCallGateway,
27+
speakMeetViaVoiceCallGateway,
28+
} from "./voice-call-gateway.js";
2529

2630
function nowIso(): string {
2731
return new Date().toISOString();
@@ -301,6 +305,7 @@ export class GoogleMeetRuntime {
301305
return { session: reusable, spoken };
302306
}
303307
const createdAt = nowIso();
308+
let delegatedTwilioSpoken = false;
304309

305310
const session: GoogleMeetSession = {
306311
id: `meet_${randomUUID()}`,
@@ -398,14 +403,22 @@ export class GoogleMeetRuntime {
398403
config: this.params.config,
399404
dialInNumber,
400405
dtmfSequence,
406+
message:
407+
mode === "realtime"
408+
? (request.message ??
409+
this.params.config.voiceCall.introMessage ??
410+
this.params.config.realtime.introMessage)
411+
: undefined,
401412
})
402413
: undefined;
414+
delegatedTwilioSpoken = Boolean(voiceCallResult?.introSent);
403415
session.twilio = {
404416
dialInNumber,
405417
pinProvided: Boolean(request.pin ?? this.params.config.twilio.defaultPin),
406418
dtmfSequence,
407419
voiceCallId: voiceCallResult?.callId,
408420
dtmfSent: voiceCallResult?.dtmfSent,
421+
introSent: voiceCallResult?.introSent,
409422
};
410423
if (voiceCallResult?.callId) {
411424
this.#sessionStops.set(session.id, async () => {
@@ -428,9 +441,11 @@ export class GoogleMeetRuntime {
428441

429442
this.#sessions.set(session.id, session);
430443
const spoken =
431-
mode === "realtime" && speechInstructions
432-
? (await this.speak(session.id, speechInstructions)).spoken
433-
: false;
444+
transport === "twilio"
445+
? delegatedTwilioSpoken
446+
: mode === "realtime" && speechInstructions
447+
? (await this.speak(session.id, speechInstructions)).spoken
448+
: false;
434449
return { session, spoken };
435450
}
436451

@@ -459,6 +474,20 @@ export class GoogleMeetRuntime {
459474
if (!session) {
460475
return { found: false, spoken: false };
461476
}
477+
if (session.transport === "twilio" && session.twilio?.voiceCallId) {
478+
await speakMeetViaVoiceCallGateway({
479+
config: this.params.config,
480+
callId: session.twilio.voiceCallId,
481+
message:
482+
instructions ||
483+
this.params.config.voiceCall.introMessage ||
484+
this.params.config.realtime.introMessage ||
485+
"",
486+
});
487+
session.twilio.introSent = true;
488+
session.updatedAt = nowIso();
489+
return { found: true, spoken: true, session };
490+
}
462491
await this.#refreshBrowserHealthForChromeSession(session);
463492
const speak = this.#sessionSpeakers.get(sessionId);
464493
if (!speak || session.state !== "active") {

extensions/google-meet/src/transports/types.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -86,6 +86,7 @@ export type GoogleMeetSession = {
8686
dtmfSequence?: string;
8787
voiceCallId?: string;
8888
dtmfSent?: boolean;
89+
introSent?: boolean;
8990
};
9091
notes: string[];
9192
};

extensions/google-meet/src/voice-call-gateway.test.ts

Lines changed: 28 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -27,25 +27,49 @@ describe("Google Meet voice-call gateway", () => {
2727
gatewayMocks.startGatewayClientWhenEventLoopReady.mockClear();
2828
});
2929

30-
it("starts Twilio Meet calls in conversation mode with the realtime intro by default", async () => {
30+
it("starts Twilio Meet calls silently, sends DTMF, then speaks the realtime intro", async () => {
3131
const config = resolveGoogleMeetConfig({
32-
voiceCall: { gatewayUrl: "ws://127.0.0.1:18789" },
32+
voiceCall: {
33+
gatewayUrl: "ws://127.0.0.1:18789",
34+
dtmfDelayMs: 1,
35+
postDtmfSpeechDelayMs: 1,
36+
},
3337
realtime: { introMessage: "Say exactly: I'm here and listening." },
3438
});
3539

3640
await joinMeetViaVoiceCallGateway({
3741
config,
3842
dialInNumber: "+15551234567",
43+
dtmfSequence: "123456#",
44+
message: "Say exactly: I'm here and listening.",
3945
});
4046

41-
expect(gatewayMocks.request).toHaveBeenCalledWith(
47+
expect(gatewayMocks.request).toHaveBeenNthCalledWith(
48+
1,
4249
"voicecall.start",
4350
{
4451
to: "+15551234567",
45-
message: "Say exactly: I'm here and listening.",
4652
mode: "conversation",
4753
},
4854
{ timeoutMs: 30_000 },
4955
);
56+
expect(gatewayMocks.request).toHaveBeenNthCalledWith(
57+
2,
58+
"voicecall.dtmf",
59+
{
60+
callId: "call-1",
61+
digits: "123456#",
62+
},
63+
{ timeoutMs: 30_000 },
64+
);
65+
expect(gatewayMocks.request).toHaveBeenNthCalledWith(
66+
3,
67+
"voicecall.speak",
68+
{
69+
callId: "call-1",
70+
message: "Say exactly: I'm here and listening.",
71+
},
72+
{ timeoutMs: 30_000 },
73+
);
5074
});
5175
});

0 commit comments

Comments
 (0)