feat(tts): add Azure Speech TTS provider#51321
feat(tts): add Azure Speech TTS provider#51321leonchui wants to merge 6 commits intoopenclaw:mainfrom
Conversation
Greptile SummaryThis PR adds an Azure Speech TTS provider with SSML synthesis support and 400+ neural voice options. The config/schema wiring ( Key issues found:
Confidence Score: 1/5
Prompt To Fix All With AIThis is a comment left during a code review.
Path: src/tts/providers/azure.ts
Line: 30-32
Comment:
**`baseUrl` silently ignored in `listAzureVoices`**
`base` is computed on line 30 (`normalizeAzureBaseUrl(params.baseUrl)`) but is never used anywhere. The voice-list URL is then built unconditionally as `https://${region}.tts.speech.microsoft.com/cognitiveservices/voices/list`, completely discarding the caller-supplied `baseUrl`. Anyone who configures a custom `baseUrl` (e.g., for private endpoints or sovereign clouds) will find it silently ignored for voice listing, while it _is_ respected in `synthesize`. The fix is to derive the hostname from `baseUrl` when it is provided, falling back to the region-based URL:
```suggestion
const region = params.region || "eastus";
const url = params.baseUrl
? `${normalizeAzureBaseUrl(params.baseUrl)}/cognitiveservices/voices/list`
: `https://${region}.tts.speech.microsoft.com/cognitiveservices/voices/list`;
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: src/tts/providers/azure.ts
Line: 100-101
Comment:
**`region` ignored in `synthesize` — endpoint always defaults to `eastus`**
`region` is resolved on line 100 but is never referenced again. The endpoint is built entirely from `baseUrl` (line 113: `${baseUrl}/cognitiveservices/v1`), and `normalizeAzureBaseUrl` hard-codes `"https://eastus.tts.speech.microsoft.com"` as its default. This means that if a user sets `region: "westeurope"` (or any other region) but does not also set a matching `baseUrl`, all synthesis requests silently go to `eastus`, causing authentication errors or incorrect routing.
The region should be used to construct the base URL when no explicit `baseUrl` is provided:
```suggestion
const region = req.config?.azure?.region || process.env.AZURE_SPEECH_REGION || "eastus";
const baseUrl = req.config?.azure?.baseUrl
? normalizeAzureBaseUrl(req.config.azure.baseUrl)
: `https://${region}.tts.speech.microsoft.com`;
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: src/tts/providers/azure.ts
Line: 45-55
Comment:
**Deprecated-voice filter never fires — `Status` is dropped by `.map()`**
The `.map()` on lines 47–53 projects each `AzureVoiceListEntry` into a new object that does not include the `Status` field. The subsequent `.filter()` on line 54 then checks `voice.Status !== "Deprecated"`, but `voice.Status` is always `undefined` on the mapped object, so the condition is always `true` and deprecated voices are never removed. TypeScript will also flag `voice.Status` as an unknown property on the mapped type.
The filter needs to run on the original entries before (or during) the map:
```suggestion
return Array.isArray(voices)
? voices
.filter((voice) => voice.Status !== "Deprecated")
.map((voice) => ({
id: voice.ShortName?.trim() ?? "",
name: voice.DisplayName?.trim() || voice.ShortName?.trim() || undefined,
category: voice.VoiceType?.trim() || undefined,
locale: voice.Locale?.trim() || undefined,
gender: voice.Gender?.trim() || undefined,
}))
.filter((voice) => voice.id.length > 0)
: [];
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: src/tts/providers/azure.ts
Line: 88-111
Comment:
**`config.azure` does not exist on `ResolvedTtsConfig` — all config values are always `undefined`**
`SpeechSynthesisRequest.config` and `SpeechProviderConfiguredContext.config` are both typed as `ResolvedTtsConfig` (see `src/tts/provider-types.ts` and `src/tts/tts.ts`). `ResolvedTtsConfig` in `src/tts/tts.ts` defines named sub-configs for `elevenlabs`, `openai`, and `edge`, but has **no `azure` field**. As a result, every access to `req.config?.azure`, `req.config.azure`, or `config.azure` in `isConfigured`, `listVoices`, and `synthesize` will always resolve to `undefined` at runtime, and TypeScript should report a compile-time error on those accesses.
The practical effect is that API key, region, voice, language, and output-format settings written in the JSON config file are completely ignored; only the `AZURE_SPEECH_API_KEY` / `AZURE_SPEECH_REGION` environment variables are reachable. The `voice` field has no env-var fallback at all, so synthesis will unconditionally throw `"Azure voice not configured"` for any user who relies on config-file settings.
To fix this properly:
1. Add an `azure` sub-object to `ResolvedTtsConfig` in `src/tts/tts.ts` (mirroring the pattern for `elevenlabs` / `openai` / `edge`).
2. Populate it in `resolveTtsConfig()` by reading `raw.azure.*` and resolving secrets with `normalizeResolvedSecretInputString`.
3. Update `SpeechSynthesisRequest` or the shared config type so the provider can access resolved values.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: src/tts/providers/azure.ts
Line: 102-107
Comment:
**`overrides.azure` does not exist on `TtsDirectiveOverrides` — override fields are always `undefined`**
`SpeechSynthesisRequest.overrides` is typed as `TtsDirectiveOverrides` (defined in `src/tts/tts.ts`). That type includes `openai`, `elevenlabs`, and `microsoft` override bags, but has **no `azure` field**. Every access to `req.overrides?.azure?.voice`, `req.overrides?.azure?.lang`, and `req.overrides?.azure?.outputFormat` will always evaluate to `undefined` at runtime, and TypeScript should flag these as errors.
In practice this means that per-call directive overrides for voice, language, and output format can never be applied to the Azure provider. An `azure` bag should be added to `TtsDirectiveOverrides` and wired through `parseTtsDirectives` in the same way the existing providers' overrides are handled.
How can I resolve this? If you propose a fix, please make it concise.Last reviewed commit: "feat(tts): add Azure..." |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 33b95fed9a
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
src/tts/providers/azure.ts
Outdated
| ), | ||
| synthesize: async (req) => { | ||
| const apiKey = | ||
| req.config.azure?.apiKey || process.env.AZURE_SPEECH_API_KEY; |
There was a problem hiding this comment.
Carry Azure settings through resolved TTS config
When users configure Azure through messages.tts.azure as introduced in this commit, this provider reads req.config.azure, but resolveTtsConfig in src/tts/tts.ts:274-343 never populates an azure section on ResolvedTtsConfig. In practice that means config-only Azure setups cannot work: isConfigured stays false, apiKey/voice/region are dropped, and synthesis only succeeds if the same values are also present in environment variables.
Useful? React with 👍 / 👎.
src/tts/providers/azure.ts
Outdated
| const region = req.config?.azure?.region || process.env.AZURE_SPEECH_REGION || "eastus"; | ||
| const baseUrl = normalizeAzureBaseUrl(req.config?.azure?.baseUrl); |
There was a problem hiding this comment.
Build the synth endpoint from the selected Azure region
If a deployment sets AZURE_SPEECH_REGION (or later wires messages.tts.azure.region) to anything other than eastus and leaves baseUrl unset, synthesis still posts to East US. region is computed here, but normalizeAzureBaseUrl(undefined) on the next line hard-codes https://eastus.tts.speech.microsoft.com, so voice listing can hit one region while synthesis hits another and fails with region/resource mismatches.
Useful? React with 👍 / 👎.
| const response = await fetch(endpoint, { | ||
| method: "POST", | ||
| headers: { |
There was a problem hiding this comment.
Honor configured timeouts on Azure TTS requests
This fetch never uses an AbortController, so neither the global messages.tts.timeoutMs nor the new messages.tts.azure.timeoutMs can stop a slow Azure call. In textToSpeech the provider loop awaits each synthesize sequentially (src/tts/tts.ts:701-729), so a hung Azure request can stall the whole reply path indefinitely instead of timing out and falling back to the next provider.
Useful? React with 👍 / 👎.
src/tts/providers/azure.ts
Outdated
| const base = normalizeAzureBaseUrl(params.baseUrl); | ||
| const region = params.region || "eastus"; | ||
| const url = `https://${region}.tts.speech.microsoft.com/cognitiveservices/voices/list`; |
There was a problem hiding this comment.
baseUrl silently ignored in listAzureVoices
base is computed on line 30 (normalizeAzureBaseUrl(params.baseUrl)) but is never used anywhere. The voice-list URL is then built unconditionally as https://${region}.tts.speech.microsoft.com/cognitiveservices/voices/list, completely discarding the caller-supplied baseUrl. Anyone who configures a custom baseUrl (e.g., for private endpoints or sovereign clouds) will find it silently ignored for voice listing, while it is respected in synthesize. The fix is to derive the hostname from baseUrl when it is provided, falling back to the region-based URL:
| const base = normalizeAzureBaseUrl(params.baseUrl); | |
| const region = params.region || "eastus"; | |
| const url = `https://${region}.tts.speech.microsoft.com/cognitiveservices/voices/list`; | |
| const region = params.region || "eastus"; | |
| const url = params.baseUrl | |
| ? `${normalizeAzureBaseUrl(params.baseUrl)}/cognitiveservices/voices/list` | |
| : `https://${region}.tts.speech.microsoft.com/cognitiveservices/voices/list`; |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/tts/providers/azure.ts
Line: 30-32
Comment:
**`baseUrl` silently ignored in `listAzureVoices`**
`base` is computed on line 30 (`normalizeAzureBaseUrl(params.baseUrl)`) but is never used anywhere. The voice-list URL is then built unconditionally as `https://${region}.tts.speech.microsoft.com/cognitiveservices/voices/list`, completely discarding the caller-supplied `baseUrl`. Anyone who configures a custom `baseUrl` (e.g., for private endpoints or sovereign clouds) will find it silently ignored for voice listing, while it _is_ respected in `synthesize`. The fix is to derive the hostname from `baseUrl` when it is provided, falling back to the region-based URL:
```suggestion
const region = params.region || "eastus";
const url = params.baseUrl
? `${normalizeAzureBaseUrl(params.baseUrl)}/cognitiveservices/voices/list`
: `https://${region}.tts.speech.microsoft.com/cognitiveservices/voices/list`;
```
How can I resolve this? If you propose a fix, please make it concise.
src/tts/providers/azure.ts
Outdated
| const region = req.config?.azure?.region || process.env.AZURE_SPEECH_REGION || "eastus"; | ||
| const baseUrl = normalizeAzureBaseUrl(req.config?.azure?.baseUrl); |
There was a problem hiding this comment.
region ignored in synthesize — endpoint always defaults to eastus
region is resolved on line 100 but is never referenced again. The endpoint is built entirely from baseUrl (line 113: ${baseUrl}/cognitiveservices/v1), and normalizeAzureBaseUrl hard-codes "https://eastus.tts.speech.microsoft.com" as its default. This means that if a user sets region: "westeurope" (or any other region) but does not also set a matching baseUrl, all synthesis requests silently go to eastus, causing authentication errors or incorrect routing.
The region should be used to construct the base URL when no explicit baseUrl is provided:
| const region = req.config?.azure?.region || process.env.AZURE_SPEECH_REGION || "eastus"; | |
| const baseUrl = normalizeAzureBaseUrl(req.config?.azure?.baseUrl); | |
| const region = req.config?.azure?.region || process.env.AZURE_SPEECH_REGION || "eastus"; | |
| const baseUrl = req.config?.azure?.baseUrl | |
| ? normalizeAzureBaseUrl(req.config.azure.baseUrl) | |
| : `https://${region}.tts.speech.microsoft.com`; |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/tts/providers/azure.ts
Line: 100-101
Comment:
**`region` ignored in `synthesize` — endpoint always defaults to `eastus`**
`region` is resolved on line 100 but is never referenced again. The endpoint is built entirely from `baseUrl` (line 113: `${baseUrl}/cognitiveservices/v1`), and `normalizeAzureBaseUrl` hard-codes `"https://eastus.tts.speech.microsoft.com"` as its default. This means that if a user sets `region: "westeurope"` (or any other region) but does not also set a matching `baseUrl`, all synthesis requests silently go to `eastus`, causing authentication errors or incorrect routing.
The region should be used to construct the base URL when no explicit `baseUrl` is provided:
```suggestion
const region = req.config?.azure?.region || process.env.AZURE_SPEECH_REGION || "eastus";
const baseUrl = req.config?.azure?.baseUrl
? normalizeAzureBaseUrl(req.config.azure.baseUrl)
: `https://${region}.tts.speech.microsoft.com`;
```
How can I resolve this? If you propose a fix, please make it concise.| return Array.isArray(voices) | ||
| ? voices | ||
| .map((voice) => ({ | ||
| id: voice.ShortName?.trim() ?? "", | ||
| name: voice.DisplayName?.trim() || voice.ShortName?.trim() || undefined, | ||
| category: voice.VoiceType?.trim() || undefined, | ||
| locale: voice.Locale?.trim() || undefined, | ||
| gender: voice.Gender?.trim() || undefined, | ||
| })) | ||
| .filter((voice) => voice.id.length > 0 && voice.Status !== "Deprecated") | ||
| : []; |
There was a problem hiding this comment.
Deprecated-voice filter never fires —
Status is dropped by .map()
The .map() on lines 47–53 projects each AzureVoiceListEntry into a new object that does not include the Status field. The subsequent .filter() on line 54 then checks voice.Status !== "Deprecated", but voice.Status is always undefined on the mapped object, so the condition is always true and deprecated voices are never removed. TypeScript will also flag voice.Status as an unknown property on the mapped type.
The filter needs to run on the original entries before (or during) the map:
| return Array.isArray(voices) | |
| ? voices | |
| .map((voice) => ({ | |
| id: voice.ShortName?.trim() ?? "", | |
| name: voice.DisplayName?.trim() || voice.ShortName?.trim() || undefined, | |
| category: voice.VoiceType?.trim() || undefined, | |
| locale: voice.Locale?.trim() || undefined, | |
| gender: voice.Gender?.trim() || undefined, | |
| })) | |
| .filter((voice) => voice.id.length > 0 && voice.Status !== "Deprecated") | |
| : []; | |
| return Array.isArray(voices) | |
| ? voices | |
| .filter((voice) => voice.Status !== "Deprecated") | |
| .map((voice) => ({ | |
| id: voice.ShortName?.trim() ?? "", | |
| name: voice.DisplayName?.trim() || voice.ShortName?.trim() || undefined, | |
| category: voice.VoiceType?.trim() || undefined, | |
| locale: voice.Locale?.trim() || undefined, | |
| gender: voice.Gender?.trim() || undefined, | |
| })) | |
| .filter((voice) => voice.id.length > 0) | |
| : []; |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/tts/providers/azure.ts
Line: 45-55
Comment:
**Deprecated-voice filter never fires — `Status` is dropped by `.map()`**
The `.map()` on lines 47–53 projects each `AzureVoiceListEntry` into a new object that does not include the `Status` field. The subsequent `.filter()` on line 54 then checks `voice.Status !== "Deprecated"`, but `voice.Status` is always `undefined` on the mapped object, so the condition is always `true` and deprecated voices are never removed. TypeScript will also flag `voice.Status` as an unknown property on the mapped type.
The filter needs to run on the original entries before (or during) the map:
```suggestion
return Array.isArray(voices)
? voices
.filter((voice) => voice.Status !== "Deprecated")
.map((voice) => ({
id: voice.ShortName?.trim() ?? "",
name: voice.DisplayName?.trim() || voice.ShortName?.trim() || undefined,
category: voice.VoiceType?.trim() || undefined,
locale: voice.Locale?.trim() || undefined,
gender: voice.Gender?.trim() || undefined,
}))
.filter((voice) => voice.id.length > 0)
: [];
```
How can I resolve this? If you propose a fix, please make it concise.| isConfigured: ({ config }) => | ||
| Boolean( | ||
| config.azure?.apiKey || | ||
| process.env.AZURE_SPEECH_API_KEY, | ||
| ), | ||
| synthesize: async (req) => { | ||
| const apiKey = | ||
| req.config.azure?.apiKey || process.env.AZURE_SPEECH_API_KEY; | ||
| if (!apiKey) { | ||
| throw new Error("Azure Speech API key missing"); | ||
| } | ||
|
|
||
| const region = req.config?.azure?.region || process.env.AZURE_SPEECH_REGION || "eastus"; | ||
| const baseUrl = normalizeAzureBaseUrl(req.config?.azure?.baseUrl); | ||
| const voice = req.overrides?.azure?.voice ?? req.config?.azure?.voice; | ||
| const lang = req.overrides?.azure?.lang ?? req.config?.azure?.lang; | ||
| const outputFormat = | ||
| req.overrides?.azure?.outputFormat ?? | ||
| req.config?.azure?.outputFormat ?? | ||
| DEFAULT_AZURE_OUTPUT_FORMAT; | ||
|
|
||
| if (!voice) { | ||
| throw new Error("Azure voice not configured"); | ||
| } |
There was a problem hiding this comment.
config.azure does not exist on ResolvedTtsConfig — all config values are always undefined
SpeechSynthesisRequest.config and SpeechProviderConfiguredContext.config are both typed as ResolvedTtsConfig (see src/tts/provider-types.ts and src/tts/tts.ts). ResolvedTtsConfig in src/tts/tts.ts defines named sub-configs for elevenlabs, openai, and edge, but has no azure field. As a result, every access to req.config?.azure, req.config.azure, or config.azure in isConfigured, listVoices, and synthesize will always resolve to undefined at runtime, and TypeScript should report a compile-time error on those accesses.
The practical effect is that API key, region, voice, language, and output-format settings written in the JSON config file are completely ignored; only the AZURE_SPEECH_API_KEY / AZURE_SPEECH_REGION environment variables are reachable. The voice field has no env-var fallback at all, so synthesis will unconditionally throw "Azure voice not configured" for any user who relies on config-file settings.
To fix this properly:
- Add an
azuresub-object toResolvedTtsConfiginsrc/tts/tts.ts(mirroring the pattern forelevenlabs/openai/edge). - Populate it in
resolveTtsConfig()by readingraw.azure.*and resolving secrets withnormalizeResolvedSecretInputString. - Update
SpeechSynthesisRequestor the shared config type so the provider can access resolved values.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/tts/providers/azure.ts
Line: 88-111
Comment:
**`config.azure` does not exist on `ResolvedTtsConfig` — all config values are always `undefined`**
`SpeechSynthesisRequest.config` and `SpeechProviderConfiguredContext.config` are both typed as `ResolvedTtsConfig` (see `src/tts/provider-types.ts` and `src/tts/tts.ts`). `ResolvedTtsConfig` in `src/tts/tts.ts` defines named sub-configs for `elevenlabs`, `openai`, and `edge`, but has **no `azure` field**. As a result, every access to `req.config?.azure`, `req.config.azure`, or `config.azure` in `isConfigured`, `listVoices`, and `synthesize` will always resolve to `undefined` at runtime, and TypeScript should report a compile-time error on those accesses.
The practical effect is that API key, region, voice, language, and output-format settings written in the JSON config file are completely ignored; only the `AZURE_SPEECH_API_KEY` / `AZURE_SPEECH_REGION` environment variables are reachable. The `voice` field has no env-var fallback at all, so synthesis will unconditionally throw `"Azure voice not configured"` for any user who relies on config-file settings.
To fix this properly:
1. Add an `azure` sub-object to `ResolvedTtsConfig` in `src/tts/tts.ts` (mirroring the pattern for `elevenlabs` / `openai` / `edge`).
2. Populate it in `resolveTtsConfig()` by reading `raw.azure.*` and resolving secrets with `normalizeResolvedSecretInputString`.
3. Update `SpeechSynthesisRequest` or the shared config type so the provider can access resolved values.
How can I resolve this? If you propose a fix, please make it concise.
src/tts/providers/azure.ts
Outdated
| const voice = req.overrides?.azure?.voice ?? req.config?.azure?.voice; | ||
| const lang = req.overrides?.azure?.lang ?? req.config?.azure?.lang; | ||
| const outputFormat = | ||
| req.overrides?.azure?.outputFormat ?? | ||
| req.config?.azure?.outputFormat ?? | ||
| DEFAULT_AZURE_OUTPUT_FORMAT; |
There was a problem hiding this comment.
overrides.azure does not exist on TtsDirectiveOverrides — override fields are always undefined
SpeechSynthesisRequest.overrides is typed as TtsDirectiveOverrides (defined in src/tts/tts.ts). That type includes openai, elevenlabs, and microsoft override bags, but has no azure field. Every access to req.overrides?.azure?.voice, req.overrides?.azure?.lang, and req.overrides?.azure?.outputFormat will always evaluate to undefined at runtime, and TypeScript should flag these as errors.
In practice this means that per-call directive overrides for voice, language, and output format can never be applied to the Azure provider. An azure bag should be added to TtsDirectiveOverrides and wired through parseTtsDirectives in the same way the existing providers' overrides are handled.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/tts/providers/azure.ts
Line: 102-107
Comment:
**`overrides.azure` does not exist on `TtsDirectiveOverrides` — override fields are always `undefined`**
`SpeechSynthesisRequest.overrides` is typed as `TtsDirectiveOverrides` (defined in `src/tts/tts.ts`). That type includes `openai`, `elevenlabs`, and `microsoft` override bags, but has **no `azure` field**. Every access to `req.overrides?.azure?.voice`, `req.overrides?.azure?.lang`, and `req.overrides?.azure?.outputFormat` will always evaluate to `undefined` at runtime, and TypeScript should flag these as errors.
In practice this means that per-call directive overrides for voice, language, and output format can never be applied to the Azure provider. An `azure` bag should be added to `TtsDirectiveOverrides` and wired through `parseTtsDirectives` in the same way the existing providers' overrides are handled.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8e07d5c326
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
src/tts/providers/azure.ts
Outdated
| const base = normalizeAzureBaseUrl(params.baseUrl); | ||
| const region = params.region || "eastus"; | ||
| const url = `https://${region}.tts.speech.microsoft.com/cognitiveservices/voices/list`; |
There was a problem hiding this comment.
Honor azure.baseUrl when listing voices
If a deployment sets messages.tts.azure.baseUrl (or passes listSpeechVoices(..., baseUrl)) for a custom/private Azure Speech endpoint, this still calls https://${region}.tts.speech.microsoft.com/... because the computed base value is never used. That makes Azure voice discovery fail anywhere the new config must target a non-default host, even though this commit adds baseUrl support and src/tts/providers/azure.test.ts:110-123 expects it to work.
Useful? React with 👍 / 👎.
src/tts/providers/azure.ts
Outdated
| const voice = req.overrides?.azure?.voice ?? req.config?.azure?.voice; | ||
| const lang = req.overrides?.azure?.lang ?? req.config?.azure?.lang; | ||
| const outputFormat = | ||
| req.overrides?.azure?.outputFormat ?? | ||
| req.config?.azure?.outputFormat ?? |
There was a problem hiding this comment.
Wire Azure directive overrides into the parser
These Azure override reads are currently unreachable. parseTtsDirectives only populates openai, elevenlabs, and microsoft fields (src/tts/tts-core.ts:154-319), and TtsDirectiveOverrides has no azure section (src/tts/tts.ts:159-180). As a result, prompts like [[tts:provider=azure voice=...]] cannot change the Azure voice/lang/output format, so the new provider silently ignores the model-override path it appears to support here.
Useful? React with 👍 / 👎.
src/tts/providers/azure.ts
Outdated
| outputFormat, | ||
| fileExtension: outputFormat.includes("mp3") ? ".mp3" : ".wav", |
There was a problem hiding this comment.
Infer Azure file extensions from the selected output format
Any non-MP3 Azure output format is surfaced as .wav here. textToSpeech writes the returned buffer using this extension (src/tts/tts.ts:662-663), so Azure Opus/WebM/PCM responses get saved with the wrong filename and downstream MIME/voice-note detection becomes incorrect. The bug shows up as soon as someone uses the newly added messages.tts.azure.outputFormat setting with anything other than an MP3 format.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ea9ffd2659
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| apiKey: normalizeResolvedSecretInputString({ | ||
| value: raw.azure?.apiKey, | ||
| path: "messages.tts.azure.apiKey", |
There was a problem hiding this comment.
Auto-select Azure when it is the only configured API provider
Adding Azure config here does not make the new provider reachable from the existing default-provider path. When messages.tts.provider is unset and prefs are empty, getTtsProvider() still only auto-detects OpenAI and ElevenLabs (src/tts/tts.ts:490-506) because resolveTtsApiKey() returns nothing for Azure (src/tts/tts.ts:563-574). In the common setup where a user adds messages.tts.azure.* but does not also set messages.tts.provider, OpenClaw will silently keep using Microsoft instead of the newly configured Azure provider.
Useful? React with 👍 / 👎.
| isConfigured: ({ config }) => | ||
| Boolean( | ||
| (config as any)?.azure?.apiKey || | ||
| process.env.AZURE_SPEECH_API_KEY, | ||
| ), |
There was a problem hiding this comment.
Treat Azure as configured only after a voice is set
This readiness check is too loose for Azure, because unlike the other providers there is no default voice to fall back to. resolveReadySpeechProvider() trusts isConfigured() before synthesizing (src/tts/tts.ts:620-640), but synthesize() then hard-fails with Azure voice not configured at src/tts/providers/azure.ts:117-119. As a result, a deployment with only an API key configured will show Azure as ready and may select it as the primary/fallback provider even though every TTS request will fail at runtime until messages.tts.azure.voice is added.
Useful? React with 👍 / 👎.
src/tts/providers/azure.ts
Outdated
| "X-Microsoft-OutputFormat": outputFormat, | ||
| }, | ||
| body: ssml, | ||
| signal: AbortSignal.timeout((req.config as any)?.azure?.timeoutMs ?? 30000), |
There was a problem hiding this comment.
Fall back to the global TTS timeout for Azure requests
Fresh evidence relative to the earlier timeout report: the new AbortSignal.timeout(...) now honors messages.tts.azure.timeoutMs, but it still bypasses the existing global messages.tts.timeoutMs. Every other hosted provider inherits req.config.timeoutMs, while this path hard-codes azure.timeoutMs ?? 30000, so deployments that only set the shared timeout will wait the full 30 seconds here. Because synthesizeSpeech() tries providers sequentially (src/tts/tts.ts:728-758), one slow Azure request can delay fallback much longer than the rest of the TTS stack.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 67a5b9b336
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const outputFormat = | ||
| azureOverride?.outputFormat ?? | ||
| (req.config as any)?.azure?.outputFormat ?? | ||
| DEFAULT_AZURE_OUTPUT_FORMAT; |
There was a problem hiding this comment.
Honor voice-note targets when choosing Azure output
Checked src/tts/tts.ts:729-751: OpenClaw passes target="voice-note" for Telegram/Feishu/WhatsApp, and the other hosted providers switch to Opus on that target. Azure ignores req.target here and always uses the configured/default MP3 format, so Feishu auto-TTS will be uploaded as a generic file instead of audio because extensions/feishu/src/media.ts:490-497 only treats .opus/.ogg as msgType: "audio". This means the new provider cannot deliver the expected voice-bubble/audio behavior on at least Feishu even when TTS is otherwise configured correctly.
Useful? React with 👍 / 👎.
| id: "azure", | ||
| label: "Azure Speech", | ||
| aliases: ["azure-tts"], |
There was a problem hiding this comment.
Canonicalize the new
azure-tts alias
The new alias is only registered in getSpeechProvider(). normalizeSpeechProviderId() still canonicalizes only edge, so a config or directive that sets provider: azure-tts leaves getTtsProvider() returning the alias string. From there resolveTtsProviderOrder() will try both azure-tts and azure, so one failing Azure request is retried a second time before any real fallback, and helpers like resolveTtsApiKey() do not recognize the alias at all. Since this commit advertises azure-tts as a provider alias, it should normalize to the canonical azure id like edge does for Microsoft.
Useful? React with 👍 / 👎.
- Add Azure TTS provider with SSML synthesis - Support for 400+ neural voices including Cantonese (zh-HK) - Config options: apiKey, region, voice, lang, outputFormat - Environment variables: AZURE_SPEECH_API_KEY, AZURE_SPEECH_REGION - Provider ID: 'azure' with alias 'azure-tts' - Built-in voices: zh-HK-HiuMaanNeural, zh-HK-HiuGaaiNeural
- Test voice list mapping from Azure API response - Test filtering of deprecated voices - Test error handling for API failures - Test custom baseUrl support
Fixed critical bugs identified by bot reviews: 1. baseUrl now used in listAzureVoices (was computed but unused) 2. region now used in synthesize endpoint construction 3. Deprecated-voice filter runs BEFORE map (Status field available) 4. Added azure to ResolvedTtsConfig type 5. Added azure to TtsDirectiveOverrides for directive support 6. Added DEFAULT_AZURE_OUTPUT_FORMAT constant 7. Added AbortController timeout for synthesize requests 8. Used type assertion for config.azure access (req.config as any) All changes follow the suggested fixes from greptile-apps and chatgpt-codex-connector reviews.
Fixed all P2 issues identified by code review: 1. Added azure_voice directive support in parseTtsDirectives - Added 'azure_voice' and 'azurevoice' directive cases - Azure voice validation: accepts non-empty ShortName format 2. Fixed fileExtension mapping for non-MP3 Azure formats - Added getFileExtension() helper function - Maps: mp3->.mp3, wav->.wav, ogg->.ogg, webm->.webm - Default to .mp3 for unknown formats 3. Added azure to TTS_PROVIDERS array - Now included in auto-selection priority list 4. Fixed Azure readiness check (isConfigured) - Now properly checks for both apiKey AND voice/lang - Uses ResolvedTtsConfig properly 5. Fixed timeout handling - Added DEFAULT_TIMEOUT_MS constant (30000) - Uses timeout from config, directive, or default - Properly passed to AbortSignal.timeout() 6. Added azure to resolveTtsApiKey function - Returns config.azure.apiKey || AZURE_SPEECH_API_KEY 7. Added azure to TtsDirectiveOverrides type - Supports voice, lang, outputFormat overrides
67a5b9b to
5d199eb
Compare
The azure type block was placed outside the interface, causing 'export modifier cannot appear on a type member' error. Now azure is properly inside TtsDirectiveOverrides interface.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 455981919b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| function getFileExtension(outputFormat: string): string { | ||
| if (outputFormat.includes("mp3")) return ".mp3"; | ||
| if (outputFormat.includes("wav")) return ".wav"; | ||
| if (outputFormat.includes("ogg")) return ".ogg"; | ||
| if (outputFormat.includes("webm")) return ".webm"; | ||
| return ".mp3"; // default to mp3 |
There was a problem hiding this comment.
Handle Azure Opus/PCM formats before defaulting to .mp3
This helper still mislabels several supported Azure output formats. Azure's REST API documents audio-...-opus, raw-*, and riff-* values for X-Microsoft-OutputFormat (for example: https://learn.microsoft.com/en-us/azure/ai-services/speech-service/rest-text-to-speech), but anything that doesn't contain mp3, wav, ogg, or webm falls through to .mp3 here. textToSpeech() then writes the returned bytes under that extension (src/tts/tts.ts:698), so choosing one of those formats causes downstream media/channel code to classify the file incorrectly.
Useful? React with 👍 / 👎.
| if (!voice) { | ||
| throw new Error( | ||
| "Azure voice not configured. Set voice in config or use [[tts:voice=zh-HK-HiuMaanNeural]] directive", |
There was a problem hiding this comment.
Point missing-voice errors at a directive Azure actually parses
When messages.tts.azure.voice is unset, this error tells users to retry with [[tts:voice=...]], but parseTtsDirectives() still routes the generic voice= key into overrides.openai (src/tts/tts-core.ts:167-177). Azure only reads req.overrides.azure.voice, so following the suggested fix leaves the provider in the same failing state until the voice is set in config or the caller discovers the separate azure_voice directive.
Useful? React with 👍 / 👎.
Summary
Add Azure Speech TTS provider to OpenClaw with SSML synthesis support.
Problem
What Changed
src/tts/providers/azure.ts- Azure TTS provider implementationsrc/tts/provider-registry.tsto register Azure providersrc/config/types.tts.tswith Azure config typessrc/config/zod-schema.core.tswith Azure validation schemaFeatures
Config Example
{ "messages": { "tts": { "provider": "azure", "azure": { "apiKey": "your-key", "region": "eastus", "voice": "zh-HK-HiuMaanNeural", "lang": "zh-HK" } } } }Human Verification