Skip to content

feat(tts): add Azure Speech TTS provider#51321

Open
leonchui wants to merge 6 commits intoopenclaw:mainfrom
leonchui:feature/azure-tts
Open

feat(tts): add Azure Speech TTS provider#51321
leonchui wants to merge 6 commits intoopenclaw:mainfrom
leonchui:feature/azure-tts

Conversation

@leonchui
Copy link
Copy Markdown

Summary

Add Azure Speech TTS provider to OpenClaw with SSML synthesis support.

Problem

  • OpenClaw currently supports Edge TTS, ElevenLabs, and OpenAI TTS
  • Azure Speech has 400+ neural voices including Cantonese (zh-HK) which users have requested
  • Many users already have Azure accounts and API keys

What Changed

  • Added src/tts/providers/azure.ts - Azure TTS provider implementation
  • Updated src/tts/provider-registry.ts to register Azure provider
  • Updated src/config/types.tts.ts with Azure config types
  • Updated src/config/zod-schema.core.ts with Azure validation schema

Features

  • SSML-based synthesis for natural speech
  • 400+ neural voices including Cantonese (zh-HK-HiuMaanNeural)
  • Config options: apiKey, region, voice, lang, outputFormat
  • Environment variables: AZURE_SPEECH_API_KEY, AZURE_SPEECH_REGION
  • Provider ID: 'azure' with alias 'azure-tts'

Config Example

{
  "messages": {
    "tts": {
      "provider": "azure",
      "azure": {
        "apiKey": "your-key",
        "region": "eastus",
        "voice": "zh-HK-HiuMaanNeural",
        "lang": "zh-HK"
      }
    }
  }
}

Human Verification

  • Tested Azure TTS synthesis with Cantonese voice (zh-HK-HiuMaanNeural)
  • Verified API authentication works with East US region
  • Voice output plays correctly in Telegram

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 20, 2026

Greptile Summary

This PR adds an Azure Speech TTS provider with SSML synthesis support and 400+ neural voice options. The config/schema wiring (types.tts.ts and zod-schema.core.ts) and the registry registration (provider-registry.ts) are correct, but the core provider implementation (src/tts/providers/azure.ts) contains several critical logic bugs that collectively prevent the provider from working with config-file settings and cause incorrect API routing.

Key issues found:

  • baseUrl silently ignored in listAzureVoices — the base variable is computed from normalizeAzureBaseUrl but never used; voice listing always hits https://{region}.tts.speech.microsoft.com regardless of a custom baseUrl.
  • region silently ignored in synthesize — the region variable is computed but never used in building the endpoint; synthesis always routes to the hard-coded eastus default when no explicit baseUrl is set.
  • Deprecated-voice filter is broken.filter((v) => v.Status !== "Deprecated") runs after .map(), which drops the Status field; deprecated voices are never excluded.
  • config.azure is not part of ResolvedTtsConfig — the synthesize/isConfigured callbacks receive a ResolvedTtsConfig that has no azure sub-object, so all config-file values (API key, region, voice, etc.) are always undefined; only env vars are reachable. This means voice has no fallback and synthesis unconditionally throws for users relying on the config file.
  • overrides.azure is not part of TtsDirectiveOverrides — per-call directive overrides for voice, language, and output format can never be applied to the Azure provider.

Confidence Score: 1/5

  • Not safe to merge — the provider will silently ignore config-file settings and always route synthesis to the wrong region for most users.
  • Multiple critical logic bugs in the core provider file mean the feature does not work as described. Config-file API keys and voice settings are unreachable at runtime (missing azure field on ResolvedTtsConfig), region routing is broken (computed but unused variable), the deprecated-voice filter never fires, and directive overrides are also unreachable. These are not edge cases — they affect every user who follows the documented config example.
  • src/tts/providers/azure.ts requires significant fixes. src/tts/tts.ts also needs to be updated to add azure to ResolvedTtsConfig, resolveTtsConfig(), and TtsDirectiveOverrides.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/tts/providers/azure.ts
Line: 30-32

Comment:
**`baseUrl` silently ignored in `listAzureVoices`**

`base` is computed on line 30 (`normalizeAzureBaseUrl(params.baseUrl)`) but is never used anywhere. The voice-list URL is then built unconditionally as `https://${region}.tts.speech.microsoft.com/cognitiveservices/voices/list`, completely discarding the caller-supplied `baseUrl`. Anyone who configures a custom `baseUrl` (e.g., for private endpoints or sovereign clouds) will find it silently ignored for voice listing, while it _is_ respected in `synthesize`. The fix is to derive the hostname from `baseUrl` when it is provided, falling back to the region-based URL:

```suggestion
  const region = params.region || "eastus";
  const url = params.baseUrl
    ? `${normalizeAzureBaseUrl(params.baseUrl)}/cognitiveservices/voices/list`
    : `https://${region}.tts.speech.microsoft.com/cognitiveservices/voices/list`;
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/tts/providers/azure.ts
Line: 100-101

Comment:
**`region` ignored in `synthesize` — endpoint always defaults to `eastus`**

`region` is resolved on line 100 but is never referenced again. The endpoint is built entirely from `baseUrl` (line 113: `${baseUrl}/cognitiveservices/v1`), and `normalizeAzureBaseUrl` hard-codes `"https://eastus.tts.speech.microsoft.com"` as its default. This means that if a user sets `region: "westeurope"` (or any other region) but does not also set a matching `baseUrl`, all synthesis requests silently go to `eastus`, causing authentication errors or incorrect routing.

The region should be used to construct the base URL when no explicit `baseUrl` is provided:

```suggestion
      const region = req.config?.azure?.region || process.env.AZURE_SPEECH_REGION || "eastus";
      const baseUrl = req.config?.azure?.baseUrl
        ? normalizeAzureBaseUrl(req.config.azure.baseUrl)
        : `https://${region}.tts.speech.microsoft.com`;
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/tts/providers/azure.ts
Line: 45-55

Comment:
**Deprecated-voice filter never fires — `Status` is dropped by `.map()`**

The `.map()` on lines 47–53 projects each `AzureVoiceListEntry` into a new object that does not include the `Status` field. The subsequent `.filter()` on line 54 then checks `voice.Status !== "Deprecated"`, but `voice.Status` is always `undefined` on the mapped object, so the condition is always `true` and deprecated voices are never removed. TypeScript will also flag `voice.Status` as an unknown property on the mapped type.

The filter needs to run on the original entries before (or during) the map:

```suggestion
  return Array.isArray(voices)
    ? voices
        .filter((voice) => voice.Status !== "Deprecated")
        .map((voice) => ({
          id: voice.ShortName?.trim() ?? "",
          name: voice.DisplayName?.trim() || voice.ShortName?.trim() || undefined,
          category: voice.VoiceType?.trim() || undefined,
          locale: voice.Locale?.trim() || undefined,
          gender: voice.Gender?.trim() || undefined,
        }))
        .filter((voice) => voice.id.length > 0)
    : [];
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/tts/providers/azure.ts
Line: 88-111

Comment:
**`config.azure` does not exist on `ResolvedTtsConfig` — all config values are always `undefined`**

`SpeechSynthesisRequest.config` and `SpeechProviderConfiguredContext.config` are both typed as `ResolvedTtsConfig` (see `src/tts/provider-types.ts` and `src/tts/tts.ts`). `ResolvedTtsConfig` in `src/tts/tts.ts` defines named sub-configs for `elevenlabs`, `openai`, and `edge`, but has **no `azure` field**. As a result, every access to `req.config?.azure`, `req.config.azure`, or `config.azure` in `isConfigured`, `listVoices`, and `synthesize` will always resolve to `undefined` at runtime, and TypeScript should report a compile-time error on those accesses.

The practical effect is that API key, region, voice, language, and output-format settings written in the JSON config file are completely ignored; only the `AZURE_SPEECH_API_KEY` / `AZURE_SPEECH_REGION` environment variables are reachable. The `voice` field has no env-var fallback at all, so synthesis will unconditionally throw `"Azure voice not configured"` for any user who relies on config-file settings.

To fix this properly:
1. Add an `azure` sub-object to `ResolvedTtsConfig` in `src/tts/tts.ts` (mirroring the pattern for `elevenlabs` / `openai` / `edge`).
2. Populate it in `resolveTtsConfig()` by reading `raw.azure.*` and resolving secrets with `normalizeResolvedSecretInputString`.
3. Update `SpeechSynthesisRequest` or the shared config type so the provider can access resolved values.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/tts/providers/azure.ts
Line: 102-107

Comment:
**`overrides.azure` does not exist on `TtsDirectiveOverrides` — override fields are always `undefined`**

`SpeechSynthesisRequest.overrides` is typed as `TtsDirectiveOverrides` (defined in `src/tts/tts.ts`). That type includes `openai`, `elevenlabs`, and `microsoft` override bags, but has **no `azure` field**. Every access to `req.overrides?.azure?.voice`, `req.overrides?.azure?.lang`, and `req.overrides?.azure?.outputFormat` will always evaluate to `undefined` at runtime, and TypeScript should flag these as errors.

In practice this means that per-call directive overrides for voice, language, and output format can never be applied to the Azure provider. An `azure` bag should be added to `TtsDirectiveOverrides` and wired through `parseTtsDirectives` in the same way the existing providers' overrides are handled.

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: "feat(tts): add Azure..."

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 33b95fed9a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

),
synthesize: async (req) => {
const apiKey =
req.config.azure?.apiKey || process.env.AZURE_SPEECH_API_KEY;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Carry Azure settings through resolved TTS config

When users configure Azure through messages.tts.azure as introduced in this commit, this provider reads req.config.azure, but resolveTtsConfig in src/tts/tts.ts:274-343 never populates an azure section on ResolvedTtsConfig. In practice that means config-only Azure setups cannot work: isConfigured stays false, apiKey/voice/region are dropped, and synthesis only succeeds if the same values are also present in environment variables.

Useful? React with 👍 / 👎.

Comment on lines +100 to +101
const region = req.config?.azure?.region || process.env.AZURE_SPEECH_REGION || "eastus";
const baseUrl = normalizeAzureBaseUrl(req.config?.azure?.baseUrl);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Build the synth endpoint from the selected Azure region

If a deployment sets AZURE_SPEECH_REGION (or later wires messages.tts.azure.region) to anything other than eastus and leaves baseUrl unset, synthesis still posts to East US. region is computed here, but normalizeAzureBaseUrl(undefined) on the next line hard-codes https://eastus.tts.speech.microsoft.com, so voice listing can hit one region while synthesis hits another and fails with region/resource mismatches.

Useful? React with 👍 / 👎.

Comment on lines +116 to +118
const response = await fetch(endpoint, {
method: "POST",
headers: {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Honor configured timeouts on Azure TTS requests

This fetch never uses an AbortController, so neither the global messages.tts.timeoutMs nor the new messages.tts.azure.timeoutMs can stop a slow Azure call. In textToSpeech the provider loop awaits each synthesize sequentially (src/tts/tts.ts:701-729), so a hung Azure request can stall the whole reply path indefinitely instead of timing out and falling back to the next provider.

Useful? React with 👍 / 👎.

Comment on lines +30 to +32
const base = normalizeAzureBaseUrl(params.baseUrl);
const region = params.region || "eastus";
const url = `https://${region}.tts.speech.microsoft.com/cognitiveservices/voices/list`;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 baseUrl silently ignored in listAzureVoices

base is computed on line 30 (normalizeAzureBaseUrl(params.baseUrl)) but is never used anywhere. The voice-list URL is then built unconditionally as https://${region}.tts.speech.microsoft.com/cognitiveservices/voices/list, completely discarding the caller-supplied baseUrl. Anyone who configures a custom baseUrl (e.g., for private endpoints or sovereign clouds) will find it silently ignored for voice listing, while it is respected in synthesize. The fix is to derive the hostname from baseUrl when it is provided, falling back to the region-based URL:

Suggested change
const base = normalizeAzureBaseUrl(params.baseUrl);
const region = params.region || "eastus";
const url = `https://${region}.tts.speech.microsoft.com/cognitiveservices/voices/list`;
const region = params.region || "eastus";
const url = params.baseUrl
? `${normalizeAzureBaseUrl(params.baseUrl)}/cognitiveservices/voices/list`
: `https://${region}.tts.speech.microsoft.com/cognitiveservices/voices/list`;
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/tts/providers/azure.ts
Line: 30-32

Comment:
**`baseUrl` silently ignored in `listAzureVoices`**

`base` is computed on line 30 (`normalizeAzureBaseUrl(params.baseUrl)`) but is never used anywhere. The voice-list URL is then built unconditionally as `https://${region}.tts.speech.microsoft.com/cognitiveservices/voices/list`, completely discarding the caller-supplied `baseUrl`. Anyone who configures a custom `baseUrl` (e.g., for private endpoints or sovereign clouds) will find it silently ignored for voice listing, while it _is_ respected in `synthesize`. The fix is to derive the hostname from `baseUrl` when it is provided, falling back to the region-based URL:

```suggestion
  const region = params.region || "eastus";
  const url = params.baseUrl
    ? `${normalizeAzureBaseUrl(params.baseUrl)}/cognitiveservices/voices/list`
    : `https://${region}.tts.speech.microsoft.com/cognitiveservices/voices/list`;
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +100 to +101
const region = req.config?.azure?.region || process.env.AZURE_SPEECH_REGION || "eastus";
const baseUrl = normalizeAzureBaseUrl(req.config?.azure?.baseUrl);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 region ignored in synthesize — endpoint always defaults to eastus

region is resolved on line 100 but is never referenced again. The endpoint is built entirely from baseUrl (line 113: ${baseUrl}/cognitiveservices/v1), and normalizeAzureBaseUrl hard-codes "https://eastus.tts.speech.microsoft.com" as its default. This means that if a user sets region: "westeurope" (or any other region) but does not also set a matching baseUrl, all synthesis requests silently go to eastus, causing authentication errors or incorrect routing.

The region should be used to construct the base URL when no explicit baseUrl is provided:

Suggested change
const region = req.config?.azure?.region || process.env.AZURE_SPEECH_REGION || "eastus";
const baseUrl = normalizeAzureBaseUrl(req.config?.azure?.baseUrl);
const region = req.config?.azure?.region || process.env.AZURE_SPEECH_REGION || "eastus";
const baseUrl = req.config?.azure?.baseUrl
? normalizeAzureBaseUrl(req.config.azure.baseUrl)
: `https://${region}.tts.speech.microsoft.com`;
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/tts/providers/azure.ts
Line: 100-101

Comment:
**`region` ignored in `synthesize` — endpoint always defaults to `eastus`**

`region` is resolved on line 100 but is never referenced again. The endpoint is built entirely from `baseUrl` (line 113: `${baseUrl}/cognitiveservices/v1`), and `normalizeAzureBaseUrl` hard-codes `"https://eastus.tts.speech.microsoft.com"` as its default. This means that if a user sets `region: "westeurope"` (or any other region) but does not also set a matching `baseUrl`, all synthesis requests silently go to `eastus`, causing authentication errors or incorrect routing.

The region should be used to construct the base URL when no explicit `baseUrl` is provided:

```suggestion
      const region = req.config?.azure?.region || process.env.AZURE_SPEECH_REGION || "eastus";
      const baseUrl = req.config?.azure?.baseUrl
        ? normalizeAzureBaseUrl(req.config.azure.baseUrl)
        : `https://${region}.tts.speech.microsoft.com`;
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +45 to +55
return Array.isArray(voices)
? voices
.map((voice) => ({
id: voice.ShortName?.trim() ?? "",
name: voice.DisplayName?.trim() || voice.ShortName?.trim() || undefined,
category: voice.VoiceType?.trim() || undefined,
locale: voice.Locale?.trim() || undefined,
gender: voice.Gender?.trim() || undefined,
}))
.filter((voice) => voice.id.length > 0 && voice.Status !== "Deprecated")
: [];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Deprecated-voice filter never fires — Status is dropped by .map()

The .map() on lines 47–53 projects each AzureVoiceListEntry into a new object that does not include the Status field. The subsequent .filter() on line 54 then checks voice.Status !== "Deprecated", but voice.Status is always undefined on the mapped object, so the condition is always true and deprecated voices are never removed. TypeScript will also flag voice.Status as an unknown property on the mapped type.

The filter needs to run on the original entries before (or during) the map:

Suggested change
return Array.isArray(voices)
? voices
.map((voice) => ({
id: voice.ShortName?.trim() ?? "",
name: voice.DisplayName?.trim() || voice.ShortName?.trim() || undefined,
category: voice.VoiceType?.trim() || undefined,
locale: voice.Locale?.trim() || undefined,
gender: voice.Gender?.trim() || undefined,
}))
.filter((voice) => voice.id.length > 0 && voice.Status !== "Deprecated")
: [];
return Array.isArray(voices)
? voices
.filter((voice) => voice.Status !== "Deprecated")
.map((voice) => ({
id: voice.ShortName?.trim() ?? "",
name: voice.DisplayName?.trim() || voice.ShortName?.trim() || undefined,
category: voice.VoiceType?.trim() || undefined,
locale: voice.Locale?.trim() || undefined,
gender: voice.Gender?.trim() || undefined,
}))
.filter((voice) => voice.id.length > 0)
: [];
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/tts/providers/azure.ts
Line: 45-55

Comment:
**Deprecated-voice filter never fires — `Status` is dropped by `.map()`**

The `.map()` on lines 47–53 projects each `AzureVoiceListEntry` into a new object that does not include the `Status` field. The subsequent `.filter()` on line 54 then checks `voice.Status !== "Deprecated"`, but `voice.Status` is always `undefined` on the mapped object, so the condition is always `true` and deprecated voices are never removed. TypeScript will also flag `voice.Status` as an unknown property on the mapped type.

The filter needs to run on the original entries before (or during) the map:

```suggestion
  return Array.isArray(voices)
    ? voices
        .filter((voice) => voice.Status !== "Deprecated")
        .map((voice) => ({
          id: voice.ShortName?.trim() ?? "",
          name: voice.DisplayName?.trim() || voice.ShortName?.trim() || undefined,
          category: voice.VoiceType?.trim() || undefined,
          locale: voice.Locale?.trim() || undefined,
          gender: voice.Gender?.trim() || undefined,
        }))
        .filter((voice) => voice.id.length > 0)
    : [];
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +88 to +111
isConfigured: ({ config }) =>
Boolean(
config.azure?.apiKey ||
process.env.AZURE_SPEECH_API_KEY,
),
synthesize: async (req) => {
const apiKey =
req.config.azure?.apiKey || process.env.AZURE_SPEECH_API_KEY;
if (!apiKey) {
throw new Error("Azure Speech API key missing");
}

const region = req.config?.azure?.region || process.env.AZURE_SPEECH_REGION || "eastus";
const baseUrl = normalizeAzureBaseUrl(req.config?.azure?.baseUrl);
const voice = req.overrides?.azure?.voice ?? req.config?.azure?.voice;
const lang = req.overrides?.azure?.lang ?? req.config?.azure?.lang;
const outputFormat =
req.overrides?.azure?.outputFormat ??
req.config?.azure?.outputFormat ??
DEFAULT_AZURE_OUTPUT_FORMAT;

if (!voice) {
throw new Error("Azure voice not configured");
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 config.azure does not exist on ResolvedTtsConfig — all config values are always undefined

SpeechSynthesisRequest.config and SpeechProviderConfiguredContext.config are both typed as ResolvedTtsConfig (see src/tts/provider-types.ts and src/tts/tts.ts). ResolvedTtsConfig in src/tts/tts.ts defines named sub-configs for elevenlabs, openai, and edge, but has no azure field. As a result, every access to req.config?.azure, req.config.azure, or config.azure in isConfigured, listVoices, and synthesize will always resolve to undefined at runtime, and TypeScript should report a compile-time error on those accesses.

The practical effect is that API key, region, voice, language, and output-format settings written in the JSON config file are completely ignored; only the AZURE_SPEECH_API_KEY / AZURE_SPEECH_REGION environment variables are reachable. The voice field has no env-var fallback at all, so synthesis will unconditionally throw "Azure voice not configured" for any user who relies on config-file settings.

To fix this properly:

  1. Add an azure sub-object to ResolvedTtsConfig in src/tts/tts.ts (mirroring the pattern for elevenlabs / openai / edge).
  2. Populate it in resolveTtsConfig() by reading raw.azure.* and resolving secrets with normalizeResolvedSecretInputString.
  3. Update SpeechSynthesisRequest or the shared config type so the provider can access resolved values.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/tts/providers/azure.ts
Line: 88-111

Comment:
**`config.azure` does not exist on `ResolvedTtsConfig` — all config values are always `undefined`**

`SpeechSynthesisRequest.config` and `SpeechProviderConfiguredContext.config` are both typed as `ResolvedTtsConfig` (see `src/tts/provider-types.ts` and `src/tts/tts.ts`). `ResolvedTtsConfig` in `src/tts/tts.ts` defines named sub-configs for `elevenlabs`, `openai`, and `edge`, but has **no `azure` field**. As a result, every access to `req.config?.azure`, `req.config.azure`, or `config.azure` in `isConfigured`, `listVoices`, and `synthesize` will always resolve to `undefined` at runtime, and TypeScript should report a compile-time error on those accesses.

The practical effect is that API key, region, voice, language, and output-format settings written in the JSON config file are completely ignored; only the `AZURE_SPEECH_API_KEY` / `AZURE_SPEECH_REGION` environment variables are reachable. The `voice` field has no env-var fallback at all, so synthesis will unconditionally throw `"Azure voice not configured"` for any user who relies on config-file settings.

To fix this properly:
1. Add an `azure` sub-object to `ResolvedTtsConfig` in `src/tts/tts.ts` (mirroring the pattern for `elevenlabs` / `openai` / `edge`).
2. Populate it in `resolveTtsConfig()` by reading `raw.azure.*` and resolving secrets with `normalizeResolvedSecretInputString`.
3. Update `SpeechSynthesisRequest` or the shared config type so the provider can access resolved values.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +102 to +107
const voice = req.overrides?.azure?.voice ?? req.config?.azure?.voice;
const lang = req.overrides?.azure?.lang ?? req.config?.azure?.lang;
const outputFormat =
req.overrides?.azure?.outputFormat ??
req.config?.azure?.outputFormat ??
DEFAULT_AZURE_OUTPUT_FORMAT;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 overrides.azure does not exist on TtsDirectiveOverrides — override fields are always undefined

SpeechSynthesisRequest.overrides is typed as TtsDirectiveOverrides (defined in src/tts/tts.ts). That type includes openai, elevenlabs, and microsoft override bags, but has no azure field. Every access to req.overrides?.azure?.voice, req.overrides?.azure?.lang, and req.overrides?.azure?.outputFormat will always evaluate to undefined at runtime, and TypeScript should flag these as errors.

In practice this means that per-call directive overrides for voice, language, and output format can never be applied to the Azure provider. An azure bag should be added to TtsDirectiveOverrides and wired through parseTtsDirectives in the same way the existing providers' overrides are handled.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/tts/providers/azure.ts
Line: 102-107

Comment:
**`overrides.azure` does not exist on `TtsDirectiveOverrides` — override fields are always `undefined`**

`SpeechSynthesisRequest.overrides` is typed as `TtsDirectiveOverrides` (defined in `src/tts/tts.ts`). That type includes `openai`, `elevenlabs`, and `microsoft` override bags, but has **no `azure` field**. Every access to `req.overrides?.azure?.voice`, `req.overrides?.azure?.lang`, and `req.overrides?.azure?.outputFormat` will always evaluate to `undefined` at runtime, and TypeScript should flag these as errors.

In practice this means that per-call directive overrides for voice, language, and output format can never be applied to the Azure provider. An `azure` bag should be added to `TtsDirectiveOverrides` and wired through `parseTtsDirectives` in the same way the existing providers' overrides are handled.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8e07d5c326

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +30 to +32
const base = normalizeAzureBaseUrl(params.baseUrl);
const region = params.region || "eastus";
const url = `https://${region}.tts.speech.microsoft.com/cognitiveservices/voices/list`;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Honor azure.baseUrl when listing voices

If a deployment sets messages.tts.azure.baseUrl (or passes listSpeechVoices(..., baseUrl)) for a custom/private Azure Speech endpoint, this still calls https://${region}.tts.speech.microsoft.com/... because the computed base value is never used. That makes Azure voice discovery fail anywhere the new config must target a non-default host, even though this commit adds baseUrl support and src/tts/providers/azure.test.ts:110-123 expects it to work.

Useful? React with 👍 / 👎.

Comment on lines +102 to +106
const voice = req.overrides?.azure?.voice ?? req.config?.azure?.voice;
const lang = req.overrides?.azure?.lang ?? req.config?.azure?.lang;
const outputFormat =
req.overrides?.azure?.outputFormat ??
req.config?.azure?.outputFormat ??
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Wire Azure directive overrides into the parser

These Azure override reads are currently unreachable. parseTtsDirectives only populates openai, elevenlabs, and microsoft fields (src/tts/tts-core.ts:154-319), and TtsDirectiveOverrides has no azure section (src/tts/tts.ts:159-180). As a result, prompts like [[tts:provider=azure voice=...]] cannot change the Azure voice/lang/output format, so the new provider silently ignores the model-override path it appears to support here.

Useful? React with 👍 / 👎.

Comment on lines +133 to +134
outputFormat,
fileExtension: outputFormat.includes("mp3") ? ".mp3" : ".wav",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Infer Azure file extensions from the selected output format

Any non-MP3 Azure output format is surfaced as .wav here. textToSpeech writes the returned buffer using this extension (src/tts/tts.ts:662-663), so Azure Opus/WebM/PCM responses get saved with the wrong filename and downstream MIME/voice-note detection becomes incorrect. The bug shows up as soon as someone uses the newly added messages.tts.azure.outputFormat setting with anything other than an MP3 format.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ea9ffd2659

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +343 to +345
apiKey: normalizeResolvedSecretInputString({
value: raw.azure?.apiKey,
path: "messages.tts.azure.apiKey",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Auto-select Azure when it is the only configured API provider

Adding Azure config here does not make the new provider reachable from the existing default-provider path. When messages.tts.provider is unset and prefs are empty, getTtsProvider() still only auto-detects OpenAI and ElevenLabs (src/tts/tts.ts:490-506) because resolveTtsApiKey() returns nothing for Azure (src/tts/tts.ts:563-574). In the common setup where a user adds messages.tts.azure.* but does not also set messages.tts.provider, OpenClaw will silently keep using Microsoft instead of the newly configured Azure provider.

Useful? React with 👍 / 👎.

Comment on lines +92 to +96
isConfigured: ({ config }) =>
Boolean(
(config as any)?.azure?.apiKey ||
process.env.AZURE_SPEECH_API_KEY,
),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Treat Azure as configured only after a voice is set

This readiness check is too loose for Azure, because unlike the other providers there is no default voice to fall back to. resolveReadySpeechProvider() trusts isConfigured() before synthesizing (src/tts/tts.ts:620-640), but synthesize() then hard-fails with Azure voice not configured at src/tts/providers/azure.ts:117-119. As a result, a deployment with only an API key configured will show Azure as ready and may select it as the primary/fallback provider even though every TTS request will fail at runtime until messages.tts.azure.voice is added.

Useful? React with 👍 / 👎.

"X-Microsoft-OutputFormat": outputFormat,
},
body: ssml,
signal: AbortSignal.timeout((req.config as any)?.azure?.timeoutMs ?? 30000),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Fall back to the global TTS timeout for Azure requests

Fresh evidence relative to the earlier timeout report: the new AbortSignal.timeout(...) now honors messages.tts.azure.timeoutMs, but it still bypasses the existing global messages.tts.timeoutMs. Every other hosted provider inherits req.config.timeoutMs, while this path hard-codes azure.timeoutMs ?? 30000, so deployments that only set the shared timeout will wait the full 30 seconds here. Because synthesizeSpeech() tries providers sequentially (src/tts/tts.ts:728-758), one slow Azure request can delay fallback much longer than the rest of the TTS stack.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 67a5b9b336

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +124 to +127
const outputFormat =
azureOverride?.outputFormat ??
(req.config as any)?.azure?.outputFormat ??
DEFAULT_AZURE_OUTPUT_FORMAT;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Honor voice-note targets when choosing Azure output

Checked src/tts/tts.ts:729-751: OpenClaw passes target="voice-note" for Telegram/Feishu/WhatsApp, and the other hosted providers switch to Opus on that target. Azure ignores req.target here and always uses the configured/default MP3 format, so Feishu auto-TTS will be uploaded as a generic file instead of audio because extensions/feishu/src/media.ts:490-497 only treats .opus/.ogg as msgType: "audio". This means the new provider cannot deliver the expected voice-bubble/audio behavior on at least Feishu even when TTS is otherwise configured correctly.

Useful? React with 👍 / 👎.

Comment on lines +84 to +86
id: "azure",
label: "Azure Speech",
aliases: ["azure-tts"],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Canonicalize the new azure-tts alias

The new alias is only registered in getSpeechProvider(). normalizeSpeechProviderId() still canonicalizes only edge, so a config or directive that sets provider: azure-tts leaves getTtsProvider() returning the alias string. From there resolveTtsProviderOrder() will try both azure-tts and azure, so one failing Azure request is retried a second time before any real fallback, and helpers like resolveTtsApiKey() do not recognize the alias at all. Since this commit advertises azure-tts as a provider alias, it should normalize to the canonical azure id like edge does for Microsoft.

Useful? React with 👍 / 👎.

Yobo added 4 commits March 21, 2026 09:45
- Add Azure TTS provider with SSML synthesis
- Support for 400+ neural voices including Cantonese (zh-HK)
- Config options: apiKey, region, voice, lang, outputFormat
- Environment variables: AZURE_SPEECH_API_KEY, AZURE_SPEECH_REGION
- Provider ID: 'azure' with alias 'azure-tts'
- Built-in voices: zh-HK-HiuMaanNeural, zh-HK-HiuGaaiNeural
- Test voice list mapping from Azure API response
- Test filtering of deprecated voices
- Test error handling for API failures
- Test custom baseUrl support
Fixed critical bugs identified by bot reviews:

1. baseUrl now used in listAzureVoices (was computed but unused)
2. region now used in synthesize endpoint construction
3. Deprecated-voice filter runs BEFORE map (Status field available)
4. Added azure to ResolvedTtsConfig type
5. Added azure to TtsDirectiveOverrides for directive support
6. Added DEFAULT_AZURE_OUTPUT_FORMAT constant
7. Added AbortController timeout for synthesize requests
8. Used type assertion for config.azure access (req.config as any)

All changes follow the suggested fixes from greptile-apps and chatgpt-codex-connector reviews.
Fixed all P2 issues identified by code review:

1. Added azure_voice directive support in parseTtsDirectives
   - Added 'azure_voice' and 'azurevoice' directive cases
   - Azure voice validation: accepts non-empty ShortName format

2. Fixed fileExtension mapping for non-MP3 Azure formats
   - Added getFileExtension() helper function
   - Maps: mp3->.mp3, wav->.wav, ogg->.ogg, webm->.webm
   - Default to .mp3 for unknown formats

3. Added azure to TTS_PROVIDERS array
   - Now included in auto-selection priority list

4. Fixed Azure readiness check (isConfigured)
   - Now properly checks for both apiKey AND voice/lang
   - Uses ResolvedTtsConfig properly

5. Fixed timeout handling
   - Added DEFAULT_TIMEOUT_MS constant (30000)
   - Uses timeout from config, directive, or default
   - Properly passed to AbortSignal.timeout()

6. Added azure to resolveTtsApiKey function
   - Returns config.azure.apiKey || AZURE_SPEECH_API_KEY

7. Added azure to TtsDirectiveOverrides type
   - Supports voice, lang, outputFormat overrides
@leonchui leonchui force-pushed the feature/azure-tts branch from 67a5b9b to 5d199eb Compare March 21, 2026 16:45
Yobo and others added 2 commits March 21, 2026 09:49
The azure type block was placed outside the interface, causing
'export modifier cannot appear on a type member' error.
Now azure is properly inside TtsDirectiveOverrides interface.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 455981919b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +26 to +31
function getFileExtension(outputFormat: string): string {
if (outputFormat.includes("mp3")) return ".mp3";
if (outputFormat.includes("wav")) return ".wav";
if (outputFormat.includes("ogg")) return ".ogg";
if (outputFormat.includes("webm")) return ".webm";
return ".mp3"; // default to mp3
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle Azure Opus/PCM formats before defaulting to .mp3

This helper still mislabels several supported Azure output formats. Azure's REST API documents audio-...-opus, raw-*, and riff-* values for X-Microsoft-OutputFormat (for example: https://learn.microsoft.com/en-us/azure/ai-services/speech-service/rest-text-to-speech), but anything that doesn't contain mp3, wav, ogg, or webm falls through to .mp3 here. textToSpeech() then writes the returned bytes under that extension (src/tts/tts.ts:698), so choosing one of those formats causes downstream media/channel code to classify the file incorrectly.

Useful? React with 👍 / 👎.

Comment on lines +127 to +129
if (!voice) {
throw new Error(
"Azure voice not configured. Set voice in config or use [[tts:voice=zh-HK-HiuMaanNeural]] directive",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Point missing-voice errors at a directive Azure actually parses

When messages.tts.azure.voice is unset, this error tells users to retry with [[tts:voice=...]], but parseTtsDirectives() still routes the generic voice= key into overrides.openai (src/tts/tts-core.ts:167-177). Azure only reads req.overrides.azure.voice, so following the suggested fix leaves the provider in the same failing state until the voice is set in config or the caller discovers the separate azure_voice directive.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant