Skip to content

feat(tools): add speech-to-text voice input#1838

Merged
piorpua merged 3 commits intoiOfficeAI:mainfrom
cdxiaodong:feat/stt-voice-input
Mar 29, 2026
Merged

feat(tools): add speech-to-text voice input#1838
piorpua merged 3 commits intoiOfficeAI:mainfrom
cdxiaodong:feat/stt-voice-input

Conversation

@cdxiaodong
Copy link
Copy Markdown
Member

Summary

  • add speech-to-text voice input with desktop and WebUI support
  • support configurable OpenAI Whisper and Deepgram Nova-2 providers in tool settings
  • include main-process transcription service, renderer voice input UI, logging, and regression tests

Changes

  • add shared speech-to-text config/types plus Electron IPC and WebUI /api/stt bridge
  • add renderer microphone button, recording/transcription flow, and transcript insertion for sendbox and guid
  • add provider settings, i18n strings, diagnostics, and test coverage for STT flows

Related Issue

Closes #331

Test Plan

  • Verify the microphone button only appears after enabling speech-to-text in settings
  • Verify OpenAI Whisper transcribes microphone input into the prompt box on desktop/WebUI
  • Verify Deepgram Nova-2 transcribes microphone input into the prompt box on desktop/WebUI
  • Run node .\\node_modules\\typescript\\bin\\tsc --noEmit
  • Run bun run package

@cdxiaodong cdxiaodong force-pushed the feat/stt-voice-input branch from 8047cef to 39c16e4 Compare March 29, 2026 05:37
@piorpua piorpua added the bot:reviewing Review in progress (mutex) label Mar 29, 2026
@piorpua
Copy link
Copy Markdown
Contributor

piorpua commented Mar 29, 2026

Code Review:feat(tools): add speech-to-text voice input (#1838)

变更概述

本 PR 为桌面端和 WebUI 新增了语音转文字(STT)功能,支持 OpenAI Whisper 和 Deepgram Nova-2 两种 provider。改动涵盖主进程 IPC bridge 服务、WebUI /api/stt 路由、渲染层麦克风按钮组件(sendbox 和 guid 页面),以及完整的 i18n 和单元测试。


方案评估

结论:✅ 方案合理

整体方案分层清晰:主进程 SpeechToTextService 通过 IPC bridge 提供服务,渲染层通过 useSpeechInput hook 封装录音/转写逻辑,WebUI 通过 /api/stt 路由复用同一服务。与项目已有架构边界(process/renderer/IPC)完全一致。桌面端和 WebUI 的分支逻辑在 transcribeAudioBlob 中统一处理,没有引入额外耦合。


问题清单

🔵 LOW — SpeechToTextService(主进程)应使用独立函数而非 static-only class

文件src/process/bridge/services/SpeechToTextService.ts,第 143 行

问题代码

export class SpeechToTextService {
  static async transcribe(request: SpeechToTextRequest): Promise<SpeechToTextResult> { ... }
  private static async transcribeWithOpenAI(...) { ... }
  private static async transcribeWithDeepgram(...) { ... }
}

问题说明:Oxlint 报告 no-extraneous-class:类中只有 static 方法,应改为独立函数导出。项目中其他 bridge service 文件也倾向使用独立函数风格。

修复建议:将 class 改为模块级独立函数:

export async function transcribeSpeechToText(request: SpeechToTextRequest): Promise<SpeechToTextResult> { ... }

🔵 LOW — normalizeAudioBuffer 使用 Array#sort() 会变异中间数组(unicorn lint 警告)

文件src/process/bridge/services/SpeechToTextService.ts,第 62 行

问题代码

const orderedKeys = Object.keys(audioBuffer)
  .filter((key) => /^\d+$/.test(key))
  .sort((a, b) => Number(a) - Number(b));

问题说明:Oxlint 报告 no-array-sort:虽然 .filter() 已创建新数组,但 .sort() 仍然原地变异该数组。项目倡导不可变模式,应使用 .toSorted()

修复建议

const orderedKeys = Object.keys(audioBuffer)
  .filter((key) => /^\d+$/.test(key))
  .toSorted((a, b) => Number(a) - Number(b));

🔵 LOW — 事件名常量 SPEECH_TO_TEXT_CONFIG_CHANGED_EVENT 在两处重复定义

文件src/renderer/components/chat/SpeechInputButton.tsx,第 24 行;src/renderer/components/settings/SettingsModal/contents/ToolsModalContent.tsx,第 40 行

问题说明:同一事件名字符串在两个文件中各自定义为私有 const,若未来修改事件名容易遗漏其中一处。建议提取到共享位置(如 src/renderer/services/SpeechToTextService.ts 或独立的 constants 文件)导出复用。

修复建议:在 src/renderer/services/SpeechToTextService.ts 末尾(或单独的 speechEvents.ts)导出:

export const SPEECH_TO_TEXT_CONFIG_CHANGED_EVENT = 'aionui:speech-to-text-config-changed';

两处消费方 import 即可。


🔵 LOW — Patch 覆盖率 42.15%,低于 50% 基线

问题说明:Codecov 报告本次改动 patch coverage 为 42.15%,258 行代码缺少测试覆盖,主要集中在:

  • src/renderer/services/SpeechToTextService.ts(12.76%)
  • src/renderer/hooks/system/useSpeechInput.ts(34.92%)
  • src/process/webserver/routes/apiRoutes.ts(0%,新增 /api/stt 路由)

核心录音流程(startRecording/stopRecording 的状态转换、transcribeFile 文件上传路径)和 WebUI 路由尚无测试覆盖,建议后续补充。


汇总

# 严重级别 文件 问题
1 🔵 LOW src/process/bridge/services/SpeechToTextService.ts:143 static-only class,应改为独立函数
2 🔵 LOW src/process/bridge/services/SpeechToTextService.ts:62 使用 .sort() 变异中间数组,改用 .toSorted()
3 🔵 LOW SpeechInputButton.tsx:24 / ToolsModalContent.tsx:40 事件名常量重复定义
4 🔵 LOW 多文件 Patch 覆盖率 42.15%,低于 50% 基线

结论

批准合并 — 仅存在 LOW 级别问题,无阻塞性问题,代码质量整体良好,可合并。


本报告由本地 pr-review skill 生成,包含完整项目上下文,无截断限制。

CONCLUSION: APPROVED
IS_CRITICAL_PATH: false
PR_NUMBER: 1838

@piorpua
Copy link
Copy Markdown
Contributor

piorpua commented Mar 29, 2026

✅ 已自动 review,无阻塞性问题,正在触发自动合并。

@piorpua piorpua merged commit fb7a373 into iOfficeAI:main Mar 29, 2026
14 checks passed
@piorpua piorpua added bot:done Auto-merged by bot and removed bot:reviewing Review in progress (mutex) labels Mar 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bot:done Auto-merged by bot

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Voice input for Prompting

2 participants