fix(composer): distinct voice-mode icon, descriptive labels, opt-in pref (#1488)#1489
fix(composer): distinct voice-mode icon, descriptive labels, opt-in pref (#1488)#14891 commit merged intomasterfrom
Conversation
…ref (#1488) Composer footer rendered two near-identical mic icons whose tooltips both said "Voice input" — push-to-talk dictation and hands-free voice mode were visually indistinguishable. Researched how ChatGPT/Claude/Gemini solve the same problem and adopt the industry convention. Changes: - btnVoiceMode now uses Lucide audio-lines (6 vertical bars), the universal voice-conversation glyph. Also registered in LI_PATHS. - Distinct localized tooltips: voice_dictate ("Dictate") and voice_mode_toggle ("Voice mode"), with active-state flips (voice_dictate_active "Stop dictation", voice_mode_toggle_active "Exit voice mode"). Legacy voice_toggle key removed (it resolved to "Voice input" in every locale and caused the duplicate-tooltip bug). - Voice mode is opt-in via Settings -> Preferences -> "Hands-free voice mode button" (default off). Dictation mic stays visible by default, unchanged. localStorage-backed; panels.js onchange calls window._applyVoiceModePref() so the button appears/disappears immediately without reload. - 17 regression tests pin: distinct titles, audio-lines glyph, all 4 new keys in all 9 locales, removal of stale voice_toggle, English labels match convention, pref gating (no unconditional display='' left in boot.js), Settings checkbox + i18n, panels.js wiring, active-state tooltip flips. Browser-verified on port 8789: default state shows 1 mic; enabling the pref makes the audio-waveform button appear live; tooltips read "Dictate" and "Voice mode" distinctly. Closes #1488
nesquena
left a comment
There was a problem hiding this comment.
Review — end-to-end ✅ (clean approve, behavioural verification matches PR description)
Self-built fix for #1488 — composer footer rendered two near-identical mic icons whose tooltips both said "Voice input" (push-to-talk dictation vs. turn-based hands-free voice mode). PR adopts the ChatGPT/Gemini convention: distinct icon + distinct localized tooltips + opt-in pref for the niche surface.
What this ships
- static/index.html:392-410:
btnMicgetsdata-i18n-title="voice_dictate"+ static fallback "Dictate";btnVoiceModeswaps to Lucideaudio-linesglyph (6 vertical bars) withdata-i18n-title="voice_mode_toggle"+ static "Voice mode". - static/index.html:788-794: new
#settingsVoiceModeEnabledcheckbox in Preferences pane. - static/icons.js:67:
audio-linesregistered inLI_PATHSforli('audio-lines')reuse. - static/i18n.js: legacy
voice_toggle: 'Voice input'removed across all 9 locales; 4 new composer keys + 2 new settings keys added; en/ja/ru fully translated, es/de/zh/zh-Hant/pt/ko ship with English fallback marked// TODO: translate(matches the codebase's existing convention). - static/boot.js:236-241:
_setRecording(on)flipsbtn.titlebetweent('voice_dictate_active')("Stop dictation") andt('voice_dictate')("Dictate"). - static/boot.js:432-450:
_voiceModePrefEnabled()+_applyVoiceModePref(), exposed onwindowfor live re-apply from the settings pane. - static/boot.js:662,679:
_activate()/_deactivate()now usevoice_mode_toggle_active/voice_mode_toggle(wasvoice_mode_active/voice_toggle— the latter was the bug). - static/panels.js:3086-3097: wires the checkbox: load from localStorage, persist on change, call
window._applyVoiceModePref()for live re-apply.
Traced against upstream hermes-agent
UI-only change. No agent-side surface; webui composer footer doesn't round-trip through hermes_cli/. Verified voice_toggle doesn't appear anywhere in the agent tarball — purely a webui i18n key. ✅
End-to-end trace
Default state (pref off):
- Page loads →
boot.js:445_applyVoiceModePref()readslocalStorage['hermes-voice-mode-button'](defaultsnull→false→display='none'). - Result: only
btnMicvisible in composer. Tooltip = "Dictate" (static + i18n keys agree). btnVoiceModeis in DOM butdisplay:none— no visual confusion.
User opens Settings → Preferences and clicks the new checkbox:
- panels.js:3093
voiceModeCb.onchangefires. localStorage.setItem('hermes-voice-mode-button','true').window._applyVoiceModePref()runs →modeBtn.style.display = ''.- Audio-waveform button appears immediately, no reload. Tooltip = "Voice mode".
Active-state tooltip flips:
- Dictation: boot.js:241 —
_setRecording(true)→ "Stop dictation";_setRecording(false)→ "Dictate". - Voice mode: boot.js:662
_activate()→ "Exit voice mode"; boot.js:679_deactivate()→ "Voice mode".
Security audit
- ✅ No XSS surface — all new strings are static i18n keys; no user-controlled HTML interpolation.
- ✅
localStoragereads only compared against literal'true'(no parsing of attacker-controlled JSON, noJSON.parse). - ✅
_voiceModePrefEnabledwraps localStorage access in try/catch — survives privacy-mode browsers wherelocalStoragethrows. - ✅ No new endpoints, no auth changes, no
config.yamlwrites. - ✅ SVG paths are static and embedded in HTML — no external
<image>refs, no<script>, nodata:URIs.
Edge-case trace
| Scenario | Expected | Actual |
|---|---|---|
| Default install (no localStorage entry) | only btnMic visible | ✅ pref reads null → false → display='none' |
| User enables pref → audio-waveform appears | live, no reload | ✅ _applyVoiceModePref() invoked from onchange |
| User disables pref while in voice-mode session | button disappears (still active until exit) | |
| Locale switch mid-dictation | tooltip flips to active-state | applyTranslations() re-applies data-i18n-title which is the idle key — minor inconsistency in active state, see Minor observations |
| Privacy-mode browser (localStorage throws) | default to off | ✅ try/catch in _voiceModePrefEnabled returns false |
Safari without audio-lines Lucide path |
no broken icon | ✅ SVG paths are inline in HTML, not loaded from LI_PATHS |
| Browser without SpeechRecognition | btnMic stays hidden | ✅ legacy display:none from boot.js mic-detection unchanged |
| Browser without TTS+STT | btnVoiceMode stays hidden | ✅ legacy `if(!modeBtn |
| 6 untranslated locales | English fallback shows | ✅ // TODO: translate matches existing convention |
Tests
tests/test_issue1488_composer_voice_buttons.py— 17/17 pass:TestComposerVoiceButtonHTML(4): distinct i18n keys per button, distinct static title fallbacks, audio-lines glyph (≥5 vertical-bar paths)TestComposerVoiceButtonI18n(4): legacyvoice_toggleremoved everywhere, all 4 new keys in all 9 locales, English convention matches ChatGPT/GeminiTestVoiceModePreferenceGate(5): localStorage-backed, hidden by default, settings checkbox + i18n keys present, panels.js wiring correctTestActiveStateTooltips(3):_setRecording/_activate/_deactivatereference correct keysTestAudioLinesIconRegistered(1):audio-linesinLI_PATHS
- Full suite: 3813 passed, 55 skipped, 3 xpassed, 0 failed in 18.23s on the PR branch (PR description claims 3866; counting drift consistent with prior batches).
- CI: 3.11 / 3.12 / 3.13 all green.
Other audit — things that are correct already
- ✅ No remaining
voice_togglereferences in any production code (verified via grep acrossstatic/andapi/). - ✅ Active-state tooltip flips at the right call sites (
_setRecordingfor dictation,_activate/_deactivatefor voice mode) — confirmed at boot.js:241, boot.js:662, boot.js:679. - ✅ The fix at boot.js:662 also corrects a pre-existing bug —
_activate()was settingmodeBtn.title=t('voice_mode_active'), butvoice_mode_activeis the toast label ("Voice mode on"), not a button tooltip. Now correctly usesvoice_mode_toggle_active("Exit voice mode"). - ✅ Cross-tool: webui-only; no agent or CLI surface touched.
- ✅ The
LI_PATHSregistration ofaudio-linesis purely for future reuse — the actual button uses inline SVG, so even ifLI_PATHSlookup fails for any reason, the visible button is unaffected.
Minor observations (non-blocking)
-
Locale switch mid-active-state. When the user changes language while dictation or voice mode is active, the i18n
applyTranslations()reappliesdata-i18n-titlefrom the static markup, which is the idle key (voice_dictate/voice_mode_toggle). The active-state title set by_setRecording(true)/_activate()would be overwritten back to the idle string until the next state transition. Cosmetic only — fixable later by storing the active-state key indata-i18n-title-activeand havingapplyTranslations()consultdataset.activeStateif applicable. Not a release blocker. -
Pref disabled while in voice mode session. If the user disables the pref while
_voiceModeActive=true,modeBtnhides viadisplay:nonebut the recognizer keeps running until the user can click the (now invisible) button. Probably worth either calling_deactivate()when the pref toggles off, or noting in the description that disabling the pref during an active session leaves it running. Edge case — a power user feature being disabled mid-flight. -
PR description test count off by ~53 (3866 claimed vs 3813 local). Consistent drift pattern from prior batches; not blocking.
-
PR description mentions both ChatGPT and Gemini as adopting the audio-lines convention; verified visually that this matches ChatGPT's voice-mode glyph. Good UX research.
Recommendation
✅ Approved. Clean fix for #1488 with a sensible default (off), a recoverable opt-in via Settings, and full i18n coverage where translations exist. The bug — duplicate "Voice input" tooltip making two distinct features indistinguishable — is closed at the source: the legacy voice_toggle key is gone, two new key pairs (idle + active) are in, and the audio-lines glyph gives users transferring from ChatGPT/Gemini the right visual cue. Pre-existing _activate() bug (using toast key as tooltip) also fixed in passing.
Parked at approval — ready for the release agent's merge/tag pipeline.
… + opt-in pref) (nesquena#1488)
Summary
Closes #1488 — composer footer rendered two near-identical mic icons whose tooltips both said "Voice input." Reported by @AvidFuturist on Discord. Different features, indistinguishable UI. Adopts the industry convention (ChatGPT/Gemini) and gates voice mode behind a Preferences toggle so the default footer stays uncluttered.
Issue with full research and proposal: #1488
Three changes, in order
1. Distinct icon for voice mode (industry convention)
The
btnVoiceModeSVG now uses Lucide'saudio-linesglyph — six vertical bars of varying height, the universal "two-way voice conversation" icon. This matches what ChatGPT and Gemini use for the same feature, so users transferring from those products have correct intuition without reading the tooltip.The dictation mic (
btnMic) stays unchanged. Same convention now: mic = dictation, audio-waveform = voice mode.audio-linesis also registered instatic/icons.jsLI_PATHSfor any future reuse viali('audio-lines').2. Distinct, descriptive, localized tooltips
The legacy
voice_togglei18n key resolved to'Voice input'in every locale — that's why both buttons had the same tooltip. Removed and replaced with four new keys covering both buttons and both states:voice_dictatevoice_dictate_activevoice_mode_togglevoice_mode_toggle_activeActive-state variants flip on/off as the user engages each feature (
_setRecording(on)for dictation,_activate()/_deactivate()for voice mode), so the tooltip is honest about what the button will do next.All 9 locales updated. ja and ru got real translations; the other 6 (es, de, zh, zh-Hant, pt, ko) keep English fallback with
// TODO: translatecomments matching the codebase's existing pattern.3. Voice mode is opt-in via Settings → Preferences
@AvidFuturist's suggestion, and the right call. Most users only need plain dictation — surfacing the niche turn-based-conversation feature next to it created exactly the visual confusion this issue captures. New checkbox in Settings → Preferences:
localStorage['hermes-voice-mode-button'](no server round-trip, matches the existing TTS prefs pattern)panels.js's onchange handler callswindow._applyVoiceModePref()(exposed byboot.js), so the audio-waveform button appears/disappears in the composer footer immediately — no reload needed.The dictation mic stays visible by default, unchanged. Behavior parity with master for the broad-majority case (the user only sees plain dictation).
Behavioral verification
Browser-verified end-to-end on isolated port 8789:
btnMicvisible. Tooltip = "Dictate".btnVoiceModehidden.localStorage['hermes-voice-mode-button']='true'.btnVoiceModeappears immediately.Tests
17 new regression tests in
tests/test_issue1488_composer_voice_buttons.pycovering:data-i18n-titleattrs, audio-lines glyph (≥5 vertical-bar paths), no leftover mic-with-sparklesrectonbtnVoiceModevoice_toggleremoved everywhere, English label/dictate strings match convention_applyVoiceModePrefexposed,_voiceModePrefEnableddefined, no unconditionalmodeBtn.style.display='';left inboot.jspanels.jswires localStorage + live re-apply_setRecording,_activate,_deactivatereference the correct keysaudio-linesinLI_PATHSFull suite: 3866 passed + 17 new = 3883 collected. No regressions.
Files
static/index.htmlbtnVoiceModeSVG to audio-lines, update bothdata-i18n-titleattrs, add#settingsVoiceModeEnabledcheckbox in Preferences panestatic/icons.jsaudio-linesinLI_PATHSstatic/i18n.jsvoice_toggle; add 4 composer keys + 2 settings keys × 9 locales (with translations for en/ja/ru, TODO fallback for the other 6)static/boot.jsbtnVoiceModevisibility behind pref via_applyVoiceModePref(exposed on window); active-state tooltip flips for both buttonsstatic/panels.js#settingsVoiceModeEnabledcheckbox: load + persist + live re-applytests/test_issue1488_composer_voice_buttons.pyCHANGELOG.mdTotal: +398 / -22 across 7 files.
Out of scope
btnVoiceModeID (no user-visible value, would invalidate any linked docs)// TODO: translate, matching the codebase's existing convention. Translators will add real strings in follow-up PRs.Reviewer notes
voice_mode_activeandvoice_mode_offkeys (used byshowToast()after a successful state change) are kept as-is. They're toast labels, not button tooltips, so they aren't part of this fix.>=5to be tolerant of future minor stylistic edits) — replacing the icon with anything mic-shaped will failtest_voice_mode_uses_audio_lines_glyphand surface the regression at PR-time.Closes #1488