Skip to content

fix(composer): distinct voice-mode icon, descriptive labels, opt-in pref (#1488)#1489

Merged
1 commit merged intomasterfrom
fix/issue-1488-voice-buttons
May 2, 2026
Merged

fix(composer): distinct voice-mode icon, descriptive labels, opt-in pref (#1488)#1489
1 commit merged intomasterfrom
fix/issue-1488-voice-buttons

Conversation

@nesquena-hermes
Copy link
Copy Markdown
Collaborator

Summary

Closes #1488 — composer footer rendered two near-identical mic icons whose tooltips both said "Voice input." Reported by @AvidFuturist on Discord. Different features, indistinguishable UI. Adopts the industry convention (ChatGPT/Gemini) and gates voice mode behind a Preferences toggle so the default footer stays uncluttered.

Issue with full research and proposal: #1488

Three changes, in order

1. Distinct icon for voice mode (industry convention)

The btnVoiceMode SVG now uses Lucide's audio-lines glyph — six vertical bars of varying height, the universal "two-way voice conversation" icon. This matches what ChatGPT and Gemini use for the same feature, so users transferring from those products have correct intuition without reading the tooltip.

The dictation mic (btnMic) stays unchanged. Same convention now: mic = dictation, audio-waveform = voice mode.

audio-lines is also registered in static/icons.js LI_PATHS for any future reuse via li('audio-lines').

2. Distinct, descriptive, localized tooltips

The legacy voice_toggle i18n key resolved to 'Voice input' in every locale — that's why both buttons had the same tooltip. Removed and replaced with four new keys covering both buttons and both states:

Key English
voice_dictate Dictate
voice_dictate_active Stop dictation
voice_mode_toggle Voice mode
voice_mode_toggle_active Exit voice mode

Active-state variants flip on/off as the user engages each feature (_setRecording(on) for dictation, _activate() / _deactivate() for voice mode), so the tooltip is honest about what the button will do next.

All 9 locales updated. ja and ru got real translations; the other 6 (es, de, zh, zh-Hant, pt, ko) keep English fallback with // TODO: translate comments matching the codebase's existing pattern.

3. Voice mode is opt-in via Settings → Preferences

@AvidFuturist's suggestion, and the right call. Most users only need plain dictation — surfacing the niche turn-based-conversation feature next to it created exactly the visual confusion this issue captures. New checkbox in Settings → Preferences:

  • Label: "Hands-free voice mode button"
  • Description: "Show the voice-mode button (audio waveform) next to the dictation mic. Lets you speak naturally — Hermes auto-sends after a pause and reads replies aloud. Requires a browser that supports both speech recognition and TTS."
  • Default: off
  • Storage: localStorage['hermes-voice-mode-button'] (no server round-trip, matches the existing TTS prefs pattern)

panels.js's onchange handler calls window._applyVoiceModePref() (exposed by boot.js), so the audio-waveform button appears/disappears in the composer footer immediately — no reload needed.

The dictation mic stays visible by default, unchanged. Behavior parity with master for the broad-majority case (the user only sees plain dictation).

Behavioral verification

Browser-verified end-to-end on isolated port 8789:

Step Observed
Default state (pref off) Only btnMic visible. Tooltip = "Dictate". btnVoiceMode hidden.
Open Settings → Preferences Checkbox renders with correct label + description.
Click checkbox localStorage['hermes-voice-mode-button'] = 'true'. btnVoiceMode appears immediately.
Hover the new button Tooltip = "Voice mode".
Two icons side-by-side Visually distinct (mic shape vs. 6-bar waveform).
Vision-AI side check Confirmed the two icons read as different controls, not duplicates.

Tests

17 new regression tests in tests/test_issue1488_composer_voice_buttons.py covering:

  • HTML: distinct static titles, distinct data-i18n-title attrs, audio-lines glyph (≥5 vertical-bar paths), no leftover mic-with-sparkles rect on btnVoiceMode
  • i18n: all 4 new keys in all 9 locales, legacy voice_toggle removed everywhere, English label/dictate strings match convention
  • Pref gate: _applyVoiceModePref exposed, _voiceModePrefEnabled defined, no unconditional modeBtn.style.display=''; left in boot.js
  • Settings UI: checkbox + label/desc i18n keys present, panels.js wires localStorage + live re-apply
  • Active-state tooltips: _setRecording, _activate, _deactivate reference the correct keys
  • Icon registry: audio-lines in LI_PATHS

Full suite: 3866 passed + 17 new = 3883 collected. No regressions.

tests/test_issue1488_composer_voice_buttons.py ........... [ 64%]
tests/test_issue1488_composer_voice_buttons.py ......       [100%]
17 passed in 2.27s

Files

File Change
static/index.html Swap btnVoiceMode SVG to audio-lines, update both data-i18n-title attrs, add #settingsVoiceModeEnabled checkbox in Preferences pane
static/icons.js Register audio-lines in LI_PATHS
static/i18n.js Remove voice_toggle; add 4 composer keys + 2 settings keys × 9 locales (with translations for en/ja/ru, TODO fallback for the other 6)
static/boot.js Gate btnVoiceMode visibility behind pref via _applyVoiceModePref (exposed on window); active-state tooltip flips for both buttons
static/panels.js Wire #settingsVoiceModeEnabled checkbox: load + persist + live re-apply
tests/test_issue1488_composer_voice_buttons.py New, 17 tests, 270 LOC
CHANGELOG.md Unreleased entry

Total: +398 / -22 across 7 files.

Out of scope

  • Renaming the internal btnVoiceMode ID (no user-visible value, would invalidate any linked docs)
  • Mobile composer tweaks (existing responsive rules still apply correctly)
  • Translations for es/de/zh/zh-Hant/pt/ko — the 4 new composer keys + 2 settings keys ship with English fallback marked // TODO: translate, matching the codebase's existing convention. Translators will add real strings in follow-up PRs.

Reviewer notes

  • The voice_mode_active and voice_mode_off keys (used by showToast() after a successful state change) are kept as-is. They're toast labels, not button tooltips, so they aren't part of this fix.
  • The 6 vertical-bar pattern in the SVG is what's checked by the test (>=5 to be tolerant of future minor stylistic edits) — replacing the icon with anything mic-shaped will fail test_voice_mode_uses_audio_lines_glyph and surface the regression at PR-time.

Closes #1488

…ref (#1488)

Composer footer rendered two near-identical mic icons whose tooltips both
said "Voice input" — push-to-talk dictation and hands-free voice mode were
visually indistinguishable. Researched how ChatGPT/Claude/Gemini solve the
same problem and adopt the industry convention.

Changes:
- btnVoiceMode now uses Lucide audio-lines (6 vertical bars), the
  universal voice-conversation glyph. Also registered in LI_PATHS.
- Distinct localized tooltips: voice_dictate ("Dictate") and
  voice_mode_toggle ("Voice mode"), with active-state flips
  (voice_dictate_active "Stop dictation", voice_mode_toggle_active
  "Exit voice mode"). Legacy voice_toggle key removed (it resolved to
  "Voice input" in every locale and caused the duplicate-tooltip bug).
- Voice mode is opt-in via Settings -> Preferences ->
  "Hands-free voice mode button" (default off). Dictation mic stays
  visible by default, unchanged. localStorage-backed; panels.js onchange
  calls window._applyVoiceModePref() so the button appears/disappears
  immediately without reload.
- 17 regression tests pin: distinct titles, audio-lines glyph, all 4
  new keys in all 9 locales, removal of stale voice_toggle, English
  labels match convention, pref gating (no unconditional display=''
  left in boot.js), Settings checkbox + i18n, panels.js wiring,
  active-state tooltip flips.

Browser-verified on port 8789: default state shows 1 mic; enabling
the pref makes the audio-waveform button appear live; tooltips read
"Dictate" and "Voice mode" distinctly.

Closes #1488
Copy link
Copy Markdown
Owner

@nesquena nesquena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — end-to-end ✅ (clean approve, behavioural verification matches PR description)

Self-built fix for #1488 — composer footer rendered two near-identical mic icons whose tooltips both said "Voice input" (push-to-talk dictation vs. turn-based hands-free voice mode). PR adopts the ChatGPT/Gemini convention: distinct icon + distinct localized tooltips + opt-in pref for the niche surface.

What this ships

  • static/index.html:392-410: btnMic gets data-i18n-title="voice_dictate" + static fallback "Dictate"; btnVoiceMode swaps to Lucide audio-lines glyph (6 vertical bars) with data-i18n-title="voice_mode_toggle" + static "Voice mode".
  • static/index.html:788-794: new #settingsVoiceModeEnabled checkbox in Preferences pane.
  • static/icons.js:67: audio-lines registered in LI_PATHS for li('audio-lines') reuse.
  • static/i18n.js: legacy voice_toggle: 'Voice input' removed across all 9 locales; 4 new composer keys + 2 new settings keys added; en/ja/ru fully translated, es/de/zh/zh-Hant/pt/ko ship with English fallback marked // TODO: translate (matches the codebase's existing convention).
  • static/boot.js:236-241: _setRecording(on) flips btn.title between t('voice_dictate_active') ("Stop dictation") and t('voice_dictate') ("Dictate").
  • static/boot.js:432-450: _voiceModePrefEnabled() + _applyVoiceModePref(), exposed on window for live re-apply from the settings pane.
  • static/boot.js:662,679: _activate() / _deactivate() now use voice_mode_toggle_active / voice_mode_toggle (was voice_mode_active / voice_toggle — the latter was the bug).
  • static/panels.js:3086-3097: wires the checkbox: load from localStorage, persist on change, call window._applyVoiceModePref() for live re-apply.

Traced against upstream hermes-agent

UI-only change. No agent-side surface; webui composer footer doesn't round-trip through hermes_cli/. Verified voice_toggle doesn't appear anywhere in the agent tarball — purely a webui i18n key. ✅

End-to-end trace

Default state (pref off):

  1. Page loads → boot.js:445 _applyVoiceModePref() reads localStorage['hermes-voice-mode-button'] (defaults nullfalsedisplay='none').
  2. Result: only btnMic visible in composer. Tooltip = "Dictate" (static + i18n keys agree).
  3. btnVoiceMode is in DOM but display:none — no visual confusion.

User opens Settings → Preferences and clicks the new checkbox:

  1. panels.js:3093 voiceModeCb.onchange fires.
  2. localStorage.setItem('hermes-voice-mode-button','true').
  3. window._applyVoiceModePref() runs → modeBtn.style.display = ''.
  4. Audio-waveform button appears immediately, no reload. Tooltip = "Voice mode".

Active-state tooltip flips:

  • Dictation: boot.js:241_setRecording(true) → "Stop dictation"; _setRecording(false) → "Dictate".
  • Voice mode: boot.js:662 _activate() → "Exit voice mode"; boot.js:679 _deactivate() → "Voice mode".

Security audit

  • ✅ No XSS surface — all new strings are static i18n keys; no user-controlled HTML interpolation.
  • localStorage reads only compared against literal 'true' (no parsing of attacker-controlled JSON, no JSON.parse).
  • _voiceModePrefEnabled wraps localStorage access in try/catch — survives privacy-mode browsers where localStorage throws.
  • ✅ No new endpoints, no auth changes, no config.yaml writes.
  • ✅ SVG paths are static and embedded in HTML — no external <image> refs, no <script>, no data: URIs.

Edge-case trace

Scenario Expected Actual
Default install (no localStorage entry) only btnMic visible ✅ pref reads nullfalsedisplay='none'
User enables pref → audio-waveform appears live, no reload _applyVoiceModePref() invoked from onchange
User disables pref while in voice-mode session button disappears (still active until exit) ⚠️ button hides via display:none; the active recognizer keeps running until user clicks where the button used to be — see Minor observations
Locale switch mid-dictation tooltip flips to active-state ⚠️ applyTranslations() re-applies data-i18n-title which is the idle key — minor inconsistency in active state, see Minor observations
Privacy-mode browser (localStorage throws) default to off ✅ try/catch in _voiceModePrefEnabled returns false
Safari without audio-lines Lucide path no broken icon ✅ SVG paths are inline in HTML, not loaded from LI_PATHS
Browser without SpeechRecognition btnMic stays hidden ✅ legacy display:none from boot.js mic-detection unchanged
Browser without TTS+STT btnVoiceMode stays hidden ✅ legacy `if(!modeBtn
6 untranslated locales English fallback shows // TODO: translate matches existing convention

Tests

  • tests/test_issue1488_composer_voice_buttons.py — 17/17 pass:
    • TestComposerVoiceButtonHTML (4): distinct i18n keys per button, distinct static title fallbacks, audio-lines glyph (≥5 vertical-bar paths)
    • TestComposerVoiceButtonI18n (4): legacy voice_toggle removed everywhere, all 4 new keys in all 9 locales, English convention matches ChatGPT/Gemini
    • TestVoiceModePreferenceGate (5): localStorage-backed, hidden by default, settings checkbox + i18n keys present, panels.js wiring correct
    • TestActiveStateTooltips (3): _setRecording/_activate/_deactivate reference correct keys
    • TestAudioLinesIconRegistered (1): audio-lines in LI_PATHS
  • Full suite: 3813 passed, 55 skipped, 3 xpassed, 0 failed in 18.23s on the PR branch (PR description claims 3866; counting drift consistent with prior batches).
  • CI: 3.11 / 3.12 / 3.13 all green.

Other audit — things that are correct already

  • ✅ No remaining voice_toggle references in any production code (verified via grep across static/ and api/).
  • ✅ Active-state tooltip flips at the right call sites (_setRecording for dictation, _activate/_deactivate for voice mode) — confirmed at boot.js:241, boot.js:662, boot.js:679.
  • ✅ The fix at boot.js:662 also corrects a pre-existing bug — _activate() was setting modeBtn.title=t('voice_mode_active'), but voice_mode_active is the toast label ("Voice mode on"), not a button tooltip. Now correctly uses voice_mode_toggle_active ("Exit voice mode").
  • ✅ Cross-tool: webui-only; no agent or CLI surface touched.
  • ✅ The LI_PATHS registration of audio-lines is purely for future reuse — the actual button uses inline SVG, so even if LI_PATHS lookup fails for any reason, the visible button is unaffected.

Minor observations (non-blocking)

  • Locale switch mid-active-state. When the user changes language while dictation or voice mode is active, the i18n applyTranslations() reapplies data-i18n-title from the static markup, which is the idle key (voice_dictate / voice_mode_toggle). The active-state title set by _setRecording(true) / _activate() would be overwritten back to the idle string until the next state transition. Cosmetic only — fixable later by storing the active-state key in data-i18n-title-active and having applyTranslations() consult dataset.activeState if applicable. Not a release blocker.

  • Pref disabled while in voice mode session. If the user disables the pref while _voiceModeActive=true, modeBtn hides via display:none but the recognizer keeps running until the user can click the (now invisible) button. Probably worth either calling _deactivate() when the pref toggles off, or noting in the description that disabling the pref during an active session leaves it running. Edge case — a power user feature being disabled mid-flight.

  • PR description test count off by ~53 (3866 claimed vs 3813 local). Consistent drift pattern from prior batches; not blocking.

  • PR description mentions both ChatGPT and Gemini as adopting the audio-lines convention; verified visually that this matches ChatGPT's voice-mode glyph. Good UX research.

Recommendation

Approved. Clean fix for #1488 with a sensible default (off), a recoverable opt-in via Settings, and full i18n coverage where translations exist. The bug — duplicate "Voice input" tooltip making two distinct features indistinguishable — is closed at the source: the legacy voice_toggle key is gone, two new key pairs (idle + active) are in, and the audio-lines glyph gives users transferring from ChatGPT/Gemini the right visual cue. Pre-existing _activate() bug (using toast key as tooltip) also fixed in passing.

Parked at approval — ready for the release agent's merge/tag pipeline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(composer): duplicate-looking mic icons — dictate and voice-mode buttons share tooltip 'Voice input'

2 participants