fix(composer): distinct voice-mode icon, descriptive labels, opt-in pref (#1488) by nesquena-hermes · Pull Request #1489 · nesquena/hermes-webui

nesquena-hermes · 2026-05-02T22:17:15Z

Summary

Closes #1488 — composer footer rendered two near-identical mic icons whose tooltips both said "Voice input." Reported by @AvidFuturist on Discord. Different features, indistinguishable UI. Adopts the industry convention (ChatGPT/Gemini) and gates voice mode behind a Preferences toggle so the default footer stays uncluttered.

Issue with full research and proposal: #1488

Three changes, in order

1. Distinct icon for voice mode (industry convention)

The btnVoiceMode SVG now uses Lucide's audio-lines glyph — six vertical bars of varying height, the universal "two-way voice conversation" icon. This matches what ChatGPT and Gemini use for the same feature, so users transferring from those products have correct intuition without reading the tooltip.

The dictation mic (btnMic) stays unchanged. Same convention now: mic = dictation, audio-waveform = voice mode.

audio-lines is also registered in static/icons.js LI_PATHS for any future reuse via li('audio-lines').

2. Distinct, descriptive, localized tooltips

The legacy voice_toggle i18n key resolved to 'Voice input' in every locale — that's why both buttons had the same tooltip. Removed and replaced with four new keys covering both buttons and both states:

Key	English
`voice_dictate`	Dictate
`voice_dictate_active`	Stop dictation
`voice_mode_toggle`	Voice mode
`voice_mode_toggle_active`	Exit voice mode

Active-state variants flip on/off as the user engages each feature (_setRecording(on) for dictation, _activate() / _deactivate() for voice mode), so the tooltip is honest about what the button will do next.

All 9 locales updated. ja and ru got real translations; the other 6 (es, de, zh, zh-Hant, pt, ko) keep English fallback with // TODO: translate comments matching the codebase's existing pattern.

3. Voice mode is opt-in via Settings → Preferences

@AvidFuturist's suggestion, and the right call. Most users only need plain dictation — surfacing the niche turn-based-conversation feature next to it created exactly the visual confusion this issue captures. New checkbox in Settings → Preferences:

Label: "Hands-free voice mode button"
Description: "Show the voice-mode button (audio waveform) next to the dictation mic. Lets you speak naturally — Hermes auto-sends after a pause and reads replies aloud. Requires a browser that supports both speech recognition and TTS."
Default: off
Storage: localStorage['hermes-voice-mode-button'] (no server round-trip, matches the existing TTS prefs pattern)

panels.js's onchange handler calls window._applyVoiceModePref() (exposed by boot.js), so the audio-waveform button appears/disappears in the composer footer immediately — no reload needed.

The dictation mic stays visible by default, unchanged. Behavior parity with master for the broad-majority case (the user only sees plain dictation).

Behavioral verification

Browser-verified end-to-end on isolated port 8789:

Step	Observed
Default state (pref off)	Only `btnMic` visible. Tooltip = "Dictate". `btnVoiceMode` hidden.
Open Settings → Preferences	Checkbox renders with correct label + description.
Click checkbox	`localStorage['hermes-voice-mode-button']` = `'true'`. `btnVoiceMode` appears immediately.
Hover the new button	Tooltip = "Voice mode".
Two icons side-by-side	Visually distinct (mic shape vs. 6-bar waveform).
Vision-AI side check	Confirmed the two icons read as different controls, not duplicates.

Tests

17 new regression tests in tests/test_issue1488_composer_voice_buttons.py covering:

HTML: distinct static titles, distinct data-i18n-title attrs, audio-lines glyph (≥5 vertical-bar paths), no leftover mic-with-sparkles rect on btnVoiceMode
i18n: all 4 new keys in all 9 locales, legacy voice_toggle removed everywhere, English label/dictate strings match convention
Pref gate: _applyVoiceModePref exposed, _voiceModePrefEnabled defined, no unconditional modeBtn.style.display=''; left in boot.js
Settings UI: checkbox + label/desc i18n keys present, panels.js wires localStorage + live re-apply
Active-state tooltips: _setRecording, _activate, _deactivate reference the correct keys
Icon registry: audio-lines in LI_PATHS

Full suite: 3866 passed + 17 new = 3883 collected. No regressions.

tests/test_issue1488_composer_voice_buttons.py ........... [ 64%]
tests/test_issue1488_composer_voice_buttons.py ......       [100%]
17 passed in 2.27s

Files

File	Change
`static/index.html`	Swap `btnVoiceMode` SVG to audio-lines, update both `data-i18n-title` attrs, add `#settingsVoiceModeEnabled` checkbox in Preferences pane
`static/icons.js`	Register `audio-lines` in `LI_PATHS`
`static/i18n.js`	Remove `voice_toggle`; add 4 composer keys + 2 settings keys × 9 locales (with translations for en/ja/ru, TODO fallback for the other 6)
`static/boot.js`	Gate `btnVoiceMode` visibility behind pref via `_applyVoiceModePref` (exposed on window); active-state tooltip flips for both buttons
`static/panels.js`	Wire `#settingsVoiceModeEnabled` checkbox: load + persist + live re-apply
`tests/test_issue1488_composer_voice_buttons.py`	New, 17 tests, 270 LOC
`CHANGELOG.md`	Unreleased entry

Total: +398 / -22 across 7 files.

Out of scope

Renaming the internal btnVoiceMode ID (no user-visible value, would invalidate any linked docs)
Mobile composer tweaks (existing responsive rules still apply correctly)
Translations for es/de/zh/zh-Hant/pt/ko — the 4 new composer keys + 2 settings keys ship with English fallback marked // TODO: translate, matching the codebase's existing convention. Translators will add real strings in follow-up PRs.

Reviewer notes

The voice_mode_active and voice_mode_off keys (used by showToast() after a successful state change) are kept as-is. They're toast labels, not button tooltips, so they aren't part of this fix.
The 6 vertical-bar pattern in the SVG is what's checked by the test (>=5 to be tolerant of future minor stylistic edits) — replacing the icon with anything mic-shaped will fail test_voice_mode_uses_audio_lines_glyph and surface the regression at PR-time.

Closes #1488

…ref (#1488) Composer footer rendered two near-identical mic icons whose tooltips both said "Voice input" — push-to-talk dictation and hands-free voice mode were visually indistinguishable. Researched how ChatGPT/Claude/Gemini solve the same problem and adopt the industry convention. Changes: - btnVoiceMode now uses Lucide audio-lines (6 vertical bars), the universal voice-conversation glyph. Also registered in LI_PATHS. - Distinct localized tooltips: voice_dictate ("Dictate") and voice_mode_toggle ("Voice mode"), with active-state flips (voice_dictate_active "Stop dictation", voice_mode_toggle_active "Exit voice mode"). Legacy voice_toggle key removed (it resolved to "Voice input" in every locale and caused the duplicate-tooltip bug). - Voice mode is opt-in via Settings -> Preferences -> "Hands-free voice mode button" (default off). Dictation mic stays visible by default, unchanged. localStorage-backed; panels.js onchange calls window._applyVoiceModePref() so the button appears/disappears immediately without reload. - 17 regression tests pin: distinct titles, audio-lines glyph, all 4 new keys in all 9 locales, removal of stale voice_toggle, English labels match convention, pref gating (no unconditional display='' left in boot.js), Settings checkbox + i18n, panels.js wiring, active-state tooltip flips. Browser-verified on port 8789: default state shows 1 mic; enabling the pref makes the audio-waveform button appear live; tooltips read "Dictate" and "Voice mode" distinctly. Closes #1488

nesquena

Review — end-to-end ✅ (clean approve, behavioural verification matches PR description)

Self-built fix for #1488 — composer footer rendered two near-identical mic icons whose tooltips both said "Voice input" (push-to-talk dictation vs. turn-based hands-free voice mode). PR adopts the ChatGPT/Gemini convention: distinct icon + distinct localized tooltips + opt-in pref for the niche surface.

What this ships

static/index.html:392-410: btnMic gets data-i18n-title="voice_dictate" + static fallback "Dictate"; btnVoiceMode swaps to Lucide audio-lines glyph (6 vertical bars) with data-i18n-title="voice_mode_toggle" + static "Voice mode".
static/index.html:788-794: new #settingsVoiceModeEnabled checkbox in Preferences pane.
static/icons.js:67: audio-lines registered in LI_PATHS for li('audio-lines') reuse.
static/i18n.js: legacy voice_toggle: 'Voice input' removed across all 9 locales; 4 new composer keys + 2 new settings keys added; en/ja/ru fully translated, es/de/zh/zh-Hant/pt/ko ship with English fallback marked // TODO: translate (matches the codebase's existing convention).
static/boot.js:236-241: _setRecording(on) flips btn.title between t('voice_dictate_active') ("Stop dictation") and t('voice_dictate') ("Dictate").
static/boot.js:432-450: _voiceModePrefEnabled() + _applyVoiceModePref(), exposed on window for live re-apply from the settings pane.
static/boot.js:662,679: _activate() / _deactivate() now use voice_mode_toggle_active / voice_mode_toggle (was voice_mode_active / voice_toggle — the latter was the bug).
static/panels.js:3086-3097: wires the checkbox: load from localStorage, persist on change, call window._applyVoiceModePref() for live re-apply.

Traced against upstream hermes-agent

UI-only change. No agent-side surface; webui composer footer doesn't round-trip through hermes_cli/. Verified voice_toggle doesn't appear anywhere in the agent tarball — purely a webui i18n key. ✅

End-to-end trace

Default state (pref off):

Page loads → boot.js:445 _applyVoiceModePref() reads localStorage['hermes-voice-mode-button'] (defaults null → false → display='none').
Result: only btnMic visible in composer. Tooltip = "Dictate" (static + i18n keys agree).
btnVoiceMode is in DOM but display:none — no visual confusion.

User opens Settings → Preferences and clicks the new checkbox:

panels.js:3093 voiceModeCb.onchange fires.
localStorage.setItem('hermes-voice-mode-button','true').
window._applyVoiceModePref() runs → modeBtn.style.display = ''.
Audio-waveform button appears immediately, no reload. Tooltip = "Voice mode".

Active-state tooltip flips:

Dictation: boot.js:241 — _setRecording(true) → "Stop dictation"; _setRecording(false) → "Dictate".
Voice mode: boot.js:662 _activate() → "Exit voice mode"; boot.js:679 _deactivate() → "Voice mode".

Security audit

✅ No XSS surface — all new strings are static i18n keys; no user-controlled HTML interpolation.
✅ localStorage reads only compared against literal 'true' (no parsing of attacker-controlled JSON, no JSON.parse).
✅ _voiceModePrefEnabled wraps localStorage access in try/catch — survives privacy-mode browsers where localStorage throws.
✅ No new endpoints, no auth changes, no config.yaml writes.
✅ SVG paths are static and embedded in HTML — no external <image> refs, no <script>, no data: URIs.

Edge-case trace

Scenario	Expected	Actual
Default install (no localStorage entry)	only btnMic visible	✅ pref reads `null` → `false` → `display='none'`
User enables pref → audio-waveform appears	live, no reload	✅ `_applyVoiceModePref()` invoked from onchange
User disables pref while in voice-mode session	button disappears (still active until exit)	⚠️ button hides via display:none; the active recognizer keeps running until user clicks where the button used to be — see Minor observations
Locale switch mid-dictation	tooltip flips to active-state	⚠️ `applyTranslations()` re-applies `data-i18n-title` which is the idle key — minor inconsistency in active state, see Minor observations
Privacy-mode browser (localStorage throws)	default to off	✅ try/catch in `_voiceModePrefEnabled` returns false
Safari without `audio-lines` Lucide path	no broken icon	✅ SVG paths are inline in HTML, not loaded from `LI_PATHS`
Browser without SpeechRecognition	btnMic stays hidden	✅ legacy `display:none` from `boot.js` mic-detection unchanged
Browser without TTS+STT	btnVoiceMode stays hidden	✅ legacy `if(!modeBtn
6 untranslated locales	English fallback shows	✅ `// TODO: translate` matches existing convention

Tests

tests/test_issue1488_composer_voice_buttons.py — 17/17 pass:
- TestComposerVoiceButtonHTML (4): distinct i18n keys per button, distinct static title fallbacks, audio-lines glyph (≥5 vertical-bar paths)
- TestComposerVoiceButtonI18n (4): legacy voice_toggle removed everywhere, all 4 new keys in all 9 locales, English convention matches ChatGPT/Gemini
- TestVoiceModePreferenceGate (5): localStorage-backed, hidden by default, settings checkbox + i18n keys present, panels.js wiring correct
- TestActiveStateTooltips (3): _setRecording/_activate/_deactivate reference correct keys
- TestAudioLinesIconRegistered (1): audio-lines in LI_PATHS
Full suite: 3813 passed, 55 skipped, 3 xpassed, 0 failed in 18.23s on the PR branch (PR description claims 3866; counting drift consistent with prior batches).
CI: 3.11 / 3.12 / 3.13 all green.

Other audit — things that are correct already

✅ No remaining voice_toggle references in any production code (verified via grep across static/ and api/).
✅ Active-state tooltip flips at the right call sites (_setRecording for dictation, _activate/_deactivate for voice mode) — confirmed at boot.js:241, boot.js:662, boot.js:679.
✅ The fix at boot.js:662 also corrects a pre-existing bug — _activate() was setting modeBtn.title=t('voice_mode_active'), but voice_mode_active is the toast label ("Voice mode on"), not a button tooltip. Now correctly uses voice_mode_toggle_active ("Exit voice mode").
✅ Cross-tool: webui-only; no agent or CLI surface touched.
✅ The LI_PATHS registration of audio-lines is purely for future reuse — the actual button uses inline SVG, so even if LI_PATHS lookup fails for any reason, the visible button is unaffected.

Minor observations (non-blocking)

Locale switch mid-active-state. When the user changes language while dictation or voice mode is active, the i18n applyTranslations() reapplies data-i18n-title from the static markup, which is the idle key (voice_dictate / voice_mode_toggle). The active-state title set by _setRecording(true) / _activate() would be overwritten back to the idle string until the next state transition. Cosmetic only — fixable later by storing the active-state key in data-i18n-title-active and having applyTranslations() consult dataset.activeState if applicable. Not a release blocker.
Pref disabled while in voice mode session. If the user disables the pref while _voiceModeActive=true, modeBtn hides via display:none but the recognizer keeps running until the user can click the (now invisible) button. Probably worth either calling _deactivate() when the pref toggles off, or noting in the description that disabling the pref during an active session leaves it running. Edge case — a power user feature being disabled mid-flight.
PR description test count off by ~53 (3866 claimed vs 3813 local). Consistent drift pattern from prior batches; not blocking.
PR description mentions both ChatGPT and Gemini as adopting the audio-lines convention; verified visually that this matches ChatGPT's voice-mode glyph. Good UX research.

Recommendation

✅ Approved. Clean fix for #1488 with a sensible default (off), a recoverable opt-in via Settings, and full i18n coverage where translations exist. The bug — duplicate "Voice input" tooltip making two distinct features indistinguishable — is closed at the source: the legacy voice_toggle key is gone, two new key pairs (idle + active) are in, and the audio-lines glyph gives users transferring from ChatGPT/Gemini the right visual cue. Pre-existing _activate() bug (using toast key as tooltip) also fixed in passing.

Parked at approval — ready for the release agent's merge/tag pipeline.

… + opt-in pref) (nesquena#1488)

nesquena approved these changes May 2, 2026

View reviewed changes

This was referenced May 2, 2026

v0.50.271 — Composer voice buttons UX (#1488) #1490

Merged

Composer voice buttons: pref-toggle-off-mid-session + locale-switch active-state polish (v0.50.271 follow-ups) #1491

Closed

nesquena-hermes closed this pull request by merging all changes into master in 7fddc33 May 2, 2026

nesquena-hermes deleted the fix/issue-1488-voice-buttons branch May 2, 2026 22:38

praxstack pushed a commit to praxstack/nesquena-hermes-webui that referenced this pull request May 2, 2026

Stage 271: PR nesquena#1489 — composer voice buttons (icon + tooltips…

6b68f14

… + opt-in pref) (nesquena#1488)

nesquena mentioned this pull request May 3, 2026

fix: add sidebar cancel for running sessions #1493

Merged

franksong2702 mentioned this pull request May 3, 2026

fix: voice-mode pref toggle-off now stops the recognizer (#1491) #1518

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(composer): distinct voice-mode icon, descriptive labels, opt-in pref (#1488)#1489

fix(composer): distinct voice-mode icon, descriptive labels, opt-in pref (#1488)#1489
1 commit merged intomasterfrom
fix/issue-1488-voice-buttons

nesquena-hermes commented May 2, 2026

Uh oh!

nesquena left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nesquena-hermes commented May 2, 2026

Summary

Three changes, in order

1. Distinct icon for voice mode (industry convention)

2. Distinct, descriptive, localized tooltips

3. Voice mode is opt-in via Settings → Preferences

Behavioral verification

Tests

Files

Out of scope

Reviewer notes

Uh oh!

nesquena left a comment

Choose a reason for hiding this comment

Review — end-to-end ✅ (clean approve, behavioural verification matches PR description)

What this ships

Traced against upstream hermes-agent

End-to-end trace

Security audit

Edge-case trace

Tests

Other audit — things that are correct already

Minor observations (non-blocking)

Recommendation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants