Merged
Conversation
Removed three filler words from the transcription filter that are actual words in European languages: - 'um' - Portuguese/Spanish indefinite article meaning 'a/an' (masculine) Example: Portuguese 'Isto é um teste' (This is a test) Removing this was breaking Portuguese transcriptions - 'ha' - Spanish/Italian/Norwegian/Swedish auxiliary verb meaning 'has/have' Example: Spanish 'Él ha comido' (He has eaten) This is a very common verb form in Romance and Scandinavian languages - 'eh' - Italian interjection and Canadian English discourse marker While less critical, this can appear in legitimate Italian speech The remaining filler words are primarily English vocalized hesitations that don't conflict with common words in other European languages. Updated tests to use 'uhm' instead of 'um' where needed. Fixes #941
thukabjj
added a commit
to thukabjj/Handy
that referenced
this pull request
Mar 7, 2026
Upstream changes: - Language-aware filler word removal (cjpais#971) - Portable mode NSIS installer (cjpais#807) - Italian translation fix (cjpais#973) - Bun setup for Windows ARM64 (cjpais#965) - tauri-plugin-dialog upgraded to 2.6 - Linux install docs and AppImage troubleshooting (cjpais#951) Conflict resolution: - package.json: take upstream dialog 2.6 - Cargo.toml: take upstream dialog 2.6, keep fork deps (symphonia, keyring) - text.rs: take upstream's language-aware filter_transcription_output, keep fork test - settings/mod.rs: keep fork fields + add upstream custom_filler_words - bun.lock: take upstream version
fxbenard
pushed a commit
to fxbenard/Parler
that referenced
this pull request
Mar 11, 2026
* Remove filler words that conflict with European languages Removed three filler words from the transcription filter that are actual words in European languages: - 'um' - Portuguese/Spanish indefinite article meaning 'a/an' (masculine) Example: Portuguese 'Isto é um teste' (This is a test) Removing this was breaking Portuguese transcriptions - 'ha' - Spanish/Italian/Norwegian/Swedish auxiliary verb meaning 'has/have' Example: Spanish 'Él ha comido' (He has eaten) This is a very common verb form in Romance and Scandinavian languages - 'eh' - Italian interjection and Canadian English discourse marker While less critical, this can appear in legitimate Italian speech The remaining filler words are primarily English vocalized hesitations that don't conflict with common words in other European languages. Updated tests to use 'uhm' instead of 'um' where needed. Fixes cjpais#941 * customize list --------- Co-authored-by: Alexander Yastrebov <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
where "ha" = "has")
custom_filler_wordssetting for users who want to override the defaults (set to[]to disable filtering entirely)Addresses the concerns raised in #943. Don't just remove filler words that conflict with European languages (which regresses English), this takes the language-aware approach discussed in that thread.
Known limitation:
custom_filler_wordsis a power-user setting with no UI. It can only be changed by editing settings_store.json directly. Changes tocustom_filler_wordsfield require an app restart to take effectThe setting has three states:
The filler word list is selected based on the app language setting (the UI language), not the transcription language. For unknown or unsupported languages, a conservative fallback list is used that avoids removing ambiguous words like "um", "eh", and "ha".