Error when transcribing to portuguese #940
-
|
Hello everyone, I'm new here, and I've started to experiment with Handy, and I'm liking it very much. I'm using in both english and portuguese (my main language). I've noticed a rather annoying issue in portuguese: it can't pick up the word "um" (in english is translated to "a"). Let me give you an example: use the Google Translate to translate the phrase "this is a test" to portuguese. You're going to get "Isto é um teste" as a result. Play the audio from your phone, for example, and enable Handy to transcribe in your computer, and the result will be "Isto é teste". I'm not sure if this is a bug, that's why I'm asking here if someone can help me point to the right direction. I've tested this with both Parakeet and Whisper models. Thanks for your hard work. BR, |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 2 replies
-
|
Hi, I think it might be due to automatic filler word removal #589 |
Beta Was this translation helpful? Give feedback.
-
|
I can confirm using google translate and removing "um": diff --git a/src-tauri/src/audio_toolkit/text.rs b/src-tauri/src/audio_toolkit/text.rs
index 0f780bc..9adddb1 100644
--- a/src-tauri/src/audio_toolkit/text.rs
+++ b/src-tauri/src/audio_toolkit/text.rs
@@ -196,7 +196,7 @@ fn extract_punctuation(word: &str) -> (&str, &str) {
/// Filler words to remove from transcriptions
const FILLER_WORDS: &[&str] = &[
- "uh", "um", "uhm", "umm", "uhh", "uhhh", "ah", "eh", "hmm", "hm", "mmm", "mm", "mh", "ha",
+ "uh", "uhm", "umm", "uhh", "uhhh", "ah", "eh", "hmm", "hm", "mmm", "mm", "mh", "ha",
"ehh",
];I get |
Beta Was this translation helpful? Give feedback.
-
|
@pchalasani FYI |
Beta Was this translation helpful? Give feedback.
-
|
Done #941 |
Beta Was this translation helpful? Give feedback.
-
|
Thank you for reporting and a nice trick to use google translate to debug. |
Beta Was this translation helpful? Give feedback.
I can confirm using google translate and removing "um":
I get
Isto é um teste.