feat: add GigaAM v3 for Russian speech recognition#913
Conversation
|
Can you add gigaam to transcribe rs first and then I will pull that in |
|
Added GigaAM as a proper engine in transcribe-rs: cjpais/transcribe-rs#45 Once that's merged and published, I'll update this PR to use the crate feature instead of the standalone module. |
|
Thank you so much @pantafive, amazing to see how much support we have for this in just one day! I've released Only comment I have is maybe in the description section just making sure it's very clear it's for Russian. Also if you have opinion on testing, would be helpful to have your opinion there too. Is it the best for Russian speech you've tested? |
|
Thanks for the release and the CDN upload! Updated the PR — now uses Regarding the description — could you clarify which description you'd like updated? The model description in the app already says "Russian speech recognition", but happy to adjust wherever you think it needs to be clearer. As for testing — GigaAM v3 is the best Russian speech model I've tested. It outperforms Whisper-large-v3 on Russian benchmarks (9.2% vs 25.1% avg WER) and handles punctuation natively. |
Add GigaAM v3 e2e_ctc as a new transcription engine using transcribe-rs 0.2.7 gigaam feature. Russian speech recognition with punctuation, Latin characters and digit support. Co-Authored-By: Claude Opus 4.6 <[email protected]>
|
Thank you! Mainly I was thinking something a bit stronger for the description, like "Best model for Russian speakers or similar" |
|
I think "best" might be risky in a description — it's subjective, and things move fast in this space, so it could become misleading quickly. "Russian speech recognition. Fast and accurate." states what it does without overpromising. But it's your project — happy to go with whatever you think works best! |
|
"Best model for Russian speakers. Great for bilingual Russian/English use — especially developers mixing both languages." |
|
I'm good with whatever and will defer to you since I don't speak Russian haha. Things do sure move fast |
Co-Authored-By: Claude Opus 4.6 <[email protected]>
|
Not a streaming model or am I wrong? |
* feat: add GigaAM v3 model for Russian speech recognition Add GigaAM v3 e2e_ctc as a new transcription engine using transcribe-rs 0.2.7 gigaam feature. Russian speech recognition with punctuation, Latin characters and digit support. Co-Authored-By: Claude Opus 4.6 <[email protected]> * fix: cargo fmt formatting Co-Authored-By: Claude Opus 4.6 <[email protected]> * Keep the file name of the model download the same as the file on the blob website. --------- Co-authored-by: Claude Opus 4.6 <[email protected]> Co-authored-by: CJ Pais <[email protected]> (cherry picked from commit ff86122) # Conflicts: # src-tauri/Cargo.lock # src-tauri/Cargo.toml # src/bindings.ts # src/i18n/locales/ar/translation.json # src/i18n/locales/cs/translation.json # src/i18n/locales/de/translation.json # src/i18n/locales/es/translation.json # src/i18n/locales/fr/translation.json # src/i18n/locales/it/translation.json # src/i18n/locales/ja/translation.json # src/i18n/locales/ko/translation.json # src/i18n/locales/pl/translation.json # src/i18n/locales/pt/translation.json # src/i18n/locales/ru/translation.json # src/i18n/locales/tr/translation.json # src/i18n/locales/uk/translation.json # src/i18n/locales/vi/translation.json # src/i18n/locales/zh-TW/translation.json # src/i18n/locales/zh/translation.json
* feat: add GigaAM v3 model for Russian speech recognition Add GigaAM v3 e2e_ctc as a new transcription engine using transcribe-rs 0.2.7 gigaam feature. Russian speech recognition with punctuation, Latin characters and digit support. Co-Authored-By: Claude Opus 4.6 <[email protected]> * fix: cargo fmt formatting Co-Authored-By: Claude Opus 4.6 <[email protected]> * Keep the file name of the model download the same as the file on the blob website. --------- Co-authored-by: Claude Opus 4.6 <[email protected]> Co-authored-by: CJ Pais <[email protected]> (cherry picked from commit ff86122) # Conflicts: # src-tauri/Cargo.lock # src-tauri/Cargo.toml # src/bindings.ts # src/i18n/locales/ar/translation.json # src/i18n/locales/cs/translation.json # src/i18n/locales/de/translation.json # src/i18n/locales/es/translation.json # src/i18n/locales/fr/translation.json # src/i18n/locales/it/translation.json # src/i18n/locales/ja/translation.json # src/i18n/locales/ko/translation.json # src/i18n/locales/pl/translation.json # src/i18n/locales/pt/translation.json # src/i18n/locales/ru/translation.json # src/i18n/locales/tr/translation.json # src/i18n/locales/uk/translation.json # src/i18n/locales/vi/translation.json # src/i18n/locales/zh-TW/translation.json # src/i18n/locales/zh/translation.json
Adds GigaAM v3 e2e_ctc engine — Russian speech recognition with punctuation, Latin characters and digits. Uses int8 quantized ONNX model (225 MB), BPE tokenizer with 257 subword tokens.
The model is currently downloaded from HuggingFace (istupakov/gigaam-v3-onnx). It needs to be mirrored to blob.handy.computer to be consistent with other models.