Conversation
Signed-off-by: jakmro <[email protected]>
Signed-off-by: jakmro <[email protected]>
Signed-off-by: jakmro <[email protected]>
Signed-off-by: jakmro <[email protected]>
Signed-off-by: jakmro <[email protected]>
Signed-off-by: jakmro <[email protected]>
Signed-off-by: jakmro <[email protected]>
Signed-off-by: jakmro <[email protected]>
Signed-off-by: jakmro <[email protected]>
Signed-off-by: jakmro <[email protected]>
There was a problem hiding this comment.
Pull request overview
This pull request adds comprehensive Voice Activity Detection (VAD) support to the Cactus Engine using the Silero VAD model. The implementation includes a new model type, graph operations for LSTM cells and activation functions (ReLU, Sigmoid), and integration with Whisper/Moonshine transcription models for automatic speech preprocessing.
Changes:
- Implemented Silero VAD model with LSTM-based speech detection
- Added VAD API endpoint (
cactus_vad) for standalone speech segment detection - Integrated VAD preprocessing into transcription workflow (enabled by default)
- Created comprehensive language bindings for Python, Flutter, Swift, and Kotlin
- Added test infrastructure and automatic VAD weight bundling during model conversion
Reviewed changes
Copilot reviewed 38 out of 38 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| cactus/models/model_silero_vad.cpp | Core VAD model implementation with STFT, encoder blocks, LSTM cell, and timestamp detection |
| cactus/models/model.h | VAD model class definition with configuration structures |
| cactus/kernel/kernel_lstm.cpp | SIMD-optimized LSTM cell kernel using ARM NEON intrinsics |
| cactus/kernel/kernel_nn.cpp | Added ReLU and Sigmoid activation functions |
| cactus/kernel/kernel.h | Exposed new kernel functions |
| cactus/graph/graph_ops_nn.cpp | LSTM cell graph operation implementation |
| cactus/graph/graph_ops_math.cpp | ReLU and Sigmoid graph operation dispatch |
| cactus/graph/graph_builder.cpp | Builder methods for LSTM and activation operations |
| cactus/graph/graph_execute.cpp | Execution dispatcher for new operations |
| cactus/graph/graph.h | OpType enum additions |
| cactus/ffi/cactus_vad.cpp | FFI implementation for VAD endpoint with JSON option parsing |
| cactus/ffi/cactus_init.cpp | Automatic VAD model initialization for Whisper/Moonshine |
| cactus/ffi/cactus_transcribe.cpp | VAD preprocessing integration in transcription |
| cactus/ffi/cactus_utils.h | Added use_vad option parsing |
| cactus/ffi/cactus_complete.cpp | Updated option parsing signature |
| cactus/ffi/cactus_ffi.h | VAD function declaration |
| cactus/engine/engine_model.cpp | Silero VAD model type registration |
| cactus/engine/engine.h | Added SILERO_VAD to ModelType enum |
| python/src/converter_silero_vad.py | New converter for Silero VAD weights |
| python/src/converter_llm.py | Automatic VAD bundling for transcription models |
| python/src/cactus.py | Python VAD API binding |
| python/src/cli.py | CLI support for VAD model download and testing |
| python/src/publish_to_hf.py | Added Silero VAD to published models |
| python/requirements.txt | Added torchaudio dependency |
| flutter/cactus.dart | Flutter VAD bindings |
| apple/Cactus.swift | Swift VAD bindings |
| android/Cactus.kt | Android VAD bindings |
| android/Cactus.common.kt | Kotlin common VAD types |
| android/Cactus.android.kt | Android-specific VAD implementation |
| android/Cactus.ios.kt | iOS Kotlin Multiplatform VAD implementation |
| tests/test_engine.cpp | VAD test implementation |
| tests/run.sh | VAD model parameter and environment setup |
| tests/android/run.sh | Android VAD test configuration |
| tests/ios/run.sh | iOS VAD test configuration |
| tests/ios/configure_xcode.rb | Xcode project VAD model copying |
| tests/ios/CactusTest/CactusTest/AppDelegate.mm | iOS VAD model setup with error handling |
| docs/cactus_engine.md | VAD API documentation |
| README.md | Added Silero VAD to supported models table |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…te documentation Signed-off-by: jakmro <[email protected]>
Signed-off-by: jakmro <[email protected]>
Signed-off-by: jakmro <[email protected]>
HenryNdubuaku
left a comment
There was a problem hiding this comment.
- Is the weight combination, switch to HF and --reconvert flag gonna be another PR?2. When you remove the venv and start source ./setup afresh, it breaks and crashes with torch audio.
After fixing those 2, I will then test finally
Signed-off-by: jakmro <[email protected]>
Signed-off-by: HenryNdubuaku <[email protected]>
Signed-off-by: HenryNdubuaku <[email protected]>
|
@jakmro I have optimised the VAD pipeline now, should be 3x faster, study my PR to your branch carefully, to understand the tricks we use to optimise.
|
* port silero vad Signed-off-by: jakmro <[email protected]> * align silero vad conversion process Signed-off-by: jakmro <[email protected]> * . Signed-off-by: jakmro <[email protected]> * lstm kernel Signed-off-by: jakmro <[email protected]> * bundle vad into s2t models Signed-off-by: jakmro <[email protected]> * clean Signed-off-by: jakmro <[email protected]> * docs Signed-off-by: jakmro <[email protected]> * add silero-vad to publish list Signed-off-by: jakmro <[email protected]> * return early when no speech Signed-off-by: jakmro <[email protected]> * update cactus_vad return value to reflect JSON response size and update documentation Signed-off-by: jakmro <[email protected]> * clean Signed-off-by: jakmro <[email protected]> * refactor test_vad_process Signed-off-by: jakmro <[email protected]> * update setup script to require Python 3.12 Signed-off-by: jakmro <[email protected]> * warning fixes Signed-off-by: HenryNdubuaku <[email protected]> * Aggresively optimise VAD Signed-off-by: HenryNdubuaku <[email protected]> --------- Signed-off-by: jakmro <[email protected]> Signed-off-by: HenryNdubuaku <[email protected]> Co-authored-by: HenryNdubuaku <[email protected]>
* port silero vad Signed-off-by: jakmro <[email protected]> * align silero vad conversion process Signed-off-by: jakmro <[email protected]> * . Signed-off-by: jakmro <[email protected]> * lstm kernel Signed-off-by: jakmro <[email protected]> * bundle vad into s2t models Signed-off-by: jakmro <[email protected]> * clean Signed-off-by: jakmro <[email protected]> * docs Signed-off-by: jakmro <[email protected]> * add silero-vad to publish list Signed-off-by: jakmro <[email protected]> * return early when no speech Signed-off-by: jakmro <[email protected]> * update cactus_vad return value to reflect JSON response size and update documentation Signed-off-by: jakmro <[email protected]> * clean Signed-off-by: jakmro <[email protected]> * refactor test_vad_process Signed-off-by: jakmro <[email protected]> * update setup script to require Python 3.12 Signed-off-by: jakmro <[email protected]> * warning fixes Signed-off-by: HenryNdubuaku <[email protected]> * Aggresively optimise VAD Signed-off-by: HenryNdubuaku <[email protected]> --------- Signed-off-by: jakmro <[email protected]> Signed-off-by: HenryNdubuaku <[email protected]> Co-authored-by: HenryNdubuaku <[email protected]>
No description provided.