Skip to content

Vad#353

Merged
HenryNdubuaku merged 16 commits intomainfrom
vad
Feb 16, 2026
Merged

Vad#353
HenryNdubuaku merged 16 commits intomainfrom
vad

Conversation

@jakmro
Copy link
Copy Markdown
Collaborator

@jakmro jakmro commented Feb 15, 2026

No description provided.

Signed-off-by: jakmro <[email protected]>
.
Signed-off-by: jakmro <[email protected]>
Signed-off-by: jakmro <[email protected]>
Signed-off-by: jakmro <[email protected]>
Signed-off-by: jakmro <[email protected]>
@jakmro jakmro marked this pull request as ready for review February 15, 2026 05:35
Copilot AI review requested due to automatic review settings February 15, 2026 05:35
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds comprehensive Voice Activity Detection (VAD) support to the Cactus Engine using the Silero VAD model. The implementation includes a new model type, graph operations for LSTM cells and activation functions (ReLU, Sigmoid), and integration with Whisper/Moonshine transcription models for automatic speech preprocessing.

Changes:

  • Implemented Silero VAD model with LSTM-based speech detection
  • Added VAD API endpoint (cactus_vad) for standalone speech segment detection
  • Integrated VAD preprocessing into transcription workflow (enabled by default)
  • Created comprehensive language bindings for Python, Flutter, Swift, and Kotlin
  • Added test infrastructure and automatic VAD weight bundling during model conversion

Reviewed changes

Copilot reviewed 38 out of 38 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
cactus/models/model_silero_vad.cpp Core VAD model implementation with STFT, encoder blocks, LSTM cell, and timestamp detection
cactus/models/model.h VAD model class definition with configuration structures
cactus/kernel/kernel_lstm.cpp SIMD-optimized LSTM cell kernel using ARM NEON intrinsics
cactus/kernel/kernel_nn.cpp Added ReLU and Sigmoid activation functions
cactus/kernel/kernel.h Exposed new kernel functions
cactus/graph/graph_ops_nn.cpp LSTM cell graph operation implementation
cactus/graph/graph_ops_math.cpp ReLU and Sigmoid graph operation dispatch
cactus/graph/graph_builder.cpp Builder methods for LSTM and activation operations
cactus/graph/graph_execute.cpp Execution dispatcher for new operations
cactus/graph/graph.h OpType enum additions
cactus/ffi/cactus_vad.cpp FFI implementation for VAD endpoint with JSON option parsing
cactus/ffi/cactus_init.cpp Automatic VAD model initialization for Whisper/Moonshine
cactus/ffi/cactus_transcribe.cpp VAD preprocessing integration in transcription
cactus/ffi/cactus_utils.h Added use_vad option parsing
cactus/ffi/cactus_complete.cpp Updated option parsing signature
cactus/ffi/cactus_ffi.h VAD function declaration
cactus/engine/engine_model.cpp Silero VAD model type registration
cactus/engine/engine.h Added SILERO_VAD to ModelType enum
python/src/converter_silero_vad.py New converter for Silero VAD weights
python/src/converter_llm.py Automatic VAD bundling for transcription models
python/src/cactus.py Python VAD API binding
python/src/cli.py CLI support for VAD model download and testing
python/src/publish_to_hf.py Added Silero VAD to published models
python/requirements.txt Added torchaudio dependency
flutter/cactus.dart Flutter VAD bindings
apple/Cactus.swift Swift VAD bindings
android/Cactus.kt Android VAD bindings
android/Cactus.common.kt Kotlin common VAD types
android/Cactus.android.kt Android-specific VAD implementation
android/Cactus.ios.kt iOS Kotlin Multiplatform VAD implementation
tests/test_engine.cpp VAD test implementation
tests/run.sh VAD model parameter and environment setup
tests/android/run.sh Android VAD test configuration
tests/ios/run.sh iOS VAD test configuration
tests/ios/configure_xcode.rb Xcode project VAD model copying
tests/ios/CactusTest/CactusTest/AppDelegate.mm iOS VAD model setup with error handling
docs/cactus_engine.md VAD API documentation
README.md Added Silero VAD to supported models table

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Collaborator

@HenryNdubuaku HenryNdubuaku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Is the weight combination, switch to HF and --reconvert flag gonna be another PR?2. When you remove the venv and start source ./setup afresh, it breaks and crashes with torch audio.

After fixing those 2, I will then test finally

jakmro and others added 3 commits February 15, 2026 21:14
Signed-off-by: HenryNdubuaku <[email protected]>
Signed-off-by: HenryNdubuaku <[email protected]>
@HenryNdubuaku
Copy link
Copy Markdown
Collaborator

@jakmro I have optimised the VAD pipeline now, should be 3x faster, study my PR to your branch carefully, to understand the tricks we use to optimise.

  • Fused STFT+magnitude kernel
  • Avoid repeated memory allocation, Pre-allocated process_chunk buffers & reuse in get_speech_timestamps
  • Not every convolution/matmul op benefits from threading, so best to use a threshold to decide when to thread
  • Bulk WAV read
  • Windowed-sinc resampler

@HenryNdubuaku HenryNdubuaku merged commit 678a6b4 into main Feb 16, 2026
1 of 2 checks passed
ncylich pushed a commit that referenced this pull request Feb 24, 2026
* port silero vad

Signed-off-by: jakmro <[email protected]>

* align silero vad conversion process

Signed-off-by: jakmro <[email protected]>

* .

Signed-off-by: jakmro <[email protected]>

* lstm kernel

Signed-off-by: jakmro <[email protected]>

* bundle vad into s2t models

Signed-off-by: jakmro <[email protected]>

* clean

Signed-off-by: jakmro <[email protected]>

* docs

Signed-off-by: jakmro <[email protected]>

* add silero-vad to publish list

Signed-off-by: jakmro <[email protected]>

* return early when no speech

Signed-off-by: jakmro <[email protected]>

* update cactus_vad return value to reflect JSON response size and update documentation

Signed-off-by: jakmro <[email protected]>

* clean

Signed-off-by: jakmro <[email protected]>

* refactor test_vad_process

Signed-off-by: jakmro <[email protected]>

* update setup script to require Python 3.12

Signed-off-by: jakmro <[email protected]>

* warning fixes

Signed-off-by: HenryNdubuaku <[email protected]>

* Aggresively optimise VAD

Signed-off-by: HenryNdubuaku <[email protected]>

---------

Signed-off-by: jakmro <[email protected]>
Signed-off-by: HenryNdubuaku <[email protected]>
Co-authored-by: HenryNdubuaku <[email protected]>
cattermelon1234 pushed a commit to cattermelon1234/cactus that referenced this pull request Feb 28, 2026
* port silero vad

Signed-off-by: jakmro <[email protected]>

* align silero vad conversion process

Signed-off-by: jakmro <[email protected]>

* .

Signed-off-by: jakmro <[email protected]>

* lstm kernel

Signed-off-by: jakmro <[email protected]>

* bundle vad into s2t models

Signed-off-by: jakmro <[email protected]>

* clean

Signed-off-by: jakmro <[email protected]>

* docs

Signed-off-by: jakmro <[email protected]>

* add silero-vad to publish list

Signed-off-by: jakmro <[email protected]>

* return early when no speech

Signed-off-by: jakmro <[email protected]>

* update cactus_vad return value to reflect JSON response size and update documentation

Signed-off-by: jakmro <[email protected]>

* clean

Signed-off-by: jakmro <[email protected]>

* refactor test_vad_process

Signed-off-by: jakmro <[email protected]>

* update setup script to require Python 3.12

Signed-off-by: jakmro <[email protected]>

* warning fixes

Signed-off-by: HenryNdubuaku <[email protected]>

* Aggresively optimise VAD

Signed-off-by: HenryNdubuaku <[email protected]>

---------

Signed-off-by: jakmro <[email protected]>
Signed-off-by: HenryNdubuaku <[email protected]>
Co-authored-by: HenryNdubuaku <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants