Voicing Silent Speech

This repository contains code for synthesizing speech audio from silently mouthed words captured with electromyography (EMG). This is a fork of the original Gaddy's Silent Speech repository, updated to support the TinyMyo EMG foundation model for transduction and recognition tasks.

Key Changes in this Fork

TinyMyo Integration: Added support for pre-training and fine-tuning using the TinyMyo foundation model.
Improved Tooling: Integrated uv for faster dependency management and PyTorch setup.
Streamlined Dataset Pipeline: Added HDF5 dataset building scripts for much faster training initialization.
WandB Logging: Added experiment tracking via Weights & Biases.
Updated Decoding: Replaced DeepSpeech dependencies with SpeechBrain and torchaudio built-in CTC decoders for better compatibility with modern hardware.

Environment Setup

The code requires Python 3.10+. We recommend using uv to sync dependencies:

uv sync

You must also initialize submodules for Hifi-GAN and phoneme alignment data:

git submodule update --init
# Extract alignments if not already done
tar -xvzf text_alignments/text_alignments.tar.gz

Data & Preprocessing

The EMG and audio data can be downloaded from Zenodo. However, we provide a script to automate this process via the download_data.py script. After downloading, you can optionally clean and resample the audio for faster training without having to do it on-the-fly.

An HDF5 dataset builder is also included to convert the raw data into a format that allows for much faster loading during training, suitable for HPC environments.

1. Download Data

Configure your $DATA_PATH in config/transduction_model.json, then run:

python download_data.py

2. Audio Cleaning (Optional)

Speeds up training by saving resampled audio files:

python data_collection/clean_audio.py

3. Build HDF5 Dataset

Required for performant training. This builds the dataset once rather than on-the-fly:

python build_hdf5.py

EMG to Audio (Transduction)

This model synthesizes audio features (MFCCs) from EMG signals, which are then converted to wav via a vocoder.

Training

python transduction_model.py

Configuration (hyperparameters, paths, WandB) is managed via config/transduction_model.json.

Note: If the HiFi-GAN vocoder checkpoint is not found locally, the script will automatically download and extract it from Zenodo to ensure a seamless setup.

Evaluation

To evaluate a saved model checkpoint and generate audio samples:

python transduction_model.py --evaluate_saved "./output/model_best.pt" --output_dir "./eval_results"

EMG to Text (Recognition)

Directly convert silent speech to text using a CTC decoder.

For recognition, we use SpeechBrain's built-in CTC decoder with a KenLM n-gram language model for improved performance. The get_lexicon.py script generates the necessary lexicon from the training data.

Setup Decoder

Download the pre-trained KenLM language model and generate the custom lexicon from your dataset in one step:

python get_lexicon.py

This script downloads the Librispeech 4-gram model into the KenLM/ directory and creates gaddy_lexicon.txt containing all unique words found in the local HDF5 datasets.

Run

python recognition_model.py                                      # Train
python recognition_model.py --evaluate_saved "path/to/model.pt"  # Evaluate

Configuration (hyperparameters, paths, WandB) is managed via config/recognition_model.json.

Documentation

This project uses MkDocs with the Material theme and mkdocstrings for API documentation.

To build and view the documentation locally:

# Serve the documentation
mkdocs serve

The documentation includes a Quick Start guide, detailed project sections, and automatically generated API references for all core modules.

Resources

EMG Data: Zenodo (4064408)
Transduction Models: Zenodo (6747411)
Recognition Models: Zenodo (7183877)

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
config		config
data_collection		data_collection
docs		docs
hifi_gan @ 4769534		hifi_gan @ 4769534
text_alignments @ 5c71ae9		text_alignments @ 5c71ae9
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
architecture.py		architecture.py
build_hdf5.py		build_hdf5.py
data_utils.py		data_utils.py
download_data.py		download_data.py
get_lexicon.py		get_lexicon.py
hdf5_dataset.py		hdf5_dataset.py
make_vocoder_trainset.py		make_vocoder_trainset.py
mkdocs.yml		mkdocs.yml
normalizers.pkl		normalizers.pkl
pyproject.toml		pyproject.toml
recognition_model.py		recognition_model.py
requirements.txt		requirements.txt
testset_largedev.json		testset_largedev.json
testset_origdev.json		testset_origdev.json
transduction_model.py		transduction_model.py
uv.lock		uv.lock
vocoder.py		vocoder.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voicing Silent Speech

Key Changes in this Fork

Environment Setup

Data & Preprocessing

1. Download Data

2. Audio Cleaning (Optional)

3. Build HDF5 Dataset

EMG to Audio (Transduction)

Training

Evaluation

EMG to Text (Recognition)

Setup Decoder

Run

Documentation

Resources

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Voicing Silent Speech

Key Changes in this Fork

Environment Setup

Data & Preprocessing

1. Download Data

2. Audio Cleaning (Optional)

3. Build HDF5 Dataset

EMG to Audio (Transduction)

Training

Evaluation

EMG to Text (Recognition)

Setup Decoder

Run

Documentation

Resources

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages