This repository contains code for synthesizing speech audio from silently mouthed words captured with electromyography (EMG). This is a fork of the original Gaddy's Silent Speech repository, updated to support the TinyMyo EMG foundation model for transduction and recognition tasks.
- TinyMyo Integration: Added support for pre-training and fine-tuning using the TinyMyo foundation model.
- Improved Tooling: Integrated
uvfor faster dependency management and PyTorch setup. - Streamlined Dataset Pipeline: Added HDF5 dataset building scripts for much faster training initialization.
- WandB Logging: Added experiment tracking via Weights & Biases.
- Updated Decoding: Replaced
DeepSpeechdependencies withSpeechBrainandtorchaudiobuilt-in CTC decoders for better compatibility with modern hardware.
The code requires Python 3.10+. We recommend using uv to sync dependencies:
uv syncYou must also initialize submodules for Hifi-GAN and phoneme alignment data:
git submodule update --init
# Extract alignments if not already done
tar -xvzf text_alignments/text_alignments.tar.gzThe EMG and audio data can be downloaded from Zenodo. However, we provide a script to automate this process via the download_data.py script. After downloading, you can optionally clean and resample the audio for faster training without having to do it on-the-fly.
An HDF5 dataset builder is also included to convert the raw data into a format that allows for much faster loading during training, suitable for HPC environments.
Configure your $DATA_PATH in config/transduction_model.json, then run:
python download_data.pySpeeds up training by saving resampled audio files:
python data_collection/clean_audio.pyRequired for performant training. This builds the dataset once rather than on-the-fly:
python build_hdf5.pyThis model synthesizes audio features (MFCCs) from EMG signals, which are then converted to wav via a vocoder.
python transduction_model.pyConfiguration (hyperparameters, paths, WandB) is managed via config/transduction_model.json.
Note: If the HiFi-GAN vocoder checkpoint is not found locally, the script will automatically download and extract it from Zenodo to ensure a seamless setup.
To evaluate a saved model checkpoint and generate audio samples:
python transduction_model.py --evaluate_saved "./output/model_best.pt" --output_dir "./eval_results"Directly convert silent speech to text using a CTC decoder.
For recognition, we use SpeechBrain's built-in CTC decoder with a KenLM n-gram language model for improved performance. The get_lexicon.py script generates the necessary lexicon from the training data.
Download the pre-trained KenLM language model and generate the custom lexicon from your dataset in one step:
python get_lexicon.pyThis script downloads the Librispeech 4-gram model into the KenLM/ directory and creates gaddy_lexicon.txt containing all unique words found in the local HDF5 datasets.
python recognition_model.py # Train
python recognition_model.py --evaluate_saved "path/to/model.pt" # EvaluateConfiguration (hyperparameters, paths, WandB) is managed via config/recognition_model.json.
This project uses MkDocs with the Material theme and mkdocstrings for API documentation.
To build and view the documentation locally:
# Serve the documentation
mkdocs serveThe documentation includes a Quick Start guide, detailed project sections, and automatically generated API references for all core modules.
- EMG Data: Zenodo (4064408)
- Transduction Models: Zenodo (6747411)
- Recognition Models: Zenodo (7183877)