A desktop GUI application for creating voice training datasets with AI-powered synthetic text generation. Supports voice cloning, TTS model training, and STT fine-tuning workflows.
Each sample is saved as a numbered folder (starting from 001) with:
- Recording (as .wav)
- Text as .txt, incorporating user edits to scrub AI messages from source of truth
- JSON for metadata
- High-quality WAV recording (44.1kHz/48kHz, 16-bit)
- Preferred microphone selection (auto-selected on recording)
- Microphone selection and testing
- Real-time duration tracking
- Pause/resume functionality during recording
- Retake button: One-click to discard and restart recording
- Delete recording option
- Synthetic text generation using OpenAI GPT models
- Multiple style options:
- General Purpose
- Colloquial
- Voice Note
- Technical
- Prose
- Custom vocabulary dictionary support
- Configurable duration and speaking rate (WPM)
- Post-generation text editing
- Regenerate option (keep audio, new text)
- New Sample button: Clear state and start completely fresh
- Narration view with adjustable font size
- Organized directory structure (numbered folders)
- Automatic sample numbering
- Metadata tracking (generation parameters, timestamps)
- Session statistics
- Total duration estimation
- Personalized goal tracking: Set target duration and track progress
- Quick access to dataset folder
- Dedicated Settings Tab for easy access
- Secure API key storage (system keyring)
- API connection testing
- Preferred microphone configuration
- Configurable audio quality settings (sample rate)
- Multiple OpenAI model support (gpt-4o-mini, gpt-4o, gpt-4.1-mini)
- Persistent base path configuration
- Training goal duration (personalizes progress tracking)
- Auto-generate next sample preference
- Three-tab interface (Record & Generate, Dataset, Settings)
- Light/dark theme toggle
- Helpful tooltips and descriptions
- Real-time statistics updates
- Clear workflow with New Sample and Retake buttons
Standard installation method for Ubuntu/Debian systems:
-
Download or build the package:
# Option A: Build from source git clone https://github.com/danielrosehill/Voice-Training-Data-Creator.git cd Voice-Training-Data-Creator ./build-deb.sh # Option B: Download release (when available) # Download voice-training-data-creator_1.0.0_all.deb from releases
-
Install the package:
sudo dpkg -i voice-training-data-creator_1.0.0_all.deb sudo apt-get install -f # Install dependencies -
Validate installation (optional):
./validate-package.sh
-
Launch the application:
- From application menu: Search for "Voice Training Data Creator"
- From terminal:
voice-training-data-creator
See INSTALL.md for detailed installation instructions and troubleshooting.
For development or if you prefer not to install system-wide:
- Python 3.10 or higher
- Ubuntu Desktop (tested on Ubuntu 25.04 with KDE)
- System packages:
libportaudio2 - OpenAI API key (for text generation)
- Clone the repository:
git clone https://github.com/yourusername/Voice-Training-Data-Creator.git
cd Voice-Training-Data-Creator- The application uses
uvfor virtual environment management. If you don't have it installed:
pip install uv- Create and activate virtual environment, then install dependencies:
uv venv .venv
source .venv/bin/activate
uv pip install -r requirements.txtUse the provided launcher script:
./run.shOr manually:
source .venv/bin/activate
python src/main.py- Open Settings Tab: Click on the "Settings" tab
- Configure Base Path: Click "Browse" to set a base directory for storing samples
- Add API Key: Enter your OpenAI API key in the API Configuration section
- Test Connection: Click "Test Connection" to verify your API key works
- Set Preferred Microphone (optional): Choose your preferred microphone device
- Set Training Goal (optional): Enter target duration in minutes (e.g., 60 for 1 hour)
- Click "Save Settings"
-
Generate Text (or use existing):
- Set target duration (minutes)
- Choose words per minute (WPM)
- Select a style
- Optionally add custom vocabulary
- Click "Generate Text"
- Edit if needed, or click "Regenerate" for a new version
-
Record Audio:
- Your preferred microphone is automatically selected
- Click "Record" to start
- Use "Pause/Resume" if you need a break
- Click "Retake" to discard and restart immediately
- Click "Stop" when finished
-
Review and Save:
- Edit the text if needed
- Click "View Text" for a large narration view
- Click "Save Sample" to store your recording
-
Start Next Sample:
- Click "New Sample" to clear everything and start fresh
- Or enable "Auto-generate next sample" in Settings for automatic workflow
- New Sample Button: Clears text and audio, optionally auto-generates new text
- Retake Button: While recording, instantly discard and restart (no stop → delete needed)
- Regenerate Button: Keep your audio but generate new text with same parameters
- Auto-generate: Enable in Settings to automatically generate text after saving
- Narration View: Use "View Text" for a large, readable display while recording
Samples are organized as follows:
{base_path}/
└── samples/
├── 001/
│ ├── 1.wav # Audio recording
│ ├── 1.txt # Source text
│ └── 1_metadata.json # Generation parameters
├── 002/
│ ├── 2.wav
│ ├── 2.txt
│ └── 2_metadata.json
└── ...
Configuration is stored in ~/.config/VoiceTrainingDataCreator/config.json
API keys are stored securely in the system keyring.
- Ensure your microphone is connected
- Check system audio settings
- Verify permissions for microphone access
- Verify your API key is correct
- Check internet connectivity
- Ensure you have sufficient OpenAI credits
- Reduce microphone input volume in system settings
- Move further from the microphone
- Adjust microphone gain if available
- Re-record the sample
- Check microphone is not muted
- Verify microphone is selected correctly
Voice-Training-Data-Creator/
├── src/
│ ├── main.py # Application entry point
│ ├── audio/ # Audio recording modules
│ │ ├── recorder.py
│ │ └── device_manager.py
│ ├── llm/ # LLM integration
│ │ ├── generator.py
│ │ └── prompt_builder.py
│ ├── storage/ # Configuration and data storage
│ │ ├── config.py
│ │ └── sample_manager.py
│ ├── main.py # Flet UI (single-file architecture)
│ └── utils/ # Validation utilities
│ └── validators.py
├── requirements.txt
├── run.sh # Launcher script
└── README.md
- Flet: Modern GUI framework (Flutter-based)
- sounddevice: Audio recording
- soundfile: WAV file handling
- numpy: Audio processing
- openai: LLM text generation
- keyring: Secure credential storage
This application uses Flet (Flutter-based) for the GUI, providing:
- Flutter-based rendering with predictable layouts
- Declarative UI architecture
- Better cross-platform support
- Material Design by default
- Three-tab interface for organized workflow
The app was migrated from PyQt6 to Flet to resolve persistent layout issues and provide a more modern user experience.
MIT



