Voice Prompt Cleanup

A GUI application for preprocessing audio files for speech-to-text (STT) workflows. Optimizes audio before sending to multimodal APIs like Google Gemini or ASR models.

Features

PyQt6 GUI with drag-and-drop support
Batch processing - process single files, multiple files, or entire folders
Persistent settings - remembers your output folder preference
Safe processing - outputs as <filename>_processed.mp3 to never overwrite originals

Audio Processing Pipeline

Convert to Mono - STT models don't need stereo
Downsample to 16kHz - matches what most APIs use internally
Speech EQ - 80Hz-8kHz bandpass filter for voice clarity
Gentle Compression - evens out speech dynamics
Truncate Silences - removes long pauses
Normalize Audio - consistent levels without clipping
Export as MP3 - compressed format suitable for API upload

Installation

From Debian Package (Recommended)

# Clone the repository
git clone https://github.com/danielrosehill/Voice-Prompt-Cleanup-Script.git
cd Voice-Prompt-Cleanup-Script

# Build the package
./build-deb.sh

# Install
sudo apt install ./build/voice-prompt-cleanup_1.0.0-1.deb

Dependencies

If installing manually, you need:

Python 3
PyQt6 (pip install PyQt6 or sudo apt install python3-pyqt6)
ffmpeg (sudo apt install ffmpeg)

Running Without Installation

# Install dependencies
sudo apt install python3-pyqt6 ffmpeg

# Run directly
./voice_prompt_cleanup_gui.py

Usage

GUI Application

Launch from your application menu as "Voice Prompt Cleanup" or run:

voice-prompt-cleanup

Add files by dragging them onto the window, or use "Add Files..." / "Add Folder..."
Set output folder (optional) - enable custom output folder to save all processed files to one location
Click Process Files

Command Line (Script Only)

./process_audio.sh input.mp3 [output.mp3]

Updating

To update to the latest version:

cd Voice-Prompt-Cleanup-Script
./update-package.sh

This will pull the latest changes, rebuild, and reinstall the package.

Supported Formats

Input: MP3, WAV, FLAC, OGG, M4A, AAC, WMA, OPUS, WEBM, MP4, MKV, AVI, MOV

Output: MP3 (64kbps, 16kHz mono)

Configuration

Settings are stored in ~/.config/voice-prompt-cleanup/settings.json:

Output folder path
Whether to use custom output folder
Last used input folder

Target Use Case

Primary target: Google Gemini Audio Understanding and similar multimodal APIs

Accepts: MP3, WAV, FLAC, OGG, etc.
Typically downsamples to 16kHz internally
Often has file size limits (e.g., 20MB for Gemini)

The preprocessing optimizes for these constraints while maintaining speech quality.

License

MIT License - see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
debian		debian
note		note
test		test
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
build-deb.sh		build-deb.sh
process_audio.sh		process_audio.sh
update-package.sh		update-package.sh
voice_prompt_cleanup_gui.py		voice_prompt_cleanup_gui.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Voice Prompt Cleanup

Features

Audio Processing Pipeline

Installation

From Debian Package (Recommended)

Dependencies

Running Without Installation

Usage

GUI Application

Command Line (Script Only)

Updating

Supported Formats

Configuration

Target Use Case

License

About

Uh oh!

Releases

Packages

Languages

danielrosehill/Voice-Prompt-Cleanup-Script

Folders and files

Latest commit

History

Repository files navigation

Voice Prompt Cleanup

Features

Audio Processing Pipeline

Installation

From Debian Package (Recommended)

Dependencies

Running Without Installation

Usage

GUI Application

Command Line (Script Only)

Updating

Supported Formats

Configuration

Target Use Case

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages