Turn any audio into text with AI-powered transcription
Beautiful web app for transcribing audio files and YouTube videos using OpenAI Whisper
Features β’ Quick Start β’ Usage β’ Tech Stack
Built in ~30 minutes as a weekend project to solve a real problem: quickly transcribing audio content without dealing with expensive APIs or complex setups.
This is a fully functional web application that:
- Transcribes audio files in multiple formats (MP3, WAV, M4A, FLAC, etc.)
- Downloads and transcribes YouTube videos directly from URLs
- Supports 100+ languages with automatic detection
- Runs entirely on your machine (no API keys needed!)
- Features a clean, modern UI inspired by YC startups
Why this exists? Sometimes you just need to transcribe something quickly without signing up for services or dealing with API limits. This tool does exactly that - simple, fast, and it just works.
Paste any YouTube URL and get a full transcription. The app automatically downloads the audio, processes it, and returns clean text.
Drag & drop or select audio files. Supports all major formats - MP3, WAV, M4A, FLAC, OGG, and more.
Works with 100+ languages including English, Russian, Kazakh, German, French, Spanish, Japanese, Chinese, and many more. Auto-detects language or specify manually for better accuracy.
Automatically uses GPU if available for faster transcription. Falls back to CPU seamlessly.
Handles audio of any length - from 10-second clips to hour-long podcasts. Uses efficient chunking for optimal performance.
Modern, responsive interface with smooth animations and intuitive UX. No clutter, just what you need.
- Python 3.8+
- FFmpeg (for audio processing)
Install FFmpeg:
# macOS
brew install ffmpeg
# Ubuntu/Debian
sudo apt-get install ffmpeg
# Windows
# Download from https://ffmpeg.org/download.html- Clone the repository:
git clone https://github.com/yerdaulet-damir/transcribe-whisper.git
cd transcribe-whisper- Install dependencies:
pip install -r requirements.txt- Run the app:
python app.py- Open in browser:
http://localhost:8080
That's it! π The model will download automatically on first use (~900MB for whisper-small).
- Click the "YouTube URL" tab
- Paste a YouTube video URL
- Click "Transcribe"
- Wait for processing (download + transcription)
- Copy or download the result
- Click the "Upload File" tab
- Drag & drop or select an audio file
- Click "Transcribe"
- Get your transcription instantly
π‘ Pro Tip: For YouTube videos, you may need to set up cookies for better reliability. See COOKIES_SETUP.md for details.
# Transcribe a file
python transcribe.py --file audio.mp3
# Specify language
python transcribe.py --file audio.mp3 --language en
# Record from microphone (10 seconds)
python transcribe.py --mic --duration 10- Flask - Lightweight web framework
- OpenAI Whisper Small - State-of-the-art speech recognition
- Hugging Face Transformers - Model loading and inference
- yt-dlp - YouTube video downloading
- librosa - Audio processing
- Vanilla JavaScript - No frameworks, just works
- Modern CSS - Clean, responsive design
- HTML5 - Semantic markup
- Whisper Small: Fast, accurate, and runs on CPU. Perfect balance of speed and quality.
- Flask: Simple, no over-engineering. Gets the job done.
- No frontend framework: Zero build step, instant loading, easy to modify.
transcriber/
βββ app.py # Flask web server & API
βββ transcribe.py # Core transcription logic
βββ requirements.txt # Python dependencies
βββ templates/
β βββ index.html # Web interface
βββ static/
β βββ css/
β β βββ style.css # Styling
β βββ js/
β βββ main.js # Frontend logic
βββ COOKIES_SETUP.md # YouTube cookies guide
βββ QUICKSTART.md # Quick setup guide
βββ README.md # This file
Default port is 8080 (to avoid conflicts with macOS AirPlay). Change it:
PORT=5001 python app.pyCurrently using whisper-small for speed. To use a larger model, edit transcribe.py:
# Line 49 in transcribe.py
self.pipe = pipeline("automatic-speech-recognition",
model="openai/whisper-large-v3", # or whisper-medium, whisper-large-v3
device=0 if device == "cuda" else -1)Model sizes:
whisper-small: ~900MB - Fast, good qualitywhisper-medium: ~1.5GB - Better accuracywhisper-large-v3: ~3GB - Best quality, slower
Most common fix:
pip install --upgrade yt-dlpIf still failing:
- Check if video is accessible (not private/region-locked)
- Set up cookies (see COOKIES_SETUP.md)
- Try again later (YouTube may temporarily block requests)
- Ensure sufficient disk space (~1GB for whisper-small)
- Check internet connection (model downloads on first use)
- First load takes 2-5 minutes
Make sure FFmpeg is installed correctly:
ffmpeg -versionThe app automatically falls back to CPU. If you want to force CPU:
transcriber = WhisperTranscriber(device="cpu")- File size limit: 500MB max upload
- Processing time: Long audio files take time (roughly 1/10th of audio duration on CPU)
- YouTube: Some videos may require authentication (cookies)
- First run: Model download takes a few minutes
Potential improvements (PRs welcome!):
- Real-time transcription from microphone
- Batch file processing
- Export to SRT/VTT subtitles
- Speaker diarization (who said what)
- Docker containerization
- API endpoint for integrations
This project uses the OpenAI Whisper model, which is open source under the MIT License.
- OpenAI for the amazing Whisper model
- Hugging Face for making it easy to use
- yt-dlp developers for YouTube support
Made with β€οΈ in ~30 minutes
Simple tools that just work
β Star this repo if you find it useful!