🎙️ Transcribe AI - Audio Transcription Made Simple

Turn any audio into text with AI-powered transcription

Beautiful web app for transcribing audio files and YouTube videos using OpenAI Whisper

Features • Quick Start • Usage • Tech Stack

🚀 About This Project

Built in ~30 minutes as a weekend project to solve a real problem: quickly transcribing audio content without dealing with expensive APIs or complex setups.

This is a fully functional web application that:

Transcribes audio files in multiple formats (MP3, WAV, M4A, FLAC, etc.)
Downloads and transcribes YouTube videos directly from URLs
Supports 100+ languages with automatic detection
Runs entirely on your machine (no API keys needed!)
Features a clean, modern UI inspired by YC startups

Why this exists? Sometimes you just need to transcribe something quickly without signing up for services or dealing with API limits. This tool does exactly that - simple, fast, and it just works.

✨ Features

🎥 YouTube Integration

Paste any YouTube URL and get a full transcription. The app automatically downloads the audio, processes it, and returns clean text.

📁 File Upload

Drag & drop or select audio files. Supports all major formats - MP3, WAV, M4A, FLAC, OGG, and more.

🌍 Multi-language Support

Works with 100+ languages including English, Russian, Kazakh, German, French, Spanish, Japanese, Chinese, and many more. Auto-detects language or specify manually for better accuracy.

⚡ GPU Acceleration

Automatically uses GPU if available for faster transcription. Falls back to CPU seamlessly.

📝 Long-form Audio

Handles audio of any length - from 10-second clips to hour-long podcasts. Uses efficient chunking for optimal performance.

🎨 Beautiful UI

Modern, responsive interface with smooth animations and intuitive UX. No clutter, just what you need.

🏃 Quick Start

Prerequisites

Python 3.8+
FFmpeg (for audio processing)

Install FFmpeg:

# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt-get install ffmpeg

# Windows
# Download from https://ffmpeg.org/download.html

Installation

Clone the repository:

git clone https://github.com/yerdaulet-damir/transcribe-whisper.git
cd transcribe-whisper

Install dependencies:

pip install -r requirements.txt

Run the app:

python app.py

Open in browser:

http://localhost:8080

That's it! 🎉 The model will download automatically on first use (~900MB for whisper-small).

📖 Usage

Web Application

Transcribe YouTube Video

Click the "YouTube URL" tab
Paste a YouTube video URL
Click "Transcribe"
Wait for processing (download + transcription)
Copy or download the result

Transcribe Audio File

Click the "Upload File" tab
Drag & drop or select an audio file
Click "Transcribe"
Get your transcription instantly

💡 Pro Tip: For YouTube videos, you may need to set up cookies for better reliability. See COOKIES_SETUP.md for details.

Command Line Interface

# Transcribe a file
python transcribe.py --file audio.mp3

# Specify language
python transcribe.py --file audio.mp3 --language en

# Record from microphone (10 seconds)
python transcribe.py --mic --duration 10

🛠️ Tech Stack

Backend

Flask - Lightweight web framework
OpenAI Whisper Small - State-of-the-art speech recognition
Hugging Face Transformers - Model loading and inference
yt-dlp - YouTube video downloading
librosa - Audio processing

Frontend

Vanilla JavaScript - No frameworks, just works
Modern CSS - Clean, responsive design
HTML5 - Semantic markup

Why These Choices?

Whisper Small: Fast, accurate, and runs on CPU. Perfect balance of speed and quality.
Flask: Simple, no over-engineering. Gets the job done.
No frontend framework: Zero build step, instant loading, easy to modify.

📁 Project Structure

transcriber/
├── app.py                 # Flask web server & API
├── transcribe.py          # Core transcription logic
├── requirements.txt       # Python dependencies
├── templates/
│   └── index.html        # Web interface
├── static/
│   ├── css/
│   │   └── style.css     # Styling
│   └── js/
│       └── main.js       # Frontend logic
├── COOKIES_SETUP.md      # YouTube cookies guide
├── QUICKSTART.md         # Quick setup guide
└── README.md             # This file

⚙️ Configuration

Change Port

Default port is 8080 (to avoid conflicts with macOS AirPlay). Change it:

PORT=5001 python app.py

Model Selection

Currently using whisper-small for speed. To use a larger model, edit transcribe.py:

# Line 49 in transcribe.py
self.pipe = pipeline("automatic-speech-recognition", 
                     model="openai/whisper-large-v3",  # or whisper-medium, whisper-large-v3
                     device=0 if device == "cuda" else -1)

Model sizes:

whisper-small: ~900MB - Fast, good quality
whisper-medium: ~1.5GB - Better accuracy
whisper-large-v3: ~3GB - Best quality, slower

🐛 Troubleshooting

YouTube 403 Forbidden Error

Most common fix:

pip install --upgrade yt-dlp

If still failing:

Check if video is accessible (not private/region-locked)
Set up cookies (see COOKIES_SETUP.md)
Try again later (YouTube may temporarily block requests)

Model Not Loading

Ensure sufficient disk space (~1GB for whisper-small)
Check internet connection (model downloads on first use)
First load takes 2-5 minutes

Audio Format Not Supported

Make sure FFmpeg is installed correctly:

ffmpeg -version

CUDA Out of Memory

The app automatically falls back to CPU. If you want to force CPU:

transcriber = WhisperTranscriber(device="cpu")

🚧 Known Limitations

File size limit: 500MB max upload
Processing time: Long audio files take time (roughly 1/10th of audio duration on CPU)
YouTube: Some videos may require authentication (cookies)
First run: Model download takes a few minutes

🎯 What's Next?

Potential improvements (PRs welcome!):

Real-time transcription from microphone
Batch file processing
Export to SRT/VTT subtitles
Speaker diarization (who said what)
Docker containerization
API endpoint for integrations

📝 License

This project uses the OpenAI Whisper model, which is open source under the MIT License.

🙏 Acknowledgments

OpenAI for the amazing Whisper model
Hugging Face for making it easy to use
yt-dlp developers for YouTube support

Made with ❤️ in ~30 minutes

Simple tools that just work

⭐ Star this repo if you find it useful!

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
static		static
templates		templates
.gitignore		.gitignore
COOKIES_SETUP.md		COOKIES_SETUP.md
LICENSE		LICENSE
QUICKSTART.md		QUICKSTART.md
README.md		README.md
TROUBLESHOOTING.md		TROUBLESHOOTING.md
app.py		app.py
clear_cache.py		clear_cache.py
example.py		example.py
transcribe.py		transcribe.py

Folders and files

Latest commit

History

Repository files navigation

🎙️ Transcribe AI - Audio Transcription Made Simple

🚀 About This Project

✨ Features

🎥 YouTube Integration

📁 File Upload

🌍 Multi-language Support

⚡ GPU Acceleration

📝 Long-form Audio

🎨 Beautiful UI

🏃 Quick Start

Prerequisites

Installation

📖 Usage

Web Application

Transcribe YouTube Video

Transcribe Audio File

Command Line Interface

🛠️ Tech Stack

Backend

Frontend

Why These Choices?

📁 Project Structure

⚙️ Configuration

Change Port

Model Selection

🐛 Troubleshooting

YouTube 403 Forbidden Error

Model Not Loading

Audio Format Not Supported

CUDA Out of Memory

🚧 Known Limitations

🎯 What's Next?

📝 License

🙏 Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages