Skip to content

yerdaulet-damir/transcribe-whisper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸŽ™οΈ Transcribe AI - Audio Transcription Made Simple

Turn any audio into text with AI-powered transcription

Python Flask Whisper License

Beautiful web app for transcribing audio files and YouTube videos using OpenAI Whisper

Features β€’ Quick Start β€’ Usage β€’ Tech Stack


πŸš€ About This Project

Built in ~30 minutes as a weekend project to solve a real problem: quickly transcribing audio content without dealing with expensive APIs or complex setups.

This is a fully functional web application that:

  • Transcribes audio files in multiple formats (MP3, WAV, M4A, FLAC, etc.)
  • Downloads and transcribes YouTube videos directly from URLs
  • Supports 100+ languages with automatic detection
  • Runs entirely on your machine (no API keys needed!)
  • Features a clean, modern UI inspired by YC startups

Why this exists? Sometimes you just need to transcribe something quickly without signing up for services or dealing with API limits. This tool does exactly that - simple, fast, and it just works.


✨ Features

πŸŽ₯ YouTube Integration

Paste any YouTube URL and get a full transcription. The app automatically downloads the audio, processes it, and returns clean text.

πŸ“ File Upload

Drag & drop or select audio files. Supports all major formats - MP3, WAV, M4A, FLAC, OGG, and more.

🌍 Multi-language Support

Works with 100+ languages including English, Russian, Kazakh, German, French, Spanish, Japanese, Chinese, and many more. Auto-detects language or specify manually for better accuracy.

⚑ GPU Acceleration

Automatically uses GPU if available for faster transcription. Falls back to CPU seamlessly.

πŸ“ Long-form Audio

Handles audio of any length - from 10-second clips to hour-long podcasts. Uses efficient chunking for optimal performance.

🎨 Beautiful UI

Modern, responsive interface with smooth animations and intuitive UX. No clutter, just what you need.


πŸƒ Quick Start

Prerequisites

  • Python 3.8+
  • FFmpeg (for audio processing)

Install FFmpeg:

# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt-get install ffmpeg

# Windows
# Download from https://ffmpeg.org/download.html

Installation

  1. Clone the repository:
git clone https://github.com/yerdaulet-damir/transcribe-whisper.git
cd transcribe-whisper
  1. Install dependencies:
pip install -r requirements.txt
  1. Run the app:
python app.py
  1. Open in browser:
http://localhost:8080

That's it! πŸŽ‰ The model will download automatically on first use (~900MB for whisper-small).


πŸ“– Usage

Web Application

Transcribe YouTube Video

  1. Click the "YouTube URL" tab
  2. Paste a YouTube video URL
  3. Click "Transcribe"
  4. Wait for processing (download + transcription)
  5. Copy or download the result

Transcribe Audio File

  1. Click the "Upload File" tab
  2. Drag & drop or select an audio file
  3. Click "Transcribe"
  4. Get your transcription instantly

πŸ’‘ Pro Tip: For YouTube videos, you may need to set up cookies for better reliability. See COOKIES_SETUP.md for details.

Command Line Interface

# Transcribe a file
python transcribe.py --file audio.mp3

# Specify language
python transcribe.py --file audio.mp3 --language en

# Record from microphone (10 seconds)
python transcribe.py --mic --duration 10

πŸ› οΈ Tech Stack

Backend

  • Flask - Lightweight web framework
  • OpenAI Whisper Small - State-of-the-art speech recognition
  • Hugging Face Transformers - Model loading and inference
  • yt-dlp - YouTube video downloading
  • librosa - Audio processing

Frontend

  • Vanilla JavaScript - No frameworks, just works
  • Modern CSS - Clean, responsive design
  • HTML5 - Semantic markup

Why These Choices?

  • Whisper Small: Fast, accurate, and runs on CPU. Perfect balance of speed and quality.
  • Flask: Simple, no over-engineering. Gets the job done.
  • No frontend framework: Zero build step, instant loading, easy to modify.

πŸ“ Project Structure

transcriber/
β”œβ”€β”€ app.py                 # Flask web server & API
β”œβ”€β”€ transcribe.py          # Core transcription logic
β”œβ”€β”€ requirements.txt       # Python dependencies
β”œβ”€β”€ templates/
β”‚   └── index.html        # Web interface
β”œβ”€β”€ static/
β”‚   β”œβ”€β”€ css/
β”‚   β”‚   └── style.css     # Styling
β”‚   └── js/
β”‚       └── main.js       # Frontend logic
β”œβ”€β”€ COOKIES_SETUP.md      # YouTube cookies guide
β”œβ”€β”€ QUICKSTART.md         # Quick setup guide
└── README.md             # This file

βš™οΈ Configuration

Change Port

Default port is 8080 (to avoid conflicts with macOS AirPlay). Change it:

PORT=5001 python app.py

Model Selection

Currently using whisper-small for speed. To use a larger model, edit transcribe.py:

# Line 49 in transcribe.py
self.pipe = pipeline("automatic-speech-recognition", 
                     model="openai/whisper-large-v3",  # or whisper-medium, whisper-large-v3
                     device=0 if device == "cuda" else -1)

Model sizes:

  • whisper-small: ~900MB - Fast, good quality
  • whisper-medium: ~1.5GB - Better accuracy
  • whisper-large-v3: ~3GB - Best quality, slower

πŸ› Troubleshooting

YouTube 403 Forbidden Error

Most common fix:

pip install --upgrade yt-dlp

If still failing:

  1. Check if video is accessible (not private/region-locked)
  2. Set up cookies (see COOKIES_SETUP.md)
  3. Try again later (YouTube may temporarily block requests)

Model Not Loading

  • Ensure sufficient disk space (~1GB for whisper-small)
  • Check internet connection (model downloads on first use)
  • First load takes 2-5 minutes

Audio Format Not Supported

Make sure FFmpeg is installed correctly:

ffmpeg -version

CUDA Out of Memory

The app automatically falls back to CPU. If you want to force CPU:

transcriber = WhisperTranscriber(device="cpu")

🚧 Known Limitations

  • File size limit: 500MB max upload
  • Processing time: Long audio files take time (roughly 1/10th of audio duration on CPU)
  • YouTube: Some videos may require authentication (cookies)
  • First run: Model download takes a few minutes

🎯 What's Next?

Potential improvements (PRs welcome!):

  • Real-time transcription from microphone
  • Batch file processing
  • Export to SRT/VTT subtitles
  • Speaker diarization (who said what)
  • Docker containerization
  • API endpoint for integrations

πŸ“ License

This project uses the OpenAI Whisper model, which is open source under the MIT License.


πŸ™ Acknowledgments

  • OpenAI for the amazing Whisper model
  • Hugging Face for making it easy to use
  • yt-dlp developers for YouTube support

Made with ❀️ in ~30 minutes

Simple tools that just work

⭐ Star this repo if you find it useful!

About

Beautiful web application for transcribing audio files and YouTube videos using the Whisper Small model.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors