OpenTranscribe

A React app for transcribing MP3 audio files using multiple open-source STT models via a Python backend API.

Features

Drag-and-drop MP3 upload
5 STT Models - Choose from Faster Whisper, OpenAI Whisper, Vosk, whisper.cpp, and Wav2Vec2
Model Comparison - Compare transcription results across models with diff highlighting
Standalone Mode - API server starts automatically with CLI
Multi-language support: English, Spanish, and auto-detect
Verbatim text output with timestamps
Fast API backend with FastAPI

Supported Models

Model	Speed	Accuracy	GPU	Best For
Faster Whisper	⚡ Fast	★★★★★	Optional	Default choice, best balance
OpenAI Whisper	Medium	★★★★★	Optional	Original reference
Vosk	⚡⚡ Fastest	★★★☆☆	No	Real-time, embedded
whisper.cpp	Slow	★★★★★	No	CPU-only systems
Wav2Vec2	Medium	★★★☆☆	Optional	Research, fine-tuning

Prerequisites

Node.js (v18+)
Python (v3.10+)
pip (Python package manager)
NVIDIA GPU (optional, for faster transcription)

Setup

One-command install — clone, configure, and run in a single step:

curl -fsSL https://raw.githubusercontent.com/jasperan/opentranscribe/main/install.sh | bash

Advanced options

Override install location:

PROJECT_DIR=/opt/myapp curl -fsSL https://raw.githubusercontent.com/jasperan/opentranscribe/main/install.sh | bash

Or install manually:

git clone https://github.com/jasperan/opentranscribe.git
cd opentranscribe
# See below for setup instructions

1. Install Frontend Dependencies

npm install

2. Install Python Backend Dependencies

# Install all dependencies (includes all STT models)
pip install -r requirements.txt

3. Run the Application

Standalone CLI (Recommended):

cd backend
python cli.py

The CLI automatically starts the API server in the background. No separate terminal needed!

    ╔════════════════════════════════════════════════════════════════╗
    ║                 OPENTRANSCRIBE CLI                             ║
    ║           Multi-Model Audio Transcription Tool                 ║
    ║                                                                ║
    ║   API Server: ● Running                                        ║
    ╚════════════════════════════════════════════════════════════════╝

Select a Task:
 [1]  Transcribe Audio File
 [2]  Compare All Models
 [3]  List Available Models
 ────
 [4]  Manage API Server
 ────
 [0]  Exit

Frontend (optional):

npm run dev

The app will open at http://localhost:5173

CLI Usage

The interactive CLI provides:

Transcribe Audio File - Select any model and transcribe
Compare All Models - Run multiple models and see differences
List Available Models - See which models are installed
Manage API Server - Start/stop/restart the server

Model Comparison

The comparison feature runs your audio through multiple models and shows word-level differences:

═══════════════════════════════════════════════════════════════════════
                      MODEL COMPARISON RESULTS
═══════════════════════════════════════════════════════════════════════

┌──────────────────┬─────────┬─────────┐
│ Model            │ Time    │ Match   │
├──────────────────┼─────────┼─────────┤
│ ● Faster Whisper │ 8.74s   │ baseline│
│ ● OpenAI Whisper │ 8.58s   │ 88%     │
│ ● Vosk           │ 26.39s  │ 75%     │
└──────────────────┴─────────┴─────────┘

DIFFERENCES FOUND (130 word positions differ):
────────────────────────────────────────────────────────────────────────

Word #33: "None"
  ├─ Baseline:       None
  ├─ OpenAI Whisper: the
  └─ Vosk:           the         ✓

API Endpoints

The API server runs automatically at http://127.0.0.1:8000

GET / - API status
GET /health - Health check
GET /models - List available STT models
POST /transcribe - Transcribe audio file
- file: Audio file (multipart/form-data)
- model: Model ID (optional, default: faster-whisper)
- language: Language code (optional, default: auto)
POST /compare - Compare multiple models
- file: Audio file
- models: Comma-separated model IDs (optional, default: all)

Build for Production

Frontend

npm run build

Deploy the dist/ folder to any static host.

Backend

The backend can be deployed to any Python hosting service:

Heroku: Add Procfile with web: uvicorn backend.main:app --host 0.0.0.0 --port $PORT
Railway/Render: Configure to run python backend/main.py
Docker: Create a Dockerfile for containerized deployment

Note: Update VITE_API_URL environment variable in the frontend to point to your deployed backend URL.

Supported Formats & Languages

Audio Formats: MP3, WAV, and other audio formats supported by Whisper
Languages:
- English (en)
- Spanish (es)
- Auto-detect (automatically detects the language)

Model Information

All models are free and open-source - no API keys required:

Faster Whisper: CTranslate2 reimplementation, 4x faster
OpenAI Whisper: Original model (~150MB for base)
Vosk: Lightweight Kaldi-based (~50MB models)
whisper.cpp: C++ implementation for CPU
Wav2Vec2: HuggingFace Transformers

Models are cached after first download in ~/.cache/ directories.

🎨 Frontend Design

UI Screenshots

OpenTranscribe features a Sonic Precision design system with warm gold on deep ink, creating an audio-grade luxury aesthetic.

Landing Page

Marketing page showcasing features, pricing, and testimonials

Transcription Interface

Main transcription UI with audio player and text editor

Model Comparison

Compare outputs from all 5 STT models side-by-side

Export Options

Export transcriptions to multiple formats (TXT, SRT, VTT, DOCX, JSON)

Design System

Component	Description
Color Palette	Warm gold (#F59E0B) on deep ink background, ivory light mode
Typography	Inter font family with careful attention to readability
Layout	Clean, focused layouts with generous whitespace
Animations	Subtle waveform animations, smooth transitions
Glass Effects	Backdrop blur panels for depth and hierarchy

Key UI Components

Upload Zone - Drag-and-drop area with visual feedback
Audio Player - Custom waveform visualization with playback controls
Model Selector - Radio buttons for choosing STT engines
Transcript Editor - Monaco-style text editor with line numbers
Diff Viewer - Highlighted differences between model outputs
Export Menu - Format selection with preview

Note: Screenshots are stored in assets/screenshots/. Run the application and use your browser's dev tools to capture updated screenshots as needed.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
app		app
backend		backend
components		components
e2e		e2e
lib		lib
public		public
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build		build
config.example.yaml		config.example.yaml
https-server.js		https-server.js
install.sh		install.sh
next-env.d.ts		next-env.d.ts
next.config.js		next.config.js
package-lock.json		package-lock.json
package.json		package.json
playwright.config.ts		playwright.config.ts
postcss.config.js		postcss.config.js
requirements.txt		requirements.txt
start-backend.bat		start-backend.bat
start-backend.sh		start-backend.sh
start.sh		start.sh
tailwind.config.js		tailwind.config.js
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenTranscribe

Features

Supported Models

Prerequisites

Setup

1. Install Frontend Dependencies

2. Install Python Backend Dependencies

3. Run the Application

CLI Usage

Model Comparison

API Endpoints

Build for Production

Frontend

Backend

Supported Formats & Languages

Model Information

🎨 Frontend Design

UI Screenshots

Landing Page

Transcription Interface

Model Comparison

Export Options

Design System

Key UI Components

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OpenTranscribe

Features

Supported Models

Prerequisites

Setup

1. Install Frontend Dependencies

2. Install Python Backend Dependencies

3. Run the Application

CLI Usage

Model Comparison

API Endpoints

Build for Production

Frontend

Backend

Supported Formats & Languages

Model Information

🎨 Frontend Design

UI Screenshots

Landing Page

Transcription Interface

Model Comparison

Export Options

Design System

Key UI Components

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages