Automated sermon content workflow system that converts video/audio sermons into high-quality Simplified Chinese subtitles using Google Cloud Speech-to-Text v2 API.
- Multi-source ingestion: YouTube URLs and local audio/video files
- Google Cloud STT v2 API: High-accuracy Simplified Chinese transcription with batch processing
- Intelligent audio chunking: Automatic splitting of large files for optimal STT performance
- Phrase management system: Domain-specific religious terms for improved accuracy
- Subtitle generation: SRT and WebVTT formats with proper line wrapping
- REST API: Comprehensive RESTful endpoints for job and phrase management
- Batch processing: CLI tool for processing multiple files with concurrent job support
- Storage options: Local filesystem or Google Cloud Storage with automatic cleanup
- Docker support: Containerized deployment with health checks
- Cost monitoring: Real-time STT cost estimation and limits
- Comprehensive testing: Validation tools and diagnostic scripts
- Production-ready: Structured logging, monitoring, and error handling
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ YouTube URL │ │ Local Files │ │ File Upload │
│ │ │ │ │ │
└─────────┬───────┘ └─────────┬───────┘ └─────────┬───────┘
│ │ │
└──────────────────────┼──────────────────────┘
│
┌─────────────▼───────────────┐
│ FastAPI Service │
│ │
└─────────────┬───────────────┘
│
┌─────────────▼───────────────┐
│ Background Workers │
│ │
└─────────────┬───────────────┘
│
┌───────────────────┼───────────────────┐
│ │ │
┌─────────▼──────────┐ ┌─────▼──────┐ ┌─────────▼──────────┐
│ YouTube │ │ Audio │ │ Google Cloud │
│ Downloader │ │ Processor│ │ Speech-to-Text │
│ (yt-dlp) │ │ (pydub) │ │ v2 API │
└─────────┬──────────┘ └─────┬──────┘ └─────────┬──────────┘
│ │ │
└──────────────────┼──────────────────┘
│
┌─────────────▼───────────────┐
│ Phrase Manager │
│ (Domain-specific terms) │
└─────────────┬───────────────┘
│
┌─────────────▼───────────────┐
│ Subtitle Builder │
│ (SRT/WebVTT) │
└─────────────┬───────────────┘
│
┌─────────────▼───────────────┐
│ Storage Manager │
│ (Local / Google Cloud) │
└─────────────────────────────┘
- Python 3.11+
- FFmpeg
- Google Cloud credentials (for STT)
- Docker (optional)
- Redis (optional, for task queue)
-
Clone the repository
git clone <repository-url> cd sermon-workflow
-
Install dependencies
python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate pip install -r requirements.txt
-
Configure environment
cp .env.template .env # Edit .env with your configuration -
Set up Google Cloud credentials
- Create a service account in Google Cloud Console
- Download the JSON key file
- Set
GOOGLE_APPLICATION_CREDENTIALSin.env
Local development:
uvicorn app.main:app --reloadDocker:
docker-compose up --buildProduction:
uvicorn app.main:app --host 0.0.0.0 --port 8000From YouTube URL:
curl -X POST "http://localhost:8000/api/v1/jobs/transcribe" \
-H "Content-Type: application/json" \
-d '{
"source_type": "youtube",
"url": "https://www.youtube.com/watch?v=VIDEO_ID",
"title": "Sunday Sermon"
}'From local file:
curl -X POST "http://localhost:8000/api/v1/jobs/transcribe" \
-H "Content-Type: application/json" \
-d '{
"source_type": "file",
"file_path": "/path/to/audio.mp3",
"title": "Wednesday Service"
}'File upload:
curl -X POST "http://localhost:8000/api/v1/jobs/transcribe/upload" \
-F "[email protected]" \
-F "title=Sunday Sermon"curl "http://localhost:8000/api/v1/jobs/{job_id}"curl "http://localhost:8000/api/v1/jobs/?limit=10&offset=0"Get all phrases:
curl "http://localhost:8000/api/v1/phrases/"Get phrases by language:
curl "http://localhost:8000/api/v1/phrases/language/cmn-Hans-CN"Add new phrase:
curl -X POST "http://localhost:8000/api/v1/phrases/" \
-H "Content-Type: application/json" \
-d '{
"phrase": "恩典尔湾",
"language": "chinese",
"category": "church_names"
}'Search phrases:
curl -X POST "http://localhost:8000/api/v1/phrases/search" \
-H "Content-Type: application/json" \
-d '{
"query": "耶稣",
"language": "chinese"
}'curl "http://localhost:8000/health"Use the CLI tool for processing multiple files:
-
Create CSV input file:
source_type,source,title youtube,https://www.youtube.com/watch?v=VIDEO1,Sunday Sermon 1 file,/path/to/audio1.mp3,Wednesday Service 1 file,/path/to/audio2.mp3,Friday Prayer
-
Run batch processing:
python scripts/batch_transcribe.py input.csv
-
Options:
python scripts/batch_transcribe.py input.csv \ --output results.csv \ --concurrent-jobs 5 \ --timeout 7200
Test YouTube extraction:
python scripts/quick_youtube_test.pyTest STT conversion:
python scripts/quick_stt_test.pyTest chunking system:
python scripts/quick_chunking_test.pyValidate chunking system:
python scripts/validate_chunking_system.py audio_file.mp3Test GCS STT support:
python scripts/test_gcs_stt.pyDiagnose Google STT issues:
python scripts/diagnose_google_stt.pyRun chunked extraction test:
python tests/test_chunked_extraction.pyRun comprehensive YouTube test:
python tests/test_youtube_extraction.pyRun single chunk STT test:
python tests/test_single_chunk_stt.pyKey configuration options in .env:
# Google Cloud
GOOGLE_APPLICATION_CREDENTIALS=path/to/service-account.json
GOOGLE_CLOUD_PROJECT=your-project-id
GCS_BUCKET_NAME=your-bucket-name
# Speech-to-Text
STT_LANGUAGE_CODE=cmn-Hans-CN
STT_MODEL=default
STT_COST_LIMIT_USD=10.0
# Storage
STORAGE_TYPE=local # or 'gcs'
LOCAL_STORAGE_PATH=./data/processed
MAX_FILE_SIZE_MB=500
# API
API_HOST=0.0.0.0
API_PORT=8000
API_KEY=your-api-key
# Redis (optional)
REDIS_URL=redis://localhost:6379/0
# Development
DEBUG=true
LOG_LEVEL=INFOsermon-workflow/
├── app/
│ ├── __init__.py
│ ├── main.py # FastAPI application
│ ├── config.py # Configuration management
│ ├── models.py # Data models
│ ├── workers.py # Background job processing
│ ├── routers/
│ │ ├── jobs.py # Job management routes
│ │ └── phrases.py # Phrase management routes
│ ├── services/
│ │ ├── ingest/
│ │ │ ├── downloader.py # YouTube downloader
│ │ │ └── audio_extractor.py # Audio processing & chunking
│ │ ├── stt/
│ │ │ └── google_stt.py # Google Cloud STT v2 API
│ │ ├── subtitles/
│ │ │ └── builder.py # Subtitle generation
│ │ ├── phrase_manager.py # Phrase management service
│ │ └── storage.py # Storage management
│ └── config/
│ └── phrases.json # Domain-specific phrases
├── scripts/
│ ├── batch_transcribe.py # Batch processing CLI
│ ├── validate_chunking_system.py # Chunking validation
│ ├── test_gcs_stt.py # GCS STT testing
│ ├── diagnose_google_stt.py # STT diagnostics
│ └── quick_*.py # Quick test scripts
├── tests/
│ ├── test_chunked_extraction.py # Comprehensive chunking test
│ ├── test_youtube_extraction.py # YouTube workflow test
│ ├── test_single_chunk_stt.py # STT conversion test
│ └── test_*.py # Other test files
├── data/
│ ├── raw/ # Raw audio files
│ └── processed/ # Processed outputs
├── Dockerfile
├── docker-compose.yml
├── requirements.txt
└── .env.template
Run the development server:
uvicorn app.main:app --reloadTest with sample YouTube video:
curl -X POST "http://localhost:8000/api/v1/jobs/transcribe" \
-H "Content-Type: application/json" \
-d '{
"source_type": "youtube",
"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"title": "Test Video"
}'Check job status:
curl "http://localhost:8000/api/v1/jobs/{job_id}"Test phrase management:
curl "http://localhost:8000/api/v1/phrases/health"- Health endpoint:
GET /health - Statistics:
GET /stats - Configuration:
GET /config(debug mode only) - Structured logging: JSON format with configurable levels
- Performance metrics: Processing time, cost estimation, file sizes
# Basic development setup
docker-compose up --build
# With Redis for task queue
docker-compose --profile redis up -d
# With admin interface
docker-compose --profile admin up -d# Production with Redis
docker-compose -f docker-compose.prod.yml --profile production up -d
# Production with monitoring stack
docker-compose -f docker-compose.prod.yml --profile production --profile monitoring up -d# Copy and configure environment file
cp .env.template .env
# Edit .env with your production settings
# For production, ensure service account is available
# The service-account.json file will be mounted into the container- API key authentication (optional)
- File upload validation and size limits
- Resource limits (file size, processing time)
- Cost limits for STT usage
- Non-root container execution
- CORS configuration for web clients
- Concurrent processing: Background tasks with configurable limits
- Intelligent chunking: Automatic audio splitting for optimal STT performance
- File streaming: Efficient handling of large audio files
- Storage optimization: Automatic cleanup and lifecycle management
- Cost monitoring: Real-time STT cost estimation and limits
- Batch operations: Support for long audio files via Google Cloud Storage
- Database: Currently uses in-memory storage (SQLite/PostgreSQL integration planned)
- Task queue: Simple background tasks (Redis/RQ integration available)
- Authentication: Basic API key auth (OAuth2 planned for production)
- Monitoring: Basic health checks (Prometheus metrics available)
- Phase 2: Video clipping and highlight extraction
- Phase 3: Devotional content generation with LLM
- Phase 4: Multi-platform content distribution
- Database: PostgreSQL integration
- Queue: Redis/RQ for robust job processing
- Monitoring: Prometheus + Grafana dashboard
- Multi-language: Support for additional languages
- Advanced phrase adaptation: Dynamic phrase learning
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
For issues and questions:
- Check the logs:
docker-compose logs - Health check:
curl http://localhost:8000/health - Documentation:
http://localhost:8000/docs - Run diagnostics:
python scripts/diagnose_google_stt.py
| Variable | Description | Default |
|---|---|---|
GOOGLE_APPLICATION_CREDENTIALS |
Path to GCP service account key | Required |
GOOGLE_CLOUD_PROJECT |
GCP project ID | Required |
GCS_BUCKET_NAME |
GCS bucket for file storage | Optional |
STT_LANGUAGE_CODE |
Speech-to-Text language | cmn-Hans-CN |
STT_MODEL |
STT model type | default |
STT_COST_LIMIT_USD |
Maximum STT cost per job | 10.0 |
STORAGE_TYPE |
Storage backend (local or gcs) |
local |
LOCAL_STORAGE_PATH |
Local storage directory | ./data/processed |
MAX_FILE_SIZE_MB |
Maximum file size for processing | 500 |
API_HOST |
API server host | 0.0.0.0 |
API_PORT |
API server port | 8000 |
API_KEY |
API authentication key | Optional |
REDIS_URL |
Redis connection URL | redis://localhost:6379/0 |
DEBUG |
Enable debug mode | true |
LOG_LEVEL |
Logging level | INFO |
- Google Cloud Speech-to-Text v2 API: Full support for the latest API with improved accuracy
- Intelligent Audio Chunking: Automatic splitting of large files to stay within Google STT limits
- Phrase Management System: Domain-specific religious terms for improved transcription accuracy
- Comprehensive Testing Suite: Validation tools and diagnostic scripts for troubleshooting
- Batch Processing: CLI tool with concurrent job support for processing multiple files
- Production-Ready: Structured logging, health checks, and monitoring endpoints
- Audio Processing: Optimized for STT with automatic format conversion and quality preservation
- Error Handling: Robust error handling with detailed logging and recovery mechanisms
- Performance: Efficient processing pipeline with configurable concurrency limits
- Scalability: Support for both local and cloud storage with automatic cleanup
- Monitoring: Real-time cost tracking and performance metrics
- Docker Optimization: Multi-stage builds, proper file mounting, and production-ready configurations