Skip to content

JonathanJing/AI-sermon-workflow

Repository files navigation

Sermon Workflow - Phase 1: Speech-to-Text Service

Automated sermon content workflow system that converts video/audio sermons into high-quality Simplified Chinese subtitles using Google Cloud Speech-to-Text v2 API.

🎯 Features

  • Multi-source ingestion: YouTube URLs and local audio/video files
  • Google Cloud STT v2 API: High-accuracy Simplified Chinese transcription with batch processing
  • Intelligent audio chunking: Automatic splitting of large files for optimal STT performance
  • Phrase management system: Domain-specific religious terms for improved accuracy
  • Subtitle generation: SRT and WebVTT formats with proper line wrapping
  • REST API: Comprehensive RESTful endpoints for job and phrase management
  • Batch processing: CLI tool for processing multiple files with concurrent job support
  • Storage options: Local filesystem or Google Cloud Storage with automatic cleanup
  • Docker support: Containerized deployment with health checks
  • Cost monitoring: Real-time STT cost estimation and limits
  • Comprehensive testing: Validation tools and diagnostic scripts
  • Production-ready: Structured logging, monitoring, and error handling

🏗️ Architecture

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   YouTube URL   │    │   Local Files   │    │   File Upload   │
│                 │    │                 │    │                 │
└─────────┬───────┘    └─────────┬───────┘    └─────────┬───────┘
          │                      │                      │
          └──────────────────────┼──────────────────────┘
                                 │
                    ┌─────────────▼───────────────┐
                    │     FastAPI Service         │
                    │                             │
                    └─────────────┬───────────────┘
                                  │
                    ┌─────────────▼───────────────┐
                    │   Background Workers        │
                    │                             │
                    └─────────────┬───────────────┘
                                  │
              ┌───────────────────┼───────────────────┐
              │                   │                   │
    ┌─────────▼──────────┐ ┌─────▼──────┐ ┌─────────▼──────────┐
    │   YouTube          │ │   Audio    │ │   Google Cloud      │
    │   Downloader       │ │   Processor│ │   Speech-to-Text   │
    │   (yt-dlp)         │ │   (pydub)  │ │   v2 API           │
    └─────────┬──────────┘ └─────┬──────┘ └─────────┬──────────┘
              │                  │                  │
              └──────────────────┼──────────────────┘
                                 │
                    ┌─────────────▼───────────────┐
                    │   Phrase Manager            │
                    │   (Domain-specific terms)   │
                    └─────────────┬───────────────┘
                                  │
                    ┌─────────────▼───────────────┐
                    │   Subtitle Builder          │
                    │   (SRT/WebVTT)             │
                    └─────────────┬───────────────┘
                                  │
                    ┌─────────────▼───────────────┐
                    │   Storage Manager           │
                    │   (Local / Google Cloud)    │
                    └─────────────────────────────┘

🚀 Quick Start

Prerequisites

  • Python 3.11+
  • FFmpeg
  • Google Cloud credentials (for STT)
  • Docker (optional)
  • Redis (optional, for task queue)

Installation

  1. Clone the repository

    git clone <repository-url>
    cd sermon-workflow
  2. Install dependencies

    python -m venv .venv
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
    pip install -r requirements.txt
  3. Configure environment

    cp .env.template .env
    # Edit .env with your configuration
  4. Set up Google Cloud credentials

    • Create a service account in Google Cloud Console
    • Download the JSON key file
    • Set GOOGLE_APPLICATION_CREDENTIALS in .env

Running the Service

Local development:

uvicorn app.main:app --reload

Docker:

docker-compose up --build

Production:

uvicorn app.main:app --host 0.0.0.0 --port 8000

📋 API Usage

1. Create Transcription Job

From YouTube URL:

curl -X POST "http://localhost:8000/api/v1/jobs/transcribe" \
  -H "Content-Type: application/json" \
  -d '{
    "source_type": "youtube",
    "url": "https://www.youtube.com/watch?v=VIDEO_ID",
    "title": "Sunday Sermon"
  }'

From local file:

curl -X POST "http://localhost:8000/api/v1/jobs/transcribe" \
  -H "Content-Type: application/json" \
  -d '{
    "source_type": "file",
    "file_path": "/path/to/audio.mp3",
    "title": "Wednesday Service"
  }'

File upload:

curl -X POST "http://localhost:8000/api/v1/jobs/transcribe/upload" \
  -F "[email protected]" \
  -F "title=Sunday Sermon"

2. Check Job Status

curl "http://localhost:8000/api/v1/jobs/{job_id}"

3. List Jobs

curl "http://localhost:8000/api/v1/jobs/?limit=10&offset=0"

4. Phrase Management

Get all phrases:

curl "http://localhost:8000/api/v1/phrases/"

Get phrases by language:

curl "http://localhost:8000/api/v1/phrases/language/cmn-Hans-CN"

Add new phrase:

curl -X POST "http://localhost:8000/api/v1/phrases/" \
  -H "Content-Type: application/json" \
  -d '{
    "phrase": "恩典尔湾",
    "language": "chinese",
    "category": "church_names"
  }'

Search phrases:

curl -X POST "http://localhost:8000/api/v1/phrases/search" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "耶稣",
    "language": "chinese"
  }'

5. Health Check

curl "http://localhost:8000/health"

🔧 Batch Processing

Use the CLI tool for processing multiple files:

  1. Create CSV input file:

    source_type,source,title
    youtube,https://www.youtube.com/watch?v=VIDEO1,Sunday Sermon 1
    file,/path/to/audio1.mp3,Wednesday Service 1
    file,/path/to/audio2.mp3,Friday Prayer
  2. Run batch processing:

    python scripts/batch_transcribe.py input.csv
  3. Options:

    python scripts/batch_transcribe.py input.csv \
      --output results.csv \
      --concurrent-jobs 5 \
      --timeout 7200

🧪 Testing & Validation

Quick Tests

Test YouTube extraction:

python scripts/quick_youtube_test.py

Test STT conversion:

python scripts/quick_stt_test.py

Test chunking system:

python scripts/quick_chunking_test.py

Comprehensive Validation

Validate chunking system:

python scripts/validate_chunking_system.py audio_file.mp3

Test GCS STT support:

python scripts/test_gcs_stt.py

Diagnose Google STT issues:

python scripts/diagnose_google_stt.py

Test Suites

Run chunked extraction test:

python tests/test_chunked_extraction.py

Run comprehensive YouTube test:

python tests/test_youtube_extraction.py

Run single chunk STT test:

python tests/test_single_chunk_stt.py

⚙️ Configuration

Key configuration options in .env:

# Google Cloud
GOOGLE_APPLICATION_CREDENTIALS=path/to/service-account.json
GOOGLE_CLOUD_PROJECT=your-project-id
GCS_BUCKET_NAME=your-bucket-name

# Speech-to-Text
STT_LANGUAGE_CODE=cmn-Hans-CN
STT_MODEL=default
STT_COST_LIMIT_USD=10.0

# Storage
STORAGE_TYPE=local  # or 'gcs'
LOCAL_STORAGE_PATH=./data/processed
MAX_FILE_SIZE_MB=500

# API
API_HOST=0.0.0.0
API_PORT=8000
API_KEY=your-api-key

# Redis (optional)
REDIS_URL=redis://localhost:6379/0

# Development
DEBUG=true
LOG_LEVEL=INFO

📁 Project Structure

sermon-workflow/
├── app/
│   ├── __init__.py
│   ├── main.py                 # FastAPI application
│   ├── config.py               # Configuration management
│   ├── models.py               # Data models
│   ├── workers.py              # Background job processing
│   ├── routers/
│   │   ├── jobs.py             # Job management routes
│   │   └── phrases.py          # Phrase management routes
│   ├── services/
│   │   ├── ingest/
│   │   │   ├── downloader.py   # YouTube downloader
│   │   │   └── audio_extractor.py  # Audio processing & chunking
│   │   ├── stt/
│   │   │   └── google_stt.py   # Google Cloud STT v2 API
│   │   ├── subtitles/
│   │   │   └── builder.py      # Subtitle generation
│   │   ├── phrase_manager.py   # Phrase management service
│   │   └── storage.py          # Storage management
│   └── config/
│       └── phrases.json        # Domain-specific phrases
├── scripts/
│   ├── batch_transcribe.py     # Batch processing CLI
│   ├── validate_chunking_system.py  # Chunking validation
│   ├── test_gcs_stt.py         # GCS STT testing
│   ├── diagnose_google_stt.py  # STT diagnostics
│   └── quick_*.py              # Quick test scripts
├── tests/
│   ├── test_chunked_extraction.py  # Comprehensive chunking test
│   ├── test_youtube_extraction.py  # YouTube workflow test
│   ├── test_single_chunk_stt.py    # STT conversion test
│   └── test_*.py                   # Other test files
├── data/
│   ├── raw/                    # Raw audio files
│   └── processed/              # Processed outputs
├── Dockerfile
├── docker-compose.yml
├── requirements.txt
└── .env.template

🔍 Testing

Run the development server:

uvicorn app.main:app --reload

Test with sample YouTube video:

curl -X POST "http://localhost:8000/api/v1/jobs/transcribe" \
  -H "Content-Type: application/json" \
  -d '{
    "source_type": "youtube",
    "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
    "title": "Test Video"
  }'

Check job status:

curl "http://localhost:8000/api/v1/jobs/{job_id}"

Test phrase management:

curl "http://localhost:8000/api/v1/phrases/health"

📊 Monitoring

  • Health endpoint: GET /health
  • Statistics: GET /stats
  • Configuration: GET /config (debug mode only)
  • Structured logging: JSON format with configurable levels
  • Performance metrics: Processing time, cost estimation, file sizes

🐳 Docker Deployment

Development

# Basic development setup
docker-compose up --build

# With Redis for task queue
docker-compose --profile redis up -d

# With admin interface
docker-compose --profile admin up -d

Production

# Production with Redis
docker-compose -f docker-compose.prod.yml --profile production up -d

# Production with monitoring stack
docker-compose -f docker-compose.prod.yml --profile production --profile monitoring up -d

Environment Setup

# Copy and configure environment file
cp .env.template .env
# Edit .env with your production settings

# For production, ensure service account is available
# The service-account.json file will be mounted into the container

🔐 Security

  • API key authentication (optional)
  • File upload validation and size limits
  • Resource limits (file size, processing time)
  • Cost limits for STT usage
  • Non-root container execution
  • CORS configuration for web clients

📈 Performance & Scaling

  • Concurrent processing: Background tasks with configurable limits
  • Intelligent chunking: Automatic audio splitting for optimal STT performance
  • File streaming: Efficient handling of large audio files
  • Storage optimization: Automatic cleanup and lifecycle management
  • Cost monitoring: Real-time STT cost estimation and limits
  • Batch operations: Support for long audio files via Google Cloud Storage

🚧 Known Limitations

  1. Database: Currently uses in-memory storage (SQLite/PostgreSQL integration planned)
  2. Task queue: Simple background tasks (Redis/RQ integration available)
  3. Authentication: Basic API key auth (OAuth2 planned for production)
  4. Monitoring: Basic health checks (Prometheus metrics available)

🛣️ Roadmap

  • Phase 2: Video clipping and highlight extraction
  • Phase 3: Devotional content generation with LLM
  • Phase 4: Multi-platform content distribution
  • Database: PostgreSQL integration
  • Queue: Redis/RQ for robust job processing
  • Monitoring: Prometheus + Grafana dashboard
  • Multi-language: Support for additional languages
  • Advanced phrase adaptation: Dynamic phrase learning

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Submit a pull request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🆘 Support

For issues and questions:

  • Check the logs: docker-compose logs
  • Health check: curl http://localhost:8000/health
  • Documentation: http://localhost:8000/docs
  • Run diagnostics: python scripts/diagnose_google_stt.py

📋 Environment Variables Reference

Variable Description Default
GOOGLE_APPLICATION_CREDENTIALS Path to GCP service account key Required
GOOGLE_CLOUD_PROJECT GCP project ID Required
GCS_BUCKET_NAME GCS bucket for file storage Optional
STT_LANGUAGE_CODE Speech-to-Text language cmn-Hans-CN
STT_MODEL STT model type default
STT_COST_LIMIT_USD Maximum STT cost per job 10.0
STORAGE_TYPE Storage backend (local or gcs) local
LOCAL_STORAGE_PATH Local storage directory ./data/processed
MAX_FILE_SIZE_MB Maximum file size for processing 500
API_HOST API server host 0.0.0.0
API_PORT API server port 8000
API_KEY API authentication key Optional
REDIS_URL Redis connection URL redis://localhost:6379/0
DEBUG Enable debug mode true
LOG_LEVEL Logging level INFO

🎯 Key Features Summary

Latest Enhancements

  • Google Cloud Speech-to-Text v2 API: Full support for the latest API with improved accuracy
  • Intelligent Audio Chunking: Automatic splitting of large files to stay within Google STT limits
  • Phrase Management System: Domain-specific religious terms for improved transcription accuracy
  • Comprehensive Testing Suite: Validation tools and diagnostic scripts for troubleshooting
  • Batch Processing: CLI tool with concurrent job support for processing multiple files
  • Production-Ready: Structured logging, health checks, and monitoring endpoints

Technical Improvements

  • Audio Processing: Optimized for STT with automatic format conversion and quality preservation
  • Error Handling: Robust error handling with detailed logging and recovery mechanisms
  • Performance: Efficient processing pipeline with configurable concurrency limits
  • Scalability: Support for both local and cloud storage with automatic cleanup
  • Monitoring: Real-time cost tracking and performance metrics
  • Docker Optimization: Multi-stage builds, proper file mounting, and production-ready configurations

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors