Skip to content

nicobistolfi/go-transcribe-api

Repository files navigation

Go Transcribe API

A Golang serverless API for transcribing audio and video files using AWS Transcribe, featuring S3 event-driven processing and API key authentication, designed for serverless deployment on AWS Lambda.

Features

  • ✅ Pure Go implementation using only net/http (no external frameworks)
  • ✅ S3 event-driven transcription processing
  • ✅ API key authentication via X-API-Key header
  • ✅ Health check endpoint at /health
  • ✅ Automatic transcription on file upload to S3
  • ✅ Multiple audio/video format support:
    • Audio: MP3, WAV, FLAC, OGG, AMR
    • Video: MP4, WebM
  • ✅ Automatic output to separate S3 bucket
  • ✅ Asynchronous job processing with polling
  • ✅ Graceful shutdown support
  • ✅ Request logging middleware
  • ✅ AWS Lambda deployment ready with Serverless Framework
  • ✅ Dual deployment: Run locally as HTTP server or deploy to AWS Lambda

Project Structure

/
├── cmd/
│   ├── api/
│   │   └── main.go           # Local server entry point
│   └── lambda/
│       └── main.go           # AWS Lambda entry point (S3 + API Gateway)
├── internal/
│   ├── handlers/
│   │   ├── health.go         # Health check handler
│   │   ├── transcribe.go     # Transcription handler (audio & video)
│   │   ├── health_test.go    # Health handler tests
│   │   └── transcribe_test.go # Transcription tests
│   ├── middleware/
│   │   ├── auth.go           # Authentication middleware
│   │   └── auth_test.go      # Middleware tests
│   └── server/
│       ├── server.go         # Server setup and configuration
│       └── server_test.go    # Server tests
├── tests/
│   ├── e2e_test.go           # End-to-end integration tests
│   └── data/                 # Test data files (audio/video samples)
├── serverless.yml            # Serverless Framework configuration
├── Taskfile.yml              # Task runner configuration
├── .air.toml                 # Hot reload configuration
├── Dockerfile                # Docker configuration
├── .gitignore                # Git ignore file
├── go.mod                    # Go module file
└── README.md                 # This file

Architecture

S3 Event-Driven Flow

  1. Upload audio/video file to Input S3 Bucket
  2. S3 triggers Lambda function automatically
  3. Lambda starts AWS Transcribe job
  4. Lambda polls for job completion
  5. Transcription result saved to Output S3 Bucket
    • AWS Transcribe raw output saved to transcribe-{jobname}-{timestamp}.json
    • Processed result saved to results/{filename}.json
    • Plain text transcript saved to transcripts/{filename}.txt
  6. AWS Transcribe output triggers webhook (if configured)
    • S3 event fires when transcribe-*.json file is created
    • Lambda downloads both AWS Transcribe output and processed result
    • Webhook receives complete payload with both JSONs and metadata
  7. Original transcript available in AWS Transcribe output location

Components

  • Input Bucket: go-transcribe-api-{stage}-input
  • Output Bucket: go-transcribe-api-{stage}-output
  • Lambda Function: Handles both S3 events and HTTP API requests
  • AWS Transcribe: Performs actual audio/video transcription

Local Development Setup

Prerequisites

  • Go 1.21 or higher
  • Git
  • Task - Task runner (recommended)
  • AWS CLI configured with appropriate credentials
  • (Optional) Serverless Framework for deployment
  • (Optional) Docker for containerized deployment

Installation

  1. Clone the repository:
git clone https://github.com/nicobistolfi/go-transcribe-api.git
cd go-transcribe-api
  1. Install dependencies:
go mod download
# or using Task
task mod
  1. Create a .env file from the example:
cp .env.example .env
# Edit .env and set your API_KEY and bucket names
  1. Install Task runner (if not already installed):
# macOS
brew install go-task/tap/go-task

# Linux
sh -c "$(curl --location https://taskfile.dev/install.sh)" -- -d

# Windows (using Scoop)
scoop install task

Quick Start with Task

View all available tasks:

task --list
# or simply
task

Common operations:

# Run the server locally
task run

# Run tests
task test

# Run tests with coverage
task test-coverage

# Build the binary
task build

# Format code
task fmt

# Start development server with hot reload
task dev

Environment Variables Configuration

The application uses the following environment variables:

Variable Description Default Required
API_KEY API key for authentication - Yes
PORT Port to run the server on 8080 No
AWS_REGION AWS region for Transcribe service - Yes
AWS_ACCESS_KEY_ID AWS access key ID - Yes (unless using IAM roles)
AWS_SECRET_ACCESS_KEY AWS secret access key - Yes (unless using IAM roles)
INPUT_BUCKET S3 bucket for input files - Yes (for Lambda)
OUTPUT_BUCKET S3 bucket for output transcripts - Yes (for Lambda)
WEBHOOK_URL Webhook URL to receive transcription results - No

Running the Server Locally

Using Task (Recommended)

# Run with default dev API key
task run

# Run with custom API key
API_KEY="your-secret-api-key" task run

# Run on custom port
PORT=3000 task run

Using Go directly

  1. Set the required environment variables:
export API_KEY="your-secret-api-key"
export PORT="8080"  # Optional, defaults to 8080
export INPUT_BUCKET="go-transcribe-api-dev-input"
export OUTPUT_BUCKET="go-transcribe-api-dev-output"
  1. Run the server:
go run cmd/api/main.go

Using Docker

# Build and run in Docker
API_KEY="your-secret-api-key" task docker

The server will start on http://localhost:8080 (or the port specified).

Testing the API

Health check (no authentication required):

curl http://localhost:8080/health
# or using Task
curl http://localhost:8080/health | jq .

Expected response:

{"status":"ok"}

Check transcription status (with authentication):

curl -H "X-API-Key: your-secret-api-key" http://localhost:8080/transcribe/status

Expected response:

{"message":"Transcription is triggered automatically via S3 uploads to the input bucket"}

Running Tests

Using Task (Recommended)

# Run all tests
task test

# Run tests with coverage
task test-coverage

# Run end-to-end tests (requires AWS credentials)
task test:e2e

# Run linter
task lint

# Clean build artifacts
task clean

Using Go directly

Run all tests with coverage:

go test -v -cover ./...

Run tests for a specific package:

go test -v ./internal/handlers
go test -v ./internal/middleware
go test -v ./internal/server

Run end-to-end tests (requires AWS credentials):

# Set up AWS credentials first
export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
export AWS_REGION="us-west-1"
export INPUT_BUCKET="go-transcribe-api-dev-input"
export OUTPUT_BUCKET="go-transcribe-api-dev-output"

# Run e2e tests
go test -v ./tests

Generate coverage report:

go test -coverprofile=coverage.out ./...
go tool cover -html=coverage.out -o coverage.html

Serverless Deployment

Prerequisites

  1. Install Serverless Framework:
npm install -g serverless
  1. Install dependencies:
npm install
  1. Configure AWS credentials:
aws configure

Deployment Steps

Using Task (Recommended)

Make sure you have a .env file with your API_KEY set, or pass it explicitly:

# Deploy using .env file
task deploy

# Or deploy with explicit API_KEY
API_KEY="your-api-key" task deploy

# Deploy to specific stage
STAGE=production task deploy

# View logs
task logs

Using Serverless directly

  1. Set your API key as an environment variable:
export API_KEY="your-production-api-key"
  1. Deploy to AWS:
serverless deploy --stage production --region us-west-1
  1. Deploy to a specific stage:
serverless deploy --stage dev
serverless deploy --stage staging
serverless deploy --stage production

Post-Deployment

After deployment, the Serverless Framework will output:

  • API Gateway URL
  • Lambda function names
  • S3 bucket names

Note: The input bucket is automatically created by the S3 event configuration. You can upload files directly to the input bucket to trigger transcription.

Viewing Logs

View function logs:

serverless logs -f transcribe --tail
# or using Task
task logs

Removing the Deployment

Remove the deployed service:

serverless remove --stage production
# or using Task
STAGE=production serverless remove

Usage

Transcribing Audio/Video Files

  1. Upload file to Input S3 Bucket:
aws s3 cp my-audio.mp3 s3://go-transcribe-api-dev-input/
  1. Lambda automatically processes the file:

    • Detects file format
    • Starts AWS Transcribe job
    • Polls for completion
    • Saves result to output bucket
  2. Retrieve transcription results:

# Download full JSON result
aws s3 cp s3://go-transcribe-api-dev-output/results/my-audio.json ./

# Download plain text transcript
aws s3 cp s3://go-transcribe-api-dev-output/transcripts/my-audio.txt ./
  1. Webhook Notification (optional):

    • If WEBHOOK_URL is configured, webhook is triggered when AWS Transcribe completes
    • S3 event fires when transcribe-*.json file is created in output bucket
    • Lambda downloads both AWS Transcribe raw output and processed result
    • Authentication: Includes X-Api-Key header with the API_KEY value
    • Timeout: 30 seconds
    • Content-Type: application/json
    • Payload includes:
      • Event metadata (bucket, key, timestamp, region)
      • Complete AWS Transcribe JSON output
      • Processed result JSON with extracted transcript
  2. Webhook Request Headers:

Content-Type: application/json
User-Agent: go-transcribe-api/1.0
X-Api-Key: <your-api-key>
  1. Webhook Payload Example:
{
  "event": {
    "bucket": "go-transcribe-api-dev-output",
    "key": "transcribe-my-audio-1234567890.json",
    "timestamp": "2024-01-15T10:30:00Z",
    "region": "us-west-1"
  },
  "transcribe_output": {
    "jobName": "transcribe-my-audio-1234567890",
    "accountId": "123456789",
    "results": {
      "transcripts": [{
        "transcript": "Full transcript text from AWS Transcribe..."
      }],
      "items": [...]
    },
    "status": "COMPLETED"
  },
  "processed_result": {
    "filename": "my-audio.mp3",
    "job_name": "transcribe-my-audio-1234567890",
    "status": "COMPLETED",
    "transcript": "Full transcript text...",
    "language_code": "en-US",
    "success": true
  }
}
  1. Processed result format:
{
  "filename": "my-audio.mp3",
  "job_name": "transcribe-my-audio-1234567890",
  "status": "COMPLETED",
  "transcript": "This is the full transcript of the audio...",
  "language_code": "en-US",
  "output_file_url": "https://...",
  "success": true
}

Supported File Formats

The API automatically detects file formats based on file extensions:

  • Audio Files:

    • MP3 (.mp3)
    • WAV (.wav)
    • FLAC (.flac)
    • OGG (.ogg)
    • AMR (.amr)
  • Video Files:

    • MP4 (.mp4)
    • WebM (.webm)

API Documentation

Endpoints

GET /health

Health check endpoint that returns the service status.

Authentication: Not required

Response:

  • Status: 200 OK
  • Body: {"status": "ok"}

GET /transcribe/status

Information endpoint about transcription triggering.

Authentication: Required (X-API-Key header)

Response:

  • Status: 200 OK
  • Body: {"message": "Transcription is triggered automatically via S3 uploads to the input bucket"}

Authentication

All endpoints (except /health) require API key authentication via the X-API-Key header.

Example:

curl -H "X-API-Key: your-api-key" https://your-api-url.com/transcribe/status

Error Responses:

  • 401 Unauthorized - Missing or invalid API key
    • {"error": "Missing API key"}
    • {"error": "Invalid API key"}
    • {"error": "API key not configured"}

Development Guidelines

Development Tools

Install development dependencies:

go install github.com/cosmtrek/air@latest
go install github.com/golangci/golangci-lint/cmd/golangci-lint@latest

This installs:

  • air - Hot reload for development
  • golangci-lint - Linting tool

Start development server with hot reload:

task dev

Run code checks:

# Format code
task fmt

# Run linter (if installed)
task lint

# Run default task (format, test, build)
task default

Clean build artifacts:

task clean

Transcription Processing

Processing Flow

  1. File Upload: User uploads audio/video to input S3 bucket
  2. Event Trigger: S3 triggers Lambda with event details
  3. Job Start: Lambda starts AWS Transcribe job with appropriate media format
  4. Polling: Lambda polls job status every 5 seconds (max 10 minutes)
  5. Completion: When completed, downloads transcript from AWS output
  6. Storage: Saves formatted result to output S3 bucket

Features

  • Automatic Format Detection: Determines media format from file extension
  • Asynchronous Processing: Uses AWS Transcribe's async API
  • Polling with Timeout: Polls every 5 seconds, max 120 attempts
  • Error Handling: Individual file processing errors logged and stored
  • Unique Job Names: Timestamp-based unique identifiers prevent conflicts

AWS Transcribe Configuration

The implementation uses:

  • Language Code: English (en-US) by default
  • Media Format: Auto-detected from extension
  • Output Location: Controlled by OUTPUT_BUCKET environment variable
  • Job Timeout: 10 minutes maximum polling duration

Adding New Endpoints

  1. Create a new handler in internal/handlers/
  2. Add authentication by wrapping with middleware.AuthMiddleware()
  3. Register the route in internal/server/server.go
  4. Write comprehensive tests

Example:

// In internal/server/server.go
mux.HandleFunc("/api/jobs", middleware.AuthMiddleware(handlers.JobsHandler))

Code Style

  • Follow standard Go conventions
  • Use gofmt for formatting
  • Keep functions small and focused
  • Write tests for all new functionality
  • Use meaningful variable and function names

Troubleshooting

Common Issues

  1. Server fails to start

    • Check if the port is already in use
    • Ensure all environment variables are set correctly
  2. Authentication failures

    • Verify the API_KEY environment variable is set
    • Check that the X-API-Key header matches exactly
  3. Transcription not triggering

    • Verify file is uploaded to correct S3 bucket
    • Check Lambda CloudWatch logs for errors
    • Ensure file extension is supported
    • Verify IAM permissions for S3 and Transcribe
  4. Deployment issues

    • Ensure AWS credentials are configured
    • Check Serverless Framework version compatibility
    • Verify the Go version matches the Lambda runtime
    • Ensure the binary is built for Linux (GOOS=linux GOARCH=amd64)
  5. Transcription jobs failing

    • Check file format is supported by AWS Transcribe
    • Verify IAM role has transcribe:StartTranscriptionJob permission
    • Check CloudWatch logs for detailed error messages
    • Ensure output bucket has write permissions

Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add some amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • Built with AWS Transcribe for high-quality speech-to-text
  • Uses AWS Lambda for serverless scalability
  • Inspired by modern serverless architectures

About

A Golang serverless API for transcribing audio and video files using AWS Transcribe, featuring S3 event-driven processing and API key authentication, designed for serverless deployment on AWS Lambda.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors