A Golang serverless API for transcribing audio and video files using AWS Transcribe, featuring S3 event-driven processing and API key authentication, designed for serverless deployment on AWS Lambda.
- ✅ Pure Go implementation using only
net/http(no external frameworks) - ✅ S3 event-driven transcription processing
- ✅ API key authentication via
X-API-Keyheader - ✅ Health check endpoint at
/health - ✅ Automatic transcription on file upload to S3
- ✅ Multiple audio/video format support:
- Audio: MP3, WAV, FLAC, OGG, AMR
- Video: MP4, WebM
- ✅ Automatic output to separate S3 bucket
- ✅ Asynchronous job processing with polling
- ✅ Graceful shutdown support
- ✅ Request logging middleware
- ✅ AWS Lambda deployment ready with Serverless Framework
- ✅ Dual deployment: Run locally as HTTP server or deploy to AWS Lambda
/
├── cmd/
│ ├── api/
│ │ └── main.go # Local server entry point
│ └── lambda/
│ └── main.go # AWS Lambda entry point (S3 + API Gateway)
├── internal/
│ ├── handlers/
│ │ ├── health.go # Health check handler
│ │ ├── transcribe.go # Transcription handler (audio & video)
│ │ ├── health_test.go # Health handler tests
│ │ └── transcribe_test.go # Transcription tests
│ ├── middleware/
│ │ ├── auth.go # Authentication middleware
│ │ └── auth_test.go # Middleware tests
│ └── server/
│ ├── server.go # Server setup and configuration
│ └── server_test.go # Server tests
├── tests/
│ ├── e2e_test.go # End-to-end integration tests
│ └── data/ # Test data files (audio/video samples)
├── serverless.yml # Serverless Framework configuration
├── Taskfile.yml # Task runner configuration
├── .air.toml # Hot reload configuration
├── Dockerfile # Docker configuration
├── .gitignore # Git ignore file
├── go.mod # Go module file
└── README.md # This file
- Upload audio/video file to Input S3 Bucket
- S3 triggers Lambda function automatically
- Lambda starts AWS Transcribe job
- Lambda polls for job completion
- Transcription result saved to Output S3 Bucket
- AWS Transcribe raw output saved to
transcribe-{jobname}-{timestamp}.json - Processed result saved to
results/{filename}.json - Plain text transcript saved to
transcripts/{filename}.txt
- AWS Transcribe raw output saved to
- AWS Transcribe output triggers webhook (if configured)
- S3 event fires when
transcribe-*.jsonfile is created - Lambda downloads both AWS Transcribe output and processed result
- Webhook receives complete payload with both JSONs and metadata
- S3 event fires when
- Original transcript available in AWS Transcribe output location
- Input Bucket:
go-transcribe-api-{stage}-input - Output Bucket:
go-transcribe-api-{stage}-output - Lambda Function: Handles both S3 events and HTTP API requests
- AWS Transcribe: Performs actual audio/video transcription
- Go 1.21 or higher
- Git
- Task - Task runner (recommended)
- AWS CLI configured with appropriate credentials
- (Optional) Serverless Framework for deployment
- (Optional) Docker for containerized deployment
- Clone the repository:
git clone https://github.com/nicobistolfi/go-transcribe-api.git
cd go-transcribe-api- Install dependencies:
go mod download
# or using Task
task mod- Create a
.envfile from the example:
cp .env.example .env
# Edit .env and set your API_KEY and bucket names- Install Task runner (if not already installed):
# macOS
brew install go-task/tap/go-task
# Linux
sh -c "$(curl --location https://taskfile.dev/install.sh)" -- -d
# Windows (using Scoop)
scoop install taskView all available tasks:
task --list
# or simply
taskCommon operations:
# Run the server locally
task run
# Run tests
task test
# Run tests with coverage
task test-coverage
# Build the binary
task build
# Format code
task fmt
# Start development server with hot reload
task devThe application uses the following environment variables:
| Variable | Description | Default | Required |
|---|---|---|---|
API_KEY |
API key for authentication | - | Yes |
PORT |
Port to run the server on | 8080 |
No |
AWS_REGION |
AWS region for Transcribe service | - | Yes |
AWS_ACCESS_KEY_ID |
AWS access key ID | - | Yes (unless using IAM roles) |
AWS_SECRET_ACCESS_KEY |
AWS secret access key | - | Yes (unless using IAM roles) |
INPUT_BUCKET |
S3 bucket for input files | - | Yes (for Lambda) |
OUTPUT_BUCKET |
S3 bucket for output transcripts | - | Yes (for Lambda) |
WEBHOOK_URL |
Webhook URL to receive transcription results | - | No |
# Run with default dev API key
task run
# Run with custom API key
API_KEY="your-secret-api-key" task run
# Run on custom port
PORT=3000 task run- Set the required environment variables:
export API_KEY="your-secret-api-key"
export PORT="8080" # Optional, defaults to 8080
export INPUT_BUCKET="go-transcribe-api-dev-input"
export OUTPUT_BUCKET="go-transcribe-api-dev-output"- Run the server:
go run cmd/api/main.go# Build and run in Docker
API_KEY="your-secret-api-key" task dockerThe server will start on http://localhost:8080 (or the port specified).
Health check (no authentication required):
curl http://localhost:8080/health
# or using Task
curl http://localhost:8080/health | jq .Expected response:
{"status":"ok"}Check transcription status (with authentication):
curl -H "X-API-Key: your-secret-api-key" http://localhost:8080/transcribe/statusExpected response:
{"message":"Transcription is triggered automatically via S3 uploads to the input bucket"}# Run all tests
task test
# Run tests with coverage
task test-coverage
# Run end-to-end tests (requires AWS credentials)
task test:e2e
# Run linter
task lint
# Clean build artifacts
task cleanRun all tests with coverage:
go test -v -cover ./...Run tests for a specific package:
go test -v ./internal/handlers
go test -v ./internal/middleware
go test -v ./internal/serverRun end-to-end tests (requires AWS credentials):
# Set up AWS credentials first
export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
export AWS_REGION="us-west-1"
export INPUT_BUCKET="go-transcribe-api-dev-input"
export OUTPUT_BUCKET="go-transcribe-api-dev-output"
# Run e2e tests
go test -v ./testsGenerate coverage report:
go test -coverprofile=coverage.out ./...
go tool cover -html=coverage.out -o coverage.html- Install Serverless Framework:
npm install -g serverless- Install dependencies:
npm install- Configure AWS credentials:
aws configureMake sure you have a .env file with your API_KEY set, or pass it explicitly:
# Deploy using .env file
task deploy
# Or deploy with explicit API_KEY
API_KEY="your-api-key" task deploy
# Deploy to specific stage
STAGE=production task deploy
# View logs
task logs- Set your API key as an environment variable:
export API_KEY="your-production-api-key"- Deploy to AWS:
serverless deploy --stage production --region us-west-1- Deploy to a specific stage:
serverless deploy --stage dev
serverless deploy --stage staging
serverless deploy --stage productionAfter deployment, the Serverless Framework will output:
- API Gateway URL
- Lambda function names
- S3 bucket names
Note: The input bucket is automatically created by the S3 event configuration. You can upload files directly to the input bucket to trigger transcription.
View function logs:
serverless logs -f transcribe --tail
# or using Task
task logsRemove the deployed service:
serverless remove --stage production
# or using Task
STAGE=production serverless remove- Upload file to Input S3 Bucket:
aws s3 cp my-audio.mp3 s3://go-transcribe-api-dev-input/-
Lambda automatically processes the file:
- Detects file format
- Starts AWS Transcribe job
- Polls for completion
- Saves result to output bucket
-
Retrieve transcription results:
# Download full JSON result
aws s3 cp s3://go-transcribe-api-dev-output/results/my-audio.json ./
# Download plain text transcript
aws s3 cp s3://go-transcribe-api-dev-output/transcripts/my-audio.txt ./-
Webhook Notification (optional):
- If
WEBHOOK_URLis configured, webhook is triggered when AWS Transcribe completes - S3 event fires when
transcribe-*.jsonfile is created in output bucket - Lambda downloads both AWS Transcribe raw output and processed result
- Authentication: Includes
X-Api-Keyheader with theAPI_KEYvalue - Timeout: 30 seconds
- Content-Type: application/json
- Payload includes:
- Event metadata (bucket, key, timestamp, region)
- Complete AWS Transcribe JSON output
- Processed result JSON with extracted transcript
- If
-
Webhook Request Headers:
Content-Type: application/json
User-Agent: go-transcribe-api/1.0
X-Api-Key: <your-api-key>
- Webhook Payload Example:
{
"event": {
"bucket": "go-transcribe-api-dev-output",
"key": "transcribe-my-audio-1234567890.json",
"timestamp": "2024-01-15T10:30:00Z",
"region": "us-west-1"
},
"transcribe_output": {
"jobName": "transcribe-my-audio-1234567890",
"accountId": "123456789",
"results": {
"transcripts": [{
"transcript": "Full transcript text from AWS Transcribe..."
}],
"items": [...]
},
"status": "COMPLETED"
},
"processed_result": {
"filename": "my-audio.mp3",
"job_name": "transcribe-my-audio-1234567890",
"status": "COMPLETED",
"transcript": "Full transcript text...",
"language_code": "en-US",
"success": true
}
}- Processed result format:
{
"filename": "my-audio.mp3",
"job_name": "transcribe-my-audio-1234567890",
"status": "COMPLETED",
"transcript": "This is the full transcript of the audio...",
"language_code": "en-US",
"output_file_url": "https://...",
"success": true
}The API automatically detects file formats based on file extensions:
-
Audio Files:
- MP3 (
.mp3) - WAV (
.wav) - FLAC (
.flac) - OGG (
.ogg) - AMR (
.amr)
- MP3 (
-
Video Files:
- MP4 (
.mp4) - WebM (
.webm)
- MP4 (
Health check endpoint that returns the service status.
Authentication: Not required
Response:
- Status:
200 OK - Body:
{"status": "ok"}
Information endpoint about transcription triggering.
Authentication: Required (X-API-Key header)
Response:
- Status:
200 OK - Body:
{"message": "Transcription is triggered automatically via S3 uploads to the input bucket"}
All endpoints (except /health) require API key authentication via the X-API-Key header.
Example:
curl -H "X-API-Key: your-api-key" https://your-api-url.com/transcribe/statusError Responses:
401 Unauthorized- Missing or invalid API key{"error": "Missing API key"}{"error": "Invalid API key"}{"error": "API key not configured"}
Install development dependencies:
go install github.com/cosmtrek/air@latest
go install github.com/golangci/golangci-lint/cmd/golangci-lint@latestThis installs:
air- Hot reload for developmentgolangci-lint- Linting tool
Start development server with hot reload:
task devRun code checks:
# Format code
task fmt
# Run linter (if installed)
task lint
# Run default task (format, test, build)
task defaultClean build artifacts:
task clean- File Upload: User uploads audio/video to input S3 bucket
- Event Trigger: S3 triggers Lambda with event details
- Job Start: Lambda starts AWS Transcribe job with appropriate media format
- Polling: Lambda polls job status every 5 seconds (max 10 minutes)
- Completion: When completed, downloads transcript from AWS output
- Storage: Saves formatted result to output S3 bucket
- Automatic Format Detection: Determines media format from file extension
- Asynchronous Processing: Uses AWS Transcribe's async API
- Polling with Timeout: Polls every 5 seconds, max 120 attempts
- Error Handling: Individual file processing errors logged and stored
- Unique Job Names: Timestamp-based unique identifiers prevent conflicts
The implementation uses:
- Language Code: English (en-US) by default
- Media Format: Auto-detected from extension
- Output Location: Controlled by
OUTPUT_BUCKETenvironment variable - Job Timeout: 10 minutes maximum polling duration
- Create a new handler in
internal/handlers/ - Add authentication by wrapping with
middleware.AuthMiddleware() - Register the route in
internal/server/server.go - Write comprehensive tests
Example:
// In internal/server/server.go
mux.HandleFunc("/api/jobs", middleware.AuthMiddleware(handlers.JobsHandler))- Follow standard Go conventions
- Use
gofmtfor formatting - Keep functions small and focused
- Write tests for all new functionality
- Use meaningful variable and function names
-
Server fails to start
- Check if the port is already in use
- Ensure all environment variables are set correctly
-
Authentication failures
- Verify the
API_KEYenvironment variable is set - Check that the
X-API-Keyheader matches exactly
- Verify the
-
Transcription not triggering
- Verify file is uploaded to correct S3 bucket
- Check Lambda CloudWatch logs for errors
- Ensure file extension is supported
- Verify IAM permissions for S3 and Transcribe
-
Deployment issues
- Ensure AWS credentials are configured
- Check Serverless Framework version compatibility
- Verify the Go version matches the Lambda runtime
- Ensure the binary is built for Linux (GOOS=linux GOARCH=amd64)
-
Transcription jobs failing
- Check file format is supported by AWS Transcribe
- Verify IAM role has transcribe:StartTranscriptionJob permission
- Check CloudWatch logs for detailed error messages
- Ensure output bucket has write permissions
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Built with AWS Transcribe for high-quality speech-to-text
- Uses AWS Lambda for serverless scalability
- Inspired by modern serverless architectures