Skip to content

plexusone/omnivoice

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

46 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

OmniVoice

Go CI Go Lint Go SAST Go Report Card Docs Docs Visualization License

Batteries-included voice pipeline framework for Go. This package provides a unified interface for speech-to-text (STT) and text-to-speech (TTS) with all providers included.

For a minimal dependency footprint, use omnivoice-core instead.

Features

  • ๐ŸŽฏ Unified Interface: Single API for all STT and TTS providers
  • ๐Ÿ—‚๏ธ Provider Registry: Get providers by name - no need to import individual provider packages
  • ๐Ÿ”Œ Multiple Providers: OpenAI, Deepgram, ElevenLabs, Twilio, Telnyx
  • โšก Streaming Support: Real-time transcription and synthesis
  • ๐Ÿš€ Easy Integration: Import and use with minimal configuration

Installation

go get github.com/plexusone/omnivoice

CLI

OmniVoice includes a command-line tool for transcription.

Install CLI

go install github.com/plexusone/omnivoice/cmd/omnivoice@latest

Usage

# Set your API key
export DEEPGRAM_API_KEY="your-api-key"

# Basic transcription (stdout)
omnivoice transcribe podcast.mp3

# Save to file
omnivoice transcribe -p deepgram -o transcript.txt podcast.mp3

# JSON output with full metadata (OmniVoice Transcript format)
omnivoice transcribe -p deepgram --diarize --timestamps -f json -o transcript.json podcast.mp3

# Generate SRT subtitles
omnivoice transcribe -p deepgram -f srt -o subtitles.srt podcast.mp3

# Generate WebVTT subtitles
omnivoice transcribe -p deepgram -f vtt -o subtitles.vtt podcast.mp3

# List available providers
omnivoice providers list

Output Formats

Format Description
text Plain transcript text (default)
json OmniVoice Transcript format with full metadata
srt SubRip subtitles
vtt WebVTT subtitles

Environment Variables

Variable Provider
DEEPGRAM_API_KEY Deepgram
OPENAI_API_KEY OpenAI
ELEVENLABS_API_KEY ElevenLabs

Quick Start (Library)

import (
    "github.com/plexusone/omnivoice"
    _ "github.com/plexusone/omnivoice/providers/all" // Register all providers
)

Usage

package main

import (
    "context"
    "log"
    "os"

    "github.com/plexusone/omnivoice"
    _ "github.com/plexusone/omnivoice/providers/all"
)

func main() {
    ctx := context.Background()

    // Get providers by name using the registry
    sttProvider, err := omnivoice.GetSTTProvider("deepgram",
        omnivoice.WithAPIKey(os.Getenv("DEEPGRAM_API_KEY")))
    if err != nil {
        log.Fatal(err)
    }

    ttsProvider, err := omnivoice.GetTTSProvider("elevenlabs",
        omnivoice.WithAPIKey(os.Getenv("ELEVENLABS_API_KEY")))
    if err != nil {
        log.Fatal(err)
    }

    // Transcribe audio
    result, err := sttProvider.TranscribeFile(ctx, "audio.mp3", omnivoice.TranscriptionConfig{
        Language:             "en",
        EnableWordTimestamps: true,
    })
    if err != nil {
        log.Fatal(err)
    }
    log.Printf("Transcription: %s", result.Text)

    // Synthesize speech
    audio, err := ttsProvider.Synthesize(ctx, "Hello, world!", omnivoice.SynthesisConfig{
        VoiceID: "pNInz6obpgDQGcFmaJgB", // Adam
    })
    if err != nil {
        log.Fatal(err)
    }
    // audio.Audio contains the audio bytes
}

Provider Registry

Get providers by name at runtime - no need to import individual provider packages:

// Available providers: "openai", "elevenlabs", "deepgram", "twilio"
ttsProvider, _ := omnivoice.GetTTSProvider("elevenlabs", omnivoice.WithAPIKey(key))
sttProvider, _ := omnivoice.GetSTTProvider("deepgram", omnivoice.WithAPIKey(key))

// List registered providers
fmt.Println(omnivoice.ListTTSProviders()) // [openai elevenlabs deepgram twilio]
fmt.Println(omnivoice.ListSTTProviders()) // [openai elevenlabs deepgram twilio]

Language Codes

OmniVoice accepts language codes in BCP-47 format, which includes ISO 639-1 two-letter codes and regional variants.

Common codes:

Code Language
en English
en-US English (US)
en-GB English (UK)
es Spanish
es-MX Spanish (Mexico)
fr French
de German
it Italian
pt Portuguese
pt-BR Portuguese (Brazil)
ja Japanese
ko Korean
zh Chinese
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
ar Arabic
hi Hindi
ru Russian

Notes:

  • Use simple codes (en) for broad compatibility across providers
  • Use regional variants (en-US) when accent/dialect matters for TTS
  • Provider support varies; see provider documentation for full language lists
  • STT providers generally support automatic language detection when no code is specified

Included Providers

Provider STT TTS Registry Name
OpenAI Whisper TTS-1/TTS-1-HD "openai"
ElevenLabs Scribe Multilingual v2 "elevenlabs"
Deepgram Nova-2 Aura "deepgram"
Twilio Media Streams Media Streams "twilio"

Related Packages

License

MIT License - see LICENSE for details.

About

Batteries-included voice pipeline framework for Go. This package provides a unified interface for speech-to-text (STT) and text-to-speech (TTS) with all providers included.

Resources

License

Stars

Watchers

Forks

Contributors

Languages