Voice
The @voltagent/voice package provides text-to-speech (TTS) and speech-to-text (STT) capabilities through various providers. It allows your applications to speak text and transcribe audio with minimal setup.
Installation
Install the package using your preferred package manager:
- npm
- yarn
- pnpm
npm install @voltagent/voice
yarn add @voltagent/voice
pnpm add @voltagent/voice
Supported Providers
- OpenAI: High-quality voices and transcription.
- ElevenLabs: Realistic, customizable voices.
- xsAI: Lightweight OpenAI-compatible voice API.
Basic Usage
First, initialize a voice provider instance.
Initialize a Voice Provider
// Initialize with OpenAI
import { OpenAIVoiceProvider } from "@voltagent/voice";
const openAIVoice = new OpenAIVoiceProvider({
apiKey: process.env.OPENAI_API_KEY, // Ensure API key is set in environment variables
ttsModel: "tts-1",
voice: "alloy", // Available voices: alloy, echo, fable, onyx, nova, shimmer
});
// Or initialize with ElevenLabs
import { ElevenLabsVoiceProvider } from "@voltagent/voice";
const elevenLabsVoice = new ElevenLabsVoiceProvider({
apiKey: process.env.ELEVENLABS_API_KEY, // Ensure API key is set
ttsModel: "eleven_multilingual_v2",
voice: "Rachel", // Example voice ID
});
// Or initialize with xsAI
import { XsAIVoiceProvider } from "@voltagent/voice";
const xsAIVoice = new XsAIVoiceProvider({
apiKey: process.env.OPENAI_API_KEY!,
ttsModel: "tts-1",
voice: "alloy",
// If you are not using OpenAI, simply specify the `baseURL`
});
Note: It's recommended to manage API keys securely, for example, using environment variables.
Text-to-Speech (TTS)
Convert text into an audio stream. You can then process this stream, for example, by saving it to a file.
import { createWriteStream } from "node:fs";
import { PassThrough } from "node:stream";
import { pipeline } from "node:stream/promises"; // Use pipeline for better error handling
// --- Example 1: Basic Speak and Save to File ---
console.log("Generating audio...");
// Get the audio stream for the text
const audioStream = await openAIVoice.speak("Hello from VoltAgent!");
console.log("Saving audio to output.mp3...");
// Create a file stream to write the audio
const fileStream = createWriteStream("output.mp3");
try {
// Pipe the audio stream to the file stream and wait for completion
await pipeline(audioStream, fileStream);
console.log("Audio successfully saved to output.mp3");
} catch (error) {
console.error("Failed to save audio:", error);
}
// --- Example 2: Speak with Options and Save ---
console.log("Generating custom audio...");
const customAudioStream = await elevenLabsVoice.speak("Speaking faster now.", {
// Provider-specific options can be passed here.
// For OpenAI, you might use:
// voice: "nova", // Override the default voice
// speed: 1.5, // Adjust speaking speed (1.0 is default)
});
console.log("Saving custom audio to custom_output.mp3...");
const customFileStream = createWriteStream("custom_output.mp3");
try {
// Pipe the custom audio stream to the file stream
await pipeline(customAudioStream, customFileStream);
console.log("Custom audio successfully saved to custom_output.mp3");
} catch (error) {
console.error("Failed to save custom audio:", error);
}
// Note: The audioStream is a standard Node.js Readable stream.
// You can pipe it to other destinations or process it directly.