Section X: Eleven Labs V3 — Advanced Prompting for Expressive TTS
Overview:
Eleven Labs’ v3 model elevates text-to-speech from “just speaking” to fully
performing—whispers, laughter, accents, multi-speaker dialogue, and nuanced emotional
depth. To harness it, you’ll combine the right voice, stability, and audio tags in your
prompts.
1. Voice Selection
● Choose a “Best Voices for V3” entry ( ✔️). These are pre-tuned for maximum
expressiveness.
● Custom voices (your own recordings) require fine-tuning before v3 can apply
advanced emotional tags.
2. Stability Setting
Controls adherence vs. creativity:
● Creative → Most emotional & flexible (↑ hallucination risk).
● Natural → Balanced, closest to recorded style.
● Robust → Most stable, least responsive to tags.
Tip: For tag-driven emotion, start at Natural or Creative; for repeatable
automation, slide toward Robust.
3. Prompt Structure & Length
● 250–1,000 chars yields the most consistent, expressive output.
● Ellipses (…) introduce natural pauses.
● Em-dashes (—) or parentheses add emphasis or side comments.
● Capitalization highlights key words: e.g. “WOW,” “UNBELIEVABLE.”
4. Audio Tags
Embed inline to dictate emotion, delivery, and non-verbal cues. Use before or mid-sentence:
● Voice Cues:
○ Whisper: “…”
○ Laughs: | Laughs harder: | Chuckles:
○ Sigh: | Exhale:
○ Sarcastic: | Curious: | Excited:
● Sound Effects:
○ Applause: | Gunshot: | Footsteps: | [Music]
● Accent Tags:
○ Strong French accent: | Mild Australian accent: | etc.
● Multi-Speaker:
○ Speaker A: / Speaker B: to alternate voices in the same clip.
Example Prompt:
vbnet
CopyEdit
Whisper: “Hey, Jessica…”
Laughs: “That is absolutely incredible!”
Strong Australian accent: “Fancy a cup of tea?”
Applause: “Thank you, everyone!”
5. Multi-Speaker Dialogue
● Prefix each line: A: / B: / C:
● Choose different V3 voices for each speaker.
● Maintain Natural or Creative stability for fluid back-and-forth.
Example:
vbnet
CopyEdit
A: Sigh: “I may have tried to debug myself….”
B: Sarcastic: “Oh wow, you really did it this time.”
A: Curious: “Why does my voice keep glitching?”
Best Practices Recap:
1. Voice: Pick a V3-verified voice.
2. Stability: Start at Natural/Creative; adjust toward Robust for repeatability.
3. Prompt Length: ≥250 characters for nuance.
4. Tags: Sprinkle emotional, delivery, and sound-effect tags inline.
5. Punctuation: Use ellipses and em-dashes for natural rhythm.
6. Multispeaker: Label each speaker and assign separate voices.