ElevenLabs API Case Study: How an Indie Game Studio Generated 200+ NPC Dialogue Lines in 48 Hours

From Casting Calls to API Calls: Replacing Traditional Voice Acting Pipelines

For indie game studios, voice acting is one of the most expensive and time-consuming production bottlenecks. Casting agencies, recording sessions, retakes, and post-processing can consume weeks of calendar time and thousands of dollars — even for a modest RPG with a handful of NPCs. This case study documents how a fictional but representative indie studio, Ironpine Games, used the ElevenLabs API to generate over 200 fully voiced NPC dialogue lines in just 48 hours. The workflow leveraged three core ElevenLabs features: Projects API, Voice Design presets, and Pronunciation Dictionaries — replacing what would have traditionally required a 3-week casting and recording pipeline.

The Challenge

212 dialogue lines across 14 unique NPCs for a fantasy RPG vertical slice- Budget constraint: under $500 total voice production cost- Timeline: 48 hours before a publisher demo- Each NPC needed a distinct, consistent voice with correct pronunciation of 30+ fictional proper nouns (place names, spells, lore terms)

Step 1: Environment Setup and Installation

Install the ElevenLabs Python SDK

pip install elevenlabs

Configure Your API Key

# Set your API key as an environment variable
export ELEVENLABS_API_KEY=YOUR_API_KEY

# Python initialization
from elevenlabs.client import ElevenLabs
client = ElevenLabs(api_key=“YOUR_API_KEY”)

Step 2: Design Unique NPC Voices with Voice Design

Instead of auditioning voice actors, Ironpine used the Voice Design API to generate distinct voice profiles for each NPC archetype — grizzled blacksmith, young apprentice, ancient oracle, and so on. from elevenlabs import VoiceDesign, Gender, Age, Accent

`Design a grizzled blacksmith voice`


blacksmith_preview = client.text_to_voice.create_previews(
voice_description=“A gruff, deep-voiced male blacksmith in his 50s with a slight rasp”,
text=“Aye, that blade will cost ye three gold crowns. No less.”
)
Listen to generated previews, then save the best one as a persistent voice
blacksmith_voice = client.text_to_voice.create_voice_from_preview(
voice_name=“Blacksmith_Gorath”,
voice_description=“Gruff male blacksmith NPC”,
generated_voice_id=blacksmith_preview.previews[0].generated_voice_id
)

print(f”Created voice: {blacksmith_voice.voice_id}“)

The team repeated this for all 14 NPCs, generating 2-3 preview variations per character and selecting the best fit — a process that took roughly 3 hours compared to weeks of casting calls.

Step 3: Create a Pronunciation Dictionary for Lore Terms

Fantasy games are full of invented words. Without a pronunciation dictionary, the TTS engine will guess — often incorrectly. ElevenLabs Pronunciation Dictionaries solve this definitively. import json


Create a pronunciation dictionary from a lexicon file
pronunciation_lexicon.pls is a PLS (Pronunciation Lexicon Specification) XML file
with open(“pronunciation_lexicon.pls”, “rb”) as f:
dictionary = client.pronunciation_dictionary.add_from_file(
file=f,
name=“ironpine_rpg_lore”,
description=“Pronunciation rules for all fantasy proper nouns”
)

print(f”Dictionary ID: {dictionary.id}”) print(f”Rules added: {dictionary.version_id}“)

Example PLS Lexicon File



  
Valdrethar
vɑːl.drɛ.θɑːr
  
  
Kythira
kɪ.θaɪ.rə
  
  
Aethermancy
iː.θɜːr.mæn.si

Step 4: Batch Generate All Dialogue with the Projects API

The Projects API is where the entire pipeline comes together. It allows you to organize chapters, assign voices per character, attach pronunciation dictionaries, and batch-convert an entire script. # Create a project for the RPG vertical slice project = client.projects.add( name="Ironpine RPG - Vertical Slice", default_model_id="eleven_multilingual_v2", pronunciation_dictionary_versions_locators=[ {"pronunciation_dictionary_id": dictionary.id, "version_id": dictionary.version_id} ], default_paragraph_voice_id=blacksmith_voice.voice_id )

print(f”Project created: {project.project_id}”)

# Add a chapter for each game area or quest
chapter = client.projects.add_chapter(
project_id=project.project_id,
name=“Chapter 1 - Village of Valdrethar”
)
print(f”Chapter ID: {chapter.chapter_id}“)

Bulk Upload Dialogue Lines via Script

import csv
import time

npc_voices = {
    "Gorath": "voice_id_blacksmith",
    "Lyra": "voice_id_apprentice",
    "Elder Morvyn": "voice_id_oracle",
    # ... 11 more NPC voice mappings
}

with open("dialogue_script.csv", "r") as f:
    reader = csv.DictReader(f)  # columns: npc_name, line_id, text
    for row in reader:
        voice_id = npc_voices.get(row["npc_name"])
        if not voice_id:
            continue

        audio = client.text_to_speech.convert(
            voice_id=voice_id,
            text=row["text"],
            model_id="eleven_multilingual_v2",
            pronunciation_dictionary_locators=[
                {"pronunciation_dictionary_id": dictionary.id, "version_id": dictionary.version_id}
            ]
        )

        filename = f"audio/{row['line_id']}.mp3"
        with open(filename, "wb") as out:
            for chunk in audio:
                out.write(chunk)

        print(f"Generated: {filename}")
        time.sleep(0.5)  # respect rate limits

Results Summary

Metric	Traditional Pipeline	ElevenLabs API Pipeline
Casting & auditions	5-7 days	3 hours (Voice Design)
Recording sessions	3-5 days	0 (API batch generation)
Pronunciation retakes	1-2 days	0 (dictionary-driven)
Post-processing	2-3 days	Minimal normalization
Total elapsed time	~15-20 days	~48 hours
Cost (212 lines)	$3,000-$8,000+	~$80-$150 API credits

## Pro Tips for Power Users - **Version your pronunciation dictionaries.** As your lore evolves during development, update the dictionary and re-generate only the affected lines. The version ID system makes this traceable.- **Use voice settings for emotional variation.** Adjust stability (lower = more expressive) and similarity_boost per line to convey anger, whispers, or excitement without needing separate voice profiles.- **Parallelize with async requests.** Use asyncio and httpx to generate multiple lines concurrently. Respect the concurrency limits on your plan tier.- **Export a voice map JSON.** Keep a single source-of-truth mapping npc_name → voice_id in version control so your entire team references the same voices.- **Tag lines with SSML-style markers.** Insert in dialogue text for natural pauses between sentences — especially useful for dramatic NPC monologues. ## Troubleshooting Common Issues

Error / Symptom	Cause	Fix
`401 Unauthorized`	Invalid or expired API key	Regenerate your key at elevenlabs.io dashboard and update the environment variable
`429 Too Many Requests`	Rate limit exceeded	Add exponential backoff or `time.sleep(1)` between calls; upgrade plan tier if persistent
Pronunciation dictionary not applied	Missing or incorrect `version_id`	Always pass both `pronunciation_dictionary_id` and `version_id` in the locator object
Voice sounds inconsistent between lines	Stability set too low	Increase `stability` to 0.6-0.75 for NPC dialogue; reserve low values for emotional peaks
Generated audio has clipping	Text contains unusual punctuation or symbols	Sanitize input text; remove stray unicode characters and excessive exclamation marks

## Frequently Asked Questions

Can I use ElevenLabs-generated voices commercially in a shipped game?

Yes. ElevenLabs allows commercial usage of generated audio on paid plans. The voices created through Voice Design are fully owned synthetic voices with no likeness rights concerns, making them ideal for indie game distribution on Steam, itch.io, or console storefronts. Always review the current terms of service for your specific plan tier.

How do I maintain voice consistency when generating hundreds of lines over multiple sessions?

Once you save a designed voice via create_voice_from_preview, it receives a persistent voice_id. All subsequent TTS calls using that ID produce consistent output. Keep stability at 0.5 or higher and use the same model_id across all generations. Avoid regenerating the voice profile mid-production.

What happens if I need to add new dialogue lines after the initial batch?

Simply run the same script with additional CSV rows. The voice IDs, pronunciation dictionary, and model settings remain unchanged. New lines will sound consistent with previously generated audio. For large additions, consider using the Projects API to organize new content into separate chapters for easier management.

Explore More Tools