ElevenLabs API Case Study: How an Indie Game Studio Generated 200+ NPC Dialogue Lines in 48 Hours

From Casting Calls to API Calls: Replacing Traditional Voice Acting Pipelines

For indie game studios, voice acting is one of the most expensive and time-consuming production bottlenecks. Casting agencies, recording sessions, retakes, and post-processing can consume weeks of calendar time and thousands of dollars — even for a modest RPG with a handful of NPCs. This case study documents how a fictional but representative indie studio, Ironpine Games, used the ElevenLabs API to generate over 200 fully voiced NPC dialogue lines in just 48 hours. The workflow leveraged three core ElevenLabs features: Projects API, Voice Design presets, and Pronunciation Dictionaries — replacing what would have traditionally required a 3-week casting and recording pipeline.

The Challenge

  • 212 dialogue lines across 14 unique NPCs for a fantasy RPG vertical slice- Budget constraint: under $500 total voice production cost- Timeline: 48 hours before a publisher demo- Each NPC needed a distinct, consistent voice with correct pronunciation of 30+ fictional proper nouns (place names, spells, lore terms)

Step 1: Environment Setup and Installation

Install the ElevenLabs Python SDK

pip install elevenlabs

Configure Your API Key

# Set your API key as an environment variable
export ELEVENLABS_API_KEY=YOUR_API_KEY
# Python initialization

from elevenlabs.client import ElevenLabs

client = ElevenLabs(api_key=“YOUR_API_KEY”)

Step 2: Design Unique NPC Voices with Voice Design

Instead of auditioning voice actors, Ironpine used the Voice Design API to generate distinct voice profiles for each NPC archetype — grizzled blacksmith, young apprentice, ancient oracle, and so on. from elevenlabs import VoiceDesign, Gender, Age, Accent

Design a grizzled blacksmith voice

blacksmith_preview = client.text_to_voice.create_previews( voice_description=“A gruff, deep-voiced male blacksmith in his 50s with a slight rasp”, text=“Aye, that blade will cost ye three gold crowns. No less.” )

Listen to generated previews, then save the best one as a persistent voice

blacksmith_voice = client.text_to_voice.create_voice_from_preview( voice_name=“Blacksmith_Gorath”, voice_description=“Gruff male blacksmith NPC”, generated_voice_id=blacksmith_preview.previews[0].generated_voice_id )

print(f”Created voice: {blacksmith_voice.voice_id}“)

The team repeated this for all 14 NPCs, generating 2-3 preview variations per character and selecting the best fit — a process that took roughly 3 hours compared to weeks of casting calls.

Step 3: Create a Pronunciation Dictionary for Lore Terms

Fantasy games are full of invented words. Without a pronunciation dictionary, the TTS engine will guess — often incorrectly. ElevenLabs Pronunciation Dictionaries solve this definitively. import json

Create a pronunciation dictionary from a lexicon file

pronunciation_lexicon.pls is a PLS (Pronunciation Lexicon Specification) XML file

with open(“pronunciation_lexicon.pls”, “rb”) as f: dictionary = client.pronunciation_dictionary.add_from_file( file=f, name=“ironpine_rpg_lore”, description=“Pronunciation rules for all fantasy proper nouns” )

print(f”Dictionary ID: {dictionary.id}”) print(f”Rules added: {dictionary.version_id}“)

Example PLS Lexicon File



  
Valdrethar
vɑːl.drɛ.θɑːr
  
  
Kythira
kɪ.θaɪ.rə
  
  
Aethermancy
iː.θɜːr.mæn.si
  

Step 4: Batch Generate All Dialogue with the Projects API

The Projects API is where the entire pipeline comes together. It allows you to organize chapters, assign voices per character, attach pronunciation dictionaries, and batch-convert an entire script. # Create a project for the RPG vertical slice project = client.projects.add( name="Ironpine RPG - Vertical Slice", default_model_id="eleven_multilingual_v2", pronunciation_dictionary_versions_locators=[ {"pronunciation_dictionary_id": dictionary.id, "version_id": dictionary.version_id} ], default_paragraph_voice_id=blacksmith_voice.voice_id )

print(f”Project created: {project.project_id}”)

# Add a chapter for each game area or quest
chapter = client.projects.add_chapter(
project_id=project.project_id,
name=“Chapter 1 - Village of Valdrethar”
)

print(f”Chapter ID: {chapter.chapter_id}“)

Bulk Upload Dialogue Lines via Script

import csv
import time

npc_voices = {
    "Gorath": "voice_id_blacksmith",
    "Lyra": "voice_id_apprentice",
    "Elder Morvyn": "voice_id_oracle",
    # ... 11 more NPC voice mappings
}

with open("dialogue_script.csv", "r") as f:
    reader = csv.DictReader(f)  # columns: npc_name, line_id, text
    for row in reader:
        voice_id = npc_voices.get(row["npc_name"])
        if not voice_id:
            continue

        audio = client.text_to_speech.convert(
            voice_id=voice_id,
            text=row["text"],
            model_id="eleven_multilingual_v2",
            pronunciation_dictionary_locators=[
                {"pronunciation_dictionary_id": dictionary.id, "version_id": dictionary.version_id}
            ]
        )

        filename = f"audio/{row['line_id']}.mp3"
        with open(filename, "wb") as out:
            for chunk in audio:
                out.write(chunk)

        print(f"Generated: {filename}")
        time.sleep(0.5)  # respect rate limits

Results Summary

MetricTraditional PipelineElevenLabs API Pipeline
Casting & auditions5-7 days3 hours (Voice Design)
Recording sessions3-5 days0 (API batch generation)
Pronunciation retakes1-2 days0 (dictionary-driven)
Post-processing2-3 daysMinimal normalization
Total elapsed time~15-20 days~48 hours
Cost (212 lines)$3,000-$8,000+~$80-$150 API credits
## Pro Tips for Power Users - **Version your pronunciation dictionaries.** As your lore evolves during development, update the dictionary and re-generate only the affected lines. The version ID system makes this traceable.- **Use voice settings for emotional variation.** Adjust stability (lower = more expressive) and similarity_boost per line to convey anger, whispers, or excitement without needing separate voice profiles.- **Parallelize with async requests.** Use asyncio and httpx to generate multiple lines concurrently. Respect the concurrency limits on your plan tier.- **Export a voice map JSON.** Keep a single source-of-truth mapping npc_name → voice_id in version control so your entire team references the same voices.- **Tag lines with SSML-style markers.** Insert in dialogue text for natural pauses between sentences — especially useful for dramatic NPC monologues. ## Troubleshooting Common Issues
Error / SymptomCauseFix
401 UnauthorizedInvalid or expired API keyRegenerate your key at elevenlabs.io dashboard and update the environment variable
429 Too Many RequestsRate limit exceededAdd exponential backoff or time.sleep(1) between calls; upgrade plan tier if persistent
Pronunciation dictionary not appliedMissing or incorrect version_idAlways pass both pronunciation_dictionary_id and version_id in the locator object
Voice sounds inconsistent between linesStability set too lowIncrease stability to 0.6-0.75 for NPC dialogue; reserve low values for emotional peaks
Generated audio has clippingText contains unusual punctuation or symbolsSanitize input text; remove stray unicode characters and excessive exclamation marks
## Frequently Asked Questions

Can I use ElevenLabs-generated voices commercially in a shipped game?

Yes. ElevenLabs allows commercial usage of generated audio on paid plans. The voices created through Voice Design are fully owned synthetic voices with no likeness rights concerns, making them ideal for indie game distribution on Steam, itch.io, or console storefronts. Always review the current terms of service for your specific plan tier.

How do I maintain voice consistency when generating hundreds of lines over multiple sessions?

Once you save a designed voice via create_voice_from_preview, it receives a persistent voice_id. All subsequent TTS calls using that ID produce consistent output. Keep stability at 0.5 or higher and use the same model_id across all generations. Avoid regenerating the voice profile mid-production.

What happens if I need to add new dialogue lines after the initial batch?

Simply run the same script with additional CSV rows. The voice IDs, pronunciation dictionary, and model settings remain unchanged. New lines will sound consistent with previously generated audio. For large additions, consider using the Projects API to organize new content into separate chapters for easier management.

Explore More Tools

Grok Best Practices for Academic Research and Literature Discovery: Leveraging X/Twitter for Scholarly Intelligence Best Practices Grok Best Practices for Content Strategy: Identify Trending Topics Before They Peak and Create Content That Captures Demand Best Practices Grok Case Study: How a DTC Beauty Brand Used Real-Time Social Listening to Save Their Product Launch Case Study Grok Case Study: How a Pharma Company Tracked Patient Sentiment During a Drug Launch and Caught a Safety Signal 48 Hours Before the FDA Case Study Grok Case Study: How a Disaster Relief Nonprofit Used Real-Time X/Twitter Monitoring to Coordinate Emergency Response 3x Faster Case Study Grok Case Study: How a Political Campaign Used X/Twitter Sentiment Analysis to Reshape Messaging and Win a Swing District Case Study How to Use Grok for Competitive Intelligence: Track Product Launches, Pricing Changes, and Market Positioning in Real Time How-To Grok vs Perplexity vs ChatGPT Search for Real-Time Information: Which AI Search Tool Is Most Accurate in 2026? Comparison How to Use Grok for Crisis Communication Monitoring: Detect, Assess, and Respond to PR Emergencies in Real Time How-To How to Use Grok for Product Improvement: Extract Customer Feedback Signals from X/Twitter That Your Support Team Misses How-To How to Use Grok for Conference Live Monitoring: Extract Event Insights and Identify Networking Opportunities in Real Time How-To How to Use Grok for Influencer Marketing: Discover, Vet, and Track Influencer Partnerships Using Real X/Twitter Data How-To How to Use Grok for Job Market Analysis: Track Industry Hiring Trends, Layoff Signals, and Salary Discussions on X/Twitter How-To How to Use Grok for Investor Relations: Track Earnings Sentiment, Analyst Reactions, and Shareholder Concerns in Real Time How-To How to Use Grok for Recruitment and Talent Intelligence: Identifying Hiring Signals from X/Twitter Data How-To How to Use Grok for Startup Fundraising Intelligence: Track Investor Sentiment, VC Activity, and Funding Trends on X/Twitter How-To How to Use Grok for Regulatory Compliance Monitoring: Real-Time Policy Tracking Across Industries How-To NotebookLM Best Practices for Financial Analysts: Due Diligence, Investment Research & Risk Factor Analysis Across SEC Filings Best Practices NotebookLM Best Practices for Teachers: Build Curriculum-Aligned Lesson Plans, Study Guides, and Assessment Materials from Your Own Resources Best Practices NotebookLM Case Study: How an Insurance Company Built a Claims Processing Training System That Cut Errors by 35% Case Study