Perplexity API Complete Setup Guide: API Key, Python SDK, Citation Parsing & Search Models
Perplexity API Complete Setup Guide
Perplexity AI offers a powerful API that combines large language models with real-time web search capabilities. This guide walks you through everything from obtaining your API key to parsing source citations and selecting the right search-augmented model for your use case.
Step 1: Create a Perplexity Account and Generate Your API Key
- Visit perplexity.ai and sign up for an account (or log in if you already have one).- Navigate to Settings → API or go directly to the API settings page.- Click Generate API Key. Copy the key immediately — it will only be displayed once.- Add billing information. Perplexity API uses a pay-per-request pricing model. You must have a valid payment method on file before making API calls.- Store your key securely. Never commit it to version control.
Store Your Key as an Environment Variable
# Linux / macOS
export PERPLEXITY_API_KEY=“YOUR_API_KEY”
Windows PowerShell
$env:PERPLEXITY_API_KEY=“YOUR_API_KEY”
Or add to a .env file (never commit this file)
echo “PERPLEXITY_API_KEY=YOUR_API_KEY” >> .env
Step 2: Install the Required Python Packages
Perplexity's API is compatible with the OpenAI SDK, so you can use the official openai Python library as your client.
# Install the OpenAI Python SDK
pip install openai
Optional: for environment variable management
pip install python-dotenv
Optional: for async workflows
pip install httpx
Step 3: Configure the Python Client
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ.get("PERPLEXITY_API_KEY", "YOUR_API_KEY"),
base_url="https://api.perplexity.ai"
)
That's it — three lines and you have a fully configured client. The key difference from standard OpenAI usage is the base_url parameter pointing to Perplexity's endpoint.
Step 4: Make Your First API Call
response = client.chat.completions.create(
model=“sonar-pro”,
messages=[
{“role”: “system”, “content”: “You are a helpful research assistant. Be concise and cite sources.”},
{“role”: “user”, “content”: “What are the latest developments in quantum computing in 2026?”}
]
)
print(response.choices[0].message.content)
Step 5: Parse Source Citations
One of Perplexity's most valuable features is returning source citations alongside generated answers. Citations are returned in the response object.
response = client.chat.completions.create(
model="sonar-pro",
messages=[
{"role": "user", "content": "What is retrieval-augmented generation?"}
]
)
Access the answer
answer = response.choices[0].message.content
print(“Answer:”, answer)
Access citations (returned in the response metadata)
if hasattr(response, ‘citations’):
citations = response.citations
print(“\nSources:”)
for i, citation in enumerate(citations, 1):
print(f” [{i}] {citation}”)
Building a Citation Formatter
import re
def format_response_with_citations(response):
"""Format Perplexity response with numbered source citations."""
content = response.choices[0].message.content
citations = getattr(response, 'citations', [])
if not citations:
return content
formatted = content + "\n\n--- Sources ---\n"
for i, url in enumerate(citations, 1):
formatted += f"[{i}] {url}\n"
return formatted
# Usage
result = format_response_with_citations(response)
print(result)
Step 6: Choose the Right Search-Augmented Model
Perplexity offers several models optimized for different use cases:
| Model | Best For | Context Window | Key Feature |
|---|---|---|---|
sonar | Quick lookups, simple Q&A | 128k tokens | Fast, cost-effective search |
sonar-pro | Complex research, multi-step reasoning | 200k tokens | Multi-search, deeper analysis |
sonar-reasoning | Math, logic, scientific tasks | 128k tokens | Extended thinking with search |
sonar-reasoning-pro | Advanced reasoning with research | 128k tokens | Best reasoning + search combo |
sonar-deep-research | Comprehensive research reports | 128k tokens | Exhaustive multi-step research |
# Example: Using sonar-reasoning for a complex query
response = client.chat.completions.create(
model="sonar-reasoning",
messages=[
{"role": "user", "content": "Compare the environmental impact of lithium-ion vs solid-state batteries."}
]
)
print(response.choices[0].message.content)
## Step 7: Advanced Configuration Options
# Fine-tune search behavior with additional parameters
response = client.chat.completions.create(
model="sonar-pro",
messages=[
{"role": "user", "content": "Latest Python 3.14 features"}
],
temperature=0.2, # Lower = more factual
max_tokens=1024, # Control response length
top_p=0.9, # Nucleus sampling
search_recency_filter="week", # Filter: day, week, month, year
return_related_questions=True # Get follow-up suggestions
)
## Pro Tips for Power Users
- **Use streaming for long responses:** Add stream=True to your API call and iterate over chunks for real-time output display.- **System prompts matter:** A well-crafted system prompt dramatically improves citation quality. Tell the model to "always cite sources" and "prefer recent publications."- **Batch requests efficiently:** For multiple queries, use Python's asyncio with httpx to run concurrent API calls and reduce total latency.- **Cache frequent queries:** Implement a simple dictionary or Redis cache keyed by query hash to avoid redundant API calls and reduce costs.- **Filter by recency:** Use search_recency_filter when freshness matters — set to "day" for breaking news or "month" for recent developments.- **Monitor usage:** Check your API dashboard regularly. Set billing alerts to avoid unexpected charges.
## Troubleshooting Common Errors
| Error | Cause | Fix |
|---|---|---|
401 Unauthorized | Invalid or missing API key | Verify your key is correct and the environment variable is loaded. Regenerate the key if needed. |
403 Forbidden | No billing method on file | Add a payment method in your Perplexity API settings before making calls. |
429 Too Many Requests | Rate limit exceeded | Implement exponential backoff. Default rate limits vary by plan tier. |
model_not_found | Incorrect model name | Double-check the model ID. Use sonar, sonar-pro, or sonar-reasoning. |
Connection refused | Wrong base URL | Ensure base_url is set to https://api.perplexity.ai exactly. |
# Robust error handling template import time from openai import OpenAI, APIError, RateLimitErrorclient = OpenAI( api_key=os.environ[“PERPLEXITY_API_KEY”], base_url=“https://api.perplexity.ai” )
def query_with_retry(prompt, model=“sonar-pro”, max_retries=3): for attempt in range(max_retries): try: response = client.chat.completions.create( model=model, messages=[{“role”: “user”, “content”: prompt}] ) return response except RateLimitError: wait = 2 ** attempt print(f”Rate limited. Retrying in {wait}s…”) time.sleep(wait) except APIError as e: print(f”API Error: {e}”) raise raise Exception(“Max retries exceeded”)
Frequently Asked Questions
Is the Perplexity API compatible with the OpenAI Python SDK?
Yes. Perplexity's API follows the OpenAI chat completions format. You simply install the openai package and point the base_url to https://api.perplexity.ai. All standard parameters like temperature, max_tokens, and stream work as expected, with additional Perplexity-specific options like search_recency_filter.
How do I access and parse citations from Perplexity API responses?
Citations are returned as part of the response object. Access them via response.citations, which provides a list of source URLs referenced in the generated answer. You can pair these with bracketed numbers in the response text to build fully attributed outputs for research or content workflows.
Which Perplexity model should I use for my project?
Use sonar for fast, simple lookups where speed and cost matter most. Choose sonar-pro for complex research requiring multiple search passes and deeper analysis. Pick sonar-reasoning or sonar-reasoning-pro for tasks that involve logic, math, or scientific analysis combined with real-time web data. For exhaustive multi-step reports, sonar-deep-research is the most thorough option.