Perplexity Sonar API Developer Guide: Build AI Search into Your Applications
What Is the Perplexity Sonar API and Why Developers Need It
The Perplexity Sonar API provides AI-powered web search with citations through a simple API call. Unlike traditional search APIs that return a list of links, Sonar returns synthesized answers grounded in real-time web sources — complete with inline citations you can verify.
For developers, this solves a common problem: building search or research features that provide answers, not just links. A customer support chatbot can answer questions with cited sources. A research tool can synthesize information from multiple web pages. A content application can fact-check claims against live web data.
The API follows the OpenAI chat completions format, making integration straightforward for anyone familiar with LLM APIs. The key difference is that Sonar searches the web in real time and includes source URLs in every response.
Getting Started
API Key Setup
- Create an account at perplexity.ai
- Navigate to API Settings
- Generate an API key
- Store the key securely (environment variable, secrets manager)
Your First API Call
import requests
API_KEY = "your-perplexity-api-key"
response = requests.post(
"https://api.perplexity.ai/chat/completions",
headers={
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json"
},
json={
"model": "sonar",
"messages": [
{
"role": "system",
"content": "You are a helpful research assistant. Provide concise, well-cited answers."
},
{
"role": "user",
"content": "What are the latest trends in AI code generation tools in 2026?"
}
]
}
)
data = response.json()
print(data["choices"][0]["message"]["content"])
print("\nSources:")
for citation in data.get("citations", []):
print(f" - {citation}")
Response Structure
{
"id": "chatcmpl-abc123",
"model": "sonar",
"choices": [
{
"message": {
"role": "assistant",
"content": "AI code generation has evolved significantly in 2026... [1][2][3]"
},
"finish_reason": "stop"
}
],
"citations": [
"https://example.com/ai-code-tools-2026",
"https://example.com/developer-survey-results",
"https://example.com/github-copilot-vs-cursor"
],
"usage": {
"prompt_tokens": 45,
"completion_tokens": 312,
"total_tokens": 357
}
}
The citations array contains URLs that correspond to the [1][2][3] markers in the response text.
Choosing the Right Sonar Model
| Model | Speed | Depth | Cost | Best For |
|---|---|---|---|---|
| sonar | Fast (2-5s) | Standard | $ | Quick answers, chatbots, autocomplete |
| sonar-pro | Medium (5-15s) | Deep | $$ | Research, detailed analysis, comparisons |
| sonar-deep-research | Slow (30-120s) | Comprehensive | $$$ | Due diligence, market research, academic |
Model Selection Logic
def choose_model(query_type):
if query_type == "quick_answer":
return "sonar" # Fast, cheap, good for simple questions
elif query_type == "research":
return "sonar-pro" # Balanced depth and speed
elif query_type == "deep_analysis":
return "sonar-deep-research" # Maximum depth, accepts latency
else:
return "sonar" # Default to fast model
Advanced API Features
Streaming Responses
For real-time display (like ChatGPT’s typing effect):
response = requests.post(
"https://api.perplexity.ai/chat/completions",
headers={
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json"
},
json={
"model": "sonar",
"messages": messages,
"stream": True
},
stream=True
)
for line in response.iter_lines():
if line:
chunk = json.loads(line.decode("utf-8").removeprefix("data: "))
if chunk["choices"][0].get("delta", {}).get("content"):
print(chunk["choices"][0]["delta"]["content"], end="", flush=True)
Search Domain Filtering
Restrict search to specific domains:
{
"model": "sonar",
"messages": [...],
"search_domain_filter": ["arxiv.org", "nature.com", "science.org"],
"search_recency_filter": "week"
}
This is powerful for:
- Academic research: limit to
.eduand journal domains - News monitoring: limit to news outlets
- Technical documentation: limit to official docs sites
Recency Filtering
Control how recent sources must be:
{
"search_recency_filter": "day" // Last 24 hours
"search_recency_filter": "week" // Last 7 days
"search_recency_filter": "month" // Last 30 days
"search_recency_filter": "year" // Last 12 months
}
Essential for news, market data, and any time-sensitive research.
System Prompt Engineering for Sonar
The system prompt significantly affects output quality:
// For factual Q&A "You are a factual research assistant. Answer based only on the search results. If the search results do not contain enough information, say so. Always cite your sources with [1], [2], etc." // For competitive analysis "You are a market research analyst. Synthesize information from multiple sources into a structured analysis. Include data points with citations. Flag any conflicting information between sources." // For customer support "You are a support agent for [Product]. Search for relevant documentation and community discussions. Provide step-by-step solutions. Cite the specific documentation page for each recommendation."
Building Production Features
Research Assistant Integration
class ResearchAssistant:
def __init__(self, api_key):
self.api_key = api_key
self.base_url = "https://api.perplexity.ai/chat/completions"
def research(self, query, depth="standard", domains=None):
model = {
"quick": "sonar",
"standard": "sonar-pro",
"deep": "sonar-deep-research"
}.get(depth, "sonar-pro")
payload = {
"model": model,
"messages": [
{"role": "system", "content": "You are a thorough research assistant. Provide detailed, well-cited answers organized by topic."},
{"role": "user", "content": query}
]
}
if domains:
payload["search_domain_filter"] = domains
response = requests.post(
self.base_url,
headers={"Authorization": f"Bearer {self.api_key}", "Content-Type": "application/json"},
json=payload
)
data = response.json()
return {
"answer": data["choices"][0]["message"]["content"],
"citations": data.get("citations", []),
"tokens_used": data["usage"]["total_tokens"]
}
# Usage
assistant = ResearchAssistant("your-api-key")
result = assistant.research(
"Compare pricing models of top 5 CRM platforms for small businesses",
depth="standard",
domains=["g2.com", "capterra.com", "trustradius.com"]
)
print(result["answer"])
print(f"Sources: {len(result['citations'])}")
Citation Display Component
Parse citation markers and link them to source URLs:
function formatCitations(text, citations) {
// Replace [1], [2], etc. with clickable links
return text.replace(/\[(\d+)\]/g, (match, num) => {
const index = parseInt(num) - 1;
if (citations[index]) {
const domain = new URL(citations[index]).hostname;
return `[${num}]`;
}
return match;
});
}
Caching Strategy
Sonar searches the live web, so identical queries may return different results over time. Implement tiered caching:
import hashlib
from datetime import datetime, timedelta
class SonarCache:
def __init__(self, cache_store):
self.cache = cache_store
def get_or_search(self, query, model, max_age_hours=1):
cache_key = hashlib.sha256(f"{model}:{query}".encode()).hexdigest()
# Check cache
cached = self.cache.get(cache_key)
if cached and cached["timestamp"] > datetime.now() - timedelta(hours=max_age_hours):
return cached["result"]
# Fresh search
result = self._call_sonar(query, model)
self.cache.set(cache_key, {
"result": result,
"timestamp": datetime.now()
})
return result
Cache durations by use case:
- News/current events: 15-30 minutes
- Market data: 1-4 hours
- Reference information: 24-48 hours
- Historical facts: 7+ days
Rate Limiting and Error Handling
import time
from tenacity import retry, wait_exponential, stop_after_attempt
@retry(
wait=wait_exponential(multiplier=1, min=2, max=30),
stop=stop_after_attempt(3)
)
def sonar_search(query, model="sonar"):
response = requests.post(
"https://api.perplexity.ai/chat/completions",
headers={"Authorization": f"Bearer {API_KEY}"},
json={"model": model, "messages": [{"role": "user", "content": query}]}
)
if response.status_code == 429:
retry_after = int(response.headers.get("Retry-After", 5))
time.sleep(retry_after)
raise Exception("Rate limited")
if response.status_code != 200:
raise Exception(f"API error: {response.status_code}")
return response.json()
Cost Management
Token Usage Optimization
- Be specific in queries: “pricing of HubSpot CRM Starter plan in 2026” costs less than “tell me about CRM pricing”
- Use appropriate models: do not use sonar-deep-research for simple lookups
- Cache aggressively: identical queries within cache window should not hit the API
- Limit response length: use
max_tokensparameter for concise answers
Monitoring Costs
Track token usage per feature:
def track_usage(feature_name, usage_data):
log_entry = {
"feature": feature_name,
"model": usage_data["model"],
"prompt_tokens": usage_data["usage"]["prompt_tokens"],
"completion_tokens": usage_data["usage"]["completion_tokens"],
"total_tokens": usage_data["usage"]["total_tokens"],
"estimated_cost": calculate_cost(usage_data),
"timestamp": datetime.now().isoformat()
}
metrics_store.append(log_entry)
Frequently Asked Questions
Is the Sonar API compatible with OpenAI’s SDK?
Yes. The API follows the OpenAI chat completions format. You can use the OpenAI Python SDK by pointing the base_url to Perplexity’s endpoint and using your Perplexity API key.
How fresh are search results?
Sonar searches the live web in real time. Results are as current as what is indexed by search engines — typically within hours for major sites and minutes for news.
Can I use Sonar for real-time monitoring?
You can poll the API on a schedule, but there is no webhook or push notification feature. For monitoring, set up a cron job that queries Sonar and diffs against previous results.
What are the rate limits?
Rate limits depend on your plan tier. Check the API documentation for current limits. Implement exponential backoff for production reliability.
Can Sonar search in languages other than English?
Yes. Sonar supports queries and results in multiple languages. The depth of results depends on the availability of content in that language on the web.
Does the API support conversation context?
Yes. Pass multiple messages in the messages array for multi-turn conversations. Sonar maintains context across turns within a single API call.