Perplexity Sonar API Developer Guide: Build AI Search into Your Applications

What Is the Perplexity Sonar API and Why Developers Need It

The Perplexity Sonar API provides AI-powered web search with citations through a simple API call. Unlike traditional search APIs that return a list of links, Sonar returns synthesized answers grounded in real-time web sources — complete with inline citations you can verify.

For developers, this solves a common problem: building search or research features that provide answers, not just links. A customer support chatbot can answer questions with cited sources. A research tool can synthesize information from multiple web pages. A content application can fact-check claims against live web data.

The API follows the OpenAI chat completions format, making integration straightforward for anyone familiar with LLM APIs. The key difference is that Sonar searches the web in real time and includes source URLs in every response.

Getting Started

API Key Setup

  1. Create an account at perplexity.ai
  2. Navigate to API Settings
  3. Generate an API key
  4. Store the key securely (environment variable, secrets manager)

Your First API Call

import requests

API_KEY = "your-perplexity-api-key"

response = requests.post(
    "https://api.perplexity.ai/chat/completions",
    headers={
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    },
    json={
        "model": "sonar",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful research assistant. Provide concise, well-cited answers."
            },
            {
                "role": "user",
                "content": "What are the latest trends in AI code generation tools in 2026?"
            }
        ]
    }
)

data = response.json()
print(data["choices"][0]["message"]["content"])
print("\nSources:")
for citation in data.get("citations", []):
    print(f"  - {citation}")

Response Structure

{
  "id": "chatcmpl-abc123",
  "model": "sonar",
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "AI code generation has evolved significantly in 2026... [1][2][3]"
      },
      "finish_reason": "stop"
    }
  ],
  "citations": [
    "https://example.com/ai-code-tools-2026",
    "https://example.com/developer-survey-results",
    "https://example.com/github-copilot-vs-cursor"
  ],
  "usage": {
    "prompt_tokens": 45,
    "completion_tokens": 312,
    "total_tokens": 357
  }
}

The citations array contains URLs that correspond to the [1][2][3] markers in the response text.

Choosing the Right Sonar Model

ModelSpeedDepthCostBest For
sonarFast (2-5s)Standard$Quick answers, chatbots, autocomplete
sonar-proMedium (5-15s)Deep$$Research, detailed analysis, comparisons
sonar-deep-researchSlow (30-120s)Comprehensive$$$Due diligence, market research, academic

Model Selection Logic

def choose_model(query_type):
    if query_type == "quick_answer":
        return "sonar"  # Fast, cheap, good for simple questions
    elif query_type == "research":
        return "sonar-pro"  # Balanced depth and speed
    elif query_type == "deep_analysis":
        return "sonar-deep-research"  # Maximum depth, accepts latency
    else:
        return "sonar"  # Default to fast model

Advanced API Features

Streaming Responses

For real-time display (like ChatGPT’s typing effect):

response = requests.post(
    "https://api.perplexity.ai/chat/completions",
    headers={
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    },
    json={
        "model": "sonar",
        "messages": messages,
        "stream": True
    },
    stream=True
)

for line in response.iter_lines():
    if line:
        chunk = json.loads(line.decode("utf-8").removeprefix("data: "))
        if chunk["choices"][0].get("delta", {}).get("content"):
            print(chunk["choices"][0]["delta"]["content"], end="", flush=True)

Search Domain Filtering

Restrict search to specific domains:

{
    "model": "sonar",
    "messages": [...],
    "search_domain_filter": ["arxiv.org", "nature.com", "science.org"],
    "search_recency_filter": "week"
}

This is powerful for:

  • Academic research: limit to .edu and journal domains
  • News monitoring: limit to news outlets
  • Technical documentation: limit to official docs sites

Recency Filtering

Control how recent sources must be:

{
    "search_recency_filter": "day"    // Last 24 hours
    "search_recency_filter": "week"   // Last 7 days
    "search_recency_filter": "month"  // Last 30 days
    "search_recency_filter": "year"   // Last 12 months
}

Essential for news, market data, and any time-sensitive research.

System Prompt Engineering for Sonar

The system prompt significantly affects output quality:

// For factual Q&A
"You are a factual research assistant. Answer based only on the search
results. If the search results do not contain enough information, say so.
Always cite your sources with [1], [2], etc."

// For competitive analysis
"You are a market research analyst. Synthesize information from multiple
sources into a structured analysis. Include data points with citations.
Flag any conflicting information between sources."

// For customer support
"You are a support agent for [Product]. Search for relevant documentation
and community discussions. Provide step-by-step solutions. Cite the
specific documentation page for each recommendation."

Building Production Features

Research Assistant Integration

class ResearchAssistant:
    def __init__(self, api_key):
        self.api_key = api_key
        self.base_url = "https://api.perplexity.ai/chat/completions"

    def research(self, query, depth="standard", domains=None):
        model = {
            "quick": "sonar",
            "standard": "sonar-pro",
            "deep": "sonar-deep-research"
        }.get(depth, "sonar-pro")

        payload = {
            "model": model,
            "messages": [
                {"role": "system", "content": "You are a thorough research assistant. Provide detailed, well-cited answers organized by topic."},
                {"role": "user", "content": query}
            ]
        }

        if domains:
            payload["search_domain_filter"] = domains

        response = requests.post(
            self.base_url,
            headers={"Authorization": f"Bearer {self.api_key}", "Content-Type": "application/json"},
            json=payload
        )

        data = response.json()
        return {
            "answer": data["choices"][0]["message"]["content"],
            "citations": data.get("citations", []),
            "tokens_used": data["usage"]["total_tokens"]
        }

# Usage
assistant = ResearchAssistant("your-api-key")
result = assistant.research(
    "Compare pricing models of top 5 CRM platforms for small businesses",
    depth="standard",
    domains=["g2.com", "capterra.com", "trustradius.com"]
)
print(result["answer"])
print(f"Sources: {len(result['citations'])}")

Citation Display Component

Parse citation markers and link them to source URLs:

function formatCitations(text, citations) {
  // Replace [1], [2], etc. with clickable links
  return text.replace(/\[(\d+)\]/g, (match, num) => {
    const index = parseInt(num) - 1;
    if (citations[index]) {
      const domain = new URL(citations[index]).hostname;
      return `[${num}]`;
    }
    return match;
  });
}

Caching Strategy

Sonar searches the live web, so identical queries may return different results over time. Implement tiered caching:

import hashlib
from datetime import datetime, timedelta

class SonarCache:
    def __init__(self, cache_store):
        self.cache = cache_store

    def get_or_search(self, query, model, max_age_hours=1):
        cache_key = hashlib.sha256(f"{model}:{query}".encode()).hexdigest()

        # Check cache
        cached = self.cache.get(cache_key)
        if cached and cached["timestamp"] > datetime.now() - timedelta(hours=max_age_hours):
            return cached["result"]

        # Fresh search
        result = self._call_sonar(query, model)
        self.cache.set(cache_key, {
            "result": result,
            "timestamp": datetime.now()
        })
        return result

Cache durations by use case:

  • News/current events: 15-30 minutes
  • Market data: 1-4 hours
  • Reference information: 24-48 hours
  • Historical facts: 7+ days

Rate Limiting and Error Handling

import time
from tenacity import retry, wait_exponential, stop_after_attempt

@retry(
    wait=wait_exponential(multiplier=1, min=2, max=30),
    stop=stop_after_attempt(3)
)
def sonar_search(query, model="sonar"):
    response = requests.post(
        "https://api.perplexity.ai/chat/completions",
        headers={"Authorization": f"Bearer {API_KEY}"},
        json={"model": model, "messages": [{"role": "user", "content": query}]}
    )

    if response.status_code == 429:
        retry_after = int(response.headers.get("Retry-After", 5))
        time.sleep(retry_after)
        raise Exception("Rate limited")

    if response.status_code != 200:
        raise Exception(f"API error: {response.status_code}")

    return response.json()

Cost Management

Token Usage Optimization

  • Be specific in queries: “pricing of HubSpot CRM Starter plan in 2026” costs less than “tell me about CRM pricing”
  • Use appropriate models: do not use sonar-deep-research for simple lookups
  • Cache aggressively: identical queries within cache window should not hit the API
  • Limit response length: use max_tokens parameter for concise answers

Monitoring Costs

Track token usage per feature:

def track_usage(feature_name, usage_data):
    log_entry = {
        "feature": feature_name,
        "model": usage_data["model"],
        "prompt_tokens": usage_data["usage"]["prompt_tokens"],
        "completion_tokens": usage_data["usage"]["completion_tokens"],
        "total_tokens": usage_data["usage"]["total_tokens"],
        "estimated_cost": calculate_cost(usage_data),
        "timestamp": datetime.now().isoformat()
    }
    metrics_store.append(log_entry)

Frequently Asked Questions

Is the Sonar API compatible with OpenAI’s SDK?

Yes. The API follows the OpenAI chat completions format. You can use the OpenAI Python SDK by pointing the base_url to Perplexity’s endpoint and using your Perplexity API key.

How fresh are search results?

Sonar searches the live web in real time. Results are as current as what is indexed by search engines — typically within hours for major sites and minutes for news.

Can I use Sonar for real-time monitoring?

You can poll the API on a schedule, but there is no webhook or push notification feature. For monitoring, set up a cron job that queries Sonar and diffs against previous results.

What are the rate limits?

Rate limits depend on your plan tier. Check the API documentation for current limits. Implement exponential backoff for production reliability.

Can Sonar search in languages other than English?

Yes. Sonar supports queries and results in multiple languages. The depth of results depends on the availability of content in that language on the web.

Does the API support conversation context?

Yes. Pass multiple messages in the messages array for multi-turn conversations. Sonar maintains context across turns within a single API call.

Explore More Tools

Grok Best Practices for Academic Research and Literature Discovery: Leveraging X/Twitter for Scholarly Intelligence Best Practices Grok Best Practices for Content Strategy: Identify Trending Topics Before They Peak and Create Content That Captures Demand Best Practices Grok Case Study: How a DTC Beauty Brand Used Real-Time Social Listening to Save Their Product Launch Case Study Grok Case Study: How a Pharma Company Tracked Patient Sentiment During a Drug Launch and Caught a Safety Signal 48 Hours Before the FDA Case Study Grok Case Study: How a Disaster Relief Nonprofit Used Real-Time X/Twitter Monitoring to Coordinate Emergency Response 3x Faster Case Study Grok Case Study: How a Political Campaign Used X/Twitter Sentiment Analysis to Reshape Messaging and Win a Swing District Case Study How to Use Grok for Competitive Intelligence: Track Product Launches, Pricing Changes, and Market Positioning in Real Time How-To Grok vs Perplexity vs ChatGPT Search for Real-Time Information: Which AI Search Tool Is Most Accurate in 2026? Comparison How to Use Grok for Crisis Communication Monitoring: Detect, Assess, and Respond to PR Emergencies in Real Time How-To How to Use Grok for Product Improvement: Extract Customer Feedback Signals from X/Twitter That Your Support Team Misses How-To How to Use Grok for Conference Live Monitoring: Extract Event Insights and Identify Networking Opportunities in Real Time How-To How to Use Grok for Influencer Marketing: Discover, Vet, and Track Influencer Partnerships Using Real X/Twitter Data How-To How to Use Grok for Job Market Analysis: Track Industry Hiring Trends, Layoff Signals, and Salary Discussions on X/Twitter How-To How to Use Grok for Investor Relations: Track Earnings Sentiment, Analyst Reactions, and Shareholder Concerns in Real Time How-To How to Use Grok for Recruitment and Talent Intelligence: Identifying Hiring Signals from X/Twitter Data How-To How to Use Grok for Startup Fundraising Intelligence: Track Investor Sentiment, VC Activity, and Funding Trends on X/Twitter How-To How to Use Grok for Regulatory Compliance Monitoring: Real-Time Policy Tracking Across Industries How-To NotebookLM Best Practices for Financial Analysts: Due Diligence, Investment Research & Risk Factor Analysis Across SEC Filings Best Practices NotebookLM Best Practices for Teachers: Build Curriculum-Aligned Lesson Plans, Study Guides, and Assessment Materials from Your Own Resources Best Practices NotebookLM Case Study: How an Insurance Company Built a Claims Processing Training System That Cut Errors by 35% Case Study