Perplexity Sonar API Developer Guide: Build AI Search into Your Applications

What Is the Perplexity Sonar API and Why Developers Need It

The Perplexity Sonar API provides AI-powered web search with citations through a simple API call. Unlike traditional search APIs that return a list of links, Sonar returns synthesized answers grounded in real-time web sources — complete with inline citations you can verify.

For developers, this solves a common problem: building search or research features that provide answers, not just links. A customer support chatbot can answer questions with cited sources. A research tool can synthesize information from multiple web pages. A content application can fact-check claims against live web data.

The API follows the OpenAI chat completions format, making integration straightforward for anyone familiar with LLM APIs. The key difference is that Sonar searches the web in real time and includes source URLs in every response.

Getting Started

API Key Setup

Create an account at perplexity.ai
Navigate to API Settings
Generate an API key
Store the key securely (environment variable, secrets manager)

Your First API Call

import requests

API_KEY = "your-perplexity-api-key"

response = requests.post(
    "https://api.perplexity.ai/chat/completions",
    headers={
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    },
    json={
        "model": "sonar",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful research assistant. Provide concise, well-cited answers."
            },
            {
                "role": "user",
                "content": "What are the latest trends in AI code generation tools in 2026?"
            }
        ]
    }
)

data = response.json()
print(data["choices"][0]["message"]["content"])
print("\nSources:")
for citation in data.get("citations", []):
    print(f"  - {citation}")

Response Structure

{
  "id": "chatcmpl-abc123",
  "model": "sonar",
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "AI code generation has evolved significantly in 2026... [1][2][3]"
      },
      "finish_reason": "stop"
    }
  ],
  "citations": [
    "https://example.com/ai-code-tools-2026",
    "https://example.com/developer-survey-results",
    "https://example.com/github-copilot-vs-cursor"
  ],
  "usage": {
    "prompt_tokens": 45,
    "completion_tokens": 312,
    "total_tokens": 357
  }
}

The citations array contains URLs that correspond to the [1][2][3] markers in the response text.

Choosing the Right Sonar Model

Model	Speed	Depth	Cost	Best For
sonar	Fast (2-5s)	Standard	$	Quick answers, chatbots, autocomplete
sonar-pro	Medium (5-15s)	Deep	$$	Research, detailed analysis, comparisons
sonar-deep-research	Slow (30-120s)	Comprehensive	$$$	Due diligence, market research, academic

Model Selection Logic

def choose_model(query_type):
    if query_type == "quick_answer":
        return "sonar"  # Fast, cheap, good for simple questions
    elif query_type == "research":
        return "sonar-pro"  # Balanced depth and speed
    elif query_type == "deep_analysis":
        return "sonar-deep-research"  # Maximum depth, accepts latency
    else:
        return "sonar"  # Default to fast model

Advanced API Features

Streaming Responses

For real-time display (like ChatGPT’s typing effect):

response = requests.post(
    "https://api.perplexity.ai/chat/completions",
    headers={
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    },
    json={
        "model": "sonar",
        "messages": messages,
        "stream": True
    },
    stream=True
)

for line in response.iter_lines():
    if line:
        chunk = json.loads(line.decode("utf-8").removeprefix("data: "))
        if chunk["choices"][0].get("delta", {}).get("content"):
            print(chunk["choices"][0]["delta"]["content"], end="", flush=True)

Search Domain Filtering

Restrict search to specific domains:

{
    "model": "sonar",
    "messages": [...],
    "search_domain_filter": ["arxiv.org", "nature.com", "science.org"],
    "search_recency_filter": "week"
}

This is powerful for:

Academic research: limit to .edu and journal domains
News monitoring: limit to news outlets
Technical documentation: limit to official docs sites

Recency Filtering

Control how recent sources must be:

{
    "search_recency_filter": "day"    // Last 24 hours
    "search_recency_filter": "week"   // Last 7 days
    "search_recency_filter": "month"  // Last 30 days
    "search_recency_filter": "year"   // Last 12 months
}

Essential for news, market data, and any time-sensitive research.

System Prompt Engineering for Sonar

The system prompt significantly affects output quality:

// For factual Q&A
"You are a factual research assistant. Answer based only on the search
results. If the search results do not contain enough information, say so.
Always cite your sources with [1], [2], etc."

// For competitive analysis
"You are a market research analyst. Synthesize information from multiple
sources into a structured analysis. Include data points with citations.
Flag any conflicting information between sources."

// For customer support
"You are a support agent for [Product]. Search for relevant documentation
and community discussions. Provide step-by-step solutions. Cite the
specific documentation page for each recommendation."

Building Production Features

Research Assistant Integration

class ResearchAssistant:
    def __init__(self, api_key):
        self.api_key = api_key
        self.base_url = "https://api.perplexity.ai/chat/completions"

    def research(self, query, depth="standard", domains=None):
        model = {
            "quick": "sonar",
            "standard": "sonar-pro",
            "deep": "sonar-deep-research"
        }.get(depth, "sonar-pro")

        payload = {
            "model": model,
            "messages": [
                {"role": "system", "content": "You are a thorough research assistant. Provide detailed, well-cited answers organized by topic."},
                {"role": "user", "content": query}
            ]
        }

        if domains:
            payload["search_domain_filter"] = domains

        response = requests.post(
            self.base_url,
            headers={"Authorization": f"Bearer {self.api_key}", "Content-Type": "application/json"},
            json=payload
        )

        data = response.json()
        return {
            "answer": data["choices"][0]["message"]["content"],
            "citations": data.get("citations", []),
            "tokens_used": data["usage"]["total_tokens"]
        }

# Usage
assistant = ResearchAssistant("your-api-key")
result = assistant.research(
    "Compare pricing models of top 5 CRM platforms for small businesses",
    depth="standard",
    domains=["g2.com", "capterra.com", "trustradius.com"]
)
print(result["answer"])
print(f"Sources: {len(result['citations'])}")

Citation Display Component

Parse citation markers and link them to source URLs:

function formatCitations(text, citations) {
  // Replace [1], [2], etc. with clickable links
  return text.replace(/\[(\d+)\]/g, (match, num) => {
    const index = parseInt(num) - 1;
    if (citations[index]) {
      const domain = new URL(citations[index]).hostname;
      return `[${num}]`;
    }
    return match;
  });
}

Caching Strategy

Sonar searches the live web, so identical queries may return different results over time. Implement tiered caching:

import hashlib
from datetime import datetime, timedelta

class SonarCache:
    def __init__(self, cache_store):
        self.cache = cache_store

    def get_or_search(self, query, model, max_age_hours=1):
        cache_key = hashlib.sha256(f"{model}:{query}".encode()).hexdigest()

        # Check cache
        cached = self.cache.get(cache_key)
        if cached and cached["timestamp"] > datetime.now() - timedelta(hours=max_age_hours):
            return cached["result"]

        # Fresh search
        result = self._call_sonar(query, model)
        self.cache.set(cache_key, {
            "result": result,
            "timestamp": datetime.now()
        })
        return result

Cache durations by use case:

News/current events: 15-30 minutes
Market data: 1-4 hours
Reference information: 24-48 hours
Historical facts: 7+ days

Rate Limiting and Error Handling

import time
from tenacity import retry, wait_exponential, stop_after_attempt

@retry(
    wait=wait_exponential(multiplier=1, min=2, max=30),
    stop=stop_after_attempt(3)
)
def sonar_search(query, model="sonar"):
    response = requests.post(
        "https://api.perplexity.ai/chat/completions",
        headers={"Authorization": f"Bearer {API_KEY}"},
        json={"model": model, "messages": [{"role": "user", "content": query}]}
    )

    if response.status_code == 429:
        retry_after = int(response.headers.get("Retry-After", 5))
        time.sleep(retry_after)
        raise Exception("Rate limited")

    if response.status_code != 200:
        raise Exception(f"API error: {response.status_code}")

    return response.json()

Cost Management

Token Usage Optimization

Be specific in queries: “pricing of HubSpot CRM Starter plan in 2026” costs less than “tell me about CRM pricing”
Use appropriate models: do not use sonar-deep-research for simple lookups
Cache aggressively: identical queries within cache window should not hit the API
Limit response length: use max_tokens parameter for concise answers

Monitoring Costs

Track token usage per feature:

def track_usage(feature_name, usage_data):
    log_entry = {
        "feature": feature_name,
        "model": usage_data["model"],
        "prompt_tokens": usage_data["usage"]["prompt_tokens"],
        "completion_tokens": usage_data["usage"]["completion_tokens"],
        "total_tokens": usage_data["usage"]["total_tokens"],
        "estimated_cost": calculate_cost(usage_data),
        "timestamp": datetime.now().isoformat()
    }
    metrics_store.append(log_entry)

Frequently Asked Questions

Is the Sonar API compatible with OpenAI’s SDK?

Yes. The API follows the OpenAI chat completions format. You can use the OpenAI Python SDK by pointing the base_url to Perplexity’s endpoint and using your Perplexity API key.

How fresh are search results?

Sonar searches the live web in real time. Results are as current as what is indexed by search engines — typically within hours for major sites and minutes for news.

Can I use Sonar for real-time monitoring?

You can poll the API on a schedule, but there is no webhook or push notification feature. For monitoring, set up a cron job that queries Sonar and diffs against previous results.

What are the rate limits?

Rate limits depend on your plan tier. Check the API documentation for current limits. Implement exponential backoff for production reliability.

Can Sonar search in languages other than English?

Yes. Sonar supports queries and results in multiple languages. The depth of results depends on the availability of content in that language on the web.

Does the API support conversation context?

Yes. Pass multiple messages in the messages array for multi-turn conversations. Sonar maintains context across turns within a single API call.

Explore More Tools