Perplexity API Complete Setup Guide: Key Generation, Python SDK, Citation Parsing & Search Mode

Q: Can I use the Perplexity API without real-time search?

The Sonar models are inherently search-augmented. You can influence search scope using domain filters and recency filters but cannot disable search entirely.

Q: How do I manage costs when using the Perplexity API?

Monitor usage via response.usage, set max_tokens limits, choose appropriate model tiers, and configure billing alerts in your dashboard.

Perplexity API Complete Setup Guide

Perplexity AI offers a powerful API that combines large language models with real-time web search capabilities. This guide walks you through everything from obtaining your API key to advanced citation parsing and search-augmented generation workflows.

Step 1: Create a Perplexity Account and Generate Your API Key

Visit perplexity.ai and sign up or log in to your account.- Navigate to Settings → API or go directly to perplexity.ai/settings/api.- Click Generate API Key and copy the key immediately — it will only be shown once.- Add billing credits to your account. Perplexity API uses a pay-per-use credit system separate from a Pro subscription.Important: Your API key starts with pplx-. Store it securely and never commit it to version control.

Step 2: Install the Required Python Packages

Perplexity’s API is compatible with the OpenAI SDK format, so you can use the official OpenAI Python client. # Install the OpenAI Python SDK pip install openai


Optional: install python-dotenv for environment variable management
pip install python-dotenv
Optional: install httpx for direct REST calls

pip install httpx

Step 3: Configure Your Environment

Create a .env file in your project root to manage your API key securely: # .env PPLX_API_KEY=pplx-YOUR_API_KEY

Then load it in Python: import os from dotenv import load_dotenv from openai import OpenAI

load_dotenv()

client = OpenAI( api_key=os.getenv(“PPLX_API_KEY”), base_url=“https://api.perplexity.ai” )

Step 4: Make Your First API Call

Perplexity uses a chat completions endpoint identical in structure to the OpenAI API: response = client.chat.completions.create( model="sonar", messages=[ {"role": "system", "content": "You are a helpful research assistant. Be precise and cite sources."}, {"role": "user", "content": "What are the latest developments in quantum computing in 2026?"} ] )

print(response.choices[0].message.content)

Available Models

Model	Description	Best For
`sonar`	Standard search-augmented model	General queries with web sources
`sonar-pro`	Advanced multi-step search model	Complex research tasks
`sonar-reasoning`	Extended thinking with search	Deep analysis and reasoning
`sonar-reasoning-pro`	Premium reasoning model	Most complex research workflows

## Step 5: Parse Source Citations from Responses

One of Perplexity's most powerful features is inline citations. The API returns source URLs that you can extract and display. import json

response = client.chat.completions.create( model=“sonar”, messages=[ {“role”: “user”, “content”: “What is retrieval-augmented generation?”} ] )


Access the main response
answer = response.choices[0].message.content
print(“Answer:”, answer)
Extract citations from the response object

citations = getattr(response, “citations”, None) if citations: print(“\nSources:”) for i, url in enumerate(citations, 1): print(f” [{i}] {url}“)

Advanced Citation Mapping

Map inline reference numbers in the text to their corresponding URLs: import re

def parse_citations(content, citations): """Map inline [n] references to source URLs.""" ref_pattern = re.compile(r’[(\d+)]’) refs_used = sorted(set(int(m) for m in ref_pattern.findall(content)))


mapped = {}
for ref_num in refs_used:
    idx = ref_num - 1
    if 0 <= idx < len(citations):
        mapped[ref_num] = citations[idx]

return mapped

citation_map = parse_citations(answer, citations or []) for num, url in citation_map.items(): print(f”[{num}] → {url}“)

Step 6: Use Search-Enhanced Mode with Parameters

Control the search behavior with additional parameters: response = client.chat.completions.create( model="sonar", messages=[ {"role": "system", "content": "Provide detailed technical analysis."}, {"role": "user", "content": "Compare FastAPI vs Django for REST API development"} ], temperature=0.2, max_tokens=1024, search_domain_filter=["stackoverflow.com", "github.com", "docs.python.org"], search_recency_filter="week" ) ### Search Parameter Reference

Parameter	Type	Description
`search_domain_filter`	list	Limit search to specific domains (max 3)
`search_recency_filter`	string	Filter by time: `hour`, `day`, `week`, `month`
`return_related_questions`	bool	Get follow-up question suggestions
`temperature`	float	Controls randomness (0.0–2.0)

## Step 7: Direct REST API Usage with cURL

If you prefer direct HTTP calls without an SDK: curl -X POST https://api.perplexity.ai/chat/completions \ -H "Authorization: Bearer pplx-YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "sonar", "messages": [ {"role": "user", "content": "Explain WebAssembly use cases"} ], "max_tokens": 512 }' ## Step 8: Streaming Responses

For real-time output, enable streaming: stream = client.chat.completions.create( model="sonar", messages=[ {"role": "user", "content": "Summarize recent AI regulation news"} ], stream=True )

for chunk in stream: delta = chunk.choices[0].delta.content if delta: print(delta, end="", flush=True)

Pro Tips for Power Users

System prompts matter: Use detailed system messages to control output format, citation style, and response depth. Perplexity’s search results quality improves with specific system instructions.- Chain queries for deep research: Use sonar-pro for multi-step research. Feed the output of one query as context into the next for iterative exploration.- Domain filtering for accuracy: When researching technical topics, restrict searches to authoritative domains like official docs and peer-reviewed sources using search_domain_filter.- Cost optimization: Use sonar for simple factual queries and reserve sonar-pro or reasoning models for complex analysis. Monitor token usage via response.usage.- Structured output: Request JSON output by specifying format in your system prompt and parsing the response accordingly.

Troubleshooting Common Errors

Error	Cause	Solution
`401 Unauthorized`	Invalid or expired API key	Regenerate your key at perplexity.ai/settings/api
`403 Forbidden`	Insufficient credits	Add billing credits to your Perplexity account
`429 Too Many Requests`	Rate limit exceeded	Implement exponential backoff; default is 50 req/min
`Model not found`	Incorrect model name	Use exact model names: `sonar`, `sonar-pro`, etc.
`Connection error`	Wrong base_url	Ensure base_url is `https://api.perplexity.ai`

## Frequently Asked Questions

Q1: Is the Perplexity API the same as a Perplexity Pro subscription?

No. The API and Pro subscription are separate products with independent billing. A Pro subscription gives you unlimited searches on the web app, while the API uses a pay-per-token credit system. You need to add API credits separately even if you have a Pro plan.

Q2: Can I use the Perplexity API without real-time search?

The Sonar models are inherently search-augmented — web search is a core feature. If you need a pure LLM without search, consider using a different provider. However, you can influence search behavior using domain filters and recency filters to narrow or focus the search scope.

Q3: How do I manage costs when using the Perplexity API?

Monitor usage via the response.usage object which returns prompt_tokens and completion_tokens. Use max_tokens to cap response length. Choose the appropriate model tier — sonar is significantly cheaper than sonar-pro or reasoning variants. Set up billing alerts in your Perplexity dashboard to avoid unexpected charges.

Explore More Tools