Claude Prompt Engineering Best Practices: System Prompts, Few-Shot Examples & Chain-of-Thought Techniques
Maximize Claude’s Response Quality with Proven Prompt Engineering Techniques
Getting the best results from Claude requires more than just asking questions. Strategic prompt engineering—through well-designed system prompts, carefully placed few-shot examples, and chain-of-thought reasoning—can dramatically improve output accuracy, consistency, and relevance. This guide covers practical, workflow-oriented techniques you can implement immediately.
1. System Prompt Design: Setting the Foundation
The system prompt establishes Claude’s persona, constraints, and output expectations before any user interaction begins. A well-structured system prompt is the single most impactful lever for response quality.
Anatomy of an Effective System Prompt
import anthropic
client = anthropic.Anthropic(api_key=“YOUR_API_KEY”)
response = client.messages.create(
model=“claude-sonnet-4-20250514”,
max_tokens=1024,
system="""You are a senior backend engineer specializing in Python and PostgreSQL.
RULES:
- Always provide production-ready code with error handling.
- Use type hints in all Python code.
- When suggesting database queries, include index recommendations.
- If a question is ambiguous, ask a clarifying question before answering.
OUTPUT FORMAT:
- Start with a one-sentence summary.
- Follow with code blocks.
End with potential pitfalls or edge cases.""", messages=[ {“role”: “user”, “content”: “How should I implement rate limiting for my API?”} ] ) print(response.content[0].text)
System Prompt Structure Checklist
- Role definition — Who is Claude in this context?- Behavioral constraints — What should Claude always or never do?- Output format specification — Structure, length, and style expectations.- Domain boundaries — What topics are in or out of scope?- Fallback behavior — How to handle ambiguity or missing information.
2. Few-Shot Example Placement: Teaching by Demonstration
Few-shot prompting gives Claude concrete input-output pairs so it can pattern-match your expectations. Placement and quality of examples matter significantly.
Basic Few-Shot Pattern
response = client.messages.create(
model=“claude-sonnet-4-20250514”,
max_tokens=512,
system=“You extract structured data from unstructured product reviews.”,
messages=[
{“role”: “user”, “content”: “Review: ‘The battery lasts forever but the screen is too dim outdoors.’”},
{“role”: “assistant”, “content”: ’{“sentiment”: “mixed”, “pros”: [“battery life”], “cons”: [“screen brightness outdoors”], “score”: 3.5}’},
{“role”: “user”, “content”: “Review: ‘Absolutely terrible. Broke after two days and customer support ghosted me.’”},
{“role”: “assistant”, “content”: ’{“sentiment”: “negative”, “pros”: [], “cons”: [“durability”, “customer support”], “score”: 1.0}’},
{“role”: “user”, “content”: “Review: ‘Best purchase this year. Fast shipping, great build quality, and the app integration is seamless.’”}
]
)
print(response.content[0].text)
Few-Shot Placement Rules
| Strategy | When to Use | Example Count |
|---|---|---|
| In system prompt | Universal formatting rules | 1–2 examples |
| As conversation turns | Task-specific patterns | 2–4 examples |
| Mixed (system + turns) | Complex structured outputs | 1 system + 2–3 turns |
Chain-of-thought prompting instructs Claude to show its reasoning process before arriving at a conclusion. This is critical for math, logic, multi-step analysis, and decision-making tasks.
Explicit CoT with XML Tags
response = client.messages.create(
model=“claude-sonnet-4-20250514”,
max_tokens=2048,
system="""You are a financial analyst. When answering questions:
- Think through your reasoning inside
tags.
- Show calculations step by step.
- Provide your final answer inside
tags.
The user will NOT see the
Extended Thinking (Built-in CoT)
Claude models support a native extended thinking feature via the API, which allocates a dedicated reasoning budget before generating the response.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=8000,
thinking={
"type": "enabled",
"budget_tokens": 5000
},
messages=[
{"role": "user", "content": "Design a database schema for a multi-tenant SaaS application with row-level security."}
]
)
for block in response.content:
if block.type == “thinking”:
print(“[Reasoning]”, block.thinking)
elif block.type == “text”:
print(“[Answer]”, block.text)
4. Installation & Setup
Get started with the Anthropic Python SDK:
# Install the SDK
pip install anthropic
Set your API key as an environment variable
export ANTHROPIC_API_KEY=“YOUR_API_KEY”
Verify installation
python -c “import anthropic; print(anthropic.version)“
Or use the API directly via cURL:
curl https://api.anthropic.com/v1/messages
-H “x-api-key: YOUR_API_KEY”
-H “content-type: application/json”
-H “anthropic-version: 2023-06-01”
-d ’{
“model”: “claude-sonnet-4-20250514”,
“max_tokens”: 1024,
“system”: “You are a helpful coding assistant.”,
“messages”: [{“role”: “user”, “content”: “Explain async/await in Python.”}]
}‘
5. Pro Tips for Power Users
- Use XML tags for structure — Claude responds exceptionally well to XML-delimited sections like
,, and<output_format>within prompts.- Prefill the assistant turn — Start Claude’s response by providing an opening in the assistant message to steer format (e.g.,{“role”: “assistant”, “content”: ”{”}forces JSON output).- Separate data from instructions — Place long documents or data inside clearly labeled XML tags so Claude doesn’t confuse content with instructions.- Temperature tuning — Usetemperature=0for deterministic tasks (data extraction, classification) andtemperature=0.7–1.0for creative writing or brainstorming.- Batch API for scale — For high-volume prompt workflows, use the Message Batches API to process thousands of prompts at 50% reduced cost.- Cache system prompts — Use prompt caching with thecache_controlparameter to reduce latency and cost when reusing large system prompts.
6. Troubleshooting Common Issues
| Problem | Cause | Solution |
|---|---|---|
| Claude ignores system prompt instructions | Conflicting or vague rules | Prioritize rules with numbered lists; place the most critical constraint first. |
| Output format is inconsistent | No few-shot examples provided | Add 2–3 concrete input/output examples in the conversation turns. |
| Responses are too verbose | No length constraint specified | Add explicit instruction: "Respond in under 200 words" or "Be concise." |
| JSON output contains markdown fences | Claude defaults to markdown formatting | Prefill assistant turn with { and instruct: "Output raw JSON only, no markdown." |
| Rate limit errors (429) | Too many concurrent requests | Implement exponential backoff or switch to the Batch API. |
| Extended thinking returns empty | Budget too low for complex task | Increase budget_tokens to at least 4000–8000 for complex reasoning. |
What is the ideal length for a Claude system prompt?
There is no hard limit, but aim for 200–800 words for most use cases. Claude can handle system prompts exceeding 10,000 tokens effectively, especially with prompt caching enabled. The key is clarity and structure—use sections, numbered rules, and XML tags rather than writing dense paragraphs. Longer system prompts work well when they contain reference material, but keep behavioral instructions concise and front-loaded.
How many few-shot examples should I include for best results?
For most tasks, 2–4 examples strike the best balance between quality and token efficiency. One example establishes the pattern, two confirm it, and three to four handle edge cases. For highly nuanced tasks like sentiment analysis with custom scales, go up to 5–6 examples. Beyond that, returns diminish and you consume tokens that could be used for the actual response. Always include at least one edge case or negative example.
When should I use extended thinking versus manual chain-of-thought prompting?
Use extended thinking (the thinking parameter) when you want Claude to reason internally without exposing the reasoning to end users—ideal for production applications. Use manual CoT with XML tags like when you need to inspect, debug, or log the reasoning process during development. Extended thinking is also more effective for extremely complex tasks because it allocates dedicated compute to reasoning before the response generation begins.