How to Build a Multi-Step Document Review Pipeline with Claude API Using Tool Use and Prompt Chaining
Build a Multi-Step Document Review Pipeline with Claude API
Contract analysis demands more than a single LLM call. By combining Claude’s tool use capability with prompt chaining, you can build a robust, multi-step document review pipeline that extracts clauses, flags risks, and produces structured summaries—all orchestrated programmatically. This guide walks you through the complete implementation.
Prerequisites
- Python 3.9 or later- An Anthropic API key (console.anthropic.com)- Basic familiarity with REST APIs and JSON
Step 1: Install the Anthropic Python SDK
Set up your environment and install the official SDK:
pip install anthropic
export ANTHROPIC_API_KEY=“YOUR_API_KEY”
Verify the installation:
python -c “import anthropic; print(anthropic.version)“
Step 2: Define Your Tool Schemas
Tools let Claude call structured functions during generation. Define tools that map to each stage of your pipeline:
tools = [
{
"name": "extract_clauses",
"description": "Extract key clauses from a legal contract, including termination, liability, indemnification, and confidentiality sections.",
"input_schema": {
"type": "object",
"properties": {
"contract_text": {
"type": "string",
"description": "The full text of the contract to analyze"
}
},
"required": ["contract_text"]
}
},
{
"name": "assess_risk",
"description": "Evaluate extracted clauses for legal risk on a scale of low, medium, or high, and provide reasoning.",
"input_schema": {
"type": "object",
"properties": {
"clauses": {
"type": "array",
"items": {"type": "string"},
"description": "List of extracted clause texts"
}
},
"required": ["clauses"]
}
},
{
"name": "generate_summary",
"description": "Produce a structured executive summary with recommended actions based on the risk assessment.",
"input_schema": {
"type": "object",
"properties": {
"risk_report": {
"type": "string",
"description": "The complete risk assessment output"
}
},
"required": ["risk_report"]
}
}
]
## Step 3: Implement the Tool Handlers
Each tool call from Claude triggers a local handler. These handlers process the structured input and return results back into the conversation:
import json
def handle_tool_call(tool_name, tool_input):
if tool_name == “extract_clauses”:
# In production, use NLP or regex-based extraction
return json.dumps({
“clauses”: [
{“type”: “Termination”, “text”: tool_input[“contract_text”][:200]},
{“type”: “Liability”, “text”: “Liability limited to fees paid in prior 12 months.”},
{“type”: “Indemnification”, “text”: “Mutual indemnification for third-party IP claims.”}
]
})
elif tool_name == “assess_risk”:
risks = []
for clause in tool_input[“clauses”]:
risks.append({“clause”: clause[:80], “risk_level”: “medium”, “reason”: “Requires legal review”})
return json.dumps({“risk_assessment”: risks})
elif tool_name == “generate_summary”:
return json.dumps({
“summary”: “Contract contains moderate risk. Recommend legal counsel review liability cap and indemnification scope.”,
“action_items”: [“Review liability cap”, “Negotiate indemnification terms”, “Confirm termination notice period”]
})
return json.dumps({“error”: “Unknown tool”})
Step 4: Build the Prompt Chain Loop
The core of the pipeline is an agentic loop that sends messages to Claude, processes tool calls, and feeds results back until the chain completes:
import anthropic
client = anthropic.Anthropic() # Uses ANTHROPIC_API_KEY env var
def run_pipeline(contract_text):
messages = [
{
“role”: “user”,
“content”: f"""Analyze this contract through a complete review pipeline:
- First, extract all key clauses using the extract_clauses tool.
- Then, assess the risk of each clause using the assess_risk tool.
- Finally, generate an executive summary using the generate_summary tool.
Contract:
{contract_text}"""
}
]
# Agentic loop: keep processing until Claude stops calling tools
while True:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=4096,
tools=tools,
messages=messages
)
# Collect tool results for this turn
tool_results = []
has_tool_use = False
for block in response.content:
if block.type == "tool_use":
has_tool_use = True
result = handle_tool_call(block.name, block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result
})
print(f" [Step] {block.name} completed")
if not has_tool_use:
# No more tool calls — extract final text response
final_text = "".join(
block.text for block in response.content if hasattr(block, "text")
)
return final_text
# Append assistant response and all tool results
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": tool_results})
Execute the pipeline
contract = “This agreement between Party A and Party B governs…”
result = run_pipeline(contract)
print(result)
Step 5: Run and Validate
Execute the pipeline from the command line:
python contract_pipeline.py
Expected output flow: - **extract_clauses** — Claude identifies and structures key contractual provisions- **assess_risk** — Each clause receives a risk rating with justification- **generate_summary** — An executive summary with action items is produced ## Pipeline Architecture Overview
| Stage | Tool Called | Input | Output |
|---|---|---|---|
| 1. Extraction | extract_clauses | Raw contract text | Structured clause list |
| 2. Risk Assessment | assess_risk | Extracted clauses | Risk ratings with reasoning |
| 3. Summarization | generate_summary | Risk report | Executive summary + action items |
claude-opus-4-6 for complex contracts** — Opus handles nuanced legal language more accurately. Switch to claude-sonnet-4-6 for cost-effective batch processing.- **Add a validation tool** — Create a fourth tool that cross-checks extracted clauses against a compliance checklist before risk assessment.- **Parallelize independent steps** — If clauses are independent, break the risk assessment into parallel tool calls by having Claude invoke assess_risk multiple times in one turn.- **Cache intermediate results** — Store extraction outputs to avoid re-processing when iterating on downstream prompts.- **Set temperature: 0** — For deterministic legal analysis, set temperature to zero in your API call to reduce variability across runs.- **Stream responses** — Use client.messages.stream() to get real-time feedback on long contracts instead of waiting for the full response.
## Troubleshooting
| Error | Cause | Fix |
|---|---|---|
tool_use_id not found | Tool result ID doesn't match the tool call ID | Ensure you pass block.id from the tool_use block as tool_use_id in the result |
max_tokens exceeded | Contract text plus tool outputs exceed token limit | Chunk large contracts into sections and process each chunk separately |
authentication_error | Missing or invalid API key | Verify ANTHROPIC_API_KEY is set: echo $ANTHROPIC_API_KEY |
tool not found in tools list | Tool name in Claude's response doesn't match defined tools | Double-check tool name strings are identical in schema and handler |
| Infinite loop | Claude keeps calling tools without converging | Add a max iteration counter (e.g., 10) to break the while loop |
Can I use prompt chaining with tool use for documents other than contracts?
Yes. The same pattern applies to any multi-step document workflow—financial reports, medical records, compliance audits, or research papers. Simply redefine your tool schemas to match the extraction and analysis steps required for your document type. The agentic loop structure remains identical.
How do I handle contracts that exceed Claude’s context window?
For contracts longer than the model’s context limit, implement a chunking strategy. Split the document into logical sections (e.g., by article or heading), run the extraction tool on each chunk independently, then merge the extracted clauses before passing them to the risk assessment tool. This keeps each API call within token limits while preserving full coverage.
What is the cost of running a full contract analysis pipeline?
Cost depends on the model and contract length. A typical 10-page contract with three tool-use turns on claude-sonnet-4-6 costs approximately $0.05–$0.15 per analysis. Using claude-opus-4-6 increases cost roughly 5x but provides better accuracy on complex legal language. Use the Anthropic usage dashboard at console.anthropic.com to monitor spending and set budget alerts.