How to Build a Multi-Step Document Review Pipeline with Claude API Using Tool Use and Prompt Chaining

Build a Multi-Step Document Review Pipeline with Claude API

Contract analysis demands more than a single LLM call. By combining Claude’s tool use capability with prompt chaining, you can build a robust, multi-step document review pipeline that extracts clauses, flags risks, and produces structured summaries—all orchestrated programmatically. This guide walks you through the complete implementation.

Prerequisites

  • Python 3.9 or later- An Anthropic API key (console.anthropic.com)- Basic familiarity with REST APIs and JSON

Step 1: Install the Anthropic Python SDK

Set up your environment and install the official SDK: pip install anthropic export ANTHROPIC_API_KEY=“YOUR_API_KEY”

Verify the installation: python -c “import anthropic; print(anthropic.version)“

Step 2: Define Your Tool Schemas

Tools let Claude call structured functions during generation. Define tools that map to each stage of your pipeline: tools = [ { "name": "extract_clauses", "description": "Extract key clauses from a legal contract, including termination, liability, indemnification, and confidentiality sections.", "input_schema": { "type": "object", "properties": { "contract_text": { "type": "string", "description": "The full text of the contract to analyze" } }, "required": ["contract_text"] } }, { "name": "assess_risk", "description": "Evaluate extracted clauses for legal risk on a scale of low, medium, or high, and provide reasoning.", "input_schema": { "type": "object", "properties": { "clauses": { "type": "array", "items": {"type": "string"}, "description": "List of extracted clause texts" } }, "required": ["clauses"] } }, { "name": "generate_summary", "description": "Produce a structured executive summary with recommended actions based on the risk assessment.", "input_schema": { "type": "object", "properties": { "risk_report": { "type": "string", "description": "The complete risk assessment output" } }, "required": ["risk_report"] } } ] ## Step 3: Implement the Tool Handlers

Each tool call from Claude triggers a local handler. These handlers process the structured input and return results back into the conversation: import json

def handle_tool_call(tool_name, tool_input): if tool_name == “extract_clauses”: # In production, use NLP or regex-based extraction return json.dumps({ “clauses”: [ {“type”: “Termination”, “text”: tool_input[“contract_text”][:200]}, {“type”: “Liability”, “text”: “Liability limited to fees paid in prior 12 months.”}, {“type”: “Indemnification”, “text”: “Mutual indemnification for third-party IP claims.”} ] }) elif tool_name == “assess_risk”: risks = [] for clause in tool_input[“clauses”]: risks.append({“clause”: clause[:80], “risk_level”: “medium”, “reason”: “Requires legal review”}) return json.dumps({“risk_assessment”: risks}) elif tool_name == “generate_summary”: return json.dumps({ “summary”: “Contract contains moderate risk. Recommend legal counsel review liability cap and indemnification scope.”, “action_items”: [“Review liability cap”, “Negotiate indemnification terms”, “Confirm termination notice period”] }) return json.dumps({“error”: “Unknown tool”})

Step 4: Build the Prompt Chain Loop

The core of the pipeline is an agentic loop that sends messages to Claude, processes tool calls, and feeds results back until the chain completes: import anthropic

client = anthropic.Anthropic() # Uses ANTHROPIC_API_KEY env var

def run_pipeline(contract_text): messages = [ { “role”: “user”, “content”: f"""Analyze this contract through a complete review pipeline:

  1. First, extract all key clauses using the extract_clauses tool.
  2. Then, assess the risk of each clause using the assess_risk tool.
  3. Finally, generate an executive summary using the generate_summary tool.

Contract: {contract_text}""" } ]

# Agentic loop: keep processing until Claude stops calling tools
while True:
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=4096,
        tools=tools,
        messages=messages
    )

    # Collect tool results for this turn
    tool_results = []
    has_tool_use = False

    for block in response.content:
        if block.type == "tool_use":
            has_tool_use = True
            result = handle_tool_call(block.name, block.input)
            tool_results.append({
                "type": "tool_result",
                "tool_use_id": block.id,
                "content": result
            })
            print(f"  [Step] {block.name} completed")

    if not has_tool_use:
        # No more tool calls — extract final text response
        final_text = "".join(
            block.text for block in response.content if hasattr(block, "text")
        )
        return final_text

    # Append assistant response and all tool results
    messages.append({"role": "assistant", "content": response.content})
    messages.append({"role": "user", "content": tool_results})

Execute the pipeline

contract = “This agreement between Party A and Party B governs…” result = run_pipeline(contract) print(result)

Step 5: Run and Validate

Execute the pipeline from the command line: python contract_pipeline.py

Expected output flow: - **extract_clauses** — Claude identifies and structures key contractual provisions- **assess_risk** — Each clause receives a risk rating with justification- **generate_summary** — An executive summary with action items is produced ## Pipeline Architecture Overview

StageTool CalledInputOutput
1. Extractionextract_clausesRaw contract textStructured clause list
2. Risk Assessmentassess_riskExtracted clausesRisk ratings with reasoning
3. Summarizationgenerate_summaryRisk reportExecutive summary + action items
## Pro Tips for Power Users - **Use claude-opus-4-6 for complex contracts** — Opus handles nuanced legal language more accurately. Switch to claude-sonnet-4-6 for cost-effective batch processing.- **Add a validation tool** — Create a fourth tool that cross-checks extracted clauses against a compliance checklist before risk assessment.- **Parallelize independent steps** — If clauses are independent, break the risk assessment into parallel tool calls by having Claude invoke assess_risk multiple times in one turn.- **Cache intermediate results** — Store extraction outputs to avoid re-processing when iterating on downstream prompts.- **Set temperature: 0** — For deterministic legal analysis, set temperature to zero in your API call to reduce variability across runs.- **Stream responses** — Use client.messages.stream() to get real-time feedback on long contracts instead of waiting for the full response. ## Troubleshooting
ErrorCauseFix
tool_use_id not foundTool result ID doesn't match the tool call IDEnsure you pass block.id from the tool_use block as tool_use_id in the result
max_tokens exceededContract text plus tool outputs exceed token limitChunk large contracts into sections and process each chunk separately
authentication_errorMissing or invalid API keyVerify ANTHROPIC_API_KEY is set: echo $ANTHROPIC_API_KEY
tool not found in tools listTool name in Claude's response doesn't match defined toolsDouble-check tool name strings are identical in schema and handler
Infinite loopClaude keeps calling tools without convergingAdd a max iteration counter (e.g., 10) to break the while loop
## Frequently Asked Questions

Can I use prompt chaining with tool use for documents other than contracts?

Yes. The same pattern applies to any multi-step document workflow—financial reports, medical records, compliance audits, or research papers. Simply redefine your tool schemas to match the extraction and analysis steps required for your document type. The agentic loop structure remains identical.

How do I handle contracts that exceed Claude’s context window?

For contracts longer than the model’s context limit, implement a chunking strategy. Split the document into logical sections (e.g., by article or heading), run the extraction tool on each chunk independently, then merge the extracted clauses before passing them to the risk assessment tool. This keeps each API call within token limits while preserving full coverage.

What is the cost of running a full contract analysis pipeline?

Cost depends on the model and contract length. A typical 10-page contract with three tool-use turns on claude-sonnet-4-6 costs approximately $0.05–$0.15 per analysis. Using claude-opus-4-6 increases cost roughly 5x but provides better accuracy on complex legal language. Use the Anthropic usage dashboard at console.anthropic.com to monitor spending and set budget alerts.

Explore More Tools

Grok Best Practices for Academic Research and Literature Discovery: Leveraging X/Twitter for Scholarly Intelligence Best Practices Grok Best Practices for Content Strategy: Identify Trending Topics Before They Peak and Create Content That Captures Demand Best Practices Grok Case Study: How a DTC Beauty Brand Used Real-Time Social Listening to Save Their Product Launch Case Study Grok Case Study: How a Pharma Company Tracked Patient Sentiment During a Drug Launch and Caught a Safety Signal 48 Hours Before the FDA Case Study Grok Case Study: How a Disaster Relief Nonprofit Used Real-Time X/Twitter Monitoring to Coordinate Emergency Response 3x Faster Case Study Grok Case Study: How a Political Campaign Used X/Twitter Sentiment Analysis to Reshape Messaging and Win a Swing District Case Study How to Use Grok for Competitive Intelligence: Track Product Launches, Pricing Changes, and Market Positioning in Real Time How-To Grok vs Perplexity vs ChatGPT Search for Real-Time Information: Which AI Search Tool Is Most Accurate in 2026? Comparison How to Use Grok for Crisis Communication Monitoring: Detect, Assess, and Respond to PR Emergencies in Real Time How-To How to Use Grok for Product Improvement: Extract Customer Feedback Signals from X/Twitter That Your Support Team Misses How-To How to Use Grok for Conference Live Monitoring: Extract Event Insights and Identify Networking Opportunities in Real Time How-To How to Use Grok for Influencer Marketing: Discover, Vet, and Track Influencer Partnerships Using Real X/Twitter Data How-To How to Use Grok for Job Market Analysis: Track Industry Hiring Trends, Layoff Signals, and Salary Discussions on X/Twitter How-To How to Use Grok for Investor Relations: Track Earnings Sentiment, Analyst Reactions, and Shareholder Concerns in Real Time How-To How to Use Grok for Recruitment and Talent Intelligence: Identifying Hiring Signals from X/Twitter Data How-To How to Use Grok for Startup Fundraising Intelligence: Track Investor Sentiment, VC Activity, and Funding Trends on X/Twitter How-To How to Use Grok for Regulatory Compliance Monitoring: Real-Time Policy Tracking Across Industries How-To NotebookLM Best Practices for Financial Analysts: Due Diligence, Investment Research & Risk Factor Analysis Across SEC Filings Best Practices NotebookLM Best Practices for Teachers: Build Curriculum-Aligned Lesson Plans, Study Guides, and Assessment Materials from Your Own Resources Best Practices NotebookLM Case Study: How an Insurance Company Built a Claims Processing Training System That Cut Errors by 35% Case Study