How to Build Your Own AI Chatbot with ChatGPT API and Claude API - Complete Beginner's Guide
Introduction: Why Build Your Own AI Chatbot?
Off-the-shelf chatbots like ChatGPT and Claude are powerful, but they come with limitations. You can’t control the system prompt in production, you can’t embed them directly into your product, and you’re stuck with whatever interface the provider gives you. Building your own chatbot using the APIs behind these models changes everything.
This guide walks you through building a fully functional AI chatbot from scratch using both the OpenAI (ChatGPT) API and the Anthropic (Claude) API. By the end, you’ll have a working chatbot application that runs locally, maintains conversation history, and can be extended into a customer support bot, a writing assistant, or any other conversational tool you can imagine.
This guide is written for developers who know basic Python or JavaScript but have never worked with LLM APIs before. No machine learning background is required. You don’t need to understand transformers, tokenization, or neural networks — just how to write a function and call an API endpoint.
Expected time to complete: 2-3 hours for the full guide. If you just want a working prototype, you can have one running in under 30 minutes by following Steps 1 through 4.
Difficulty level: Beginner to Intermediate. If you’ve ever made an HTTP request in code, you have the prerequisite skills.
Prerequisites
Before you start building, make sure you have the following ready:
- Python 3.8+ or Node.js 18+ installed on your machine. This guide uses Python for examples, but the concepts translate directly to JavaScript.
- An OpenAI API key — Sign up at
platform.openai.com. New accounts get $5 in free credits. After that, GPT-3.5 Turbo costs roughly $0.002 per 1K tokens (~750 words), and GPT-4o costs approximately $0.005 per 1K input tokens. - An Anthropic API key — Sign up at
console.anthropic.com. New accounts receive $5 in free credits. Claude 3.5 Sonnet costs about $0.003 per 1K input tokens. - A code editor — VS Code, PyCharm, or any editor you’re comfortable with.
- Basic terminal/command line familiarity — You’ll need to run scripts and install packages.
- pip or npm — For installing the official SDK packages.
Estimated cost for following this entire guide: Under $1 in API credits if you use the recommended models.
Step-by-Step Instructions
Step 1: Set Up Your Project Environment
Create a dedicated project directory and set up a virtual environment. This keeps your chatbot dependencies isolated from other Python projects on your machine.
mkdir my-ai-chatbot
cd my-ai-chatbot
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
Install both SDK packages:
pip install openai anthropic python-dotenv
Create a .env file in your project root to store your API keys securely:
OPENAI_API_KEY=sk-your-openai-key-here
ANTHROPIC_API_KEY=sk-ant-your-anthropic-key-here
**Important:** Add .env to your .gitignore file immediately. Never commit API keys to version control. This is the single most common security mistake beginners make with API projects.
Step 2: Make Your First ChatGPT API Call
Let’s start with the OpenAI API because it’s the one most developers encounter first. Create a file called chatgpt_basic.py:
import os
from dotenv import load_dotenv
from openai import OpenAI
load_dotenv()
client = OpenAI(api_key=os.getenv(“OPENAI_API_KEY”))
response = client.chat.completions.create(
model=“gpt-4o”,
messages=[
{“role”: “system”, “content”: “You are a helpful coding assistant.”},
{“role”: “user”, “content”: “What is an API key and why do I need one?”}
],
max_tokens=500,
temperature=0.7
)
print(response.choices[0].message.content)
Run it with python chatgpt_basic.py. You should see a clear, helpful explanation printed to your terminal. Congratulations — you just made your first LLM API call.
Key parameters to understand:
- model: Which model to use.
gpt-4ois the current best balance of speed, quality, and cost.gpt-3.5-turbois cheaper but less capable. - messages: The conversation history array. Each message has a
role(system, user, or assistant) andcontent. - temperature: Controls randomness. 0 means deterministic; 1.0 means more creative. 0.7 is a solid default for conversational bots.
- max_tokens: The maximum length of the response. 1 token ≈ 0.75 words in English.
Step 3: Make Your First Claude API Call
Now let’s do the same thing with the Anthropic Claude API. Create claude_basic.py:
import os
from dotenv import load_dotenv
import anthropic
load_dotenv()
client = anthropic.Anthropic(api_key=os.getenv(“ANTHROPIC_API_KEY”))
response = client.messages.create(
model=“claude-sonnet-4-20250514”,
max_tokens=500,
system=“You are a helpful coding assistant.”,
messages=[
{“role”: “user”, “content”: “What is an API key and why do I need one?”}
]
)
print(response.content[0].text)
Notice the structural differences from the OpenAI API:
- The system prompt is a separate parameter in Claude, not part of the messages array.
- The response structure uses
response.content[0].textinstead ofresponse.choices[0].message.content. - Claude uses
max_tokensas a required parameter, not optional.
Both APIs follow the same mental model — you send a conversation, you get a response — but the implementation details differ. This is exactly why building a wrapper that supports both is valuable.
Step 4: Build a Unified Chatbot Class
Now let’s build something practical: a single chatbot class that works with either provider. Create chatbot.py:
import os
from dotenv import load_dotenv
from openai import OpenAI
import anthropic
load_dotenv()
class AIChatbot:
def init(self, provider=“openai”, system_prompt=“You are a helpful assistant.”):
self.provider = provider
self.system_prompt = system_prompt
self.conversation_history = []
if provider == "openai":
self.client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
self.model = "gpt-4o"
elif provider == "anthropic":
self.client = anthropic.Anthropic(
api_key=os.getenv("ANTHROPIC_API_KEY")
)
self.model = "claude-sonnet-4-20250514"
else:
raise ValueError(f"Unknown provider: {provider}")
def send_message(self, user_message):
self.conversation_history.append(
{"role": "user", "content": user_message}
)
if self.provider == "openai":
response = self._call_openai()
else:
response = self._call_anthropic()
self.conversation_history.append(
{"role": "assistant", "content": response}
)
return response
def _call_openai(self):
messages = [
{"role": "system", "content": self.system_prompt}
] + self.conversation_history
response = self.client.chat.completions.create(
model=self.model,
messages=messages,
max_tokens=1024,
temperature=0.7
)
return response.choices[0].message.content
def _call_anthropic(self):
response = self.client.messages.create(
model=self.model,
max_tokens=1024,
system=self.system_prompt,
messages=self.conversation_history
)
return response.content[0].text
def reset(self):
self.conversation_history = []</code></pre>
This class handles the differences between the two APIs internally. Your application code just calls send_message() regardless of which provider is active. The conversation history is maintained automatically, so the chatbot remembers context across multiple exchanges.
Step 5: Add a Terminal Chat Interface
Let’s make the chatbot interactive. Create main.py:
from chatbot import AIChatbot
def main():
print(“Choose your AI provider:”)
print(“1. OpenAI (ChatGPT)”)
print(“2. Anthropic (Claude)”)
choice = input(“Enter 1 or 2: “).strip()
provider = "openai" if choice == "1" else "anthropic"
bot = AIChatbot(
provider=provider,
system_prompt="You are a friendly and knowledgeable assistant. "
"Keep responses concise but thorough."
)
print(f"\nChatbot ready ({provider}). Type 'quit' to exit.")
print("Type 'switch' to change provider.")
print("Type 'reset' to clear conversation history.\n")
while True:
user_input = input("You: ").strip()
if not user_input:
continue
if user_input.lower() == "quit":
print("Goodbye!")
break
if user_input.lower() == "reset":
bot.reset()
print("Conversation history cleared.\n")
continue
if user_input.lower() == "switch":
new_provider = (
"anthropic" if bot.provider == "openai" else "openai"
)
bot = AIChatbot(provider=new_provider,
system_prompt=bot.system_prompt)
print(f"Switched to {new_provider}.\n")
continue
response = bot.send_message(user_input)
print(f"\nBot: {response}\n")
if name == “main”:
main()
Run it with python main.py and you have a working chatbot that can switch between ChatGPT and Claude mid-conversation.
Step 6: Add Streaming Responses
Real chatbots don’t make you wait for the entire response before showing anything. Streaming lets you display tokens as they arrive, creating a much better user experience. Here’s how to add streaming to both providers:
def _call_openai_stream(self):
messages = [
{“role”: “system”, “content”: self.system_prompt}
] + self.conversation_history
stream = self.client.chat.completions.create(
model=self.model,
messages=messages,
max_tokens=1024,
temperature=0.7,
stream=True
)
full_response = ""
for chunk in stream:
if chunk.choices[0].delta.content:
token = chunk.choices[0].delta.content
print(token, end="", flush=True)
full_response += token
print() # newline after streaming completes
return full_response
def _call_anthropic_stream(self):
full_response = ""
with self.client.messages.stream(
model=self.model,
max_tokens=1024,
system=self.system_prompt,
messages=self.conversation_history
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
full_response += text
print()
return full_response
Streaming is essential for production chatbots. A response that takes 3 seconds to generate feels instant when the first token appears in 200ms. Both OpenAI and Anthropic support streaming with minimal code changes — you just add stream=True and iterate over chunks instead of waiting for the complete response.
Step 7: Add Error Handling and Rate Limiting
Production chatbots need to handle failures gracefully. API calls can fail for many reasons — rate limits, network issues, invalid inputs, or server outages. Here’s a robust error handling pattern:
import time
def send_message_safe(self, user_message, max_retries=3):
self.conversation_history.append(
{“role”: “user”, “content”: user_message}
)
for attempt in range(max_retries):
try:
if self.provider == "openai":
response = self._call_openai()
else:
response = self._call_anthropic()
self.conversation_history.append(
{"role": "assistant", "content": response}
)
return response
except (openai.RateLimitError, anthropic.RateLimitError):
wait_time = 2 ** attempt # exponential backoff
print(f"Rate limited. Waiting {wait_time}s...")
time.sleep(wait_time)
except (openai.APIConnectionError,
anthropic.APIConnectionError):
print("Connection error. Retrying...")
time.sleep(1)
except Exception as e:
self.conversation_history.pop() # remove failed message
raise e
self.conversation_history.pop()
raise Exception("Max retries exceeded")</code></pre>
The exponential backoff pattern (waiting 1s, then 2s, then 4s) is the industry-standard approach for handling rate limits. Both OpenAI and Anthropic enforce rate limits based on tokens per minute and requests per minute, and hitting them doesn't mean anything is broken — it just means you need to slow down briefly.
Step 8: Build a Simple Web Interface with Flask
A terminal chatbot is great for development, but most users expect a web interface. Here’s a minimal Flask application that serves a chat UI:
pip install flask
Create app.py:
from flask import Flask, request, jsonify, render_template_string
from chatbot import AIChatbot
app = Flask(name)
bot = AIChatbot(provider=“anthropic”,
system_prompt=“You are a helpful assistant.”)
HTML_TEMPLATE = """
AI Chatbot
AI Chatbot
"""
@app.route(’/’)
def home():
return render_template_string(HTML_TEMPLATE)
@app.route(‘/chat’, methods=[‘POST’])
def chat():
msg = request.json.get(‘message’, ”)
response = bot.send_message(msg)
return jsonify({‘response’: response})
if name == ‘main’:
app.run(debug=True, port=5000)
Run python app.py and open http://localhost:5000 in your browser. You now have a web-based AI chatbot running locally.
Step 9: Manage Conversation Memory and Token Limits
As conversations grow longer, you’ll hit token limits. GPT-4o supports up to 128K tokens of context, and Claude supports up to 200K tokens, but sending long conversations is expensive. Here’s a practical approach to managing conversation length:
def trim_history(self, max_messages=20):
"""Keep only the most recent messages to control costs."""
if len(self.conversation_history) > max_messages:
# Always keep the most recent messages
self.conversation_history = (
self.conversation_history[-max_messages:]
)
def get_token_estimate(self):
"""Rough estimate: 1 token per 4 characters in English."""
total_chars = sum(
len(m[“content”]) for m in self.conversation_history
)
return total_chars // 4
For production applications, consider implementing a summarization strategy: when the conversation exceeds a threshold, use the LLM to summarize older messages into a compact context block, then replace the old messages with the summary. This preserves important context while keeping token usage under control.
A practical threshold is 50 messages or approximately 10,000 tokens. At that point, summarize the first 40 messages into a 500-token summary and keep the latest 10 messages verbatim. This approach can reduce costs by 60-80% on long conversations without noticeable quality loss.
Step 10: Deploy Your Chatbot
Once your chatbot works locally, you have several deployment options:
Option A: Deploy with Docker
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 5000
CMD [“gunicorn”, “app:app”, “-b”, “0.0.0.0:5000”]
**Option B: Deploy to a cloud platform**
- Railway / Render: Free tier available, push-to-deploy from GitHub, ideal for prototypes.
- AWS Lambda + API Gateway: Pay-per-request pricing, good for low-traffic bots. Expect ~$0.20 per million requests plus API costs.
- Google Cloud Run: Auto-scaling containers, generous free tier (2 million requests/month).
Regardless of platform, always use environment variables for API keys in production. Never hardcode credentials.
ChatGPT API vs Claude API: Key Differences
Feature OpenAI (ChatGPT) Anthropic (Claude)
System Prompt Inside messages array Separate parameter
Latest Model GPT-4o Claude Sonnet 4
Max Context 128K tokens 200K tokens
Input Pricing $0.005/1K tokens (4o) $0.003/1K tokens (Sonnet)
Output Pricing $0.015/1K tokens (4o) $0.015/1K tokens (Sonnet)
Streaming stream=True .messages.stream()
SDK Package openai anthropic
Rate Limits (Free) 3 RPM 5 RPM
Strengths Ecosystem, plugins, fine-tuning Long context, instruction following, safety
Common Mistakes and How to Avoid Them
1. Exposing API Keys in Client-Side Code
Never put your API key in JavaScript that runs in the browser. Anyone can open DevTools and copy it. Instead, always make API calls from your backend server. Your frontend sends user messages to your server, and your server calls the AI API. This adds one network hop but protects your credentials completely.
2. Ignoring Token Costs on Long Conversations
Every message you send includes the entire conversation history. A 50-message conversation might be sending 15,000 tokens per request — and you’re paying for all of them every single time. Implement conversation trimming or summarization from day one. A user who chats for an hour could easily generate $2-5 in API costs without token management. With proper trimming, you can keep that under $0.50.
3. Using the Wrong Model for the Job
Don’t default to GPT-4o or Claude Opus for every use case. If your chatbot answers simple FAQ-style questions, GPT-3.5 Turbo or Claude Haiku will respond 3-5x faster at 10-20x lower cost. Reserve the premium models for tasks that genuinely need their reasoning capabilities — complex analysis, code generation, or nuanced writing. You can even route different types of queries to different models dynamically.
4. Not Setting a System Prompt
A chatbot without a system prompt is like an employee without a job description. The system prompt is where you define personality, boundaries, knowledge scope, and response format. Be specific: “You are a customer support agent for Acme Corp. Only answer questions about our products listed at acme.com/products. If asked about competitors, politely redirect to our products.” Vague prompts produce vague results.
5. Sending User Input Directly Without Validation
Always validate and sanitize user input before sending it to the API. Set reasonable maximum length limits (e.g., 2,000 characters per message). Check for empty messages. This prevents accidental or intentional abuse that could rack up your API bill. A single request with a 50,000-character message could cost several dollars depending on the model.
Frequently Asked Questions
How much does it cost to run an AI chatbot?
For a small-scale chatbot handling 100 conversations per day with an average of 10 messages each, expect to spend $5-15/month using GPT-4o or Claude Sonnet. Using cheaper models like GPT-3.5 Turbo or Claude Haiku, that drops to $0.50-2/month. The main cost driver is conversation length, not the number of users. A single long conversation with 100 back-and-forth messages costs more than 50 short conversations with 2 messages each.
Can I use both ChatGPT and Claude in the same application?
Absolutely — that’s exactly what this guide teaches. Many production applications use multiple models strategically. For example, you might use Claude for tasks requiring long context or careful instruction following, and GPT-4o for tasks where the OpenAI ecosystem (function calling, vision, DALL-E) adds value. The unified chatbot class in Step 4 makes switching between providers a one-line change.
Do I need a GPU or special hardware?
No. When using APIs, all the heavy computation happens on OpenAI’s or Anthropic’s servers. Your application just sends HTTP requests and receives text responses. A $5/month VPS with 512MB of RAM can comfortably run an API-based chatbot serving hundreds of concurrent users. You only need GPUs if you’re running open-source models locally (like Llama or Mistral), which is a completely different approach.
How do I make my chatbot remember information across sessions?
The APIs themselves are stateless — they don’t remember previous conversations. To add memory across sessions, store conversation histories in a database (PostgreSQL, SQLite, or even a JSON file for prototypes). When a returning user sends a message, load their previous conversation history and include it in the API request. For longer-term memory, extract key facts from conversations and store them as a user profile that gets injected into the system prompt.
Is it safe to build a customer-facing chatbot with these APIs?
Yes, with proper guardrails. Both OpenAI and Anthropic have built-in content filters, but you should add your own layer of protection. Use the system prompt to define strict boundaries for what the bot should and shouldn’t discuss. Implement input validation to reject obviously abusive messages. Log all conversations for review. And critically, never give the chatbot access to actions it shouldn’t take (like database modifications or payment processing) without human approval in the loop.
Summary and Next Steps
Here’s what you’ve accomplished in this guide:
- Set up API access for both OpenAI and Anthropic
- Made standalone API calls to each provider
- Built a unified chatbot class that abstracts away provider differences
- Added a terminal chat interface with provider switching
- Implemented streaming responses for real-time output
- Added production-ready error handling with exponential backoff
- Created a web interface using Flask
- Learned conversation memory management and cost optimization
- Explored deployment options for making your chatbot publicly accessible
Where to go from here:
- Add function calling / tool use — Let your chatbot perform actions like searching databases, calling external APIs, or processing files. Both OpenAI and Anthropic support this feature.
- Implement RAG (Retrieval-Augmented Generation) — Connect your chatbot to a knowledge base so it can answer questions about your specific documents, products, or data.
- Build a Slack or Discord bot — Integrate your chatbot into team communication tools using their respective APIs and webhooks.
- Add voice input/output — Use the OpenAI Whisper API for speech-to-text and a TTS API for text-to-speech to create a voice-enabled chatbot.
- Fine-tune for your use case — Once you have enough conversation data, fine-tune a model to match your specific domain and tone without needing lengthy system prompts.