Gemini API Setup Guide: Get Your API Key, Install Python SDK & Send Your First Multimodal Request
Gemini API Setup Guide: From API Key to Your First Multimodal Request
Google’s Gemini API gives developers access to one of the most powerful multimodal AI models available today. Whether you want to generate text, analyze images, or process audio, Gemini handles it all through a single unified API. This step-by-step guide walks you through everything — from getting your API key in Google AI Studio to sending your first multimodal request using the Python SDK.
Step 1: Get Your Gemini API Key from Google AI Studio
Before writing any code, you need an API key. Google AI Studio provides a free tier that’s generous enough for development and prototyping.
- Visit Google AI Studio at
aistudio.google.com - Sign in with your Google account
- Click “Get API Key” in the left sidebar
- Click “Create API Key” and select an existing Google Cloud project or create a new one
- Copy the generated key immediately — you won’t be able to view it again in full
Important: The free tier includes up to 15 requests per minute for Gemini 2.0 Flash and 2 requests per minute for Gemini 2.5 Pro. For production workloads, enable billing on your Google Cloud project.
Step 2: Install the Google Generative AI Python SDK
The official Python SDK is the fastest way to interact with the Gemini API. You need Python 3.9 or higher.
Create a Virtual Environment (Recommended)
python -m venv gemini-env
On macOS/Linux
source gemini-env/bin/activate
On Windows
gemini-env\Scripts\activate
Install the SDK
pip install google-genai
This installs the latest google-genai package, which is the unified SDK for Gemini models (replacing the older google-generativeai package).
Verify the installation:
python -c “import google.genai; print(‘SDK installed successfully’)“
Step 3: Configure Your API Key
You have two options for providing your API key. Using an environment variable is the recommended approach.
Option A: Environment Variable (Recommended)
# macOS/Linux
export GEMINI_API_KEY=“YOUR_API_KEY”
Windows PowerShell
$env:GEMINI_API_KEY=“YOUR_API_KEY”
Windows CMD
set GEMINI_API_KEY=YOUR_API_KEY
Option B: Inline in Code (Development Only)
from google import genai
client = genai.Client(api_key="YOUR_API_KEY")
**Never commit API keys to version control.** Add your key to a .env file and include .env in your .gitignore.
Step 4: Send Your First Text Request
Let’s verify everything works with a simple text generation call.
from google import genai
import os
client = genai.Client(api_key=os.environ.get(“GEMINI_API_KEY”))
response = client.models.generate_content(
model=“gemini-2.0-flash”,
contents=“Explain quantum computing in three sentences.”
)
print(response.text)
If you see a coherent response about quantum computing, your setup is complete.
Step 5: Send Your First Multimodal Request
Gemini’s standout feature is native multimodal understanding. Here’s how to analyze an image with text in a single request.
Analyze an Image from a URL
from google import genai
from google.genai import types
import os
import urllib.request
client = genai.Client(api_key=os.environ.get(“GEMINI_API_KEY”))
Download a sample image
image_url = “https://upload.wikimedia.org/wikipedia/commons/thumb/a/a7/Camponotus_flavomarginatus_ant.jpg/320px-Camponotus_flavomarginatus_ant.jpg”
image_path = “sample.jpg”
urllib.request.urlretrieve(image_url, image_path)
Upload and analyze
my_file = client.files.upload(file=image_path)
response = client.models.generate_content(
model=“gemini-2.0-flash”,
contents=[
my_file,
“Describe what you see in this image in detail. Identify the species if possible.”
]
)
print(response.text)
Analyze a Local Image with Inline Data
from google import genai
from google.genai import types
import pathlib
import os
client = genai.Client(api_key=os.environ.get("GEMINI_API_KEY"))
image_bytes = pathlib.Path("your_photo.jpg").read_bytes()
response = client.models.generate_content(
model="gemini-2.0-flash",
contents=[
types.Content(parts=[
types.Part.from_bytes(data=image_bytes, mime_type="image/jpeg"),
types.Part.from_text("What objects are in this image? List them.")
])
]
)
print(response.text)
Step 6: Explore Available Models
Choose the right model for your use case:
| Model | Best For | Context Window | Speed |
|---|---|---|---|
gemini-2.5-pro | Complex reasoning, coding, analysis | 1M tokens | Moderate |
gemini-2.5-flash | Balanced speed and quality | 1M tokens | Fast |
gemini-2.0-flash | High-volume, low-latency tasks | 1M tokens | Very Fast |
gemini-2.0-flash-lite | Cost-efficient, simple tasks | 1M tokens | Fastest |
for model in client.models.list():
print(model.name)
Step 7: Streaming Responses
For long outputs, streaming delivers tokens as they're generated rather than waiting for the full response.
response = client.models.generate_content_stream(
model=“gemini-2.0-flash”,
contents=“Write a 500-word essay on renewable energy.”
)
for chunk in response:
print(chunk.text, end="", flush=True)
Pro Tips for Power Users
- Use system instructions — Pass a
configparameter withsystem_instructionto set persistent behavior across the conversation without repeating context in every prompt. - Batch with the Files API — Upload large files (up to 2GB) via
client.files.upload()once, then reference them across multiple requests using the returned file object. Files persist for 48 hours. - Control output format — Set
response_mime_type=“application/json”in the config and provide aresponse_schemato get structured JSON output every time. - Token counting — Use
client.models.count_tokens()before sending large prompts to estimate costs and stay within limits. - Safety settings — Adjust safety thresholds per request using
safety_settingsin the config if the defaults are too restrictive for your legitimate use case.
Troubleshooting Common Errors
| Error | Cause | Solution |
|---|---|---|
403 PERMISSION_DENIED | Invalid or expired API key | Regenerate your key in Google AI Studio and update your environment variable |
429 RESOURCE_EXHAUSTED | Rate limit exceeded | Implement exponential backoff or upgrade to a paid tier |
ModuleNotFoundError: google.genai | SDK not installed or wrong package | Run pip install google-genai (not google-generativeai) |
400 INVALID_ARGUMENT | Unsupported file type or malformed request | Verify the MIME type matches the file content and check the request structure |
500 INTERNAL | Server-side issue | Wait and retry. If persistent, check the Google Cloud status dashboard |
Frequently Asked Questions
Is the Gemini API free to use?
Yes, Google offers a free tier through Google AI Studio with rate limits (e.g., 15 RPM for Gemini 2.0 Flash). This is sufficient for development and testing. For production workloads with higher rate limits and SLA guarantees, you need to enable billing on your Google Cloud project and use the paid tier.
What file types does Gemini support for multimodal input?
Gemini supports a wide range of file types including images (JPEG, PNG, GIF, WebP), video (MP4, MPEG, MOV, AVI, WebM), audio (MP3, WAV, AIFF, FLAC, OGG), and documents (PDF, plain text). You can upload files up to 2GB through the Files API. For inline data, stick to files under 20MB.
What is the difference between google-genai and google-generativeai packages?
The google-genai package is the newer, unified SDK that provides a cleaner API with the genai.Client interface. The older google-generativeai package uses the genai.configure() pattern and is in maintenance mode. New projects should use google-genai as it supports all the latest features and models including Gemini 2.0 and 2.5 series.