Gemini API Setup Complete Guide: From API Key to Your First Multimodal Request
Gemini API Setup Complete Guide: API Key, Python SDK, and First Multimodal Request
Google’s Gemini API gives developers access to one of the most powerful multimodal AI models available. This step-by-step guide walks you through getting your API key from Google AI Studio, installing the Python SDK, and sending your first text and multimodal requests — all in under 15 minutes.
Step 1: Get Your Gemini API Key from Google AI Studio
- Visit Google AI Studio — Navigate to
aistudio.google.comand sign in with your Google account.- Click “Get API Key” — In the left sidebar, click the Get API Key button.- Create API Key — Select Create API key in new project or choose an existing Google Cloud project. Google will provision a new project automatically if needed.- Copy and Store Your Key — Copy the generated key immediately. Store it securely — you won’t be able to view it again in the console.# Store as environment variable (recommended)
Linux / macOS
export GEMINI_API_KEY=“YOUR_API_KEY”
Windows PowerShell
$env:GEMINI_API_KEY=“YOUR_API_KEY”
Persist across sessions (Linux/macOS — add to .bashrc or .zshrc)
echo ‘export GEMINI_API_KEY=“YOUR_API_KEY”’ >> ~/.bashrc
source ~/.bashrc- Verify the Key — Run a quick curl test to confirm your key works:
curl “https://generativelanguage.googleapis.com/v1beta/models?key=YOUR_API_KEY”A JSON response listing available models confirms your key is active.
Step 2: Install the Google Generative AI Python SDK
The official Python SDK simplifies interaction with the Gemini API.
- **Ensure Python 3.9+** is installed:python --version- **Install the SDK** via pip:
pip install -U google-genai- **Verify the installation:**python -c "from google import genai; print('SDK installed successfully')"
## Step 3: Send Your First Text Request
Start with a simple text generation call to confirm everything works end to end.
from google import genai
import os
client = genai.Client(api_key=os.environ.get(“GEMINI_API_KEY”))
response = client.models.generate_content(
model=“gemini-2.0-flash”,
contents=“Explain how neural networks learn in 3 sentences.”
)
print(response.text)
Expected output: A concise explanation of neural network learning in three sentences.
Step 4: Send Your First Multimodal Request
Gemini’s true power lies in processing text, images, audio, and video together. Here’s how to analyze a local image with a text prompt.
from google import genai
from google.genai import types
import os
import pathlib
client = genai.Client(api_key=os.environ.get(“GEMINI_API_KEY”))
Load a local image file
image_path = pathlib.Path(“sample.jpg”)
image_data = image_path.read_bytes()
response = client.models.generate_content(
model=“gemini-2.0-flash”,
contents=[
types.Part.from_bytes(data=image_data, mime_type=“image/jpeg”),
“Describe this image in detail. What objects are visible?”
]
)
print(response.text)
Analyzing an Image from a URL
from google import genai
from google.genai import types
import os
import urllib.request
client = genai.Client(api_key=os.environ.get("GEMINI_API_KEY"))
# Download image bytes
image_url = "https://example.com/photo.jpg"
image_data = urllib.request.urlopen(image_url).read()
response = client.models.generate_content(
model="gemini-2.0-flash",
contents=[
types.Part.from_bytes(data=image_data, mime_type="image/jpeg"),
"What is happening in this image?"
]
)
print(response.text)
Step 5: Streaming Responses
For long outputs, streaming delivers tokens as they are generated, reducing perceived latency.
from google import genai
import os
client = genai.Client(api_key=os.environ.get(“GEMINI_API_KEY”))
response = client.models.generate_content_stream(
model=“gemini-2.0-flash”,
contents=“Write a 500-word essay about climate change.”
)
for chunk in response:
print(chunk.text, end="", flush=True)
Step 6: Configure Generation Parameters
Fine-tune output quality with generation configuration.
from google import genai
from google.genai import types
import os
client = genai.Client(api_key=os.environ.get(“GEMINI_API_KEY”))
response = client.models.generate_content(
model=“gemini-2.0-flash”,
contents=“Write a creative product tagline for a smart water bottle.”,
config=types.GenerateContentConfig(
temperature=0.9,
top_p=0.95,
max_output_tokens=256,
)
)
print(response.text)
| Parameter | Range | Purpose |
|---|---|---|
| temperature | 0.0 – 2.0 | Controls randomness. Lower = deterministic, higher = creative |
| top_p | 0.0 – 1.0 | Nucleus sampling threshold |
| max_output_tokens | 1 – model max | Limits response length |
| top_k | 1 – 40 | Limits token candidates per step |
Available Models Reference
| Model | Best For | Context Window |
|---|---|---|
| gemini-2.0-flash | Fast, cost-effective general tasks | 1M tokens |
| gemini-2.0-flash-lite | Highest speed, lowest cost | 1M tokens |
| gemini-2.5-pro | Complex reasoning, coding | 1M tokens |
| gemini-2.5-flash | Balanced speed and thinking | 1M tokens |
system_instruction parameter in your config to define the model's persona or constraints.- **Batch with async** — Use client.aio.models.generate_content for async calls when processing multiple requests concurrently.- **JSON mode** — Set response_mime_type="application/json" in your config to force structured JSON output — ideal for API pipelines.- **Safety settings** — Customize safety thresholds per category using safety_settings in your config if defaults are too restrictive for your use case.- **Token counting** — Call client.models.count_tokens() before large requests to estimate cost and stay within rate limits.- **Caching** — For repeated context (like a large document), use context caching to reduce latency and cost on subsequent requests.
## Troubleshooting Common Errors
| Error | Cause | Solution |
|---|---|---|
400 API_KEY_INVALID | Incorrect or expired API key | Regenerate your key in Google AI Studio and update your environment variable |
429 RESOURCE_EXHAUSTED | Rate limit exceeded | Implement exponential backoff or upgrade to a paid tier for higher quotas |
ModuleNotFoundError: google.genai | SDK not installed or wrong package | Run pip install -U google-genai (not google-generativeai, which is the legacy package) |
403 PERMISSION_DENIED | API not enabled for your project | Enable the Generative Language API in your Google Cloud Console |
500 INTERNAL | Transient server error | Retry after a few seconds. If persistent, check the Google Cloud Status Dashboard |
Is the Gemini API free to use?
Yes, the Gemini API offers a generous free tier through Google AI Studio. The free tier includes rate-limited access to models like Gemini 2.0 Flash. For production workloads requiring higher throughput, you can enable billing in Google Cloud and pay per token. Check the official pricing page for current rates per model.
What file types does Gemini support for multimodal input?
Gemini supports a wide range of input types: JPEG, PNG, GIF, and WebP for images; MP3, WAV, FLAC, and OGG for audio; MP4, AVI, MOV, and MKV for video; and PDF for documents. You can combine multiple file types in a single request. The Files API handles uploads larger than 20MB, while inline data works for smaller files.
What is the difference between google-genai and google-generativeai packages?
The google-genai package is the current, recommended SDK that uses a unified client pattern (genai.Client). The google-generativeai package is the older, legacy SDK with a different API surface. New projects should always use google-genai. If you are migrating from the legacy SDK, the main change is moving from genai.configure() and genai.GenerativeModel() to the client-based approach shown in this guide.