Gemini API Setup Guide: Get Your API Key, Install Python SDK & Send Your First Multimodal Request

Gemini API Setup Guide: From API Key to Your First Multimodal Request

Google’s Gemini API gives developers access to one of the most powerful multimodal AI models available today. Whether you want to generate text, analyze images, or process audio, Gemini handles it all through a single unified API. This step-by-step guide walks you through everything — from getting your API key in Google AI Studio to sending your first multimodal request using the Python SDK.

Step 1: Get Your Gemini API Key from Google AI Studio

Before writing any code, you need an API key. Google AI Studio provides a free tier that’s generous enough for development and prototyping.

  • Visit Google AI Studio at aistudio.google.com
  • Sign in with your Google account
  • Click “Get API Key” in the left sidebar
  • Click “Create API Key” and select an existing Google Cloud project or create a new one
  • Copy the generated key immediately — you won’t be able to view it again in full

Important: The free tier includes up to 15 requests per minute for Gemini 2.0 Flash and 2 requests per minute for Gemini 2.5 Pro. For production workloads, enable billing on your Google Cloud project.

Step 2: Install the Google Generative AI Python SDK

The official Python SDK is the fastest way to interact with the Gemini API. You need Python 3.9 or higher.

python -m venv gemini-env

On macOS/Linux

source gemini-env/bin/activate

On Windows

gemini-env\Scripts\activate

Install the SDK

pip install google-genai

This installs the latest google-genai package, which is the unified SDK for Gemini models (replacing the older google-generativeai package).

Verify the installation:

python -c “import google.genai; print(‘SDK installed successfully’)“

Step 3: Configure Your API Key

You have two options for providing your API key. Using an environment variable is the recommended approach.

# macOS/Linux export GEMINI_API_KEY=“YOUR_API_KEY”

Windows PowerShell

$env:GEMINI_API_KEY=“YOUR_API_KEY”

Windows CMD

set GEMINI_API_KEY=YOUR_API_KEY

Option B: Inline in Code (Development Only)

from google import genai

client = genai.Client(api_key="YOUR_API_KEY")

**Never commit API keys to version control.** Add your key to a .env file and include .env in your .gitignore.

Step 4: Send Your First Text Request

Let’s verify everything works with a simple text generation call.

from google import genai import os

client = genai.Client(api_key=os.environ.get(“GEMINI_API_KEY”))

response = client.models.generate_content( model=“gemini-2.0-flash”, contents=“Explain quantum computing in three sentences.” )

print(response.text)

If you see a coherent response about quantum computing, your setup is complete.

Step 5: Send Your First Multimodal Request

Gemini’s standout feature is native multimodal understanding. Here’s how to analyze an image with text in a single request.

Analyze an Image from a URL

from google import genai from google.genai import types import os import urllib.request

client = genai.Client(api_key=os.environ.get(“GEMINI_API_KEY”))

Download a sample image

image_url = “https://upload.wikimedia.org/wikipedia/commons/thumb/a/a7/Camponotus_flavomarginatus_ant.jpg/320px-Camponotus_flavomarginatus_ant.jpg” image_path = “sample.jpg” urllib.request.urlretrieve(image_url, image_path)

Upload and analyze

my_file = client.files.upload(file=image_path)

response = client.models.generate_content( model=“gemini-2.0-flash”, contents=[ my_file, “Describe what you see in this image in detail. Identify the species if possible.” ] )

print(response.text)

Analyze a Local Image with Inline Data

from google import genai
from google.genai import types
import pathlib
import os

client = genai.Client(api_key=os.environ.get("GEMINI_API_KEY"))

image_bytes = pathlib.Path("your_photo.jpg").read_bytes()

response = client.models.generate_content(
    model="gemini-2.0-flash",
    contents=[
        types.Content(parts=[
            types.Part.from_bytes(data=image_bytes, mime_type="image/jpeg"),
            types.Part.from_text("What objects are in this image? List them.")
        ])
    ]
)

print(response.text)

Step 6: Explore Available Models

Choose the right model for your use case:

ModelBest ForContext WindowSpeed
gemini-2.5-proComplex reasoning, coding, analysis1M tokensModerate
gemini-2.5-flashBalanced speed and quality1M tokensFast
gemini-2.0-flashHigh-volume, low-latency tasks1M tokensVery Fast
gemini-2.0-flash-liteCost-efficient, simple tasks1M tokensFastest
List models programmatically:

for model in client.models.list(): print(model.name)

Step 7: Streaming Responses

For long outputs, streaming delivers tokens as they're generated rather than waiting for the full response.

response = client.models.generate_content_stream( model=“gemini-2.0-flash”, contents=“Write a 500-word essay on renewable energy.” )

for chunk in response: print(chunk.text, end="", flush=True)

Pro Tips for Power Users

  • Use system instructions — Pass a config parameter with system_instruction to set persistent behavior across the conversation without repeating context in every prompt.
  • Batch with the Files API — Upload large files (up to 2GB) via client.files.upload() once, then reference them across multiple requests using the returned file object. Files persist for 48 hours.
  • Control output format — Set response_mime_type=“application/json” in the config and provide a response_schema to get structured JSON output every time.
  • Token counting — Use client.models.count_tokens() before sending large prompts to estimate costs and stay within limits.
  • Safety settings — Adjust safety thresholds per request using safety_settings in the config if the defaults are too restrictive for your legitimate use case.

Troubleshooting Common Errors

ErrorCauseSolution
403 PERMISSION_DENIEDInvalid or expired API keyRegenerate your key in Google AI Studio and update your environment variable
429 RESOURCE_EXHAUSTEDRate limit exceededImplement exponential backoff or upgrade to a paid tier
ModuleNotFoundError: google.genaiSDK not installed or wrong packageRun pip install google-genai (not google-generativeai)
400 INVALID_ARGUMENTUnsupported file type or malformed requestVerify the MIME type matches the file content and check the request structure
500 INTERNALServer-side issueWait and retry. If persistent, check the Google Cloud status dashboard

Frequently Asked Questions

Is the Gemini API free to use?

Yes, Google offers a free tier through Google AI Studio with rate limits (e.g., 15 RPM for Gemini 2.0 Flash). This is sufficient for development and testing. For production workloads with higher rate limits and SLA guarantees, you need to enable billing on your Google Cloud project and use the paid tier.

What file types does Gemini support for multimodal input?

Gemini supports a wide range of file types including images (JPEG, PNG, GIF, WebP), video (MP4, MPEG, MOV, AVI, WebM), audio (MP3, WAV, AIFF, FLAC, OGG), and documents (PDF, plain text). You can upload files up to 2GB through the Files API. For inline data, stick to files under 20MB.

What is the difference between google-genai and google-generativeai packages?

The google-genai package is the newer, unified SDK that provides a cleaner API with the genai.Client interface. The older google-generativeai package uses the genai.configure() pattern and is in maintenance mode. New projects should use google-genai as it supports all the latest features and models including Gemini 2.0 and 2.5 series.

Explore More Tools

Grok Best Practices for Academic Research and Literature Discovery: Leveraging X/Twitter for Scholarly Intelligence Best Practices Grok Best Practices for Content Strategy: Identify Trending Topics Before They Peak and Create Content That Captures Demand Best Practices Grok Case Study: How a DTC Beauty Brand Used Real-Time Social Listening to Save Their Product Launch Case Study Grok Case Study: How a Pharma Company Tracked Patient Sentiment During a Drug Launch and Caught a Safety Signal 48 Hours Before the FDA Case Study Grok Case Study: How a Disaster Relief Nonprofit Used Real-Time X/Twitter Monitoring to Coordinate Emergency Response 3x Faster Case Study Grok Case Study: How a Political Campaign Used X/Twitter Sentiment Analysis to Reshape Messaging and Win a Swing District Case Study How to Use Grok for Competitive Intelligence: Track Product Launches, Pricing Changes, and Market Positioning in Real Time How-To Grok vs Perplexity vs ChatGPT Search for Real-Time Information: Which AI Search Tool Is Most Accurate in 2026? Comparison How to Use Grok for Crisis Communication Monitoring: Detect, Assess, and Respond to PR Emergencies in Real Time How-To How to Use Grok for Product Improvement: Extract Customer Feedback Signals from X/Twitter That Your Support Team Misses How-To How to Use Grok for Conference Live Monitoring: Extract Event Insights and Identify Networking Opportunities in Real Time How-To How to Use Grok for Influencer Marketing: Discover, Vet, and Track Influencer Partnerships Using Real X/Twitter Data How-To How to Use Grok for Job Market Analysis: Track Industry Hiring Trends, Layoff Signals, and Salary Discussions on X/Twitter How-To How to Use Grok for Investor Relations: Track Earnings Sentiment, Analyst Reactions, and Shareholder Concerns in Real Time How-To How to Use Grok for Recruitment and Talent Intelligence: Identifying Hiring Signals from X/Twitter Data How-To How to Use Grok for Startup Fundraising Intelligence: Track Investor Sentiment, VC Activity, and Funding Trends on X/Twitter How-To How to Use Grok for Regulatory Compliance Monitoring: Real-Time Policy Tracking Across Industries How-To NotebookLM Best Practices for Financial Analysts: Due Diligence, Investment Research & Risk Factor Analysis Across SEC Filings Best Practices NotebookLM Best Practices for Teachers: Build Curriculum-Aligned Lesson Plans, Study Guides, and Assessment Materials from Your Own Resources Best Practices NotebookLM Case Study: How an Insurance Company Built a Claims Processing Training System That Cut Errors by 35% Case Study