Google DeepMind

Gemini 1.5 Pro

1M Context
Multimodal
Grounding

Gemini 1.5 Pro features a breakthrough 1-million-token context window, enabling analysis of entire codebases, hour-long videos, and massive document collections in a single prompt.

Get API Access

API Performance

Latency: 400ms avg time to first token
Context: 1M tokens context window
Input: $1.25 per 1M input tokens (up to 128K)
Output: $5.00 per 1M output tokens
Uptime: 99.9% 99.9%

About the model

Gemini 1.5 Pro — Infinite Context

Gemini 1.5 Pro is Google DeepMind’s most capable model, featuring a revolutionary 1-million-token context window powered by a new Mixture-of-Experts (MoE) architecture. It can process and reason over entire codebases, hour-long videos, or thousands of documents simultaneously.

With native multimodal capabilities across text, image, video, and audio, and Google Search grounding, it’s uniquely positioned for knowledge-intensive enterprise applications.

Input / Output

Input

Text
Images
Documents
Video

Output

Text
Code
Structured Data

Core capabilities

What Gemini 1.5 Pro can do

Use cases

Designed for scale

Large Codebase Analysis
Video Content Analysis
Research with Web Grounding
Multi-document QA
Enterprise RAG Systems
Long-form Content Creation

Pricing

API Pricing Comparison

Gemini 1.5 Pro uses tiered pricing based on context size. Prompts up to 128K tokens are billed at a lower rate.

Provider	Region	Latency	Throughput	Uptime	Input ($/1M)	Output ($/1M)	Context
Google AI Studio	Global	400ms	60 tok/s	99.9%	$1.25/1M (<128K)	$5.00/1M	1M
Google Vertex AI	Multi-region	420ms	55 tok/s	99.95%	$1.25/1M	$5.00/1M	1M
LLMAPI BEST	Global	370ms	80 tok/s	99.99%	$1.10/1M	$4.40/1M	1M

Performance benchmarks

Technical Specifications

Metric	Specification	Gemini 1.5 Pro
Context window	1,000,000 tokens	1,000,000 tokens
Max output	8,192 tokens	8,192 tokens
Multimodal	Text, Image, Video, Audio	Text, Image, Video, Audio
Architecture	MoE (Mixture of Experts)	Distilled MoE

Enterprise scale, Google reliability

1M: Context tokens
99.9%: Uptime SLA
4+: Modalities
Google: Search grounding

Start with Gemini 1.5 Pro

Architecture

Mixture-of-Experts Design

Gemini 1.5 Pro uses a sparse Mixture-of-Experts architecture for efficient large-context processing.

Sparse MoE Architecture

Only activates relevant expert networks per token, enabling massive scale with controlled compute.
Core
Long Context Attention

Novel attention mechanism enabling efficient 1M token processing without quadratic memory cost.
Context
Google Search Integration

Native retrieval-augmented generation using real-time Google Search results.
Grounding

Decision guide

Is Gemini 1.5 Pro right for you?

Use Gemini 1.5 Pro when

You need to process massive amounts of content (1M+ tokens)
Your use case involves video analysis
You need real-time web search grounding
You're working with entire codebases or large document sets

Consider alternatives when

Latency is critical (GPT-4o or Claude are faster)
You need the highest accuracy for pure text reasoning
You require a well-established API ecosystem

FAQ

Frequently asked questions

What is the 1 million token context window?

1 million tokens is roughly equivalent to 700,000 words of text, 11 hours of audio, 1 hour of video, or 30,000 lines of code. This allows Gemini 1.5 Pro to reason over entire repositories or document collections at once.
How does Google Search grounding work?

When enabled, Gemini 1.5 Pro can query Google Search in real-time to retrieve current information, then ground its responses in verified, up-to-date sources.
Is Gemini 1.5 Pro available through Google Cloud?

Yes, it's available through Google AI Studio (for development) and Google Vertex AI (for production enterprise use with enhanced SLAs and data residency options).

Ready to integrate Gemini 1.5 Pro?

Start Free Trial