Google DeepMind

Gemini 1.5 Pro

  • 1M Context
  • Multimodal
  • Grounding

Gemini 1.5 Pro features a breakthrough 1-million-token context window, enabling analysis of entire codebases, hour-long videos, and massive document collections in a single prompt.

Get API Access

Gemini 1.5 Pro — Infinite Context

Gemini 1.5 Pro is Google DeepMind’s most capable model, featuring a revolutionary 1-million-token context window powered by a new Mixture-of-Experts (MoE) architecture. It can process and reason over entire codebases, hour-long videos, or thousands of documents simultaneously.

With native multimodal capabilities across text, image, video, and audio, and Google Search grounding, it’s uniquely positioned for knowledge-intensive enterprise applications.

What Gemini 1.5 Pro can do

Designed for scale

  • Large Codebase Analysis
  • Video Content Analysis
  • Research with Web Grounding
  • Multi-document QA
  • Enterprise RAG Systems
  • Long-form Content Creation

API Pricing Comparison

Gemini 1.5 Pro uses tiered pricing based on context size. Prompts up to 128K tokens are billed at a lower rate.

Provider Region Latency Throughput Uptime Input ($/1M) Output ($/1M) Context
Google AI Studio Global 400ms 60 tok/s 99.9% $1.25/1M (<128K) $5.00/1M 1M
Google Vertex AI Multi-region 420ms 55 tok/s 99.95% $1.25/1M $5.00/1M 1M
LLMAPI BEST Global 370ms 80 tok/s 99.99% $1.10/1M $4.40/1M 1M

Technical Specifications

Metric Specification Gemini 1.5 Pro Gemini 1.5 Flash
Context window 1,000,000 tokens 1,000,000 tokens
Max output 8,192 tokens 8,192 tokens
Multimodal Text, Image, Video, Audio Text, Image, Video, Audio
Architecture MoE (Mixture of Experts) Distilled MoE

Enterprise scale, Google reliability

1M
Context tokens
99.9%
Uptime SLA
4+
Modalities
Google
Search grounding
Start with Gemini 1.5 Pro

Mixture-of-Experts Design

Gemini 1.5 Pro uses a sparse Mixture-of-Experts architecture for efficient large-context processing.

  • Sparse MoE Architecture

    Only activates relevant expert networks per token, enabling massive scale with controlled compute.

    Core
  • Long Context Attention

    Novel attention mechanism enabling efficient 1M token processing without quadratic memory cost.

    Context
  • Google Search Integration

    Native retrieval-augmented generation using real-time Google Search results.

    Grounding

Is Gemini 1.5 Pro right for you?

Use Gemini 1.5 Pro when

  • You need to process massive amounts of content (1M+ tokens)
  • Your use case involves video analysis
  • You need real-time web search grounding
  • You're working with entire codebases or large document sets

Consider alternatives when

  • Latency is critical (GPT-4o or Claude are faster)
  • You need the highest accuracy for pure text reasoning
  • You require a well-established API ecosystem

Frequently asked questions

  • What is the 1 million token context window?

    1 million tokens is roughly equivalent to 700,000 words of text, 11 hours of audio, 1 hour of video, or 30,000 lines of code. This allows Gemini 1.5 Pro to reason over entire repositories or document collections at once.

  • How does Google Search grounding work?

    When enabled, Gemini 1.5 Pro can query Google Search in real-time to retrieve current information, then ground its responses in verified, up-to-date sources.

  • Is Gemini 1.5 Pro available through Google Cloud?

    Yes, it's available through Google AI Studio (for development) and Google Vertex AI (for production enterprise use with enhanced SLAs and data residency options).

Ready to integrate Gemini 1.5 Pro?

Start Free Trial