Gemini Embedding 2

Text Embeddings
Semantic Search
Fast Inference

Gemini Embedding 2 is Google's natively multimodal embedding model that maps text, images, video, audio, and documents into a single semantic vector space. It is notable for unifying many media types in one model to power cross-modal search and retrieval.

Start Using API

API Performance

Latency: ~0.4s avg response
Input: $0.02 per 1M tokens
Uptime: 99% 99%

About the model

What is Gemini Embedding 2?

Gemini Embedding 2 is a proprietary multimodal embedding model from Google that produces numerical vector representations for text, images, audio, video, and documents in a unified space. Its main use cases include powering retrieval-augmented generation, semantic search, recommendation, and classification across mixed media, and enabling cross-modal applications like using a text query to retrieve relevant images or video clips. It is part of Google’s Gemini Embedding family and succeeds earlier text-focused Gemini embedding models.

Input / Output

Input

Text (natural language strings)
Images (e.g. JPEG, PNG)
Audio files
Video files
Documents (e.g. PDF and similar file types)

Output

Dense embedding vectors (numeric representations)

Model capabilities

5 Core Capabilities

Text Embedding

Generates dense vector representations for text inputs, enabling semantic similarity, retrieval, clustering, and downstream ML tasks.
Multilingual Embeddings

Produces embeddings for multiple languages, allowing cross-lingual semantic search and comparison in a shared vector space.
Long-Text Support

Embeds relatively long text segments, enabling document-level search, recommendation, and topic grouping across larger content pieces.
Semantic Retrieval

Supports building semantic search systems where queries and documents are embedded and matched via vector similarity instead of keywords.
Recommendation Support

Provides text embeddings suitable for powering recommendation, personalization, and content discovery based on semantic similarity.

Use cases

6 Most Valuable Use Cases

Semantic Text Search
Document Clustering
Recommendation Systems
Code Snippet Retrieval
Question Answer Retrieval
Multilingual Text Matching

Transparent pricing

Cost Comparison

LLM API offers competitive embedding pricing, often undercutting major clouds by up to 35–50% per 1M tokens.

Provider	Region	Uptime	Input ($/1M)	Context
LLM API BEST	Global		$0.09	8K tokens
Google (Gemini API / Vertex AI)	Global	99.9%	$0.20	8K tokens
OpenAI	Global	99.9%	$0.13	8K tokens
Azure OpenAI	Global	99.9%	$0.13	8K tokens
Amazon Bedrock (Titan Embeddings)	US & Selected Regions	99.9%	$0.02

Performance benchmarks

Technical Specifications

Metric	Gemini Embedding 2	OpenAI text-embedding-3-large	Cohere Embed v3 English	AWS Titan Text Embeddings V2
Embedding Dimensions	3072	3072	1024	1024
Max Input Tokens	8,192	—	—	8,000
Price per 1M Tokens (Input)	$0.02	$0.13	$0.10	$0.12
Price per 1M Tokens (Output)	—	$0.13	—	—
Modalities Supported	Text, Image	Text	Text	Text
Throughput	—	—	—	—
Avg Latency	—	—	—	—
Service Uptime (SLA)	—	—	—	—

30-day usage via LLM API

3.8B: Text chunks embedded (30 days)
520M: API requests (30 days)
45K: Active developer accounts (30 days)
99.97%: Avg API uptime (30 days)

Start Using API

Architecture & Integration

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

Unified AI Routing

Automatically route each request to the optimal model across providers based on latency, capability, and policies—no client changes, just better defaults.
One endpoint, every model
Cost-Aware Controls

Define per-project or per-tenant budgets, choose cost ceilings, and let LLM.API pick the cheapest model that still meets your quality and latency targets.
Lower spend, same output
Resilient Fallback Logic

Eliminate single-vendor outages with built-in failover across providers, automatic retries, and policy-based degradation that keeps your product responsive.
Never ship 500s again
End-to-End Observability

Get unified logs, traces, and metrics for every provider—latency, errors, token usage, and prompts—all correlated to requests and tenants in one place.
See every token flow
Task-Level Orchestration

Describe tasks, constraints, and tools once; LLM.API handles model selection, tool calling, and execution flow so you focus on product logic, not glue code.
Ship workflows, not wiring
High-Throughput Batch APIs

Process millions of inferences efficiently with bulk submission, concurrency control, and automatic chunking tuned for each provider’s limits and quotas.
Scale from 10 to millions

Decision guide

When to Use — When NOT to Use

Use it if...

You need general-purpose text embeddings for semantic search, clustering, or retrieval applications.
You need multilingual embeddings that handle many languages consistently within a single vector space.
Your use case involves building recommendation systems based on textual similarity or user profiles.
You need embeddings optimized for low latency and reasonable cost on Google Cloud.
Your use case involves hybrid search, combining Gemini Embedding 2 with keyword or metadata filters.
You need embeddings well-integrated with other Google Vertex AI or Gemini-based workflows.
Your use case involves encoding short queries and longer documents into the same embedding space.

Avoid if...

You need to embed images, audio, or video rather than purely textual content.
Your workload requires full generative capabilities like conversation, code synthesis, or content creation.
You need ultra-long-context document understanding beyond the maximum token limits of embeddings.
Your workload requires highly domain-specific vectors trained on proprietary data you fully control.
You need on-premise deployment without relying on Google-managed cloud infrastructure or APIs.
Your workload requires strict vendor neutrality, avoiding lock-in to any specific cloud provider.
You need binary or sparse representations instead of dense floating-point embeddings for storage efficiency.

FAQ

Frequently Asked Questions

What is Gemini Embedding 2?

Gemini Embedding 2 is Google’s latest text and code embedding model designed to generate dense vector representations for search, retrieval, and semantic similarity.
What input modalities does Gemini Embedding 2 support?

Gemini Embedding 2 supports text and code inputs only; it does not embed images, audio, or other modalities.
How do I access Gemini Embedding 2 through LLM.API?

You call the unified LLM.API embeddings endpoint with the provider set to Google and model set to Gemini Embedding 2.
What is the context window of Gemini Embedding 2?

Gemini Embedding 2 supports input sequences up to 8,192 tokens, after which inputs must be truncated or chunked.
How fast is Gemini Embedding 2 for generating embeddings via LLM.API?

Embedding requests typically return in tens of milliseconds to low hundreds of milliseconds per batch, depending on batch size and network latency.
How is pricing for Gemini Embedding 2 handled on LLM.API?

LLM.API charges per 1,000 input tokens for Gemini Embedding 2, with the exact rate shown in your LLM.API pricing and usage dashboard.
How does Gemini Embedding 2 compare to other embedding models on LLM.API?

Gemini Embedding 2 offers strong multilingual and code understanding, often outperforming many older open-source embedding models in retrieval and semantic similarity benchmarks.
What are the main limitations of Gemini Embedding 2?

Gemini Embedding 2 cannot generate text, has a fixed maximum context length, and may encode provider-specific biases present in its training data.
Can I use Gemini Embedding 2 for multilingual applications?

Yes, Gemini Embedding 2 supports many languages and produces a shared embedding space suitable for cross-lingual retrieval and semantic search.
Does Gemini Embedding 2 support batching through LLM.API?

Yes, you can send an array of input texts in a single embeddings request to Gemini Embedding 2 to reduce per-item latency and cost.

Start in 2 lines of code

Get My API Key

Gemini Embedding 2

What is Gemini Embedding 2?

5 Core Capabilities

Text Embedding

Multilingual Embeddings

Long-Text Support

Semantic Retrieval

Recommendation Support

6 Most Valuable Use Cases

Cost Comparison

Technical Specifications

Why Build on LLM.API?

Unified AI Routing

Cost-Aware Controls

Resilient Fallback Logic

End-to-End Observability

Task-Level Orchestration

High-Throughput Batch APIs

When to Use — When NOT to Use

Use it if...

Avoid if...

Start in 2 lines of code