Gemini 2.5 Flash Live Preview

Google · Gemini 2.5

Google's stable 2.5-era native-audio Live API model for realtime multimodal voice agents.

Type
multimodal
Context
131K tokens
Max Output
8K tokens
Status
preview
Input
$0.5/1M tok
Output
$2/1M tok
API Access
Yes
License
proprietary
live-api native-audio multimodal voice-agents realtime preview
Released December 2025 · Updated May 16, 2026

Overview

Freshness note: Model capabilities, limits, and pricing can change quickly. This profile is a point-in-time snapshot last verified on May 16, 2026.

Gemini 2.5 Flash Live Preview is Google’s native-audio Live API route for realtime voice and multimodal agent experiences. It remains listed in the current docs alongside the newer Gemini 3.1 Flash Live Preview, which means teams should evaluate it as the stable 2.5-era live route rather than the newest Google voice model.

Capabilities

This model is built for conversational interfaces where low-latency turn-taking, more natural voice behavior, and multimodal context all matter together. Google’s pricing docs emphasize higher-quality pacing, voice naturalness, verbosity control, and mood, which makes it more relevant for voice agents and guided realtime experiences than a plain text-first Flash route.

It also supports function calling and search grounding, which means the Live API can act more like an operational voice agent and less like a read-only speech demo.

Technical Details

Google’s current model docs still list Gemini 2.5 Flash Live with:

  • Model code: gemini-2.5-flash-native-audio-preview-12-2025
  • Input token limit: 131,072
  • Output token limit: 8,192
  • Inputs: audio, video, and text
  • Outputs: audio and text

The same docs also show the current preview replacing earlier live model IDs, which is a useful signal that Google is consolidating around this native-audio route rather than older live variants.

Pricing & Access

Google’s current pricing docs list paid-tier pricing at:

  • Input: $0.50 per 1M text tokens
  • Input: $3.00 per 1M audio or video tokens
  • Output: $2.00 per 1M text tokens
  • Output: $12.00 per 1M audio tokens

Signal Lens stores the text input and text output prices in frontmatter for baseline comparability, but real deployment cost will depend heavily on audio traffic. Availability is through the Gemini Live API preview surface.

Best Use Cases

Use Gemini 2.5 Flash Live Preview for realtime tutoring, guided product walkthroughs, conversational support flows, or voice-first assistants that need to react to speech and visual context together. Test Gemini 3.1 Flash Live when the newest preview model matters more than compatibility.

Comparisons

  • Gemini 2.5 Pro TTS Preview (Google): Better for one-way high-quality speech generation, while Flash Live is built for two-way realtime interaction.
  • GPT Realtime-style flows: Similar broad category, with platform choice usually driven by stack alignment and tooling preferences.
  • ElevenLabs conversational agents: Stronger productized voice platform, while Gemini Flash Live is the model-layer route inside Google’s ecosystem.