Gemini 2.5 Flash Live Preview
Google · Gemini 2.5
Google's stable 2.5-era native-audio Live API model for realtime multimodal voice agents.
Overview
Freshness note: Model capabilities, limits, and pricing can change quickly. This profile is a point-in-time snapshot last verified on May 16, 2026.
Gemini 2.5 Flash Live Preview is Google’s native-audio Live API route for realtime voice and multimodal agent experiences. It remains listed in the current docs alongside the newer Gemini 3.1 Flash Live Preview, which means teams should evaluate it as the stable 2.5-era live route rather than the newest Google voice model.
Capabilities
This model is built for conversational interfaces where low-latency turn-taking, more natural voice behavior, and multimodal context all matter together. Google’s pricing docs emphasize higher-quality pacing, voice naturalness, verbosity control, and mood, which makes it more relevant for voice agents and guided realtime experiences than a plain text-first Flash route.
It also supports function calling and search grounding, which means the Live API can act more like an operational voice agent and less like a read-only speech demo.
Technical Details
Google’s current model docs still list Gemini 2.5 Flash Live with:
- Model code:
gemini-2.5-flash-native-audio-preview-12-2025 - Input token limit: 131,072
- Output token limit: 8,192
- Inputs: audio, video, and text
- Outputs: audio and text
The same docs also show the current preview replacing earlier live model IDs, which is a useful signal that Google is consolidating around this native-audio route rather than older live variants.
Pricing & Access
Google’s current pricing docs list paid-tier pricing at:
- Input: $0.50 per 1M text tokens
- Input: $3.00 per 1M audio or video tokens
- Output: $2.00 per 1M text tokens
- Output: $12.00 per 1M audio tokens
Signal Lens stores the text input and text output prices in frontmatter for baseline comparability, but real deployment cost will depend heavily on audio traffic. Availability is through the Gemini Live API preview surface.
Best Use Cases
Use Gemini 2.5 Flash Live Preview for realtime tutoring, guided product walkthroughs, conversational support flows, or voice-first assistants that need to react to speech and visual context together. Test Gemini 3.1 Flash Live when the newest preview model matters more than compatibility.
Comparisons
- Gemini 2.5 Pro TTS Preview (Google): Better for one-way high-quality speech generation, while Flash Live is built for two-way realtime interaction.
- GPT Realtime-style flows: Similar broad category, with platform choice usually driven by stack alignment and tooling preferences.
- ElevenLabs conversational agents: Stronger productized voice platform, while Gemini Flash Live is the model-layer route inside Google’s ecosystem.