Gemini 2.5 Pro TTS Preview

Overview

Freshness note: Model capabilities, limits, and pricing can change quickly. This profile is a point-in-time snapshot last verified on May 16, 2026.

Gemini 2.5 Pro TTS Preview is Google’s higher-end 2.5 text-to-speech model in the Gemini API. Google’s current docs now also list Gemini 3.1 Flash TTS Preview, so this entry should be read as the 2.5 Pro voice route rather than the newest Google speech model.

Capabilities

This model is aimed at narrated explainers, voice-rich educational content, spoken summaries, and product experiences where the speech output itself carries more value than a generic utility voice. The practical distinction versus lower-cost TTS options is not just quality in the abstract. It is how much control and naturalness you can preserve when prompts become more specific or brand-sensitive.

Technical Details

This is an audio-generation model, so Signal Lens stores contextWindow and maxOutput as 0 for UI consistency and treats token-style limits as N/A in summary views. Google’s current model docs still expose useful limits for implementation work:

Model code: gemini-2.5-pro-preview-tts
Input token limit: 8,192
Output token limit: 16,384

The model is preview-only and does not support features like Live API, function calling, or structured outputs. It is a focused TTS route rather than a broader multimodal runtime.

Pricing & Access

Google’s current Gemini API pricing lists Gemini 2.5 Pro Preview TTS at:

Standard input: $1.00 per 1M text tokens
Standard output: $20.00 per 1M audio tokens
Batch input: $0.50 per 1M text tokens
Batch output: $10.00 per 1M audio tokens

Access is through the Gemini API preview surface where TTS models are enabled.

Best Use Cases

Use Gemini 2.5 Pro TTS Preview for narrated explainers, multilingual educational or internal-communications content, and other speech-generation workflows where output quality matters more than lowest-cost synthesis. It is a stronger fit than a live voice model when the task is deliberate speech generation rather than conversation, but teams should include Gemini 3.1 Flash TTS in fresh evaluations.

Comparisons

Gemini 2.5 Flash TTS Preview (Google): Better cost/performance for high-volume narration, while Pro TTS is the higher-quality route.
Gemini 2.5 Flash Live Preview (Google): Better for realtime interactive voice agents, while Pro TTS is built for one-way speech output.
ElevenLabs / Eleven v3: Stronger dedicated voice ecosystem, while Gemini Pro TTS is the tighter Google-stack option.