Gemini 2.5 Flash

Google · Gemini 2.5

Stable Gemini 2.5 Flash route balancing multimodal capability, latency, and production cost.

Type
multimodal
Context
1M tokens
Max Output
66K tokens
Status
current
Input
$0.3/1M tok
Output
$2.5/1M tok
API Access
Yes
License
proprietary
fast multimodal cost-efficient assistant production
Released April 2025 · Updated May 16, 2026

Overview

Freshness note: Model capabilities, limits, and pricing can change quickly. This profile is a point-in-time snapshot last verified on May 16, 2026.

Gemini 2.5 Flash is Google’s stable price-performance 2.5 route for teams that need responsive output at scale. Gemini 3 Flash is now the newer preview/flex lane in the docs, but 2.5 Flash remains the safer choice when a production app wants stable model naming and well-understood pricing.

Capabilities

Flash handles summarization, extraction, content transformation, and many coding-adjacent tasks with strong latency characteristics. It is useful for applications where user experience depends on fast model response without dropping multimodal support, search grounding, or tool-use capability.

Technical Details

Google’s current model docs still list Gemini 2.5 Flash with a 1,048,576 token input window and a 65,536 token output limit. It supports text, image, video, audio, and PDF inputs, plus code execution, file search, function calling, structured outputs, search grounding, Google Maps grounding, thinking, and URL context.

Pricing & Access

Current Gemini API pricing lists Gemini 2.5 Flash at 0.30per1Mtext/image/videoinputtokensand0.30 per 1M text/image/video input tokens and 2.50 per 1M output tokens, with audio input at 1.00per1Mtokens.Batchpricingishalfthatlevel,andcontextcachingisnowlistedat1.00 per 1M tokens. Batch pricing is half that level, and context caching is now listed at 0.03 for text/image/video tokens. Access is available through Google AI Studio and Vertex AI.

Best Use Cases

Strong choice for customer support assistants, internal copilots, UI-driven chat tools, agentic workflows that need low latency, and automation tasks requiring fast response with good quality without adopting newer preview lifecycle risk.

Comparisons

Compared with Gemini 2.5 Pro, Flash is cheaper and faster but less capable on the hardest reasoning tasks. Compared with Gemini 3 Flash Preview, Flash is the safer stable route while 3-series Flash is the newer preview lane. Compared with Claude Haiku 4.5, Flash usually offers broader multimodal and tool support at a different cost profile.