Gemini 2.5 Flash-Lite

Google · Gemini 2.5

Stable budget Gemini 2.5 tier for large-scale assistant and automation workloads.

Type
multimodal
Context
1M tokens
Max Output
66K tokens
Status
current
Input
$0.1/1M tok
Output
$0.4/1M tok
API Access
Yes
License
proprietary
low-cost high-throughput assistant automation multimodal
Released June 2025 · Updated May 16, 2026

Overview

Freshness note: Model capabilities, limits, and pricing can change quickly. This profile is a point-in-time snapshot last verified on May 16, 2026.

Gemini 2.5 Flash-Lite targets high-throughput workloads where cost control and response speed are primary constraints. Google still lists it as the smallest and most cost-effective 2.5 model, while Gemini 3.1 Flash-Lite has become the newer line to test when preview or newer-stable behavior is acceptable.

Capabilities

The model is practical for classification, extraction, concise summarization, translation, and routine assistant tasks. It can handle many day-to-day workflows when prompts are structured and outputs are validated, and it remains one of Google’s more practical routes for scaled agentic back-end work where the harder reasoning can be routed elsewhere.

Technical Details

Google’s current model docs still list Gemini 2.5 Flash-Lite with a 1,048,576 token input window and a 65,536 token output limit. It supports the same broad input modalities and many of the same agent-oriented capabilities as Flash, but with a lower quality ceiling on difficult tasks. Google still describes it as a full multimodal model with thinking, caching, code execution, file search, function calling, grounding, structured outputs, and URL context.

Pricing & Access

Current Gemini API pricing lists Gemini 2.5 Flash-Lite at 0.10per1Mtext/image/videoinputtokensand0.10 per 1M text/image/video input tokens and 0.40 per 1M output tokens, with audio input at 0.30per1Mtokens.Batchpricingishalfthatlevel,andcontextcachingisnowlistedat0.30 per 1M tokens. Batch pricing is half that level, and context caching is now listed at 0.01 for text/image/video tokens. Access is available through Google AI Studio and Vertex AI where the stable Flash-Lite SKU is enabled.

Best Use Cases

Best for ticket triage, data normalization, lightweight support automation, translation, and high-volume internal tooling where responsiveness and budget matter.

Comparisons

Compared with Gemini 2.5 Flash, Flash-Lite is more cost-focused with a lower quality ceiling on difficult tasks. Compared with Gemini 3.1 Flash-Lite, the 2.5 model is the safer stable route while 3.1 is the newer preview lane. Compared with Claude Haiku 4.5, choice depends on latency profile, output style, and integration requirements.