Gemini 2.5 Flash-Lite
Google · Gemini 2.5
Stable budget Gemini 2.5 tier for large-scale assistant and automation workloads.
Overview
Freshness note: Model capabilities, limits, and pricing can change quickly. This profile is a point-in-time snapshot last verified on May 16, 2026.
Gemini 2.5 Flash-Lite targets high-throughput workloads where cost control and response speed are primary constraints. Google still lists it as the smallest and most cost-effective 2.5 model, while Gemini 3.1 Flash-Lite has become the newer line to test when preview or newer-stable behavior is acceptable.
Capabilities
The model is practical for classification, extraction, concise summarization, translation, and routine assistant tasks. It can handle many day-to-day workflows when prompts are structured and outputs are validated, and it remains one of Google’s more practical routes for scaled agentic back-end work where the harder reasoning can be routed elsewhere.
Technical Details
Google’s current model docs still list Gemini 2.5 Flash-Lite with a 1,048,576 token input window and a 65,536 token output limit. It supports the same broad input modalities and many of the same agent-oriented capabilities as Flash, but with a lower quality ceiling on difficult tasks. Google still describes it as a full multimodal model with thinking, caching, code execution, file search, function calling, grounding, structured outputs, and URL context.
Pricing & Access
Current Gemini API pricing lists Gemini 2.5 Flash-Lite at 0.40 per 1M output tokens, with audio input at 0.01 for text/image/video tokens. Access is available through Google AI Studio and Vertex AI where the stable Flash-Lite SKU is enabled.
Best Use Cases
Best for ticket triage, data normalization, lightweight support automation, translation, and high-volume internal tooling where responsiveness and budget matter.
Comparisons
Compared with Gemini 2.5 Flash, Flash-Lite is more cost-focused with a lower quality ceiling on difficult tasks. Compared with Gemini 3.1 Flash-Lite, the 2.5 model is the safer stable route while 3.1 is the newer preview lane. Compared with Claude Haiku 4.5, choice depends on latency profile, output style, and integration requirements.