Gemini 3.1 Flash-Lite — Signal Lens

Overview

Freshness note: Model capabilities, limits, and pricing can change quickly. This profile is a point-in-time snapshot last verified on June 8, 2026.

Gemini 3.1 Flash-Lite is Google’s stable low-cost Flash model in the Gemini API catalog. It extends the Flash-Lite idea into the Gemini 3.x generation for teams that want a current high-volume route for lightweight multimodal automation, simple extraction, translation, transcription, and latency-sensitive assistant work.

Capabilities

This model is aimed at classification, extraction, concise summarization, translation, transcription, and other high-throughput assistant or automation tasks where budget and response speed matter more than frontier-level reasoning. It is the kind of model you consider when the system needs to do a lot of work reliably, not when every single answer needs the highest available quality ceiling.

Technical Details

Google’s current Gemini API model catalog lists Gemini 3.1 Flash-Lite with:

1,048,576 token context window
65,536 max output tokens
multimodal input support across text, images, audio, video, and files
stable model ID: gemini-3.1-flash-lite

Like the rest of the current Flash line, it is part of Google’s broader agentic and multimodal model surface rather than a text-only budget SKU. Google’s deprecation table lists a May 7, 2027 shutdown date for this endpoint, so teams with long-lived deployments should still track migration notices.

Pricing & Access

Google’s current Gemini API pricing lists Gemini 3.1 Flash-Lite at:

Input: $0.25 per 1M text, image, or video tokens
Input: $0.50 per 1M audio tokens
Output: $1.50 per 1M tokens
Batch and Flex: $0.125 text/image/video input,$ 0.25 audio input, and $0.75 output per 1M tokens
Priority: $0.45 text/image/video input,$ 0.90 audio input, and $2.70 output per 1M tokens

Search and Maps grounding share Google’s Gemini 3 allowance before per-query billing. Access is through the Gemini API and related Google developer surfaces.

Best Use Cases

Use Gemini 3.1 Flash-Lite for ticket triage, structured extraction, batch classification, internal tooling, translation, transcription, and cost-sensitive assistants where multimodal support still matters. It is a practical option when Gemini 3.5 Flash feels richer than the task actually requires.

Comparisons

Gemini 3.5 Flash (Google): Better quality ceiling for harder agentic and coding tasks, while Flash-Lite is the cheaper stable route.
Gemini 2.5 Flash-Lite (Google): Older stable endpoint with an October 16, 2026 shutdown date.
GPT-5 nano (OpenAI): Another high-volume automation model, with the choice often driven by ecosystem and modality requirements.