Gemini 3.1 Flash-Lite

Google · Gemini 3.1

Google's newer low-cost Gemini preview tier for high-throughput multimodal assistant and automation workloads.

Type
multimodal
Context
1M tokens
Max Output
66K tokens
Status
preview
Input
$0.1/1M tok
Output
$0.4/1M tok
API Access
Yes
License
proprietary
multimodal flash-lite preview cost-efficient high-throughput
Released March 2026 · Updated April 4, 2026

Overview

Freshness note: Model capabilities, limits, and pricing can change quickly. This profile is a point-in-time snapshot last verified on April 4, 2026.

Gemini 3.1 Flash-Lite is Google’s newer low-cost Flash preview model in the Gemini API catalog. It extends the Flash-Lite idea into the Gemini 3.x generation for teams that want Google’s newest low-cost path while still keeping a stable 2.5 route available for production fallback.

Capabilities

This model is aimed at classification, extraction, concise summarization, translation, and other high-throughput assistant or automation tasks where budget and response speed matter more than frontier-level reasoning. It is the kind of model you consider when the system needs to do a lot of work reliably, not when every single answer needs the highest available quality ceiling.

Technical Details

Google’s current Gemini API model catalog lists Gemini 3.1 Flash-Lite with:

  • 1,048,576 token context window
  • 65,536 max output tokens
  • multimodal input support across text, images, audio, video, and files

Like the rest of the current Flash line, it is part of Google’s broader agentic and multimodal model surface rather than a text-only budget SKU.

Pricing & Access

Google’s current Gemini API pricing lists Gemini 3.1 Flash-Lite at:

  • Input: $0.10 per 1M text, image, or video tokens
  • Output: $0.40 per 1M tokens

Audio input is priced separately at a higher rate. Access is through the Gemini API and related Google developer surfaces where preview models are enabled.

Best Use Cases

Use Gemini 3.1 Flash-Lite for ticket triage, structured extraction, batch classification, internal tooling, and cost-sensitive assistants where multimodal support still matters. It is a practical option when Gemini 3 Flash feels richer than the task actually requires.

Comparisons

  • Gemini 3 Flash (Google): Better quality ceiling for harder tasks, while Flash-Lite is the cheaper preview route.
  • Gemini 2.5 Flash-Lite (Google): Stable production alternative with similar cost posture.
  • GPT-5 nano (OpenAI): Another high-volume automation model, with the choice often driven by ecosystem and modality requirements.