Gemini 3.5 Flash — Signal Lens

Overview

Freshness note: Model capabilities, limits, and pricing can change quickly. This profile is a point-in-time snapshot last verified on June 8, 2026.

Gemini 3.5 Flash is Google’s stable fast frontier model for the agentic Gemini 3.5 generation. Google launched the model at I/O 2026 on May 19, positioning it as the first model in the 3.5 family and the current high-speed route for complex coding, multimodal reasoning, and long-horizon agent workflows.

The practical change is that Flash is no longer just the cheaper Gemini lane. Gemini 3.5 Flash is now Google’s main stable fast model for sustained work: planning, coding, tool use, multimodal understanding, and sub-agent style execution through Google Antigravity and the Gemini API.

Capabilities

Gemini 3.5 Flash supports text, image, video, audio, and PDF inputs with text output. Google’s current model page lists support for function calling, structured outputs, search grounding, Google Maps grounding, URL context, file search, code execution, batch, caching, flex inference, and priority inference. It does not currently support Computer Use, image generation, audio generation, or the Live API.

Google’s launch framing emphasizes agentic coding and long-horizon tasks. The model is designed for rapid agent loops where the system needs to inspect context, reason, use tools, update work, and keep moving without escalating immediately to a slower Pro-tier route. It is also the model behind new Gemini app and Search agentic experiences such as Gemini Spark.

Technical Details

Current public API limits:

Model ID: gemini-3.5-flash
Input token limit: 1,048,576
Output token limit: 65,536
Input types: text, image, video, audio, and PDF
Output type: text
Stable version: gemini-3.5-flash
Latest update: May 2026
Published knowledge cutoff: January 2025
Thinking levels: minimal, low, medium, and high, with medium as the default

Google’s 3.5 migration guide also says Gemini 3.5 Flash preserves reasoning context across multi-turn conversations automatically when thought signatures are present. That can improve iterative agent work, but it can also raise input-token usage over long conversations.

Pricing & Access

Google’s current standard paid Gemini API pricing for Gemini 3.5 Flash is:

Input: $1.50 per 1M tokens
Output, including thinking tokens: $9.00 per 1M tokens
Context caching: $0.15 per 1M tokens plus storage
Batch and Flex: $0.75 input /$ 4.50 output per 1M tokens
Priority: $2.70 input /$ 16.20 output per 1M tokens

Grounding with Google Search and Google Maps shares Google’s Gemini 3 allowance and then bills per search query. Access is through the Gemini API, Google AI Studio, Android Studio, Google Antigravity, Gemini Enterprise Agent Platform, Gemini Enterprise, the Gemini app, and AI Mode in Search where available.

Best Use Cases

Choose Gemini 3.5 Flash for agentic coding, document-heavy app workflows, multimodal assistants, extraction pipelines, UI generation loops, and production systems that need strong reasoning without Pro-tier latency or cost.

It is less appropriate for specialized browser control, realtime voice agents, image generation, or video generation. Google has separate Gemini Live, Computer Use, Nano Banana, Veo, and Gemini Omni routes for those workloads.

Comparisons

Gemini 3 Flash (Google): Older preview fast route; Gemini 3.5 Flash is the newer stable model.
Gemini 3.1 Pro Preview (Google): Higher-cost preview Pro lane for harder reasoning and multimodal tasks.
Gemini 2.5 Flash (Google): Stable older Flash route that may remain useful for compatibility and cost-sensitive deployments.
GPT-5.5 (OpenAI): Strong OpenAI frontier alternative, especially for Codex and ChatGPT-native agentic workflows.