Gemini Robotics-ER 1.6

Overview

Freshness note: Model capabilities, limits, and pricing can change quickly. This profile is a point-in-time snapshot last verified on June 8, 2026.

Gemini Robotics-ER 1.6 is Google’s robotics-tuned Gemini variant, published as gemini-robotics-er-1.6-preview. It is built as a vision-language model for advanced reasoning in the physical world: interpreting complex visual data, performing spatial reasoning, and planning actions from natural-language commands.

This is a specialized model. For typical chat or coding workflows, the standard Gemini 3.1 Pro line is the right reference. Robotics-ER 1.6 is the entry to consult when the question is “how does Google approach embodied reasoning for physical agents?”

Capabilities

Google’s current model and pricing docs highlight a specific capability profile:

Embodied reasoning for robots and physical-world agents.
Text, image, video, and audio inputs with text output.
Spatial and physical reasoning for interpreting visual scenes and planning actions.
Tool-oriented capabilities including function calling, code execution, file search, URL context, search grounding, Google Maps grounding, Computer Use, structured outputs, thinking, caching, batch, flex, and priority inference.
No image generation, audio generation, or Live API support on the current model page.

Technical Details

Public anchors at this snapshot:

Model ID: gemini-robotics-er-1.6-preview
Input token limit: 131,072
Output token limit: 65,536
Inputs: text, images, video, and audio
Output: text
Designed as a planner and reasoning model for physical-world workflows, not as a generic chat model.
Available through the Gemini API and Google AI Studio at this snapshot.
Marked as a preview release; DeepMind may adjust capabilities and access before GA.

Google’s deprecation table lists no shutdown date for gemini-robotics-er-1.6-preview at this snapshot.

Pricing & Access

Google’s current pricing page lists a separate Robotics-ER 1.6 Preview tier:

Input: $1.00 per 1M text, image, or video tokens
Input: $2.00 per 1M audio tokens
Output: $5.00 per 1M tokens
Batch: $0.50 text/image/video input,$ 1.00 audio input, and $2.50 output per 1M tokens

Access options:

Gemini API for developers
Google AI Studio for prototyping
Vertex AI / Gemini deployment paths where this preview model is enabled

Best Use Cases

Choose Gemini Robotics-ER 1.6 for:

Industrial inspection workflows where reading analog gauges and instruments matters.
Robotics planners that delegate motor control to VLA models while keeping high-level reasoning in a frontier model.
Embodied agents that need spatial reasoning, multi-view understanding, and tool-call orchestration in one surface.
Research and prototyping work around robotics planning, physical-scene interpretation, and tool-using embodied agents.

This is not the right model for general assistant work, chat, or non-physical agentic coding; the standard Gemini 3.1 Pro and Flash lines are better defaults there.

Comparisons

Gemini Robotics-ER 1.5 (Google): Direct predecessor; 1.6 is the current documented Robotics-ER preview route.
Gemini 3.5 Flash (Google): General-purpose fast Gemini model; Robotics-ER 1.6 is the robotics-specialized counterpart with embodied-reasoning positioning.
Other embodied-AI research models: Typically narrower research demos; Robotics-ER 1.6 differentiates through Gemini API access, physical-world reasoning, and broad tool support.