Gemini Omni Flash — Signal Lens

Overview

Freshness note: Video-model capabilities, rollout timing, API availability, and pricing can change quickly. This profile is a point-in-time snapshot last verified on July 6, 2026.

Gemini Omni Flash is the first model in Google’s Gemini Omni family. Google introduced it at I/O 2026 as a model for creating video from mixed inputs, and the current Gemini API preview model ID is gemini-omni-flash-preview.

The main product idea is to combine Gemini reasoning with creative video generation and editing, so users can start from text, images, video, or audio references and refine output through conversation. Gemini Omni Flash is now both a product-surface feature in Gemini and Flow and a paid-tier Gemini API preview route.

Capabilities

Gemini Omni Flash is designed for video creation and video editing rather than text chat. Google documents several API workflow patterns:

text-to-video generation with audio
image-to-video generation
subject-reference video generation
task-specific hints such as text_to_video, image_to_video, reference_to_video, and edit
stateful video editing through previous_interaction_id
uploaded-video editing through the Files API where regionally supported

Google says the Omni family starts with video and will add other output modalities such as image and audio over time.

Technical Details

This is a video-native model, so token-style context and output fields are stored as 0 in Signal Lens and should be treated as N/A in model comparisons.

Current public anchors:

First model in the Gemini Omni family
API model ID: gemini-omni-flash-preview
Initial output mode: video
Input references: text, images, video, and audio
Product rollout: Gemini app, Google Flow, Google Flow Music, YouTube Shorts Remix, YouTube Create
API route: Interactions API preview
Supported aspect ratios in current examples: 16:9 and 9:16
SynthID watermarking applies to generated videos

Important current limitations: uploaded-video editing is not available for users in the EEA, Switzerland, and the UK; audio references are not supported; multi-video reasoning is not supported; video extension/interpolation is not supported; and system instructions, temperature, top_p, stop sequences, and negative prompts are not supported.

Pricing & Access

Gemini Omni Flash is available on the paid tier of the Gemini API and through supported Google product surfaces. Current standard Gemini API pricing is:

Input: $1.50 per 1M text, image, video, or audio tokens
Output text and thinking: $9.00 per 1M tokens
Output video: $17.50 per 1M video-output tokens
Effective 720p output price: about $0.10 per second

Google’s pricing note says video billing is based on total output token consumption at 5,792 tokens per second of 720p video. Product-surface access still varies by Google AI plan, geography, and account type.

Best Use Cases

Use Gemini Omni Flash for early creative exploration, video transformations, social clip ideation, reference-driven visual drafts, and explainers where conversational editing is more useful than one-shot video generation.

For developer API work, compare it against Veo 3.1, Sora 2, and grok-imagine-video. Omni is the more conversational multimodal editing route; Veo is still the clearer Google model family for conventional video generation pipeline planning.

Comparisons

Veo 3.1 (Google): Current Google API video family for conventional generation, Fast/Lite variants, and 4K options; Omni Flash is the newer conversational multimodal creation and editing lane.
Nano Banana Pro (Google): Gemini-native image generation/editing; Omni Flash focuses first on video.
Sora 2 (OpenAI): OpenAI video generation route with a different product ecosystem and API posture.
Grok Imagine Video (xAI): xAI API video route with published per-second pricing.