GPT-Realtime-Translate

Overview

Freshness note: Model capabilities, limits, and pricing can change quickly. This profile is a point-in-time snapshot last verified on June 8, 2026.

GPT-Realtime-Translate is OpenAI’s dedicated live speech-to-speech translation model. It is not a general realtime assistant model. Its job is to translate speech while source audio is still arriving, returning translated audio and transcript deltas quickly enough for live multilingual products.

OpenAI positions it for customer support, education, events, media, creator platforms, and cross-border sales workflows where waiting for batch translation would break the experience.

Capabilities

The model supports more than 70 input languages and 13 output languages in OpenAI’s launch materials. It is built for speech-to-speech translation that preserves meaning while keeping pace with the speaker, including regional pronunciation and domain-specific language.

Because it is a dedicated translation route, it does not support function calling or structured outputs. Treat it as a specialized audio model in a larger system rather than the agent brain for a workflow.

Technical Details

Current published limits:

Context window: 16,000 tokens
Max output: 2,000 tokens
Audio input and audio output
Text output for transcripts and deltas
Streaming: supported
Function calling: not supported
Structured outputs: not supported

The model uses the realtime translation endpoint and is priced by audio duration rather than text tokens.

Pricing & Access

Published OpenAI API pricing:

Realtime audio duration: $0.034 per minute
Equivalent per-second rate: about $0.00057

Access is through OpenAI’s realtime translation API. The billing model is duration-based, so cost estimates should start from expected conversation minutes rather than token counts.

Best Use Cases

Use GPT-Realtime-Translate for live translated customer calls, multilingual support queues, live event interpretation, product education videos, cross-language sales conversations, and creator tools where translated audio needs to arrive while the source is still being spoken.

For offline localization or document translation, a text model plus post-editing workflow will usually be cheaper and easier to govern.

Comparisons

GPT-Realtime-2 (OpenAI): Better for reasoning and tool-using voice agents; Translate is the specialized live translation lane.
GPT-Realtime-Whisper (OpenAI): Transcribes live speech to text; Translate produces translated speech and transcript deltas.
Gemini Live-style translation flows: Similar product class, with the choice depending on language coverage, latency, and platform alignment.