GPT-Realtime-Whisper — Signal Lens

Overview

Freshness note: Model capabilities, limits, and pricing can change quickly. This profile is a point-in-time snapshot last verified on June 8, 2026.

GPT-Realtime-Whisper is OpenAI’s realtime transcription model for applications that need transcript deltas from live audio while a person is still speaking. It extends the Whisper-style speech-to-text lane into lower-latency realtime use cases rather than replacing every offline transcription model.

Use this page when the question is “how do we transcribe live audio streams quickly?” rather than “which voice model should run a full conversation?”

Capabilities

GPT-Realtime-Whisper is designed for live captions, meetings, classrooms, broadcast workflows, customer-support monitoring, healthcare intake capture, and voice-agent pipelines that need continuous understanding of user speech.

The model supports streaming and is tuned for latency and accuracy tradeoffs in realtime audio. It does not support function calling or structured outputs, because it is a transcription model rather than a general agent model.

Technical Details

Current published limits:

Context window: 16,000 tokens
Max output: 2,000 tokens
Audio input
Text output
Streaming: supported
Function calling: not supported
Structured outputs: not supported

The model uses realtime transcription sessions and is priced by audio duration rather than text tokens.

Pricing & Access

Published OpenAI API pricing:

Realtime audio duration: $0.017 per minute
Equivalent per-second rate: about $0.00028

Access is through OpenAI’s realtime transcription API. Estimate cost from concurrent audio minutes and expected operating hours, then separately budget any downstream summarization or agent steps.

Best Use Cases

Use GPT-Realtime-Whisper for live captions, transcript feeds for voice agents, support-call indexing while the call is active, accessibility tooling, meeting notes, classroom transcription, and broadcast workflows that need low-latency text.

For batch file transcription where latency does not matter, compare against GPT-4o Transcribe, GPT-4o mini Transcribe, and legacy Whisper pricing before choosing this route.

Comparisons

GPT-4o Transcribe (OpenAI): Better fit for high-quality offline transcription and file ingestion.
GPT-4o mini Transcribe (OpenAI): Cheaper modern STT route for non-live pipelines.
GPT-Realtime-Translate (OpenAI): Translates speech across languages; Realtime Whisper focuses on same-language speech-to-text.