GPT-realtime-1.5

OpenAI · GPT Realtime

OpenAI's earlier realtime voice model for audio-in, audio-out agents, now superseded by GPT-Realtime-2 for new flagship voice work.

Type
audio
Context
32K tokens
Max Output
4K tokens
Status
legacy
Input
$4/1M tok
Output
$16/1M tok
API Access
Yes
License
proprietary
realtime voice audio speech-to-speech customer-support multimodal legacy
Released March 2026 · Updated May 8, 2026

Overview

Freshness note: Model capabilities, limits, and pricing can change quickly. This profile is a point-in-time snapshot last verified on May 8, 2026.

GPT-realtime-1.5 is now the older premium route in OpenAI’s realtime voice lineup. OpenAI’s May 7, 2026 voice launch introduced gpt-realtime-2 as the new flagship realtime voice model with GPT-5-class reasoning, a larger context window, and stronger tool-calling behavior for live voice agents.

That does not make 1.5 irrelevant. It remains a useful compatibility target for teams already pinned to its behavior, prompt shape, latency profile, or pricing assumptions. New voice-agent builds should evaluate GPT-Realtime-2 first.

Capabilities

GPT-realtime-1.5 is built for turn-taking quality, interruptions, and speech-native interaction. It supports text input and output, audio input and output, and image input. That combination still matters for real products: a support agent can listen, speak back, inspect a screenshot, and keep the conversation live without stitching together separate STT, reasoning, and TTS systems.

OpenAI’s current docs also support WebRTC, WebSocket, and SIP-style connection patterns around the realtime stack. In practice, that makes the model relevant for browser assistants, server-mediated call flows, and telephony-style systems.

Technical Details

Current published limits:

  • Context window: 32,000 tokens
  • Max output: 4,096 tokens

OpenAI lists function calling as supported, but not structured outputs. That is an important implementation detail: GPT-realtime-1.5 is suitable for tool-using voice agents, but teams that require rigid JSON contracts should still design fallback paths or validation layers.

The model supports text and image input in addition to audio, which makes it more flexible than a pure speech stack. It remains audio-first, though, and should be treated primarily as a voice-agent model rather than a general default across all assistant workloads.

Pricing & Access

Published pricing still mixes text, audio, and image token billing:

  • Text input: $4.00 / 1M tokens
  • Text output: $16.00 / 1M tokens
  • Audio input: $32.00 / 1M tokens
  • Audio output: $64.00 / 1M tokens
  • Image input: $5.00 / 1M tokens

Access is through OpenAI API surfaces including realtime-oriented usage paths. Cost discipline matters because real conversational traffic tends to be dominated by audio tokens, not text tokens. Teams comparing 1.5 against GPT-Realtime-2 should test end-to-end call success, latency, and tool reliability, not just headline token prices.

Best Use Cases

Use GPT-realtime-1.5 for existing customer-support voice agents, live product assistants, phone workflows, embedded assistants, and systems that already depend on its behavior. For new builds, start with GPT-Realtime-2 unless compatibility testing shows a concrete reason to stay on 1.5.

Comparisons

  • GPT-Realtime-2 (OpenAI): New flagship realtime voice model with GPT-5-class reasoning, 128K context, and stronger tool behavior.
  • gpt-realtime (OpenAI): Earlier general realtime route; 1.5 remains a stronger legacy voice route than the original generation.
  • gpt-realtime-mini (OpenAI): Better for lower-cost, high-volume realtime usage when premium voice quality is not required.
  • gpt-audio-1.5 (OpenAI): Better fit when you want strong audio I/O in Chat Completions style workflows rather than the full realtime interaction model.