GPT-4o mini TTS
OpenAI · GPT-4o Audio
OpenAI text-to-speech model for responsive, API-first voice output workflows.
Overview
Freshness note: Model capabilities, limits, and pricing can change quickly. This profile is a point-in-time snapshot last verified on May 16, 2026.
GPT-4o mini TTS is OpenAI’s text-to-speech model tier for generating voice output in interactive applications and automation flows. OpenAI’s current model card describes it as a GPT-4o-mini-powered TTS route for natural-sounding speech with a 2,000-token input ceiling.
Capabilities
The model supports programmatic voice generation for assistant responses, narrated content, and audio feedback loops. It is especially useful in systems already using OpenAI APIs for reasoning, orchestration, or realtime voice features.
Technical Details
OpenAI’s current model docs specify a maximum input length of 2,000 tokens. Output is audio rather than text, so token-style max output numbers are not very helpful here. Operational evaluation should prioritize voice quality, latency, speaking-style control, and stability across languages.
Pricing & Access
OpenAI’s current model docs list GPT-4o mini TTS at 12.00 per 1M audio output tokens. Because voices and controls can change by surface, confirm the exact voice catalog before launch.
Best Use Cases
Best for voice assistants, spoken notifications, educational narration, and multimodal interfaces needing low-friction speech output inside an OpenAI-centered stack.
Comparisons
Compared with Eleven v3, GPT-4o mini TTS offers tighter OpenAI ecosystem integration but usually less emphasis on expressive voice performance. Compared with Realtime voice pipelines, it is simpler for non-live or semi-live generation flows. Product-specific listening tests should still drive final selection.