GPT-4o mini Transcribe
OpenAI · GPT-4o Audio
Lower-cost OpenAI speech-to-text tier for high-volume transcription pipelines.
Overview
Freshness note: Model capabilities, limits, and pricing can change quickly. This profile is a point-in-time snapshot last verified on February 15, 2026.
GPT-4o mini Transcribe is OpenAI’s efficiency-focused STT tier for teams running high-volume audio-to-text workloads. It is designed for production pipelines where cost control is a primary constraint and perfect transcription quality is not required on every clip.
Capabilities
The model handles common transcription and audio normalization tasks with practical quality for many operational use cases. It is well suited to routing pipelines where premium quality tiers are reserved for difficult clips and the majority of traffic needs cheaper transcription.
Technical Details
As a speech model, token context/output fields are represented as 0 in this repository and treated as N/A for token-based UI. Evaluate this model using transcription quality metrics and latency under target audio conditions.
Pricing & Access
OpenAI’s current pricing docs list GPT-4o mini Transcribe at $3.00 per 1M audio input tokens. It is exposed through OpenAI audio model APIs, but teams should still verify endpoint support and throughput limits before forecast modeling.
Best Use Cases
Strong fit for large-scale meeting ingestion, support call transcription, media indexing, and telemetry-heavy voice analytics pipelines.
Comparisons
Compared with GPT-4o Transcribe, mini offers lower cost with quality tradeoffs on difficult audio. Compared with ElevenLabs audio stack options, decision depends on end-to-end voice platform requirements and cost targets.