Latest

Wearables Mobvoi TicWatch Atlas 2 review: the Wear OS underdog that makes battery life …
Editorials The DMA reaches your UK iPhone in 2026 — but the good bits are stuck in Bru…
Comparisons OnePlus Pad 3 UK: the £529 tablet that makes the iPad Air sweat
Editorials Oppo Find N5 UK: the world’s thinnest foldable Britain still can’…
Wearables COROS Vertix 2S UK review: the £599 adventure watch aimed straight at Garmin
Comparisons Proton Workspace in 2026: the privacy-first Google Workspace alternative UK S…
Editorials Does fast charging really wreck your phone’s battery? The 2026 UK truth…

All news

AI

OpenAI Realtime API voice models reset the AI voice agent market

OpenAI Realtime API voice models launch 7 May 2026 with GPT-Realtime-2, Translate and Whisper. 128K context window, low-cost translation, GPT-5 class reasoning.

Hannah Foster 7 May 2026 Updated 22 Jun 2026 6 min read

OpenAI Realtime API voice models just shipped a generation jump with GPT-Realtime-2, GPT-Realtime-Translate and GPT-Realtime-Whisper, and the price-per-minute on translation is the headline that should worry any voice startup with a moat made of latency. OpenAI announced the three models on 7 May, exiting the Realtime API from beta to general availability on the same day.

Key facts

OpenAI Realtime API voice models GPT-Realtime-2, GPT-Realtime-Translate and GPT-Realtime-Whisper launched 7 May 2026 alongside Realtime API general availability.
GPT-Realtime-2 carries GPT-5 class reasoning, a 128K context window, five reasoning effort levels and tool-use precision improvements.
GPT-Realtime-Translate handles over 70 input languages with 13 output languages at £0 (about $0.03)4 per minute.
GPT-Realtime-Whisper delivers streaming transcription at £0 (about $0.01)7 per minute, beating standalone Whisper-3 on common benchmarks.

Why the OpenAI Realtime API voice models change the maths

The OpenAI Realtime API voice models matter because the audio AI market spent two years building voice agents on top of cascaded ASR-LLM-TTS pipelines. Those pipelines added 700 to 1200ms of latency per turn and demanded a small army of fine-tuned speech models. GPT-Realtime-2 collapses that into a single endpoint that listens, reasons, calls tools, optionally produces an image input response and answers in voice, all without leaving the model. That is the architectural step Google’s Gemini Live and Anthropic’s Claude voice tier have been working toward, and OpenAI got there first to general availability.

Pricing is the second part of the story. £0 (about $0.03)4 per minute for live translation across 70-plus input languages is below Google Cloud Translate’s streaming tier and below ElevenLabs’ multilingual offering. We covered the ElevenLabs AI music push last month, and the same company is now under genuine pricing pressure on its core voice business. Whisper as a paid streaming product at £0 (about $0.01)7 per minute also kills the calculus of self-hosting Whisper-3 for anything below several thousand hours per month.

GPT-Realtime-2 reasoning levels and tool calling

GPT-Realtime-2 is the first speech-to-speech model OpenAI describes as having GPT-5 class reasoning. Developers can pick from minimal, low, medium, high and xhigh reasoning effort, with low set as the default. Low is the sensible balance for call-centre work, restaurant ordering, hands-free admin and other latency-sensitive jobs. High is where the model thinks before it speaks, useful for legal triage and any voice agent that needs to weigh options before committing to an action. xhigh is the deep-reasoning lane that OpenAI is positioning for diagnostic and decision-support voice products.

Image: OpenAI / Wikimedia Commons

The context window jumped from 32K to 128K tokens, which is the change voice agent builders have been begging for. A 128K window means a single Realtime session can carry an entire two-hour customer history, multiple tool schemas, an MCP server’s worth of memory and the in-call audio without summarisation tricks. Performance benchmarks support the story – OpenAI says GPT-Realtime-2 scores 15.2% higher on Big Bench Audio and 13.8% higher on Audio MultiChallenge than GPT-Realtime-1.5. Those are real margins, not vendor noise.

Video: Ray Fernando – OpenAI Realtime API demo walkthrough

Translate and Whisper: the OpenAI Realtime API voice models for everyone else

Model	Price	Use case	MTW read
GPT-Realtime-2	£25 (about $32)/M audio in	Production voice agents with reasoning	The new default, kills the cascaded stack.
GPT-Realtime-Translate	£0 (about $0.03)4/min	Live simultaneous translation	Sub-cost of Google streaming translate, kills boutique startups.
GPT-Realtime-Whisper	£0 (about $0.01)7/min	Streaming transcription	Pure utility play, hosted Whisper makes self-hosting irrational below scale.

GPT-Realtime-Translate is the model that should make UK and EU contact-centre operators rethink their localisation budgets. 70-plus input languages with 13 output targets covers the European and Asian markets a London-based business actually serves, and at 3.4p per minute it is now cheaper than the legacy outsourced live-translation services in finance and aviation. The catch is target-language coverage – Welsh, Irish Gaelic and Scottish Gaelic are not on the 13-output list yet, and OpenAI has been opaque about when smaller languages get parity. The Anthropic AWS £79 (about $100)B deal we wrote about in April was partly about exactly this kind of voice-translation workload, and OpenAI just made it sharper.

What UK developers should watch

UK developers should care about three things. First, the Realtime API now supports SIP phone calling and remote MCP servers, which means a voice agent can answer a real PSTN number in minutes rather than the weeks Twilio integrations used to demand. Second, image inputs landed at the same time – your voice agent can accept a photo mid-call and act on it, which closes the gap with Google’s Gemini 3.1 Flash Lite multimodal route. Third, the pricing is denominated in USD, so any UK procurement model needs an FX buffer baked in.

The risk for OpenAI is the same as ever – reliability. We wrote about the Claude double outage in April and the OpenAI Realtime API’s beta period had its own service interruptions in Q1. General availability raises the SLA bar, and any UK-regulated voice product needs to plan for fallback ASR. The model is excellent. The dependency is real.

OpenAI Realtime API voice models power ChatGPT advanced voice and partner apps — Image: OpenAI / Wikimedia Commons

OpenAI Realtime API voice models and the multimodal angle

The launch ships image inputs alongside the new audio models, which closes a real gap. A voice agent on a customer support line can now accept a photo of a damaged product mid-call and respond with a description, an action recommendation and a tool call – all from the same Realtime session. The cascaded alternative would have required a separate vision model, a separate state-passing layer and at least one network hop between systems. OpenAI’s bundled approach cuts that to a single endpoint and a single billing meter, which is exactly the simplification the production voice agent market has been demanding since 2024.

The MCP integration is the other quietly important detail. Remote MCP server support means a voice agent can connect to a corporate knowledge base, a CRM, a calendar and a payment processor without rebuilding the function-calling layer for each. Anthropic’s MCP rollout passed 97 million installs last month, which we covered in our MCP adoption analysis, and OpenAI adopting the standard for Realtime API is the moment the protocol becomes the de facto inter-model integration layer. UK voice product teams that already shipped an MCP server for chat now get voice for free.

MTW verdict

OpenAI Realtime API voice models reset the voice agent market for the second time in eighteen months. GPT-Realtime-Translate at 3.4p per minute is the single product launch that will reshape pricing across UK voice startups. Build on it, but architect a fallback path to Whisper-3 self-hosted or Gemini Live before you commit a customer contract to a single vendor.

Buyer action

Where to buy or check next

Use this as the final check before ordering a phone, changing network or trusting a headline monthly price.

Currys mobile phonesCompare unlocked phones and UK retail prices.Argos mobile phonesCheck mainstream UK phone stock and pricing.EE mobileCheck contract, SIM and network options.Vodafone mobileCompare UK network deals and SIM options.O2 shopCheck O2 phone, SIM and tariff availability.Ofcom coverage checkerCheck local mobile coverage before switching.

Editorial standards

By Hannah Foster

Related coverage

Wearables

Mobvoi TicWatch Atlas 2 review: the Wear OS underdog that makes battery life the whole point

Jul 13, 2026

Editorials

The DMA reaches your UK iPhone in 2026 — but the good bits are stuck in Brussels

Jul 12, 2026

Comparisons

OnePlus Pad 3 UK: the £529 tablet that makes the iPad Air sweat

Jul 11, 2026

Editorials

Oppo Find N5 UK: the world’s thinnest foldable Britain still can’t officially buy

Jul 10, 2026

Wearables

COROS Vertix 2S UK review: the £599 adventure watch aimed straight at Garmin

Jul 9, 2026

Comparisons

Proton Workspace in 2026: the privacy-first Google Workspace alternative UK SMEs keep asking about

Jul 9, 2026

Reader discussion

Leave a comment

Comments are moderated. Keep it useful, accurate, and on topic.

Join the discussion Cancel reply

Keep reading

Today on MTW

The latest stories moving through the newsroom.

Wearables / 13 Jul 2026

Mobvoi TicWatch Atlas 2 review: the Wear OS underdog that makes battery life the whole point

Editorials / 12 Jul 2026

The DMA reaches your UK iPhone in 2026 — but the good bits are stuck in Brussels

Comparisons / 11 Jul 2026

OnePlus Pad 3 UK: the £529 tablet that makes the iPad Air sweat

Editorials / 10 Jul 2026

Oppo Find N5 UK: the world’s thinnest foldable Britain still can’t officially buy

Keep reading

Latest reviews

Recent hands-on verdicts and product reads.

Reviews / 6 Jul 2026

Bowers & Wilkins Px8 S2 review: the UK verdict on the £629 headphones for grown-ups

Reviews / 4 Jul 2026

Cambridge Audio Melomania P100 review: the British ANC pair that undercuts Sony

Reviews / 3 Jul 2026

Bowers & Wilkins Zeppelin review: the design speaker that finally sounds the part

Keep reading

Buying guides

Practical UK buying advice and comparisons.

Buying Guides / 8 Jul 2026

Best premium wireless earbuds UK 2026: Sony WF-1000XM6 vs Technics EAH-AZ100 and Bowers & Wilkins Pi8

Buying Guides / 28 Jun 2026

The best laptop for UK photo and video work in 2026: which premium machine I’d actually buy

Buying Guides / 27 Jun 2026

Best NAS for UK creators in 2026: Synology, QNAP or Asustor?

Keep reading

From the archive

Legacy reporting from the MobileTechWorld back catalogue.

Archive / 22 Oct 2013

Nokia Lumia 1520 Announced: Specifications

Archive / 22 Oct 2013

Instagram, Vine, Xbox Video and many more hitting Windows Phone 8 in the coming weeks

Archive / 14 Oct 2013

Microsoft launches Windows Phone 8 developer preview program and releases GDR3 Update today

Archive / 8 Sep 2013

Nokia Lumia 1520 shows up in real life with MicroSD Card slot, 2GB of Ram and Snapdragon S800 SoC