Back to List
How OpenAI Scales Low-Latency Voice AI for 900 Million Weekly Users via WebRTC Rearchitecture
Industry NewsOpenAIWebRTCVoice AI

How OpenAI Scales Low-Latency Voice AI for 900 Million Weekly Users via WebRTC Rearchitecture

OpenAI has revealed the engineering strategies used to deliver low-latency voice AI at a massive scale, supporting over 900 million weekly active users. To ensure natural, real-time conversations for ChatGPT and the Realtime API, OpenAI rearchitected its WebRTC stack to address critical infrastructure constraints. The new "split relay plus transceiver" architecture overcomes challenges such as one-port-per-session limitations and the need for stable stateful session ownership. By optimizing global routing and first-hop latency, OpenAI maintains high-quality media transport with low jitter and packet loss. This technical evolution allows for crisp turn-taking and responsive AI interactions, essential for the next generation of interactive AI agents and workflows.

Hacker News

Key Takeaways

  • Massive Scale Support: OpenAI's infrastructure now supports over 900 million weekly active users with low-latency voice capabilities.
  • Architectural Shift: The team moved to a "split relay plus transceiver" architecture to bypass traditional WebRTC scaling limitations.
  • Three Core Requirements: Success is defined by global reach, rapid connection setup, and stable media round-trip time with minimal jitter.
  • Infrastructure Optimization: The rearchitecture addresses port termination constraints and stateful session ownership (ICE and DTLS) to improve internal packet routing.
  • Standard Compliance: Despite internal changes, the system preserves standard WebRTC behavior for clients, utilizing ICE, DTLS, and SRTP.

In-Depth Analysis

The Challenge of Real-Time AI at Global Scale

For voice AI to feel natural, it must operate at the speed of human speech. OpenAI identifies that any network interference—such as awkward pauses, clipped interruptions, or delayed "barge-in"—immediately degrades the user experience. This is particularly critical for ChatGPT voice users, developers utilizing the Realtime API, and interactive AI agents. At a scale of 900 million weekly active users, OpenAI has established three concrete performance requirements: providing global reach, ensuring users can start speaking immediately upon session start, and maintaining low, stable media round-trip times. Achieving these goals requires a network environment where jitter and packet loss are kept to an absolute minimum to ensure turn-taking feels "crisp."

Overcoming Infrastructure Constraints

As OpenAI scaled, three primary technical constraints began to collide, necessitating a complete rearchitecture of their WebRTC stack. First, the traditional "one-port-per-session" media termination model did not align with OpenAI’s existing infrastructure. Second, stateful protocols such as Interactive Connectivity Establishment (ICE) and Datagram Transport Layer Security (DTLS) required stable session ownership, which is difficult to maintain in a dynamic, high-traffic environment. Third, global routing needed to be optimized to keep "first-hop" latency as low as possible. To resolve these issues, OpenAI developed a "split relay plus transceiver" architecture. This design allows the company to change how packets are routed internally while maintaining a standard WebRTC interface for external clients, ensuring compatibility across browsers and mobile applications.

Leveraging WebRTC for Interactive Media

WebRTC serves as the foundation for OpenAI’s real-time products because it standardizes the most difficult aspects of interactive media. By using WebRTC, OpenAI benefits from established protocols for connectivity and security. This includes ICE for NAT (Network Address Translation) traversal, which is essential for establishing connections across different network environments. Furthermore, the use of DTLS and SRTP (Secure Real-time Transport Protocol) ensures that media transport remains encrypted and secure. By building upon these open standards, OpenAI can focus on the specialized routing and scaling logic required to handle nearly a billion users while ensuring that the underlying media negotiation—including codec negotiation—remains robust and interoperable.

Industry Impact

The rearchitecture of OpenAI's voice infrastructure sets a new benchmark for the AI industry, particularly for developers building interactive workflows. By solving the "speed of speech" problem at a scale of 900 million users, OpenAI demonstrates that real-time AI agents are no longer limited by traditional networking bottlenecks. The transition to a split relay architecture suggests that as AI models become more integrated into daily communication, the underlying infrastructure must evolve from standard peer-to-peer models to highly optimized, server-side relay systems. This shift will likely influence how other AI companies approach the deployment of Realtime APIs and voice-first applications, prioritizing first-hop latency and stateful session stability as core metrics for user engagement.

Frequently Asked Questions

Question: Why did OpenAI move away from a traditional one-port-per-session model?

The one-port-per-session media termination model did not fit well with OpenAI's large-scale infrastructure. As the number of users grew to 900 million weekly, managing individual ports for every session became a constraint that hindered efficient scaling and internal packet routing.

Question: What are the three main requirements for OpenAI's voice AI?

OpenAI focuses on three specific goals: global reach for its massive user base, fast connection setup so users can speak the moment a session begins, and low, stable media round-trip times with minimal jitter to ensure natural conversation flow.

Question: How does the "split relay plus transceiver" architecture help?

This architecture allows OpenAI to optimize how packets are routed within their internal infrastructure to reduce latency and manage stateful sessions (like ICE and DTLS) more effectively, all while appearing as a standard WebRTC connection to the user's device.

Related News

OpenAI President Greg Brockman Testifies in Musk Lawsuit: Journal Evidence and Evasive Tactics Take Center Stage
Industry News

OpenAI President Greg Brockman Testifies in Musk Lawsuit: Journal Evidence and Evasive Tactics Take Center Stage

In a significant development in the legal battle between Elon Musk and OpenAI, OpenAI President Greg Brockman took the stand, revealing the critical role of his personal journals in the case. The testimony, which occurred on May 4, 2026, was marked by an unusual procedural sequence where Brockman was cross-examined before his direct examination. Observers noted Brockman's defensive and evasive communication style, described as reminiscent of a high school debate club, as he avoided direct answers to key questions. Musk’s legal team appears to be leveraging Brockman’s own written records as a primary pillar of their argument. This analysis delves into the procedural anomalies of the testimony and the potential impact of internal documentation on the future of AI industry litigation.

Exploring the Nature of AI Character: An Analysis of the Clippy vs Anton Utility Debate
Industry News

Exploring the Nature of AI Character: An Analysis of the Clippy vs Anton Utility Debate

This report examines the conceptual divide between AI as a persona and AI as a functional tool, as highlighted in the recent Latent Space reflection. The analysis focuses on the 'Clippy vs Anton' debate, which serves as a framework for understanding the nature of AI 'character.' By distinguishing between 'The Other' (AI as a distinct entity) and 'The Utility' (AI as a seamless instrument), the news highlights a fundamental philosophical shift in how artificial intelligence is perceived and developed. On a quiet day in the industry, this reflection provides a deeper look into the psychological and functional roles that AI agents occupy in the current technological landscape, questioning whether the future of AI lies in personified companionship or invisible efficiency.

Why AI Coding Agents Need Senior Engineering Scaffolding: An Analysis of the Agent Skills Project
Industry News

Why AI Coding Agents Need Senior Engineering Scaffolding: An Analysis of the Agent Skills Project

The 'Agent Skills' project, authored by Addy Osmani, addresses a fundamental flaw in current AI coding agents: their tendency to act like junior developers by prioritizing the shortest path to completion. While agents excel at generating code, they often bypass critical 'invisible' tasks such as writing specifications, creating tests, and ensuring code reviewability. Agent Skills introduces a framework of markdown-based 'skills' injected into an agent's context to enforce senior-level engineering discipline. By mapping these skills to established Software Development Life Cycles (SDLC) and Google’s engineering practices, the project aims to move AI beyond simple code generation toward reliable, scalable software engineering. With over 26,000 stars, the project highlights a significant industry demand for tools that bridge the gap between functional code and professional engineering standards.