OpenAI Partners with Cerebras for 'Near-Instant' Code Generation with GPT-5.3-Codex-Spark, Diversifying Beyond Nvidia
OpenAI has launched GPT-5.3-Codex-Spark, a new coding model designed for near-instantaneous response times, marking its first major inference partnership outside of its traditional Nvidia-dominated infrastructure. This model runs on hardware from Cerebras Systems, a chipmaker specializing in low-latency AI workloads. The move comes as OpenAI navigates a complex relationship with Nvidia, faces criticism over ChatGPT ads, secures a Pentagon contract, and experiences internal organizational changes. While an OpenAI spokesperson stated that GPUs remain foundational, Cerebras complements these by excelling in workflows requiring extremely low latency, enhancing real-time coding experiences. Codex-Spark is OpenAI's first model built for real-time coding collaboration, claiming over 1000 tokens per second on ultra-low latency hardware, though specific latency metrics were not provided.
OpenAI on Thursday launched GPT-5.3-Codex-Spark, a stripped-down coding model engineered for near-instantaneous response times. This deployment signifies the company's first significant inference partnership outside its traditional Nvidia-dominated infrastructure. The model operates on hardware provided by Cerebras Systems, a Sunnyvale-based chipmaker renowned for its wafer-scale processors that specialize in low-latency AI workloads.
This partnership emerges at a critical juncture for OpenAI. The company is currently navigating a strained relationship with its long-standing chip supplier, Nvidia. Concurrently, it faces increasing criticism regarding its decision to introduce advertisements into ChatGPT, has recently announced a Pentagon contract, and is experiencing internal organizational upheaval, including the disbandment of a safety-focused team and the resignation of at least one researcher in protest.
An OpenAI spokesperson clarified the strategic importance of this new collaboration to VentureBeat, stating, "GPUs remain foundational across our training and inference pipelines and deliver the most cost effective tokens for broad usage." The spokesperson added, "Cerebras complements that foundation by excelling at workflows that demand extremely low latency, tightening the end-to-end loop so use cases such as real-time coding in Codex feel more responsive as you iterate." This careful articulation, emphasizing the foundational role of GPUs while positioning Cerebras as a complement, highlights OpenAI's delicate balancing act as it diversifies its chip suppliers without alienating Nvidia, which remains the dominant force in AI accelerators.
OpenAI acknowledges that these speed gains come with certain capability tradeoffs, which the company believes developers will accept. Codex-Spark is presented as OpenAI's inaugural model specifically designed for real-time coding collaboration. The company asserts that the model can deliver more than 1000 tokens per second when served on ultra-low latency hardware. However, OpenAI declined to provide specific latency metrics, such as time-to-first-token figures, only stating that "Codex-Spark is optimized to feel near-instant."