Back to List
TechnologyAIChipsInnovation

OpenAI Partners with Cerebras for 'Near-Instant' Code Generation with GPT-5.3-Codex-Spark, Diversifying Beyond Nvidia

OpenAI has launched GPT-5.3-Codex-Spark, a new coding model designed for near-instantaneous response times, marking its first major inference partnership outside of its traditional Nvidia-dominated infrastructure. This model runs on hardware from Cerebras Systems, a chipmaker specializing in low-latency AI workloads. The move comes as OpenAI navigates a complex relationship with Nvidia, faces criticism over ChatGPT ads, secures a Pentagon contract, and experiences internal organizational changes. While an OpenAI spokesperson stated that GPUs remain foundational, Cerebras complements these by excelling in workflows requiring extremely low latency, enhancing real-time coding experiences. Codex-Spark is OpenAI's first model built for real-time coding collaboration, claiming over 1000 tokens per second on ultra-low latency hardware, though specific latency metrics were not provided.

VentureBeat

OpenAI on Thursday launched GPT-5.3-Codex-Spark, a stripped-down coding model engineered for near-instantaneous response times. This deployment signifies the company's first significant inference partnership outside its traditional Nvidia-dominated infrastructure. The model operates on hardware provided by Cerebras Systems, a Sunnyvale-based chipmaker renowned for its wafer-scale processors that specialize in low-latency AI workloads.

This partnership emerges at a critical juncture for OpenAI. The company is currently navigating a strained relationship with its long-standing chip supplier, Nvidia. Concurrently, it faces increasing criticism regarding its decision to introduce advertisements into ChatGPT, has recently announced a Pentagon contract, and is experiencing internal organizational upheaval, including the disbandment of a safety-focused team and the resignation of at least one researcher in protest.

An OpenAI spokesperson clarified the strategic importance of this new collaboration to VentureBeat, stating, "GPUs remain foundational across our training and inference pipelines and deliver the most cost effective tokens for broad usage." The spokesperson added, "Cerebras complements that foundation by excelling at workflows that demand extremely low latency, tightening the end-to-end loop so use cases such as real-time coding in Codex feel more responsive as you iterate." This careful articulation, emphasizing the foundational role of GPUs while positioning Cerebras as a complement, highlights OpenAI's delicate balancing act as it diversifies its chip suppliers without alienating Nvidia, which remains the dominant force in AI accelerators.

OpenAI acknowledges that these speed gains come with certain capability tradeoffs, which the company believes developers will accept. Codex-Spark is presented as OpenAI's inaugural model specifically designed for real-time coding collaboration. The company asserts that the model can deliver more than 1000 tokens per second when served on ultra-low latency hardware. However, OpenAI declined to provide specific latency metrics, such as time-to-first-token figures, only stating that "Codex-Spark is optimized to feel near-instant."

Related News

Technology

Microsoft's HVE Core: Streamlined Hyper-Velocity Engineering Components for Project Acceleration and Copilot Integration

Microsoft has released 'hve-core,' a collection of refined hyper-velocity engineering components designed to accelerate project initiation and enhance existing projects. These components, which include instructions, prompts, agents, and skills, are specifically developed to help projects fully leverage the capabilities of various Copilots. The initiative aims to provide essential building blocks for developers looking to optimize their workflows and integrate advanced AI assistance into their development processes.

Technology

MiroFish: A Concise and Universal Swarm Intelligence Engine for Omnipresent Prediction Trends on GitHub

MiroFish, developed by 666ghj, is introduced as a concise and universal swarm intelligence engine designed for predicting a wide range of phenomena. The project, trending on GitHub since March 9, 2026, aims to leverage collective intelligence to offer predictive capabilities across various domains. Its core functionality focuses on providing a streamlined and adaptable solution for 'predicting all things,' highlighting its broad applicability in the realm of intelligent systems.

Technology

Alibaba's Page Agent: A JavaScript GUI Proxy for Natural Language Web Interface Control

Alibaba has released 'Page Agent,' a JavaScript-based GUI proxy designed to enable natural language control over web page interfaces. This tool, currently trending on GitHub, aims to simplify web interaction by allowing users to manage graphical user interfaces within web pages using natural language commands. The project is developed by Alibaba and was published on March 9, 2026.