OpenAI has partnered with AI chipmaker Cerebras in a multi-year agreement aimed at adding 750 megawatts of ultra low-latency compute to OpenAI’s platform to speed up AI responses for customers. The arrangement focuses on AI inference—running models to generate outputs—rather than training, with OpenAI saying the goal is to make its AI respond much faster across tasks like answering hard questions, generating code, creating images, and running AI agents.
Cerebras said the deployment will roll out in multiple stages beginning in 2026, calling it the largest high-speed AI inference deployment in the world. OpenAI said the capacity will come online in multiple tranches through 2028 as the company integrates it into its inference stack in phases and expands across workloads. TechCrunch reported that the deal is worth over $10 billion, citing a source familiar with the details.
What OpenAI and Cerebras announced
OpenAI said it is partnering with Cerebras to add 750MW of ultra low-latency AI compute to its platform. Cerebras said it has signed a multi-year agreement with OpenAI to deploy 750 megawatts of Cerebras wafer-scale systems to serve OpenAI customers. TechCrunch reported that Cerebras will deliver 750 megawatts of compute to OpenAI starting this year and continuing through 2028.
In its announcement, OpenAI described Cerebras as building purpose-built AI systems designed to accelerate long outputs from AI models, with speed coming from putting massive compute, memory, and bandwidth together on a single giant chip and removing bottlenecks that slow inference on conventional hardware. OpenAI said adding this low-latency capacity is intended to make AI responses faster, arguing that real-time responses lead users to do more, stay longer, and run higher-value workloads.
Why the deal targets “real-time” inference
Both companies framed the partnership around faster outputs for OpenAI’s customers, with OpenAI saying the systems will speed up responses that currently take more time to process. OpenAI described AI usage as a repeated loop—request, model “thinks,” response—and said lowering latency makes that loop feel real-time. In Cerebras’ post, CEO Andrew Feldman compared the shift to how broadband changed the internet, saying real-time inference will transform AI.
Cerebras also claimed that large language models running on its systems can deliver responses up to 15 times faster than GPU-based systems, including in use cases such as coding agents and voice chat. TechCrunch similarly noted that Cerebras claims its AI-focused systems are faster than GPU-based systems such as Nvidia’s offerings.
How Cerebras fits OpenAI’s compute mix
OpenAI said integrating Cerebras is part of a broader compute strategy built around a “resilient portfolio” that matches the right systems to the right workloads. In a quote shared by both OpenAI and Cerebras, OpenAI’s Sachin Katti said Cerebras adds a dedicated low-latency inference solution, which OpenAI expects to support faster responses, more natural interactions, and scaling real-time AI to more people.
Network World reported that OpenAI will use chips designed by Cerebras to run parts of its ChatGPT inference workload and that the commitment involves purchasing up to 750 megawatts of computing capacity over three years, citing a Wall Street Journal report. The same Network World report said the move reflects pressure from large-scale AI services on power availability, networking, and inter-data center connectivity, while OpenAI looks for faster and more cost-efficient alternatives to Nvidia’s dominant GPUs.
Network World also reported that OpenAI has pursued infrastructure diversification in other ways, including work on a custom AI chip with Broadcom and plans to deploy AMD’s latest accelerators. In that report, analysts described a broader industry trend toward more heterogeneous infrastructure strategies rather than relying on one accelerator model for everything.
Business context around Cerebras
TechCrunch reported that Cerebras has been around for over a decade and gained momentum after the launch of ChatGPT in 2022 and the AI boom that followed. The same report said Cerebras filed for an IPO in 2024 but has pushed it back multiple times while continuing to raise large amounts of money. TechCrunch also reported that the company was said to be in talks to raise another billion dollars at a $22 billion valuation, and noted that OpenAI CEO Sam Altman is already an investor and that OpenAI once considered acquiring Cerebras.
What comes next
OpenAI said it will integrate the low-latency capacity into its inference stack in phases and expand it across workloads. Cerebras said the rollout will happen in multiple stages beginning in 2026. OpenAI and Cerebras presented the effort as part of a push to bring faster, “frontier” AI experiences to far more users as real-time inference becomes a bigger focus.
