Amazon Web Services (AWS) and artificial intelligence startup Cerebras Systems have struck a major partnership to bring ultra-fast AI inference to the cloud. Announced on Friday, March 13, 2026, the agreement integrates Cerebras’s high-performance WSE-3 chips into AWS data centers to accelerate generative AI and large language model workloads.
This collaboration aims to solve critical speed bottlenecks in AI inference by deploying Cerebras CS-3 systems alongside AWS Trainium-powered servers on the Amazon Bedrock platform. The integration creates a powerful, highly optimized cloud computing environment designed for developers and enterprises that require real-time responses for demanding applications, such as interactive chatbots and coding assistants.
A Disaggregated Architecture for AI Processing
To achieve these performance gains, AWS and Cerebras are introducing a novel “disaggregated architecture” for AI inference workloads. Inference is the process where previously trained AI models take user requests and generate responses. Traditionally, this process is handled by a single computing system, but the new partnership splits the workload into two distinct phases using different hardware optimized for each specific task.
The first phase, known as the “prefill” stage, involves processing a user’s prompt and converting natural language into tokens that the AI system can understand. Under the new architecture, Amazon’s proprietary Trainium custom AI chips will manage this prefill phase.
The second phase is the “decode” stage, where the AI system actually generates and delivers the desired response to the user. The massive Cerebras chips will be solely responsible for this decoding process. The two systems will be interconnected using AWS’s Elastic Fabric Adapter, a high-speed networking technology that allows the hardware to communicate seamlessly.
Cerebras CEO Andrew Feldman referred to this strategy as a “divide and conquer” approach, separating prompt processing from token generation to maximize efficiency across the computing pipeline.
Inside the Cerebras CS-3 Hardware
The hardware driving the decode phase represents a significant departure from standard AI processors. The Cerebras WSE-3 is an extraordinarily powerful chip featuring 900,000 cores and 44 gigabytes of on-chip SRAM. Unlike the primary chips produced by competitors, the Cerebras architecture does not depend on costly external high-bandwidth memory.
Instead, the chips store all model weights on-chip, delivering exceptional speed. These chips are housed within the Cerebras CS-3 appliance, a water-cooled system roughly the size of a mini-fridge. The CS-3 combines the massive chip with external networking equipment and other vital components necessary for data center integration.
By bringing the CS-3 to AWS data centers, the partnership allows organizations to access this unique hardware without the friction of traditional procurement. Customers can leverage the performance benefits of the WSE-3 chip entirely through the cloud.
Executive Perspectives on the Collaboration
Leaders from both companies emphasize that this deal will democratize access to top-tier AI hardware. Feldman noted that every type of customer, from solo developers to the world’s largest financial institutions, utilizes AWS. He stated that this partnership will streamline access to Cerebras hardware, making the powerful technology available with just a simple click.
David Brown, Vice President of Compute and Machine Learning Services at AWS, highlighted the importance of speed in the current AI landscape. He explained that inference is where artificial intelligence delivers actual value to customers, but processing speed remains a critical bottleneck for demanding, real-time workloads.
However, the companies recognize that a disaggregated approach is not a one-size-fits-all solution. James Wang, Director of Product Marketing at Cerebras, explained that the disaggregated architecture is ideal for large, stable workloads. Because most customers run a mix of tasks with varying prefill and decode ratios, the traditional aggregated computing approach remains ideal for many scenarios. Consequently, the companies expect most customers will want access to both architectures depending on their specific needs.
Future Rollouts and Cloud Expansion
While the financial terms of the agreement between Amazon and the $23.1 billion chip startup were not disclosed, the deployment is expected to roll out rapidly. Later this year, AWS plans to make leading open-source large language models, as well as its proprietary Amazon Nova models, available to run on the Cerebras hardware through the Amazon Bedrock service.
The partnership arrives during a period of massive infrastructure expansion for Amazon. In its fourth-quarter earnings report, Amazon announced plans for $200 billion in capital expenditures for 2026, the vast majority of which is dedicated to expanding AWS capacity. Furthermore, the Cerebras deal coincides with a reported 11-part, $37 billion bond sale by Amazon, which is specifically aimed at funding its ongoing artificial intelligence infrastructure buildout.
Ultimately, this alliance between a dominant cloud provider and an innovative chipmaker aims to set a new standard for efficient and scalable AI inference, driving significant advancements across the tech industry.
