Amazon Web Services (AWS) has partnered with artificial intelligence chipmaker Cerebras Systems to deploy the world’s largest AI processor in its cloud data centers . The multi-year agreement, announced on Friday, will make Cerebras’ Wafer-Scale Engine 3 (WSE-3) chips available to developers via the Amazon Bedrock managed service in the coming months . By combining Cerebras hardware with Amazon’s custom Trainium processors, the companies expect to increase the speed at which AI models generate output by a factor of five .
The collaboration introduces a disaggregated architecture designed to tackle the distinct computational challenges of AI inference . Inference, the stage where trained models generate responses to user prompts, is divided into two main phases known as prefill and decode .
During the prefill stage, a user’s prompt is broken down into smaller data tokens, which is a computationally intensive and naturally parallel process . The decode phase follows, generating the model’s response sequentially, one token at a time . Decoding is less demanding on raw computation but requires massive memory bandwidth to constantly move data between logic circuits and memory .
Traditionally, a single chip handles both phases of this process . However, the AWS and Cerebras partnership splits the workload between specialized hardware . Amazon’s proprietary Trainium chips will handle the prefill stage, while the Cerebras WSE-3 processors will take over the decode phase . The two systems will be linked using Amazon’s Elastic Fabric Adapter (EFA), a custom network device that bypasses the host server’s operating system to accelerate connections and prevent network congestion .
David Brown, Vice President of Compute and Machine Learning Services at AWS, highlighted that speed remains a critical bottleneck for demanding workloads like real-time coding assistance and interactive applications . By separating the workload across Trainium and Cerebras systems, each chip can perform the specific tasks it handles best . This approach is expected to deliver inference speeds an order of magnitude faster than current cloud offerings .
Cerebras Systems Founder and Chief Executive Officer Andrew Feldman stated that the disaggregated inference solution will bring blisteringly fast AI performance to a global customer base within their existing AWS environments .
The Massive Scale of the WSE-3 Processor
Cerebras has gained industry attention for its unconventional approach to semiconductor manufacturing . While traditional methods involve cutting a silicon wafer into numerous smaller chips, Cerebras uses an entire wafer to build a single massive processor .
The WSE-3 chip features approximately four trillion transistors and 900,000 AI-optimized cores . It also includes 44 gigabytes of on-chip memory . Cerebras packages this processor within a water-cooled system known as the CS-3, an appliance roughly the size of a mini-fridge that houses the WSE-3 alongside external memory and networking equipment .
This massive scale provides the WSE-3 with 27 petabytes per second of internal memory bandwidth . According to the company, this bandwidth is more than 200 times greater than what is offered by Nvidia’s NVLink interconnect technology . The immense data movement capabilities make the WSE-3 highly optimized for the demanding memory requirements of the decode phase in AI inference .
Through Amazon Bedrock, customers will be able to utilize this hardware without managing the physical infrastructure directly . The service will support popular open-source large language models as well as Amazon’s proprietary generative AI systems, including the Nova model family .
Rising Competition in AI Hardware
The AWS and Cerebras partnership underscores the intensifying battle for dominance in the AI hardware market . Currently, Nvidia and its graphics processing unit (GPU) accelerators hold a commanding market share . The explosive adoption of generative AI has led to surging demand for these chips, prompting major cloud providers to seek alternative architectures and develop custom silicon .
Google relies on its proprietary Tensor Processing Units (TPUs) to power AI models across its ecosystem . Microsoft recently introduced its Maia AI accelerator and Cobalt central processing units . Similarly, Meta Platforms has deployed its custom Meta Training and Inference Accelerator (MTIA) chips for workloads on Facebook and Instagram .
For Cerebras, the AWS collaboration follows significant business momentum . The startup recently secured a computing infrastructure deal with OpenAI, agreeing to supply 750 megawatts of computing capacity through 2028 . This agreement, reportedly worth over $10 billion, arrived between two funding rounds that raised more than $2 billion for Cerebras . The company is reportedly preparing for an initial public offering as soon as the second quarter, and these high-profile cloud partnerships could bolster investor confidence ahead of the listing .
