AWS Integrates Cerebras AI Chips to Supercharge Cloud Inference

Last updated: 14/03/2026

6 Min Read

Amazon Web Services (AWS) has partnered with artificial intelligence chipmaker Cerebras Systems to deploy the world’s largest AI processor in its cloud data centers . The multi-year agreement, announced on Friday, will make Cerebras’ Wafer-Scale Engine 3 (WSE-3) chips available to developers via the Amazon Bedrock managed service in the coming months . By combining Cerebras hardware with Amazon’s custom Trainium processors, the companies expect to increase the speed at which AI models generate output by a factor of five .

The collaboration introduces a disaggregated architecture designed to tackle the distinct computational challenges of AI inference . Inference, the stage where trained models generate responses to user prompts, is divided into two main phases known as prefill and decode .

During the prefill stage, a user’s prompt is broken down into smaller data tokens, which is a computationally intensive and naturally parallel process . The decode phase follows, generating the model’s response sequentially, one token at a time . Decoding is less demanding on raw computation but requires massive memory bandwidth to constantly move data between logic circuits and memory .

Traditionally, a single chip handles both phases of this process . However, the AWS and Cerebras partnership splits the workload between specialized hardware . Amazon’s proprietary Trainium chips will handle the prefill stage, while the Cerebras WSE-3 processors will take over the decode phase . The two systems will be linked using Amazon’s Elastic Fabric Adapter (EFA), a custom network device that bypasses the host server’s operating system to accelerate connections and prevent network congestion .

David Brown, Vice President of Compute and Machine Learning Services at AWS, highlighted that speed remains a critical bottleneck for demanding workloads like real-time coding assistance and interactive applications . By separating the workload across Trainium and Cerebras systems, each chip can perform the specific tasks it handles best . This approach is expected to deliver inference speeds an order of magnitude faster than current cloud offerings .

Cerebras Systems Founder and Chief Executive Officer Andrew Feldman stated that the disaggregated inference solution will bring blisteringly fast AI performance to a global customer base within their existing AWS environments .

The Massive Scale of the WSE-3 Processor

Cerebras has gained industry attention for its unconventional approach to semiconductor manufacturing . While traditional methods involve cutting a silicon wafer into numerous smaller chips, Cerebras uses an entire wafer to build a single massive processor .

The WSE-3 chip features approximately four trillion transistors and 900,000 AI-optimized cores . It also includes 44 gigabytes of on-chip memory . Cerebras packages this processor within a water-cooled system known as the CS-3, an appliance roughly the size of a mini-fridge that houses the WSE-3 alongside external memory and networking equipment .

This massive scale provides the WSE-3 with 27 petabytes per second of internal memory bandwidth . According to the company, this bandwidth is more than 200 times greater than what is offered by Nvidia’s NVLink interconnect technology . The immense data movement capabilities make the WSE-3 highly optimized for the demanding memory requirements of the decode phase in AI inference .

Through Amazon Bedrock, customers will be able to utilize this hardware without managing the physical infrastructure directly . The service will support popular open-source large language models as well as Amazon’s proprietary generative AI systems, including the Nova model family .

Rising Competition in AI Hardware

The AWS and Cerebras partnership underscores the intensifying battle for dominance in the AI hardware market . Currently, Nvidia and its graphics processing unit (GPU) accelerators hold a commanding market share . The explosive adoption of generative AI has led to surging demand for these chips, prompting major cloud providers to seek alternative architectures and develop custom silicon .

Google relies on its proprietary Tensor Processing Units (TPUs) to power AI models across its ecosystem . Microsoft recently introduced its Maia AI accelerator and Cobalt central processing units . Similarly, Meta Platforms has deployed its custom Meta Training and Inference Accelerator (MTIA) chips for workloads on Facebook and Instagram .

For Cerebras, the AWS collaboration follows significant business momentum . The startup recently secured a computing infrastructure deal with OpenAI, agreeing to supply 750 megawatts of computing capacity through 2028 . This agreement, reportedly worth over $10 billion, arrived between two funding rounds that raised more than $2 billion for Cerebras . The company is reportedly preparing for an initial public offering as soon as the second quarter, and these high-profile cloud partnerships could bolster investor confidence ahead of the listing .

Category

Company

Resources

The Massive Scale of the WSE-3 Processor

Rising Competition in AI Hardware

Leave a Reply Cancel reply

Most Read

ALFA-K tool predicts how cancer evolves, Moffitt says

Triplet Superconductor Discovery Aids Quantum Computing

Samsung Galaxy A55 Unveiled: A Powerful Game-Changer in Mid-Range Smartphones

Claude Tops App Store as Anthropic Defies Pentagon

US-UAE AI Data Campus Deal Stalled Over Security Concerns

Tear Gas Health Effects: Risks and Long-term Impact

About Us

Explore

Useful Links

Subscribe Us