Amazon Web Services has reached a major agreement with semiconductor startup Cerebras Systems to integrate new hardware into its cloud infrastructure. The collaboration focuses on accelerating AI inference, which is the process where trained artificial intelligence models respond to user requests. By combining computing power from both companies, Amazon aims to speed up operations for chatbots, coding assistants, and other interactive tools.
The new cloud computing service is scheduled to launch in the second half of 2026. While the specific financial terms of the agreement remain undisclosed, the companies have been laying the groundwork for this integration for several years. The partnership marks a significant milestone, as Amazon Web Services becomes the first major cloud provider, or hyperscaler, to officially commit to offering Cerebras technology to its vast network of customers.
A Divided Approach to Faster Processing
To achieve faster response times, the two companies are utilizing a method known as inference disaggregation. Instead of relying on a single piece of hardware to manage the entire workload, Amazon and Cerebras will split the computation into two distinct stages. Company leadership describes this workflow as a divide-and-conquer strategy designed to overcome traditional processing delays.
When a user submits a prompt, the artificial intelligence model must first understand the request. This initial stage is called the prefill phase, where human words are converted into data tokens that the computer can process. Under the new system architecture, Amazon’s proprietary Trainium3 chips will exclusively handle these highly parallel prefill calculations.
Once the prefill phase is complete, the workload moves to the decode stage. During this second step, the artificial intelligence actually generates and delivers the requested answer token by token. Cerebras’ massive Wafer Scale Engine processors, which are optimized for rapid token generation, will take over to complete the decode phase. By dedicating specialized chips to different parts of the AI inference process, the companies expect to drastically reduce latency for tasks that require immediate, iterative feedback.
Integrating Hardware Inside the Cloud
As part of the hardware arrangement, massive processors from the startup will be physically installed inside Amazon Web Services data centers. The third-party processors will be directly linked to Amazon’s custom Trainium3 hardware using the cloud provider’s proprietary networking technology. This deep physical and digital integration ensures that the divided workload can move seamlessly between the two different types of silicon without unnecessary communication delays.
Nafea Bshara, a vice president at Amazon Web Services, noted that the integrated chip solution is particularly valuable for customers working in scenarios where time is money. He also indicated that the cloud provider plans to deploy as many of the startup’s chips as necessary to meet overall market demand.
For Cerebras, gaining a footprint within the world’s largest cloud computing platform offers immense visibility. Chief Executive Officer Andrew Feldman emphasized the vast reach of the cloud provider, noting that the customer base ranges from individual independent developers to massive global financial institutions. By embedding their hardware directly into this existing ecosystem, the startup hopes to make accessing its specialized computing power as simple as a single click for users around the world.
Challenging the Market Leader
The partnership arrives as technology companies scramble to build enough infrastructure to support the surging demand for artificial intelligence capabilities. Cerebras, which is currently valued at $23.1 billion, is positioning its technology as a unique alternative to traditional hardware. The company is actively preparing for an initial public offering and seeks to capture a larger share of the enterprise market.
Unlike the flagship processors sold by market leader Nvidia, the startup has engineered a fundamentally different architecture. The company relies on exceptionally large chips that can process massive volumes of data simultaneously, eliminating the need for the expensive high-bandwidth memory that typical graphics processing units require. In addition to this new cloud partnership, the startup recently secured a $10 billion contract to supply hardware to OpenAI, the creator of ChatGPT.
While Amazon remains a major purchaser of Nvidia hardware, it continues to invest heavily in developing its own custom silicon to improve data center efficiency and offer distinct services. By bringing a new, highly capitalized hardware partner into its data centers, the cloud giant is expanding the options available to artificial intelligence developers. The collaboration ultimately gives enterprise customers a new, highly specialized avenue for running complex models at high speeds.
