Microsoft has unveiled its Maia 200 AI accelerator, a custom chip built to make running large AI models faster and more cost-efficient across its Azure cloud. The company is positioning Maia 200 as a breakthrough for AI inference, the stage where trained models answer user prompts and power real-world applications like chatbots and productivity tools.
Maia 200 is designed specifically for large-scale inference rather than training, and Microsoft says it delivers better performance per dollar than the hardware it currently uses in its data centers. It will power services such as Microsoft 365 Copilot, Azure AI Foundry, and models from the Microsoft Superintelligence team, including OpenAI’s latest GPT‑5.2 family.
A custom AI chip focused on inference
Microsoft describes Maia 200 as a “breakthrough inference accelerator” tuned for the heavy workloads of modern reasoning and language models. Unlike some rival chips that are built to handle both training and inference, Microsoft and industry analysts say this design is optimized for the production side of AI, where efficiency and throughput matter most.
The chip is fabricated on Taiwan Semiconductor Manufacturing Company’s 3‑nanometer process and contains more than 140 billion transistors. According to Microsoft, Maia 200 is its most performant first‑party silicon yet and the most efficient inference system it has deployed, delivering 30% better performance per dollar than the latest generation of hardware in its fleet.
Performance numbers and comparisons
Each Maia 200 accelerator can deliver more than 10 petaFLOPS of compute at 4‑bit precision (FP4) and over 5 petaFLOPS at 8‑bit precision (FP8), within a 750‑watt system‑on‑chip power envelope. Microsoft says this is enough for a single chip to run today’s largest AI models while still leaving room for even bigger models in the future.
Microsoft also draws direct comparisons with rival cloud providers’ in‑house AI chips. The company claims Maia 200 delivers three times the FP4 performance of Amazon’s third‑generation Trainium and FP8 performance above Google’s seventh‑generation TPU. Analysts note that Maia 200 uses a more advanced 3‑nanometer manufacturing node than the 5‑nanometer or 7‑nanometer processes used in these competing chips, and say it shows Microsoft is closing earlier gaps in custom silicon.
Microsoft and external commentators emphasize that customers will still need to validate real‑world performance and pricing within Azure before shifting workloads from other vendors, including Nvidia. One analyst also points out that enterprises will want to see how much of Microsoft’s own infrastructure savings from Maia 200 are eventually reflected in cloud subscription costs.
Memory, networking and system design
Maia 200’s architecture centers on feeding data to AI models as efficiently as possible, not just pushing raw compute power. Each chip includes 216GB of high‑bandwidth HBM3e memory delivering 7 TB/s of bandwidth, along with 272MB of on‑chip SRAM and specialized data‑movement engines to keep large models highly utilized.
Microsoft redesigned the memory subsystem around low‑precision data types, a dedicated direct memory access engine, on‑die SRAM, and a custom network‑on‑chip fabric to move data quickly and increase token throughput during inference. At the system level, each accelerator exposes 2.8 TB/s of bidirectional scale‑up bandwidth and connects into a two‑tier network built on standard Ethernet rather than proprietary fabrics.
Clusters can scale up to 6,144 Maia 200 accelerators, using the same communication protocols within trays, racks, and across the data center. Within each tray, four accelerators are directly linked to keep high‑bandwidth communication local, which Microsoft says helps reduce power use and total cost of ownership for dense inference deployments.
Role in Azure and Microsoft AI services
Maia 200 is already deployed in Microsoft’s US Central data center region near Des Moines, Iowa, and is rolling out next to the US West 3 region near Phoenix, Arizona, with more regions planned later. The chip is integrated with Azure as part of a heterogeneous AI infrastructure that also includes other types of accelerators.
Microsoft says Maia 200 will run multiple models, including OpenAI’s GPT‑5.2 family, and will support workloads for Azure AI Foundry and Microsoft 365 Copilot. The Microsoft Superintelligence team plans to use the chip for reinforcement learning and synthetic data generation, which are key for improving future in‑house AI models and speeding up the creation of domain‑specific training data.
Industry analysts argue that Microsoft’s long experience with enterprise IT gives it an advantage in embedding Maia‑based inference services directly into the broader Azure platform. Commentators also stress that Microsoft’s strategy is to complement, not outright replace, other vendors like Nvidia and AMD, while offering customers more options for high‑throughput, memory‑intensive AI inference.
Developer tools and future roadmap
To encourage early adoption, Microsoft is offering a preview of the Maia software development kit. The SDK supports popular AI frameworks, including PyTorch, and includes a Triton compiler, an optimized kernel library, access to Maia’s low‑level NPL programming language, a simulator, and a cost calculator for tuning workloads.
Microsoft says its silicon program used a sophisticated pre‑silicon environment to model large language model workloads and validate networking and cooling systems, including a second‑generation closed‑loop liquid cooling unit. According to the company, this approach allowed AI models to run on Maia 200 within days of first silicon and cut the time from first chip to data center deployment by more than half compared with earlier infrastructure programs.
The company describes Maia as a multi‑generation accelerator family and says it is already designing future versions while it deploys Maia 200 across its global infrastructure. As these chips scale out, Microsoft expects continued improvements in performance per dollar and per watt for its most important AI workloads in Azure.
