Microsoft has introduced the Maia 200, a custom-designed AI inference accelerator aimed at significantly improving the performance and efficiency of large-scale AI workloads across its cloud ecosystem. Built on TSMC’s advanced 3-nanometer process, Maia 200 is optimized to handle demanding inference tasks for cutting-edge models, including GPT-5.2 and Microsoft Copilot.
![]()
A key highlight of the chip is its massive 216 GB of HBM3e high-bandwidth memory, enabling faster data access and smoother handling of complex, data-intensive models. The accelerator supports FP8 and FP4 tensor cores and delivers more than 10 petaflops of compute performance at 4-bit precision. According to Microsoft, this translates into a roughly 30 percent improvement in performance per dollar compared with its previous AI inference hardware, helping lower operating costs at scale.
Maia 200 has been designed specifically for hyperscale data center environments, with early deployments already underway in select Azure data centers. Microsoft plans to expand availability across its global cloud infrastructure in the coming months. The company also announced that a dedicated software development kit will be released to help developers optimize and deploy workloads on Maia-based systems.
With Maia 200, Microsoft is reinforcing its strategy of building in-house silicon to tightly integrate hardware, software, and AI services, while reducing reliance on third-party accelerators and strengthening Azure’s competitiveness in the rapidly evolving AI infrastructure market.
