SC500: Microsoft Now Has the Third Fastest Computer in the World
A supercomputer powering Microsoft‘s AI computing has unexpectedly been listed as the third-fastest supercomputer in the world.
The surprise ranking was part of the latest Top500 list, which, twice a year, rates the fastest supercomputers in the world. The listings typically include government-funded computers in national labs that are used for scientific research tied to national security and interests. Participation in the list is voluntary.
The latest Top500 list was released ahead of the Supercomputing 2023 show being held in Denver, Colorado. The bi-annual list is released every June and November.
The Azure supercomputer, called Eagle, delivered 561 petaflops of performance. The system has a total of 1.12 million computing cores, including Nvidia’s H100 GPU and Intel’s Xeon Platinum 8480C CPU cores.
Eagle was no match for the top-ranked Frontier, which is the only system on Top500 to clear the exaflop performance mark.
Frontier, which is installed at Oak Ridge National Laboratory, retained its top rank from the most recent Top500 listing in June. Frontier has AMD’s hardware — a 3rd Generation Epyc CPU and the Instinct MI250X.
The second-fastest computer in the world was the first-timer Aurora, which recently became operational at Argonne National Laboratory. The system delivered 585.34 petaflops of horsepower.
Aurora’s performance was disappointing as the system was supposed to top the list with performance exceeding 2 exaflops. Top500 organizers said that Aurora may ultimately take the top spot as benchmarking wasn’t complete.
Microsoft’s Eagle rose after the company scrambled to raise computing capacity after the integration of AI into its applications. The company was caught flat-footed and has built its AI and high-performance computing capacity around Nvidia GPUs.
“This is the highest rank a cloud system has ever achieved on the TOP500. In fact, it was only 2 years ago that a previous Azure system was the first cloud system ever to enter the TOP10 at spot No. 10,” Top500 organizers said in a statement.
Microsoft relies heavily on Nvidia as it doesn’t have its own AI hardware. By comparison, rivals Amazon and Google have homegrown AI chips in their own cloud infrastructure.
Google may have a top-10 system with its A3 supercomputer, which can host up to 26,000 Nvidia H100 GPUs. But the company has not advertised its overall performance.
The Eagle system has Infiniband networking, and the GPUs can bypass CPUs and communicate with each other directly. The direct communication also helps create a larger memory pool to run AI training and inference.
Supercomputing 2023 this year is true marvel. The people are passionate, excited, fantastic. Record 13,823 attendees, 400 remote. An anti-CES.
— Agam Shah (@agamsh) November 14, 2023
The Eagle system is reserved for users willing to shell out hefty premiums to run applications and train developers.
The Azure cloud has other Nvidia GPUs, including the older V100 and A100 GPUs. Queries on Bing are typically routed to even older GPUs or CPUs.
Eagle knocked out Japan’s Fugaku supercomputer — which benchmarked at 442.01 petaflops — down to the fourth-fastest supercomputer. Rounding out the top 5 was EuroHPC’s LUMI supercomputer, which delivered 379.70 petaflops.
Microsoft may hold on to the number 3 spot for only a limited time as many new supercomputers are going online.
A two-exaflop system called El Capitan, which has AMD CPUs and GPUs, will start operating soon at the Lawrence Livermore National Laboratory.
Jupiter, an exascale supercomputer based on ARM CPUs and Nvidia GPUs, will go online next year at the Jülich Supercomputing Centre in Germany.
Overall, the US had 161 supercomputers on the Top500 list, followed by China with 104, and then Germany and Japan. China has its own exascale computers, but has not submitted them to Top500 given the charged political environment and chip export bans.
Supercomputers are now also being designed to handle AI and high-performance computing workloads. GPUs and other accelerators are now a must-have in such systems. Nvidia leads the market, with AMD and Intel way behind.
Microsoft also claimed record GPT-3 training time on Eagle using the MLPerf benchmarking suite. The system trained a GPT-3 LLM generative model with 175 billion parameters in four minutes. The training was done in 1,344 Azure ND H100 v5 virtual machines with 10,000 Nvidia H100 GPU cores.
Correction: A previous version of this article incorrectly stated that Microsoft’s Azure supercomputer, Eagle, was built for the Oak Ridge National Laboratory.