Where are you using WebAssembly?
Wasm promises to let developers build once and run anywhere. Are you using it yet?
At work, for production apps
At work, but not for production apps
I don’t use WebAssembly but expect to when the technology matures
I have no plans to use WebAssembly
No plans and I get mad whenever I see the buzzword

Nvidia GPU Dominance at a Crossroads

Thanks to the surprise success of ChatGPT, Nvidia enjoys a lead in the market for GPUs and AI accelerators, but rivals such as AMD, Google, Meta and others are rapidly catching up with their own technologies.
Dec 12th, 2023 3:00am by
Featued image for: Nvidia GPU Dominance at a Crossroads
Nvidia CEO Jensen Huang at AWS re:Invent last month. AWS will be the first cloud service to offer the next-gen Nvidia GH200 to it customers.

When ChatGPT was released a year ago, every chip maker, except for Nvidia, was caught sleeping. Nvidia’s GPUs powered the initial surge of users for the AI chatbot.

The AI rush resulted in an overwhelming demand for Nvidia’s technology. A long line included Elon Musk, who had to wait for the GPUs to improve Tesla’s AI backend for its self-driving cars. Even large cloud providers had to wait as long as six months to receive their orders of GPUs.

But a year on, chip makers have woken up with their own AI chips, which are being marketed as alternatives to Nvidia’s latest H100 GPUs.

The AI chips, such as AMD’s MI300X and Google’s TPU v5p, which were introduced this month, will not have the long waits. AMD’s chip will be cheaper, but the AI software support for the GPU is still a work-in-progress.

Nvidia knows the competition is coming and has taken steps to maintain its dominance. The first, and perhaps most significant, is the acceleration of the GPU product release cycle.

Nvidia‘s Strategy for Dominance

Nvidia will now release a new GPU every year, a step up from its previous two-year release cycle. The company’s H100 GPUs are still in short supply, but last month announced the H200 GPU, which delivers the same performance but has more memory capacity. Larger memory can store more data as AI jobs get longer and queries get more complex.

Nvidia H200 (Nvidia)

Nvidia’s former enemies, bitcoin miners, are now turning into allies. Cryptocurrency hunters are shying away from mining and turning data centers into AI computing centers. The miners will provide computing capacity on Nvidia H100 GPUs at significantly cheaper prices than conventional cloud providers.

Crusoe Energy is borrowing $200 million to acquire 20,000 H100 GPUs. The GPUs will become available to customers in the first quarter of next year. Crusoe used the GPUs as collateral to secure financing.

Nonprofit AI cloud provider Voltage Park, founded by blockchain billionaire Jed McCaleb, acquired 24,000 Nvidia H100 GPUs, which were ordered in April 2023. Voltage Park aims to be like eBay, with the highest bidder receiving AI computing time on the H100.

Mining company Bit Digital has acquired a fleet of H100 GPUs that will be deployed in a data center, the company said in a filing with the Securities and Exchange Commission. The company has also signed a multimillion-dollar contract for three years with a customer committed to using the GPUs.

Nvidia Topping Export Restrictions

The China market is important to Nvidia, and the U.S. government has put the company under the microscope.

The U.S. government has twice created restrictions to restrict the sales of some of Nvidia’s top GPUs to China, but Nvidia creatively changed the specifications of the chip that allowed sales to China.

One such chip, the H800 GPU, which is a China market variant of the H100, has been withdrawn. The U.S. government announced restrictions in October that banned the sale of H800, and server makers stopped sales shortly after. Lenovo pulled the GPU from its China server products on Oct. 31.

Nvidia has announced H20, a variant of its fastest GPU, for the China market but has delayed its release.

However, the chip maker is also facing competition from local AI chip makers. Chinese firms Huawei and Biren Technology have developed GPUs.

AMD‘s GPU Is Faster than Nvidias Fastest

AMD was sleeping when ChatGPT was released. Its GPU wasn’t ready for AI, its software stack was broken, and AI wasn’t featured on its long-term roadmap.

But the company knows how to catch up quickly.

AMD this week launched its new MI300X, which the company CEO Lisa Su claimed was the world’s fastest AI accelerator. The GPU has more memory capacity and comparable throughput to an Nvidia GPU.

The AMD MI300X Accelerator.

AMD has claimed better raw AI and computing performance. But those metrics are highly reliant on the foundational model, algorithms, software tools, compilers, and other variables. Nvidia has a stronger software stack with CUDA and has better customization of AI applications to its chips.

Outside of performance, most AI chip makers outside Nvidia are still trying to prove the viability of their silicon. But AMD is eroding that dominance by scoring major customers, including Microsoft, Meta, and Oracle, which are putting MI300X in their data centers.

Microsoft’s infrastructure is heavily reliant on Nvidia hardware, and GPT-4 is now loaded on the MI300X. Microsoft also announced a preview of an MI300X virtual machine, and Oracle is testing the MI300X in its cloud services.

Meta plugged in MI300X into an OCP-compliant server, which was one of the fastest server deployments in Meta’s history, said Ajit Mathews, senior director of engineering at Meta, in an on-stage appearance.

AMD CEO Lisa Su said there was space for many AI chip makers. “We’re now expecting that the data center accelerator TAM will grow more than 70% annually over the next four years to over $400 billion in 2027,” Su said.

Google’s TPU v5p 

Google’s AI chips, called TPUs, have been around for a decade but were not very accessible and mostly used internally. Google last week released the TPU v5p, which is its first AI chip for training with mass availability.

The chip’s release coincided with the launch of Gemini, which is Google’s next-generation large-language model. Google also announced a new type of supercomputer called Hypercomputer, which connects the conventional cloud-based consumption model with a supercomputing infrastructure.

The TPU v5p chips are limited to running AI workloads, while GPUs are designed to run general-purpose workloads. The TPU v5p chips are only available through Google Cloud but it easily accessible. A new feature called Dynamic Work Scheduling guarantees flexible on-demand or scheduled availability.

Source: Google

The current version of Gemini was trained on the TPU v4 and TPU v5e chips. The TPU v5e is designed more for inferencing, while the TPU v5p is a beefier variant that can handle training.

The Hypercomputer server system bunches together 8,960 Cloud TPU v5p chips in a pod. The pods are interconnected via optical circuit switches (OCS), which provide a throughput of 4,800 Gbps. The Hypercomputer supports GPUs, but the optical interconnect, which is faster than copper wires, is reserved for the TPUs.

The TPU v5p is a result of hardware-software co-design, Mark Lohmeyer, Google vice president and general manager of compute and machine learning infrastructure, told The New Stack.

“Google is uniquely able to do that because of our depth and research, own in-house models, our ecosystem of partner models, our experience scaling those and applications that served multiple billions of consumers on top of this infrastructure,” Lohmeyer said.

Many AI companies are running on Google infrastructure and will use TPUs v5p chips, Lohmeyer said.

More Competition on Tap

Intel’s best AI bet for now remains its CPUs, which are being designed for inference. The Xeon server chips include extensions such as AMX that speed up inferencing on models such as Llama 2.

Intel’s got multiple AI accelerators but none have caught fire. The general-purpose Ponte Vecchio GPU, which is in the world’s second-fastest supercomputer called Aurora, has found limited adoption. Another AI chip called Gaudi2 shows more promise — it matches the H100 on AI performance in some cases and is being used in an AI supercomputer being built for StabilityAI.

Intel canceled the successor to Ponte Vecchio and revised its roadmap to release its next-generation GPU in 2025. The AI megachip, called Falcon Shores, merges the Gaudi accelerators with Intel’s GPU accelerators.

Group Created with Sketch.
TNS owner Insight Partners is an investor in: Bit.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.