Nvidia Hones in on Apple-Like Approach to AI with CUDA
“The iPhone moment of AI has started,” said Nvidia CEO Jensen Huang during a keynote at the company’s GTC conference this week.
He was referring to OpenAI’s ChatGPT, which has attracted millions of users in just over half a year. User queries are processed on Nvidia’s GPUs, which spit out responses to users.
Just like Apple, which owns its hardware and software stack, Nvidia is using its proprietary hardware and software tools to lock up developers into its ecosystem. To that effect, Nvidia at its GPU Technology Conference announced new workflows to develop applications like ChatGPT with large language models.
To be sure, Nvidia, like Apple, has the best hardware and software stack for artificial intelligence. But once developers are locked into Nvidia’s proprietary ecosystem, it could be expensive, and getting out could be hard.
But Nvidia dominates AI and has a leg up on bringing coders to its side. Nvidia’s GPUs perform best when coded in the chip maker’s CUDA parallel programming framework, which compiles code and dispatches work and data to GPUs.
Huang highlighted the ongoing shift toward generative AI, which still needed human touch. Tools like ChatGPT can automate coding, but there are new considerations that developers need to account for such as tuning code to hardware accelerators to deliver the fastest results.
Conventional coding relied on parallelism on the CPU, which has reached a wall in performance. With AI, the software needs to talk to specialized accelerators like GPUs, which take in queries, weigh various parameters, and spits out the best possible answers.
Open Sourcing CUDA
Nvidia is open sourcing CUDA libraries that make it easier to move workloads to the programming framework. Developers can modify those libraries into their applications as they see fit, which gives them an easier on ramp into CUDA and GPU acceleration.
“Accelerated computing is not easy. It requires full stack invention — from chips, systems, networking acceleration libraries, to refactoring the applications,” Huang said during the keynote.
ChatGPT at times was unavailable to users because the servers reached peak capacity. The sudden interest in generative AI has created a shortage of hardware on which to run algorithms. The first dibs on AI hardware goes to cloud native companies like Facebook, Google, and Microsoft, which are designing data centers to handle AI applications.
There’s a fundamental coding problem as ISO C++ does not have native parallelism. Programmers use frameworks like Nvidia’s CUDA, which can recompile code to harness the computing power of GPUs.
CUDA provides libraries and frameworks that include highly tuned math libraries, core libraries for data structures and algorithms, and communication libraries to scale up applications. CUDA supports C++, Fortran, and Python, and works with software libraries such as TensorFlow or PyTorch.
“It’s not only the languages which are supported by [CUDA] directly, there are literally dozens of other languages which are created by other companies and groups, which also compile on to and run on the GPU,” said Stephen Jones, the principal software architect at Nvidia, at a break-out session about CUDA at the trade show.
CUDA can recompile from many programming languages, but not WebAssembly.
“I think I’ve heard of a couple of academic projects where people are looking at it, even. I think it’s one of many directions multinode systems are going: there’s not really a one-size-fits-all. The universality of web assembly is very attractive though,” said an Nvidia moderator during the CUDA session.
This also includes recompiling code to work on quantum computers, which can be simulated on Nvidia hardware. Programmers can take regular code, recompile it in CUDA, and see how it runs on surrogate quantum computer environments being simulated on GPUs.
But the GTC focus was squarely on AI and Nvidia’s GPUs. Only four years ago, the in-person GTC conference had 8,000 attendees, and this year there are about 250,000 developers tuned into the virtual conference, Huang said.
“Generative AI is a new kind of computer… everyone can direct a computer to solve problems. This was a domain only for computer programmers. Now, everyone is a programmer,” Huang said during the keynote.
That may sound like bad news for coders, but Nvidia executives at the show said that accelerated computing will help developers move to a new era of probabilistic computing that involves reasoning and predicting results.
Application development is transitioning to creating AI models, said Manuvir Das, vice president of enterprise computing, during a press conference.
Nvidia announced pre-packaged AI models called Foundations so coders and data scientists can develop their own chatbots and image and video generators.
One service called NeMo will allow the development of a ChatGPT-like experience where the AI can generate summaries, seek out market intelligence, or respond to questions. Another module called Picasso is for generating images, videos, or 3d models, and BioNeMo is for protein structures and other biotech applications.
“Customers can bring their model or start with the NeMo pre-trained language models ranging from GPT-8, GPT-43 and GPT-530 billion parameters throughout the entire process,” Das said.
Each model can be connected to proprietary datasets and can be improved over time as more data is added. There are also guardrails to prevent the AI from getting too emotional or responding with undesirable answers, which happened to Microsoft’s Bing AI and Google’s Bard.
Developers can access the models through APIs. Each modality includes tuned inference engines, frameworks for data processing, and vector databases.
No pricing was provided for access to the Foundation models, but judging by other Nvidia hardware and software announcements, it may require a sizeable investment.
The NeMo and Picasso services are accessible through Nvidia’s DGX Cloud hardware, which was also announced at the show.
The DGX Cloud provides access to its latest GPUs and AI Enterprise software toolkit in the cloud starting at $37,000 a month. That is about double the price of Azure’s Nvidia GPU instances, which max out at around $20,000 per month. The DGX Cloud service will be available to customers through public cloud providers, which include Azure and Oracle.
Investments in AI could help lower costs, said Justin Boitano, vice president of EGX computing at Nvidia.
“Ultimately, if you can get the business outcome you need at a lower cost that usually frees up investment into new areas,” Boitano said.
Nvidia also launched other tools, including CvCUDA, to do video processing for faster multimedia delivery to smartphones and other devices. CvCUDA provides 30 operators that have Python and C++ bindings that make it use accelerated computing for image warping, video editing, and processing of images.
“What we wanted to do was look at the bottlenecks that are adjacent to the AI in TensorFlow and Pytorch and make sure that there was very efficient video pre-processing and post-processing,” Boitano said.
CvCUDA is available in a GitHub repository that has been put out as source code. The company will engage with developers that want to build on that library.
“The openness of it basically allows people also to then ultimately fork it if they want, they can create their own copy and enhancements to it. And then they have a license to run it in production,” Boitano said.
Another CUDA tool called CuOpt solves what Boitano called a “traveling salesman” problem of optimizing the best routes.
Developers can pull data from ESRI’s ArcGIS cloud service and build cost models around routes. Companies can optimize depots to pick up from, the number of cars needed, the location of stores, and optimize routes for the shortest time and lowest cost.
“We will also look to making an API available for people that want to consume it just as a service and not stand up the infrastructure in their data centers,” Boitano said.
Microsoft is also betting its software and cloud future on Nvidia hardware. The software giant will deploy Nvidia’s OVX-2 server to put metaverse applications into Microsoft Office 365 applications. Microsoft’s Bing with AI runs on Nvidia’s A100 GPUs, and the company is building a supercomputer with the latest H100 GPUs based on an architecture called Hopper.