Cloud Services / Containers / Machine Learning

Nvidia AI Suite Runs Workloads in Containers, VMs

26 Jan 2022 5:00am, by

Enterprises can now leverage Nvidia’s AI Enterprise software suite to develop and run artificial intelligence workloads in Kubernetes containers or virtual machines (VMs) on VMware’s vSphere platform.

Nvidia, which for several years has made AI and machine learning central to its growth plans, last year made AI Enterprise available to run on vSphere with Tanzu — VMware’s Kubernetes platform — on a trial basis. Last week, the company said it was available with production support.

“Among the top customer-requested features in Nvidia AI Enterprise 1.1 is production support for running on VMware vSphere with Tanzu, which enables developers to run AI workloads on both containers and virtual machines within their vSphere environments,” John Fanelli, vice president of product for Nvidia, wrote in a blog post. “This new milestone in the AI-ready platform curated by Nvidia and VMware provides an integrated, complete stack of containerized software and hardware optimized for AI, all fully managed by IT.”

Nvidia last year unveiled AI Enterprise, a suite of AI software and frameworks that can run on Nvidia-certified hardware from a range of server makers either in enterprises’ on-premises data centers or as-a-service on bare-metal systems housed in Equinix data centers. The goal is to give developers a turnkey solution that organizations can leverage for their AI workloads.

Nvidia and VMware Grow Partnership

It’s also the latest expansion of the Nvidia-VMware multilevel partnership, which was announced at VMware’s virtual VMworld event in 2020. Nvidia has pushed to expand its AI capabilities through its GPUs, software and systems like the DGX-2. VMware, long a major player in data centers, is aggressively extending its reach into the cloud through its adoption of such technologies as Kubernetes.

Now with AI Enterprise 1.1, developers can take advantage of both containers and VMs via vSphere.

A broad array of both established IT players like IBM, Google and Microsoft and myriad startups are competing in a global AI software market that is expected to grow to $62.5 billion this year, a 21.3% year-over-year jump over 2021, according to Gartner analysts.

Nvidia has done a good job growing its presence in the space over the years, including in developing alliances with the likes of VMware and an expanding number of hardware makers, according to Rob Enderle, principal analyst with The Enderle Group.

“Nvidia started just positioning their GPU technology as a part of the solution but then realized that there needed to be an ecosystem to foster successful deployment … so they developed that ecosystem,” Enderle told The New Stack. “I see Nvidia as the leading driver of successful AI deployments. Most early AI efforts have failed to meet expectations due to many factors like training cost. Nvidia has been attacking those factors aggressively and currently has the most comprehensive set of tools to take AI from concept to deployment.”

Nvidia a Presence in AI

Most significant AI implementations in such markets as autonomous cars, robotics and backend systems, have Nvidia technology involved, the analyst said, adding that Nvidia officials understood early that simulation would be a critical tool for training and developed that technology before others.

With AI Enterprise 1.1, Nvidia is recognizing the growing demand from enterprises to leverage containerized development for AI, which comes with both a number of benefits as well as heightened complexity. It requires orchestration across multiple layers of infrastructure, from AI and analytics software frameworks and hardware to containers and VMs. As a full-stack solution, AI Enterprise is designed to reduce that complexity and make it easier for enterprises to embrace AI, Fanelli said.

A Boost for Developers

It also makes it easier for developers who are creating AI applications, Enderle said. Enterprise products must conform with enterprise policies around such areas as security and auditability so that what is created doesn’t detract from the company’s objectives, he said. It also removes barriers to entry and ensures that the enterprise’s intended result is realized.

“AI Enterprise solutions have built into them critical functions that allow them to perform in an enterprise, so developers don’t have to create these functions themselves, speeding up development and potential sales volume to enterprises and government significantly while also improving the viability of the resulting offerings across all markets,” Enderle said.

Nvidia and VMware are going to continue to ease access to AI Enterprise, with plans to soon add vSphere with Tanzu to GPU-maker’s LaunchPad program, where organizations can test and prototype AI workloads for free and learn how to develop and manage AI workloads. The program is available at nine Equinix data center locations worldwide.

Cisco, Hitachi Vantara Join Server Lineup

Nvidia has a growing roster of hardware makers that offer Nvidia-certified systems to run AI Enterprise, including Dell EMC, Hewlett Packard Enterprise, Lenovo, SuperMicro, Atos and Gigabyte. With the AI Enterprise 1.1 announcement, Nvidia is adding Cisco Systems and Hitachi Vantara to the lineup.

Cisco is rolling out the UCS C240 M6 rack server, a two-socket, 2U system powered by Nvidia’s A100 Tensor Core GPUs that can run a range of storge- and interconnect-intensive workloads, including big data analytics, databases, collaboration and high-performance computing (HPC). The first such system from Hitachi Vantara is the Hitachi Advanced Server DS220, which also includes two sockets and the A100 Tensor Core GPUs.

Jeremy Sawyer, senior director for cloud technology and software for Cisco’s Global Strategic Partner Program, wrote in a blog post that enterprises are migrating to hybrid cloud environments that involve multiple locations on-premises failover infrastructures. At the same time, developers are increasingly adopting cloud native technologies as they shift to containers and use Kubernetes to manage the environments.

“In addition, many of these companies are running compute-intensive, container-based workloads — such as AI training and inferencing, data analytics, and HPC — in their hybrid cloud environments,” Sayer wrote. “To help accelerate these workloads, many of our customers turn to Nvidia GPUs and the Nvidia AI Enterprise software suite.”

Given that, it made sense for Cisco to develop a Nvidia-certified server, he wrote.

Featured image via Pixabay.