Cloud Services / Data / Machine Learning

VMware’s vSphere Gets Direct Access to Nvidia’s AI Frameworks and GPUs

12 Mar 2021 10:09am, by

Direct access to GPU giant Nvidia’s AI Enterprise suite will allow VMware vSphere customers to benefit from a number of Nvidia frameworks and its GPU hardware to scale AI applications and their development across multicloud virtual infrastructures.

With the release of VMware vSphere 7 Update 2, vSphere now supports Nvidia’s AI frameworks, CUDA applications, models and SDKs, under the terms of the licensing agreement between the two companies.

“VMware and NVIDIA have teamed up to drive AI adoption in the enterprise where large-scale deployments with enterprise resiliency depends on a virtualized infrastructure,” Lee Caswell, vice president, cloud platform business unit for VMware, told The New Stack.

Before the release, Caswell said vSphere customers often deployed AI applications on bare metal servers by “where performance was the top priority and limited scale was required.”

“Oftentimes, this bare-metal ‘infrastructure’ was serviced from shadow-IT foxholes without attention to enterprise requirements for resiliency, quality of service, and scale,” Caswell said.

In cloud environments, vSphere customers’ AI deployments were limited to public cloud solutions for on-demand infrastructure, Caswell said.

With the collaboration between VMware and Nvidia, vSphere customers benefit from Nvidia’s AI hardware and computing designs, including its computationally intensive A100 Ampere architecture for parallel computing and other application, in combination with VMware’s hypervisor design. “This performance on offer is virtually identical to bare metal and scales out linearly,” Caswell said.

VMware also worked with Nvidia to add VMware’s “resiliency features,” such as vMotion and distributor resource schedulers (DRS), to applications running on Nvidia GPUs, Caswell said. “With this release, it is possible to manage AI workloads with the same vMotion and DRS enterprise features” used for traditional applications, Caswell said. For scaling requirements, the joint solution allows for the Nvidia GPUs to be time-sliced and shared across nodes with low-latency RDMA connections.

“Data scientists don’t want to worry about infrastructure and enterprises are intent on getting AI systems in production more quickly,” Caswell said. “By working together, we are accelerating AI deployment and improving total cost of ownership with shared resources — all without compromising performance.”

While developers can benefit from less complexity associated with deploying AI applications as described above, data scientists can take advantage of being able to build applications that can scale with the use of Nvidia’s AI Enterprise frameworks. This eliminates AI silos, or as Caswell described, “shadow AI.” “It reduces the risk of pulling together and testing disparate pieces and enables [DevOps teams] to get started quickly,” Caswell said.

Possible usage scenarios for AI-oriented DevOps teams with vSphere 7 for their AI projects that Caswell communicated include:

  • TensorFlow and PyTorch for machine learning.
  • Nvidia Tensor RT, for GPU-optimized deep learning inference, with Nvidia Triton Inference Server to deploy trained AI models at scale.
  • RAPIDS, for data science and analytics pipelines.

A newsletter digest of the week’s most important stories & analyses.