Nvidia Intros Large Language Model Customization, Services
At its annual GPU Technology Conference (GTC) developer event today, Nvidia is announcing two new cloud services based on Large Language Models (LLM) technology. One service lets users customize pre-trained LLMs for their own specific use cases and another caters to biomedical research using trained protein models. These new services are built on Nvidia’s NeMo Megatron framework, now entering a public beta phase. Nvidia had announced vast improvements to Nemo Megatron less than two months ago.
Must read: Nvidia Shaves up to 30% off Large Language Model Training Times
The New Stack was briefed by Paresh Kharya, Nvidia’s Senior Director of Product Management and Marketing. Kharya explained that LLMs are based on the transformer architecture, invented by Google. That architecture is based on the premise that “AI can understand which parts of a sentence or which parts of an image, or even very disparate data points, are relevant to each other.” Kharya also said transformers can even train on unlabeled data sets, which expands the volume of data on which they can be trained.
It turns out that even fully-trained LLMs can be used for a range of use cases (including those beyond language learning), as long as their massive foundation training is augmented with some additional special training, on a customer’s own data. Using a new technique called “prompt learning,” LLMs can simply be exposed to a small volume of example data — as little as a few hundred specimens — and the LLM can then be used for the customer’s scenario. The training generates a “prompt token,” effectively a companion model that provides context, which is then combined with the foundation model to deliver higher accuracy for that customer-specific use case.
Nvidia’s new NeMo LLM Service will allow exactly that. Users submit their data to the model, then use the prompt token-customized LLM for their own applications. Nvidia says the prompt training times range from minutes to hours, a trivial duration compared to the weeks-to-months training times required for the LLMs themselves. Beyond prompt learning, the cloud service will also allow its LLMs to be used for inference directly.
Another service, called BioNeMo, geared to “digital biology,” facilitates the acceleration of drug discovery for pharma and biotech companies. It supports protein, DNA and biochemical data, providing ready access to four open source protein models, namely EFM-1 (created by Facebook parent company Meta, and retrained by Nvidia), OpenFold, MegaMolBART and ProtT5 (developed in a collaboration led by the Technical University of Munich’s RostLab and including Nvidia).
Early Access and Developer Playground
Users of these cloud services and APIs gain access to massive LLMs, including Megatron 530B (so named because it has 530 billion training parameters) without needing possession of the model or any GPU hardware, be it on-premises or in the cloud. Instead, it’s all managed by Nvidia. Developers need only make the right API calls.
The two services will go into early access next month. And during the early access period, their use will be free. Developers who are interested can apply to Nvidia to be part of the early access program (although the company provided no link for, or details around, the application process). Nvidia will even provide a “playground” for no-code experimentation and interaction with the models.
Chief LLM Enthusiast
Nvidia Founder and CEO Jensen Huang is pretty jazzed about these new services. “Large language models hold the potential to transform every industry,” Huang said. “The ability to tune foundation models puts the power of LLMs within reach of millions of developers who can now create language services and power scientific discoveries without needing to build a massive model from scratch.”
Enabling the massive models as a service (MMaaS?) is a smart and logical move for the leader in GPUs, on which those models can best be trained. In fact, Nvidia is also announcing at GTC that its new H100 GPUs, which have transformer engines built in and will significantly accelerate LLM training, are now in full production.
All of this, especially early access to the NeMo cloud services, provides a cool opportunity for developers to work with the LLMs, with very little barrier to entry.