Nvidia Launches AI Guardrails: LLM Turtles All the Way Down
Nvidia has announced a new safety toolkit for AI chatbots called NeMo Guardrails, which acts as a kind of censor for applications built on large language models (LLMs). The software has been released as an open source project.
Jonathan Cohen, VP of Applied Research at Nvidia, spoke about the new software yesterday in a briefing with journalists.
“A guardrail is a guide that helps keep the conversation between a human and an AI on track,” said Cohen.
According to the company, NeMo Guardrails enables developers to set up three kinds of boundaries:
- Topical guardrails to “prevent apps from veering off into undesired areas.” The example Cohen used in the briefing was an employee asking an HR chatbot which employees had adopted children. The guardrails prevented the chatbot from attempting to answer this.
- Safety guardrails is a broad category that includes fact-checking (preventing hallucinations), filtering out unwanted language, and preventing hateful content.
- Security guardrails “restrict apps to making connections only to external third-party applications known to be safe.”
Developers can also create their own custom rules “with a few lines of code.”
NeMo Guardrails can run on a variety of tools that use LLMs. The primary one mentioned in the briefing was LangChain, an open source toolkit used by developers to plug third-party applications into LLMs.
It can also work with LLM-enabled applications such as Zapier.
While NeMo Guardrails can be used separately via GitHub, Nvidia is also integrating it into a couple of their own product offerings. It’s available in the NeMo framework, “which includes everything users need to train and tune language models using a company’s proprietary data.” In addition, Nvidia has made it available as a separate paid-for service.
Fact-Checking Using Other LLMs
Interestingly, the fact-checking mentioned as part of the safety guardrails is done not by a human… but another LLM. Cohen explained that this is because organizations can customize and train an LLM to be a fact-checker on specific data.
“There [are] very general purpose language models,” he said, “but there’s also a lot of value in training a language model with a lot of data on a very specific task, and we have a lot of evidence — and the community has a lot of evidence — that when you fine-tune these models with lots of examples, they actually can perform much better.”
In a technical blog post, Nvidia stated that NeMo Guardrails is built on Colang, a modeling language, and its associated runtime for conversational AI. Cohen described it as “a domain-specific language for describing conversational flows.”
According to Nvidia, interacting with Colang is “like a traditional dialog manager.” You create guardrails by using three key concepts:
- Canonical form (“a simplified paraphrase of an utterance”)
- Flows (“a tree or a graph of interactions between the user and the bot”)
“Colang has this really nice property, that Colang source code reads very much like natural language,” said Cohen, “and so it’s a very easy-to-use tool. It’s very powerful and it lets you essentially script a language model in something that looks almost like English.”
As for how the guardrails are implemented, Cohen explained that it’s a Python module that runs Colang scripts. The runtime “monitors the human speaking and the bot speaking, and tracks the state of the dialogue.”
The key, according to Cohen, is that the runtime is “able to determine whether a guardrail applies or not.” Once again, however, LLMs are used to make this determination.
One thing that’s noticeable about Nvidia’s new guardrails software is that it’s “LLMs all the way down”, to paraphrase the old turtles and software adage.
Cohen defended this by saying, “Why wouldn’t we use large language models? [It’s] such a powerful technology for in-context understanding and generalizing, and this kind of fuzzy inferencing.”
Of course, the heavy reliance on LLMs does make one wonder about the reliability of the system, from a fact-checking and security perspective. But this is surely why it’s being released as open source software — to let the community deal with that can of worms.