AI21 Labs Bets on Accuracy, Develops Approach for Factual AI

ChatGPT is impressive, but it’s missing a vital component. That’s according to Ehud Karpas, a squad director at AI21 Labs, which develops generative AI for text.
“ChatGPT is amazing. It’s impressive. It does things that are really mind-blowing,” Karpas told The New Stack. “I think I should say this: A good text needs to be fluent, and it needs to be engaging. But I don’t think that’s the whole story. I think it also needs to be reliable.”
That’s something that isn’t entirely guaranteed with ChatGPT, which ended its training in early 2022. It’s a known issue that the chatbot can veer into the realm of inaccuracy.
Prioritizing Accuracy
AI21 Labs took a different approach with the launch last week of Wordtune editor’s 12 Spices — essentially, “spices” are AI-driven filters that help writers with everything from building an argument to making a relevant joke.
“We put a very strong emphasis in Wordtune Spices on being reliable, being factually accurate,” Karpas, who led the project, said. “That was one of the two key pillars in the project: capturing the writer’s intent and being factually accurate.”
To that end, three of the 12 Spices are facts: statistical facts, historical facts and nature facts. But the AI does two things to help writers ensure their work is correct: First, it sources by providing a link back to the original material; second, it conducts a search to ensure the information is up to date, Karpas explained.
“If the model generates a statistical fact — 42% of people do this and that — then it sounds very convincing. It might be complete nonsense,” Karpas said. “So we have a link, you can click on the link, and you can see where the information came from.”

Image via A121 Labs
The end result is a chatbot that’s more of a co-writer, helping to prompt the writer with facts, inspirational quotes, and even jokes — depending on which Spice the writer selects — to move the piece forward.
No ‘One Language Model to Rule’
One key to that is not relying on just one language model — which is a fine approach for a cool demo, Karpas quipped — but using a combination of language models, he said, like the company’s Jurassic X.
“We’re under the idea that there is no one model to rule them all, let’s call it — if you want to build good technology, and you have to combine pieces,” he said. “We think that even the best model will have weaknesses, just because it’s one model that’s good in some stuff, but it has flaws. We’re thinking that a wider set of tools brings us more robust capability.”
Last week, AI21 Labs released a research paper on its approach to language models. It noted that text-generative AI often includes factual inaccuracies or errors, which is exacerbated when dealing with uncommon domains or up-to-date information. One way to address that, the paper proposes, is through Retrieval Augmented Language Modeling (RALM) to ground the language model “doing generation by conditioning on repeat documents retrieved from an external knowledge source.”
RALM systems include two high-level components, the paper explained: Document retrieval, or selecting the set of documents on which to condition; and document reading, or determining how to incorporate the selected documents into the Language Model (LM) generation process.
“In this paper, we show that substantial gains can also be made by adapting the document selection mechanism to the task of language modeling, making it possible to achieve many of the benefits of RALM while working with off-the-shelf LMs, even via API access,” the draft stated.
The paper proposes a RALM framework AI21 has dubbed “in-context RALM.” Using this approach, the company was able to see LM performance gains of two to three times in the ALM architecture across all examined text bodies, even when simple off-the-shelf retrievers were used. The paper also identified other methods the company used to improve LAM performance.
Developer Takeaways
While the specific AI driving Spices isn’t accessible by developers, AI21 Labs does offer several developer APIs for its A21 platform, which is used to build AI applications that comprehend and generate natural language, powered by the company’s language models (LMs).
Developers can generate text compilations for an input prompt with the Jurasssic-1 language models using calls to its Complete API, which can be embedded in an application or service, or via their interactive web environment, which will allow developers to experiment with the models. They also offer specialized models for a paraphrasing function via calls to its Rewrite API and a summarizing function via calls to its Summarize API.
Karpas offered a few takeaways for developers from his work with Spices.
“When you build something this wide with such a large scope, you have to be flexible in how you solve each problem — in our case, each of the Spices. So yeah — I would like to have one solution for all 12 Spices; life isn’t all that kind.”
But rather than code 12 different solutions, the team categorized the 12 solutions into four families and used a similar solution to solve them, he said, effectively reducing a daunting problem.
“My approach is: always take the problem, break it down, and then see what you can put together,” he said. “We started worrying with quality first, latency and price later. Get it to work, then get it to work efficiently.”