AI21 Labs Releases Jurassic-2, its New Large Language Model

AI21 Labs, an Israeli generative AI company, today announced its latest large language model (LLM), called Jurassic-2. Up till now, the base model of AI21 Labs has been Jurassic-1, the largest version of which has 178 billion parameters. That made it among the largest LLMs on the market — slightly bigger than OpenAI’s 175B parameter GPT-3 davinci model.
However, when I spoke this week to AI21 Labs co-founder and co-CEO, Ori Goshen, he was reluctant to tell me how large Jurassic-2 is.
LLM size “plays a factor, but it’s not the only factor,” said Goshen. “So we’ve stopped referring to the size, because it can be misleading about the actual performance of the model.”
In its press release, AI21 Labs claims that “Jurassic-2 offers a more advanced baseline model, making it one of the most advanced large language models available on the market.” But this doesn’t tell us much, because every generative AI company claims it has the best models — certainly Cohere told me that when I interviewed them. So I asked Goshen how AI21’s models differ from the likes of OpenAI, Cohere and Anthropic?
“I think we’re heading toward different directions,” he replied. AI21 is focusing on “reading and writing related use cases,” he said, so it isn’t concerned with using generative AI in coding or indeed any other conversational use case.
Also, he noted that LLMs — amazing though they currently are — have so far been prone to mistakes and a kind of brittleness when used by the public. To try and solve this, AI21 has added software modules to “augment and complement it [the LLM] with an additional approach that compensates for the lack of reliability and explainability.”
I asked what he meant by this — does AI21 have special algorithms it uses to further process LLM output?
“So, there’s a lot of algorithmic work and there’s a lot that will be published in the coming months about the complementary approach,” he replied. He pointed me to a paper that was released in May of last year, entitled “MRKL Systems.” The acronym MRKL stands for “Modular Reasoning, Knowledge and Language” and is a system that combines one or more LLMs with “external knowledge sources as well as symbolic reasoning experts that can handle tasks that lie beyond the reach of neural models.”
As Goshen explains it, MRKL (pronounced “miracle”) “proposes an architecture called the ‘neuro-symbolic architecture,’ where the language model offloads some of the tasks to modules that are specialized to conduct a specific task.” He gave an example of arithmetic, which can be more efficiently calculated by a special software module — rather than having an LLM essentially “predict” the answer based on statistical probabilities.
Goshen thinks adding these modules is how you “compensate for the brittleness of the pure language models.”
Jurassic-2, which is being released today, is AI21’s latest base model, but Goshen said the company will also “build more neuro-symbolic capabilities on top of it.”
I should also note that, like Cohere, AI21 doesn’t own the hardware it uses to run its LLMs. “We have a very close relationship with both AWS and Google Cloud,” said Goshen, “but we also built our platform in an agnostic way.”
How Developers Can Use Jurassic-2
As with Jurassic-1, Jurassic-2 will be available to developers via AI21 Studio, which the company calls “an NLP-as-a-Service developer platform.” There are three sizes, described using Starbucks-like proportions — Large, Grande, and Jumbo — and each of these has “a separate instruction-tuned version.”
In addition to working directly with any of the three LLM models, AI21 is also debuting a series of five “task-specific” APIs: Paraphrase, Summarize, Text recommendations, Grammatical error correction, and Text segmentation.
According to Goshen, the APIs were designed for “common tasks that people would like to perform, [but] it doesn’t make sense to interact directly with the language models.” He says there are “plenty of other high-level APIs” to come, in addition to the five released today.
Each API connects to a specialized version of the base LLM. For example, the Summarize model was “trained on massive amounts of summaries” and uses algorithms that are optimized for summarization, said Goshen.
These task models are much smaller and more accurate than LLMs, he added. “They’re less prone to hallucinations and reasoning violations that you have when you use a general purpose larger model.”
Use Cases
Since AI21 is focused on generative AI for reading and writing use cases, I wondered whether it can do the types of business functions that Cohere and OpenAI claim to do — for instance, sentiment analysis of customer data for a retailer?
Goshen replied that it absolutely does sentiment analysis, and indeed anything that involves “dealing with information or consuming information at scale.”
“A big use case for us in helping our retail customers to create high quality product descriptions,” he said. “Some retailers have millions of new products introduced every year. And it takes a lot of effort to write these high quality product descriptions. This is something that can be extremely automated.”
This AI-based automation isn’t just about creating mass-produced content, he added. “You can create multiple versions — product descriptions that are optimized for different types of audiences. So you can really customize the content.”
He also noted that although AI21 itself isn’t focused on conversational AI, its LLMs can be used by other companies to create those types of products — chatbots are commonly used these days by retailers, for example.
AI Won’t Replace Writers
This month, Goshen wrote a fascinating blog post about why AI and machine learning won’t replace human writers. “AI is not at a point where it can compete with human intelligence, since it is based on statistics and probabilistic models,” he wrote.
As a professional writer myself, I heartily agree! But I had to ask, does he think writers will need to learn new skills in order to use AI systems — for example, skills like “prompt engineering,” the term for when humans interact with a product like ChatGPT.
“There is going to be a need for it,” he replied, regarding prompt engineering, “but there are going to be a lot of tools that will abstract it away. So you as a writer won’t have to really care about the prompt.”
We’ve seen a lot of industries adapting to generative AI this year, and I like the augmentation approach that AI21 Labs is promoting — humans and AI working together. It remains to be seen whether its “neuro-symbolic architecture” gets widely adopted in the market, but it’s at least a bit different to the chat-oriented systems we’ve seen from the likes of OpenAI and Google Bard.