Where are you using WebAssembly?
Wasm promises to let developers build once and run anywhere. Are you using it yet?
At work, for production apps
At work, but not for production apps
I don’t use WebAssembly but expect to when the technology matures
I have no plans to use WebAssembly
No plans and I get mad whenever I see the buzzword
AI / Large Language Models / Open Source

MosaicML Launches 30B Model — Takes on LLaMA, Falcon and GPT

MosaicML has launched MPT-30B, which founder Naveen Rao claims out-performs both LLaMA and Falcon in certain use cases for enterprise devs.
Jun 22nd, 2023 7:00am by
Featued image for: MosaicML Launches 30B Model — Takes on LLaMA, Falcon and GPT
Image via Pexels

MosaicML is launching its second open source large language model (LLM), called MPT-30B, which follows on from the smaller MPT-7B model it debuted in May.

To discuss the new model and what it means for developers, I spoke to MosaicML co-founder and CEO, Naveen Rao. His previous startup was Nervana, a deep learning company that was acquired by Intel in 2016 — so he’s no johnny-come-lately in the AI industry.

As the name suggests, MPT-30B is a 30-billion parameter model. The company claims that it surpasses OpenAI’s GPT-3 in quality, despite having about 1/6th the number of parameters (GPT-3 has 175 billion). “This means MPT-30B is easier to run on local hardware and much cheaper to deploy for inference,” the company says.

MosaicML vs. LLaMA and Falcon

MPT-30B was trained on longer sequences (up to 8,000 tokens) than other models, including GPT-3, LLaMA and Falcon (2,000 tokens each). According to MosaicML, “It is designed to handle even longer sequences in practice, making it a perfect fit for data-heavy enterprise applications.”

In practice, what this means is that users can enter longer prompts. Indeed, MosaicML’s previous 7B parameter model comes with a fine-tuned option, called MPT-7B-StoryWriter-65k+, that has a massive 65,000 “context length.”

“Longer context [lengths] means more flexible usages,” said Rao. “We’re going to have fine-tuned versions that are especially good for writing prose — for writing longer outputs.”

MosaicML platform

The MosaicML platform; via its company blog

Another difference Rao wanted to highlight was its attention mechanism. When Google published its now famous 2017 paper about transformer technology, “Attention Is All You Need,” it noted that “multi-headed self-attention” was the training mechanism that provided its breakthrough for AI (an insight that OpenAI then borrowed to build GPT).

“Attention is the intrinsic part to transformer models,” explained Rao. “That’s actually what allows them to see connections across a sentence, or a paragraph, or a whole corpus of text.”

Rao told me that MosaicML utilizes a technique called “FlashAttention,” which was the subject of a 2022 academic paper.

“It enables you to have faster inference and training — both Falcon and LLaMA do not have this,” he said. “So ours are actually more efficient from a computing perspective.”

Rao added that the new model is more appropriate for enterprise use, because it is “right-sized” to “fit into the constraints of real hardware.” He noted that deep-learning GPUs typically use 40-80 gigabytes of memory. According to Rao, the open source Falcon LLM struggles with this constraint.

“Oddly enough, the Falcon model that they released is a 40 billion parameter model. This doesn’t fit very easily into an 80-gig GPU, because it’s butting right up against the edge.”

He added that its own 30 billion parameter model is smaller in order to better optimize for GPUs. “It doesn’t really hurt us on performance and it will allow you to very easily fit into an 80-gig GPU,” he said.

Rao claims that its new 30B parameter model also compares favorably to both LLaMA and Falcon in performance.

“We’re actually training on less compute, because of our efficiency methods, than LLaMA and Falcon. So it’s actually much cheaper to train. But we’re basically on parity. It depends on the evaluation metric — like, for coding, this model actually does considerably better than those two. On other things, it’s a little bit worse.”

Of course, the people behind LLaMA and Falcon might contest that. But it’s difficult to independently verify the claims of MosaicML because none of the three open source LLM projects Rao talks about (MosaicML, LLaMA or Falcon) have yet been tested using Stanford’s HELM measure.

MosaicML vs. OpenAI

So how does MosaicML’s model compare to OpenAI’s GPT-4? Rao acknowledged that GPT-4 is superior in terms of its capabilities, across most aspects. However, he reiterated that MosaicML’s model offers a longer context length, which allows for unique use cases — such as generating an epilogue to F. Scott Fitzgerald’s famous novel, ‘The Great Gatsby.’ (Aside: as a former English Literature major, this is the last thing I want from LLMs!).

The main challenge with large models like GPT-4, said Rao, is the high cost of running them, making it impractical for most enterprises. MosaicML also focuses on serving companies with specific data — including sensitive data — to fine-tune models for their specific industries.

In terms of use cases, Rao explained that industries like healthcare and banking can benefit from MosaicML’s ability to interpret and summarize large amounts of data. In the medical field, for instance, the model can interpret lab results and provide insights into a patient’s history by analyzing various inputs.

Rao emphasized the importance of open source models in these scenarios, as the nature of health (or indeed financial) data requires secure handling behind a firewall, rather than sending it over an API to the likes of OpenAI.

How Developers Can Use MosaicML

I asked how developers can start using MosaicML’s platform. Rao replied that MosaicML offers various options, depending on the developer’s needs and expertise. For simple integration, they provide an API similar to other companies (like OpenAI), which allows developers to easily incorporate MosaicML’s models into their frontend applications. He claims that MosaicML’s models are more cost-effective compared to similar-sized models from other providers.

Developers also have the option of customizing a MosaicML model by fine-tuning it with their own data. They can download the model, make modifications, and create their own API with the customized version.

For more advanced developers with ample data, Rao said that MosaicML’s tools can be used to pre-train custom models from scratch, and serve them using MosaicML’s platform.

I then asked about the compatibility of MosaicML with popular third-party tools, like LangChain.

“All the tools that you get with LangChain work with our API’s,” he replied. “And what’s really cool about it is, you can use those tools on top of a custom model that you build with us. So we basically give the developer incredible power in terms of customization — even owning the whole model. All your data that went into that model — the weights, everything — are owned by you, so full customization is possible. That’s what we enable. With these API providers [like OpenAI], you get what you get — there is zero customization.”

Team Open Source

Despite talking a little smack about LLaMA and Falcon during our interview, ultimately Rao thinks they’re all on the same team — and that it’s proprietary platforms like OpenAI that are the true competition.

“This puts the power back in the hands of the enterprise developer,” he said, about open source LLMs. “Having all of that in one centralized place, where you get what you get, is a big negative outcome.”

He also insisted that the open source LLMs are “closing the gap to these closed source models.” Maybe not completely yet, he acknowledged, but he thinks open LLMs have “crossed the threshold where these models are actually extremely useful.”

Group Created with Sketch.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.