Top 5 Large Language Models and How to Use Them Effectively
Modern Large Language Models (LLMs) are pre-trained on a large corpus of self-supervised textual data and are then tuned to human preferences via techniques such as reinforcement learning with human feedback (RLHF).
LLMs have seen rapid advances over the last decade or so, and particularly since the development of GPT (generative pre-trained transformer) in 2012. Google’s BERT, introduced in 2018, represented a significant advance in capability and architecture and was followed by OpenAI’s release of GPT-3 in 2022, and GPT-4 this year.
At the same time, while open sourcing AI models is controversial given the potential for abuse in everything from generating spam and disinformation to misuse in synthetic biology, we have also seen a number of open source alternatives in the last few months, such as the recently introduced Llama 2 from Meta.
Use Cases for LLMs
Given how new this all is, we’re still getting to grips with what may or may not be possible with the technology. But the capabilities of LLMs are undoubtedly remarkable, with a wide range of potential applications in business. These include being used as chatbots in customer support settings, code generation for developers and potentially business users as well, audio transcription summarizing and paraphrasing, translation, and content generation.
You can imagine, for example, customer meetings could be both transcribed and summarized by a suitably trained LLM in near real-time, with the results shared with the sales, marketing and product teams. Or an organization’s web pages might automatically be translated into different languages. In both cases, the results would be imperfect but could be quickly reviewed and fixed by a human reviewer as needed.
In a coding context, many of the popular internal development environments now support some level of AI-powered code completion, with GitHub Copilot and Amazon CodeWhisperer among the leading examples. Other related applications, such as natural language database querying, also show promise. LLMs might also be able to generate developer documentation from source code.
LLMs could prove useful when working with other forms of unstructured data in particular industries. “In wealth management,” Madhukar Kumar, CMO of SingleStore, a relational database company, told the New Stack, “we are working with customers who have a huge amount of unstructured data, such as legal documents stored in PDFs, and want to be able to query them in plain English using a Large Language Model.”
SingleStore is seeing clients using LLMs to perform both deterministic and non-deterministic querying at the same time.
“For example, in wealth management, I might want to be able to say, ‘Show me the income statements of everybody between 45 and 55 years old who recently quit their job,’ because I think they are right for my 401(k) product,” Kumar said.
“This requires both database querying via SQL and the ability to work with that corpus of unstructured PDF data. This is the sort of use case we are seeing a lot.”
Large language models have been applied to areas such as sentiment analysis. This can be useful for organizations looking to gather data and feedback to improve customer satisfaction. Sentiment analysis is also helpful for identifying common themes and trends in a large body of text, which may assist with both decision-making and more targeted business strategies.
We should note though that LLMs are not factually reliable, and therefore shouldn’t be used without human oversight in any setting where accuracy matters.
Training an LLM from scratch remains a major undertaking, so it makes more sense to build on top of an existing model where possible. This is still a rapidly evolving space, but with Kumar’s help we’ve compiled a list of what we think are the five most important LLMs at the moment. If you are looking to explore potential uses for LLMs yourself, these are the ones we think you should definitely consider.
The Top 5 LLMs
GPT-4 is probably top of the tree at the moment, and OpenAI has built an impressive product around it, with an effective ecosystem that allows you to create plugins, as well as execute code and functions. It is particularly good at text generation and summarization.
“If you look at GPT-4,” Kumar said, “it is a little bit more conservative but it is far more accurate than 3.5 was, particularly around code generation.”
2. Claude 2
The main advantage that Claude has is the size of the context window, which was recently expanded from 9K to 100K tokens, considerably more than the maximum 32k tokens supported by GPT-4 at the time of this writing. This corresponds to around 75,000 words, which allows a business to submit hundreds of pages of material for Claude to digest.
3. Llama 2
It is available for free for both research and commercial use, though there are some oddly specific restrictions in the license, such that if the technology is used in an application or service with more than 700 million monthly users, a special license is required from Meta. The community agreement also forbids the use of Llama 2 to train other language models.
While there are advantages to open source, particularly for research, the high cost of training and fine-tuning models means that, at least at the moment, commercial LLMs will generally perform better.
As the Llama 2 whitepaper described, “[C]losed product LLMs are heavily fine-tuned to align with human preferences, which greatly enhances their usability and safety. This step can require significant costs in compute and human annotation, and is often not transparent or easily reproducible, limiting progress within the community to advance AI alignment research.”
In February, Meta released the precursor of Llama 2, LLaMA, as source-available with a non-commercial license. It soon leaked, and spawned a number of fine-tuned models built on top of it, including Alpaca from Stanford University, and Vicuna, developed by a team from the University of California, Berkeley, Carnegie Mellon University, Stanford, and UC San Diego.
Both of these models used a unique approach of training with synthetic instructions, but while they show promise, the Llama 2 paper again suggested: “they fall short of the bar set by their closed-source counterparts.”
That said, you don’t have to pay to use an open source model, so while you are trying to decide whether this technology is useful in your particular use case, Llama 2 could be a good place to start.
Orca, from Microsoft Research, is the most experimental model we’ve selected, but it is interesting in part because it is a smaller open source model, and uses a different technique called progressive learning to train itself from the large foundation models.
What this means is that Orca can learn from models like GPT-4 through imitation, improving its own reasoning capabilities. This may be indicative of a way that open-source models can better compete with their closed-sourced counterparts in the future, and as such Orca is an interesting model to keep an eye on.
Cohere is another commercial offering, and the company behind it was co-founded by Aidan Gomez, who was co-author of the seminal transformer research paper “Attention Is All You Need.” Cohere is being positioned as a cloud-neutral vendor, and it is clearly targeting enterprises, as indicated by the company’s recently announced partnership with McKinsey.
Picking an LLM
Once you’ve drawn up a shortlist of LLMs, and have identified one or two low-risk use cases to experiment with, you have the option of running multiple tests using different models to see which one works best for you, as you might do if you were evaluating an observability tool or similar.
It’s also worth considering whether you can use multiple LLMs in concert. “I think that the future is not just picking one but an ensemble of LLMs that are good at different things,” Kumar told us.
Of course, none of this is particularly useful to you unless you have timely access to data. During our conversation, Kumar suggested that this was where contextual databases like SingleStore come in.
“To truly use the power of LLMs,” he said, “you need the ability to do both lexical and semantic search, manage structured and unstructured data, handle both metadata and the vectorized data, and handle all of that in milliseconds, as you are now sitting between the end user and the LLM’s response.”