A New Tool for the Open Source LLM Developer Stack: Aviary
The company behind Ray, an open source AI framework that helps power ChatGPT, has just released a new tool to help developers work with large language models (LLMs). Called Aviary, Anyscale describes it as the “first fully free, cloud-based infrastructure designed to help developers choose and deploy the right technologies and approach for their LLM-based applications.” Like Ray, Aviary is being released as an open source project.
I spoke to Anyscale’s Head of Engineering, Waleed Kadous, to discuss the new tool and its impact on LLM applications.
The goal of Aviary is to enable developers to identify the best open source platform to fine-tune and scale an LLM application. Developers can submit test prompts to a pre-selected set of LLMs, including Llama, CarperAI, Dolly 2.0, Vicuna, StabilityAI, and Amazon’s LightGPT.
The Emergence of an Open Source LLM Stack
I told Kadous that there’s an emerging developer ecosystem building up around AI and LLMs; I mentioned LangChain and also Microsoft’s new Copilot stack as examples. I asked how Aviary fits into this new ecosystem.
He replied that we are witnessing the development of an open source LLM stack. He drew a parallel to the LAMP stack of the 1990s and early 2000s (which I also did, in my LangChain post). In the open source LLM stack, he continued, Ray serves as the bottom layer for orchestration and management. Above that, there is an interface for model storage and retrieval — something like Hugging Face. Then there are tools like LangChain “that kind of glues it all together and does all the prompt adjustments.”
Aviary is essentially the back end to run something like LangChain, he explained.
“LangChain is really good for a single query, but it doesn’t really have an off-the-shelf deployment suite,” he said.
So why does this LLM stack have to be open source, especially considering the strength of OpenAI and the other big tech companies (like Google) when it comes to LLMs?
Kadous noted the downsides of LLMs owned by companies (such as OpenAI or Google), since their inner workings are often not well understood. They wanted to create a tool that would help access open source LLMs, which are more easily understood. Initially, he said, they intended to just create a comparison tool — which turned out to be the first part of Aviary. But as they worked on the project, he continued, they realized there was a significant gap in the market. There needed to be a way for developers to easily deploy, manage and maintain their chosen open source model. So that became the second half of what Aviary offers.
How a Dev Uses Aviary
Kadous explained that there are two main tasks involved in choosing and then setting up an LLM for an application. The first is comparing different LLM models, which can be done through Aviary’s frontend website, or via the command line.
Aviary currently supports nine different open source LLMs, ranging from small models with 2 billion parameters to larger ones with 30 billion parameters. He said that it took them “a fair amount of effort” to get the comparison engine up to par.
“Each one [LLM] has unique stop tokens [and] you have to kind of tailor the process a little bit,” he said. “In some cases, you can accelerate them using something like DeepSpeed, which is a library that helps to make models run faster.”
One interesting note here is that for the evaluation process, they use OpenAI’s GPT-4 (not an open source LLM!). Kadous said they chose this because it’s currently considered the most advanced model globally. The GPT-4 evaluation provides rankings and comparisons for each prompt, across whichever models were selected.
The second key task for a developer is getting the chosen model into production. The typical workflow involves downloading a model from a repository like Hugging Face. But then additional considerations arise, said Kadous, such as understanding stop tokens, implementing learning tweaks, enabling auto-scaling, and determining the required GPU specifications.
He said that Aviary simplifies the deployment process by allowing users to configure the models through a config file. The aim is to make deployment as simple as running a few command lines, he added.
Aviary’s main connection with Ray, the distributed computing framework that Anyscale is best known for, is that it uses a library called Ray Serve, which is described as “a scalable model serving library for building online inference APIs.” I asked Kadous to explain how this works.
Ray Serve is specifically designed for serving machine learning models and handling model traffic, he replied. It enables the inference process, where models respond to queries. One of its benefits, he said, is its flexibility and scalability — which allows for easy service deployment and scaling from one instance to multiple instances. He added that Ray Serve incorporates cost-saving features like utilizing spot instances, which he said are significantly cheaper than on-demand instances.
Kadous noted that Ray Serve’s capabilities are particularly important when dealing with large models that require coordination across multiple machines. For example, Falcon LLM has 40 billion parameters, which necessitates running on multiple GPUs. Ray Serve leverages the Ray framework to handle the coordination between those GPUs and manage workloads distributed across multiple machines, which in turn enables Aviary to support these complex models effectively.
Customized Data Requirements
I wanted to know how a developer with a specific use case — say, someone who works for a small insurance company — might use Aviary. Can they upload insurance-related data to Aviary and test it against the models?
Kadous said that developers can engage with Anyscale and request their own customized version of Aviary, which allows them to set up a fine-tuned model. For example, an insurance company might fine-tune a model to generate responses to insurance claims. By comparing the prompts sent to the original model and the fine-tuned model, developers can assess if the fine-tuning has produced the desired differences, or if any unexpected behavior occurs.
Examples of LLM Apps
Finally, I asked Kadous what are the most impressive applications built on top of open LLMs that he’s seen so far.
He mentioned the prevalence of retrieval Q&A applications that utilize embeddings. Embeddings involve converting sentences into sequences of numbers that represent their semantic meaning, he explained. He thinks open source engines have proven to be particularly effective in generating these embeddings and creating semantic similarity.
Additionally, open source models are often used for summarizing the results obtained from retrieval-based applications, he added.