Generative AI: How Companies Are Using and Scaling AI Models
Robert Nishihara is the co-founder and CEO of Anyscale, the company behind the open source platform, Ray — the distributed machine learning framework being used by ChatGPT and other highly-scaled products, such as Uber. I spoke to Nishihara to find out how ordinary companies can take advantage of AI technology and how Ray fits into the enterprise space.
In this new era of generative AI, frameworks like Ray will become increasingly important — indeed, as important as Kubernetes has been over the past decade, in terms of creating modern applications at scale.
Nishihara was one of the graduate students under Ion Stoica at Berkeley’s RISELab. It was there that he, Stoica and Philipp Moritz (another graduate student) created Ray in 2016. Anyscale was founded a few years later to commercialize the technology.
It’s worth noting that Ray was originally envisioned as “a general purpose system that would make distributed computing easy to do,” as Nishihara put it in a recent LinkedIn post. So it’s designed for “all scalable computing,” not just machine learning workloads. That said, its main use case right now is scaling machine learning.
Generative AI in the Enterprise
I asked Nishihara for an example of Ray helping a business use generative AI. He pointed to a company called Cohere AI, which gives developers and businesses access to NLP (Natural Language Processing), via large language models. Cohere AI uses Ray to train its large language models, he said.
Anyscale is also talking to a lot of early-stage startups wanting to take advantage of the generative AI trend, he added.
“I feel like every week, or maybe multiple times a week, we talk with new startups that are trying to do, like, fine-tuned Stable Diffusion models, and serve them, and build products around that. So we do work with a lot of companies like this.”
The generative AI use cases he’s seeing are mostly based on language and image data, but there are also some audio and video use cases too.
As for what data is being used in enterprises for AI workloads, Nishihara confirmed that it is mostly data owned by those companies. However, they typically combine it with some public data.
“They do use off-the-shelf, pre-trained models,” he said. “Sometimes things are done end-to-end with their own data, but it depends on how much data they have. It is common to use pre-trained models that are trained on public data, and then fine-tune with your own data.”
How Uber Uses Ray
When it comes to companies operating at super-scale, Uber is one of the biggest users of Ray.
“Uber runs all of their machine learning on Ray,” Nishihara told me. “And machine learning — this is deep learning, classical machine learning, XGBoost. […] They use machine learning all over the place, for things like…if you get in an Uber, it tells you what time you’re going to arrive — your ETA. And that ETA is a prediction made by a deep learning model.”
Nishihara explained that Uber has its own “in-house distributed system they built for scaling deep learning training.” That product is Horovod, a distributed deep learning framework for TensorFlow released as open source in 2017. Uber later added Ray to its workflow, in order to better scale the data ingest and pre-processing parts.
“The limitation they [Uber] were running into, as they were scaling on more and more GPUs […] was that the bottleneck shifted from scaling on GPUs to how quickly they could ingest and pre-process new data — to feed it into the training,” Nishihara explained.
Uber still uses Horovod for ML training, but it now runs on top of Ray — which handles the data ingest and pre-processing via a Ray library called Ray Datasets. Ray does this using CPUs, while Horovod uses GPUs. Another advantage of Ray, said Nishihara, is that it can achieve ML scaling using both CPUs and GPUs at the same time.
The Role of Foundation Models
Recently Nishihara noted on Twitter that “training and serving foundation models are some of the fastest growing use cases we see” on Ray and Anyscale. A foundation model is an AI model trained on broad data and at a massive scale, with the goal of using it for many different types of tasks. It’s a relatively new phenomenon, with OpenAI and Stable Diffusion helping to popularize the technique in the last couple of years.
I asked Nishihara to unpack how Ray helps organizations train and serve foundation models, in this new era of generative AI. He used the example of an image-based foundation model to help explain.
“You could repurpose it to recognize objects, or maybe to predict depth in an image, or to segment the different objects out — [there are] a lot of different use cases. And the idea is, you would have one base model that is large and understands images and is trained with a lot of compute and a lot of data, and then other people can […] fine-tune it and adapt it for different purposes.”
So the training of foundation models involves both training the initial data and fine-tuning it for specific purposes. As for serving, he said that it also has “two different flavors” — which he characterized as online and offline.
“You want to deploy the model in a real-time kind of way, where there’s a service running that can be queried and you get the response back in milliseconds, or whatever. And then the other version is offline, where you have the model and you want to apply it to a ton of data.”
Typically, he says, Anyscale’s customers — and other open source users of Ray — will do all of these different steps. So for example, a startup may use an image model, fine-tune it for their specific business purpose, and then deploy the model “as a service” to their users.
Kind of Like Kubernetes, But for Python Devs
All of Ray’s functionality is designed to be easy to use for developers, which Nishihara noted differs from the Kubernetes user experience for devs (notoriously difficult). Partly that’s because Kubernetes is for DevOps people, whereas Ray is specifically for Python developers.
“The experience we’re trying to deliver […] is about getting to the point where, as a developer, you know how to program in Python on your laptop — and that’s enough,” he said. “Then you can build these scalable applications. […] You can do as OpenAI does, but you don’t have to learn about scheduling across multiple machines, about fault tolerance, about auto-scaling.”
He added that a lot of people choose to run ML applications on top of Kubernetes, but that the ML scaling is handled by Ray. So it’s probably better to think of Ray as another, much-needed, piece of the overall scaling solution for devs.