Baseten: Deploy and Share ML Models with Python
In enterprises that employ machine learning, data scientists typically build and train models, then pass them over to an MLOps team to deploy them. But startups and many small businesses don’t have such a data engineering team, leaving them in the lurch when it comes actually putting those models to use in their organizations.
That’s the situation in which Baseten co-founders Tuhin Srivastava, Amir Haghighat and Philip Howes found themselves at digital publishing platform Gumroad. In effect, they had to become full-stack engineers to get models into production.
“We just kept seeing this issue over and over again, of companies making a big investment in machine learning and believing that this is going to have a transformative effect on their organizations, but really coming up short when it came to tangible value that execs could point to around machine learning,” Srivastava said.
In talking to around 100 machine learning practitioners and dozens of teams, they found organizations struggling.
“We realized that a lot of organizations, and in particular machine learning teams, were struggling with more of the software engineering side of getting been their models connected to value than the machine learning itself,” he said.
Complex ML Infrastructure
“Today there is no efficient bridge between the creation of ML models and the process of getting them into production,” Luis Ceze, co-founder and CEO at OctoML, wrote in a previous post at The New Stack.
He put the average time to production for ML models at 12 weeks, though other surveys differ while still making clear it’s a lengthy process.
“Today the process is so challenging that even skilled data scientists and AI practitioners get it wrong — models end up in their own unique pipeline, more often than not,” Ceze wrote. “With few exceptions, the pipelines are custom-assembled and fragile. Changes made to the deployment hardware choice, the environment, training framework, software library or integration stack can necessitate a thorough debugging or even a complete rebuild.”
San Francisco-based Baseten aims to abstract away the complexities of data infrastructure, enabling data science teams to put ML models into production faster and more reliably — and with less reliance on engineering help. And the technology, which is in public beta, is designed so users don’t have to know anything but Python, which they normally already use in their Jupyter Notebooks.
“According to Omdia research, even though nearly two-thirds of U.S. companies are investigating or building [machine learning] pilot use cases, only 9% have been able to bring those efforts through to production and even fewer (6%) have done so at scale across the business,” said Bradley Shimmin, Chief Analyst AI platforms, Analytics and Data management.
“What’s holding them back? Often, a simple impedance mismatch between developing a working model and integrating that model within the context of business apps at the point of action. With the ability to rapidly create APIs and embed machine learning models directly into shareable apps, Baseten promises to minimize this mismatch and accelerate time to value.”
A free tool built on Baseten and developed by Tencent researchers, GFP-GAN (Generative Facial Prior-Generative Adversarial Network), that can restore damaged and low-resolution photos has been getting some buzz lately.
The inspiration for our photo restoration app (w/GFP-GAN) was to showcase more delightful, “just because” applications of ML. We’re so thrilled that it’s touched the hearts of so many.
— Baseten (@basetenco) August 1, 2022
Down a Rabbit Hole
When asked what they had to learn at Gumroad, Srivastava replied, “Oh gosh, so much.
“But the first thing we had to learn was how to actually get that model served behind an API. And so we went down this long rabbit hole doing a bunch of infrastructure work, had to learn about Amazon Web Services, had to learn about Docker. Kubernetes didn’t exist then, but we had to learn about orchestration.
“And honestly, that took me about a quarter. And but at the end of all, we had this pretty brittle system that was able to serve this API with a model behind it. But then we realized we have to wrap that model that was being served behind the infrastructure with some business logic. So you know, a model takes a specific type of input that oftentimes doesn’t map directly to how another business wants to use that model.
“So we ended up having to learn a bunch of backend engineering and server-side engineering, like building a Ruby on Rails app and Django apps, to be able to wrap that model and call the business logic integrated back into existing systems.”
Then there was the problem of models that needed a human review step.
“So [we] ended up learning a bunch of frontend skills, to be able to put those interfaces out so the analysts could make those decisions. …We really did traverse the entire gamut of software engineering just to get models kind of liberated and adding value to the business,” he explained.
Python, Just Python
Baseten focuses on those three pillars: serving the model behind an API, backend infrastructure and frontend interfaces.
“That’s the whole premise of Baseten is that machine learning teams are really, really good at Python. They’re really, really good at models, and that’s all they have to focus on. We kind of abstract away everything else,” Srivastava said.
The Python SDK enables users to deploy TensorFlow, scikit-learn, PyTorch models or custom models right from a Jupyter Notebook. BaseTen’s serverless infrastructure enables chaining model outputs and pre- and post-processing code. It offers a library of pre-trained models that allow users to build and deploy an application around models for tasks like sentiment analysis, image classification and speech transcription.
“Baseten gets the process of tool-building out of the way, so we can focus on our key skills: modeling, measurement, and problem solving,” said Nikhil Harithas, senior machine learning engineer at Patreon
As part of the abstraction, it created and open sourced Truss, which helps data scientists deploy models trained with any framework to run in any environment. It turns your Python model into a microservice with a production-ready API endpoint, without the need for Flask or Django.
“The data scientist’s development environment needs to be flexible and permissive,” Howes wrote in a blog post introducing Truss.
“Model serving, or making a model available to other systems, is critical; a model is not very useful unless it can operate in the real world. We built Truss as a standard for serving models that takes advantage of proven technologies like Docker but abstracts away complexity regardless of model framework.”
“Baseten provides an easy way for us to host our models, iterate on them and experiment without worrying about any of the DevOps involved,” said Faaez Ul Haq, head of data science at Pipe.
In April, the company announced it has raised $8 million in seed funding co-led by Greylock and South Park Commons Fund and $12 million in Series A funding led by Greylock. A company of just eight employees in January has grown to 22 now.
“What we see with customers is that they’re sticking together all those three pieces in a really piecemeal way, and that’s why we think that the bundle is so important,” he said.
“Baseten is very, very nascent as a product, and so we’re ensuring that we have feature completeness,” he said.
“It’s time to build a lot of features that larger organizations need, you know, integration with the local development workflow, integration with GitHub and GitLab. And then also really thickening up a bunch of work around model deployment.
“So we think our model deployment pieces are pretty novel, and we’ll actually be opening that up in a pretty open sourcy way. …But it’s really around creating long-term value for our customers and also making the novel parts of Baseten and more accessible.”