From his experience working at Lyft, Snap and Twitter, Harish Doddi saw a lot of friction between those who create machine learning models, the data science team, and those putting them into production, the machine learning engineering side of the organization or DevOps team.
Data scientists, he said, tend to work on historical data in siloed environments, usually the confines of their laptops. But when a model goes into production, the way a model behaves with real-time data can be very different.
“I started thinking about how the production environment is very different from the way models are being developed, and how there is a bigger problem. If things go wrong in the production environment, then things go wrong in a development environment,” he said.
So along with Jerry Xu, an alum of Lyft, Box and Twitter, Doddi founded Datatron, which offers a model management and governance platform for enterprise machine learning and artificial intelligence initiatives.
Managing Models in Production
Data scientists work with the business units to sort out the business problem the model is designed to solve. While some data scientists also handle deployment, in many cases, handling the operational aspects falls to a relatively new role, the machine learning engineer, as Levon Paradzhanyan, system architect at EastBanc Technologies, pointed out in a previous post.
Analyst Lawrence Hecht reported that an Algorithmia survey found that in a majority of enterprises, it takes more than a month to create a model and at least another month to put the model into production.
There’s the old axiom that nearly 90 percent of models never make it to production, but even reaching that point is just the beginning, as Doddi points out. There’s a range of other considerations, such as workflows, caching, monitoring, versioning, security and more.
“Our strong conviction is this whole production environment is separate from the development environments. That’s why we don’t provide APIs or we don’t provide SDKs to data scientists. … They’re not end-users of our platform,” Doddi said, adding that its customers are large enterprises. They include Johnson & Johnson, Comcast, Ford, and Domino’s Pizza.
Domino’s, for instance, uses Datatron to help manage models to provide guidance on store placement, staffing and ways to improve customer experience.
“Enterprises have very large legacy systems that you need to work with. So the way we architected the system has a seamless integration to those type of systems,” Doddi said.
Its stack is built on top of Kubernetes, Docker and Python technology on the backend; the front end is completely based on React.
“We have chosen, consciously … Kubernetes because one of the hardest challenges we observed is you have to have a product that fits into their legacy systems very nicely. … And that’s where the enterprise architecture integration, how we absorb the models into our system, how we do orchestration of our software, how do we do autoscale, all of those things come into picture,” he said.
Language, Framework Agnostic
The San Francisco-based startup calls its model management platform ModelOps, a framework to help organizations conduct continuous integration, development and delivery of AI/ML models at scale.
It allows data scientists to work with any languages, libraries, and frameworks they choose. It provides a model catalog to provide full transparency and lineage for the model throughout its life cycle. Each model gets a unique identifier and it collects information on every action taken with it as well as version number, metadata, tags, model locations, input and output features and more.
It supports models for inferencing with real-time and streaming data. With batch scoring, it allows users to work with very large datasets offline, including pulling and joining data from multiple sources and storing the output to a designated location. Whether using batch or streaming data, it employs parallel processing to accelerate data processing speed and optimize compute infrastructure.
It has patented what it calls the Publisher/Challenge Gateway, designed to enhance collaboration between the development and operations teams and allow users to experiment with different models in production for release options such as canary and A/B testing. A shadow mode allows a test model to run alongside a live one to determine whether the updated one will behave as expected. A failover model can be used to take over if the primary model does not meet a defined criteria.
It provides infrastructure monitoring as an out-of-the-box feature for all machines in a cluster, tracking CPU usage, memory usage, health check and more.
In October, Datatron unveiled a new release, including deployment of ML models in complex multitenant environments; monitoring for customer-defined KPIs; new explainability capabilities; native support for Jupyter notebooks, and a new rapid setup and deployment process that supports deployment of APIs for real-time or batch inferencing in less than 10 minutes.
Focus on Model Governance
Datatron takes the view that there’s a triumvirate involved in model management: the data science team, the operations team and the risk team. Datatron strives to be the bridge between the three concerns.
Multiple startups are tackling the challenges of debugging and managing machine learning models in production for evidence of drift and other problems, such as Tecton and WhyLabs as well as the major cloud vendors. Datatron, however, is leaning more heavily toward model governance.
“When we started, we started as an operations platform. But after working with our customers very, very closely, we got into governance because operations is generating so much data … which the compliance people need,” Doddi said.
“In in any large organization, these models make decisions. They need to have accountability, accountability.
“You need to have a proper accountability by bringing proper transparency, traceability of how the decision was made, and even like understanding [of] the moral behavior on certain metrics relating to buyers or an anomaly or even the risk of the model to the organization. You need to have one place where you [have] proper evidence that can be presented to an audit team or a regulator.”
To that end, its dashboard provides a high-level overview of your AI/ML program. It allows users to further investigate models and make better decisions about them.
Six types of metrics are calculated based on how the model is performing: bias, anomaly, drift, performance, operational, and business key performance indicators (KPIs) that the business customizes.
“We give this global view of what’s going on across your organization. How many models have bias issues, how many models have drift issues, how many models have business issues, for example,” he said.
Then it computes an overall health score outlining the risk of this model to the organization at a global level. If this score falls below a certain threshold, the system marks it as a red flag, and it triggers alerts.
Customers can set alerts via email, Slack, PagerDuty or other methods, and set automatic shutdown defaults triggered when the model varies from pre-defined performance thresholds.
It provides a full activity log and audit trail, enabling users to go back and see how the model performed at a particular time in the past.
Going forward, its customers are asking for reports they can show compliance auditors. The company expects to release that feature in the first quarter next year, Doddi said.
Designed as a cloud native platform, Datatron can be deployed anywhere — your own on-premise data center, in public clouds and even air-gapped.
The New Stack is a wholly owned subsidiary of Insight Partners, an investor in the following companies mentioned in this article: Docker.