“Model velocity… is the rate of delivering new [machine learning] models or improving on those models. It is more than just getting a model into production quickly, but also how fast and how frequently can you iterate on that model so that it can take advantage of current business conditions.”
That’s what Bob Laurent, Sr. Director of Product Marketing at Domino Data Lab, told The New Stack about his company’s new release, Domino 5.0. Domino is an enterprise data science platform with MLOps capabilities, and its new 5.0 release is designed to help users increase model velocity via flexible infrastructure, scalable compute resources, the ability to reuse prior work, enhanced collaboration features, and easier ways to get models into production.
Although the new release was announced in January, its focus is significant for the machine learning space and we thought it was worthy of some investigation and analysis now.
Technologically sophisticated organizations, and those with aspirations to get there, are past the “gee whiz” phase with machine learning. Their proofs of concept are done. Their models are in production, and they’re working. But the process is still too bespoke, and more of a craft than it is an engineering discipline. So MLOps-focused vendors like Domino are working hard to help these organizations raise the bar. Domino 5 is focused on this and its efforts are based on four key elements:
- Autoscaling of distributed compute clusters. With the 5.0 release, Domino dynamically grows or shrinks the distributed computing clusters used to train ML models, or score data against them, based on workload. For data science teams, this eliminates the need to specify parameters for processor power or the amount of memory set aside for distributed computing, making provisioning easier. For IT, it eliminates the problem of overcommitting resources to jobs and helps realize corresponding cost savings. Autoscaling capability is available both on-premises and in the cloud, and it supports all cluster types already available with Domino: Spark, Ray and Dask.
- Easier data source connection. With the 5.0 release, Domino offers pre-built connectors for Snowflake, Redshift, and Amazon S3, where data scientists can use them to connect to their data sources just by supplying their credentials. And unlike generic ODBC or JDBC drivers, Domino’s connectors are designed to work with specialized machine learning frameworks like PyTorch, able to generate compatible code, and build and train models. This reduces a ton of ad hoc coding work that would otherwise be necessary. For data sources other than Snowflake, Redshift, or S3, users can create their own connectors and share them with their colleagues. This increases collaboration because, in addition to code and data artifacts, users can now share the connectors used to create their data sets.
- Integrated Monitoring Capabilities. With the new 5.0 release, Domino brings its Domino Model Monitor (DMM) capabilities into the core Domino platform. All Domino users who upgrade to 5.0 can have access to model monitoring that includes capabilities like data drift detection, and ground truth tracking, to help maintain model quality and accuracy. This is where the iteration component of model velocity kicks in, allowing data science teams to monitor their models in production without using separate products. If a data quality issue is detected, Domino can reconstitute the original development environment with a single click, then retrain the model or do a champion/challenger test to find the best model and push that into production.
- Automated Insights for Model Drift. If model drift is detected during monitoring, Automated Insights can provide explanations, exposing the drift’s root cause, by examining a model’s features (input variables) and how their values have drifted over time. These insights are delivered via generated, customizable reports.
Blinded By ‘Science?’
Despite its name, data science has manifested as more of a craft than a repeatable, scientific process. Pressure is on vendors throughout the AI and Analytics arenas to address this requirement. That provides an opportunity to those who rise to it, but poses a threat to vendors that don’t accommodate it.
With its new release, Domino brings MLOps to the cloud, allows tighter integration with development and deployment environments, and simplifies the data science workflow. The company is clearly heeding the call to enable its customers to approach AI in a more engineering-oriented manner. Other vendors have begun to do so, too, and will likely up the ante even further, not just easing the process of deploying models to production, but also monitoring and retraining them when needed.