Integrate Quality, Infrastructure for Maximum ML Velocity

Many organizations have already identified machine learning as a transformative technology for improving their products and overall business. However, they often treat ML as an additive technology to their existing operations rather than redesigning systems and organizations to fully integrate ML.
This approach makes it difficult to holistically combine the infrastructure and quality perspectives that are necessary to extract the maximum value from ML.
The infrastructure perspective, exemplified by engineers who build production systems, centers on how to best integrate ML into the software stack.
The quality perspective, characterized by analysts and data scientists who focus on business metrics, captures the analysis and modeling that ultimately drive business impact. ML systems work best when their product surface and underlying software are jointly designed to account for all the interactions between infrastructure and quality work.
Better ML = Infrastructure + Quality
Traditional ML life cycles typically lean into only one perspective — either infrastructure or quality. By not capturing both views simultaneously, fragmentation of work is likely to occur that leads to incompatible components. For example, it is common for quality teams to develop models and require analysis that the infrastructure cannot support.
Organizations generally know how to frame ML opportunities in the context of business problems such as how a retailer can increase customer lifetime value. However, business metrics such as revenue are difficult to optimize directly and take a while to measure after deploying a new model.
Therefore, it is important to identify proxy metrics such as the likelihood that a user will interact with a recommended item, which are easier to optimize and measure.
The diagram below illustrates a proposed ML + business metrics life cycle that jointly captures the infrastructure and quality work needed to develop, deploy and iteratively improve ML systems in the context of business and proxy metrics.
The black arrows capture the infrastructure work, and the blue arrows capture the quality work.
Let’s consider how an advertising company might apply the life cycle to map a standard problem to an ML system: How to select the best two ads to show website visitors out of a possible 1 million ads.
From the quality perspective, the steps will include how to:
- Identify a business metric to optimize, such as long-term revenue.
- Conduct data analysis to determine a proxy metric, such as predicting the click-through rate (CTR).
- Organize data and build a model to predict CTRs, for example, by joining ad impressions with clicks and other attributes.
- Evaluate and improve the model on historical data.
- Determine how to deploy the model and measure impact on proxy and business metrics, for instance, by running an A/B experiment.
From the infrastructure perspective, the steps will include how to:
- Source raw data from production systems.
- Process the data for ML training, inference, analysis, etc.
- Enable offline training, evaluation and experimentation of different models.
- Enable deployment, evaluation and monitoring of production models.
- Debug and diagnose quality and infrastructure issues during development and in production.
Initially, it might seem that many of the steps described above can be designed and implemented independently. In practice, the details really matter. For example, at training time, we have only two ads per web page, and we know exactly where they were located on the page.
At inference time, we have 1 million ads to consider per web page, and we need to predict a CTR before we know on which part of the page the ad may be shown. We must carefully consider both the quality and infrastructure implications and how they intersect when deciding how to incorporate these details into the overall design.
ML differs from other software systems because data sets and models are difficult to verify, monitor and debug. Suppose we monitor the revenue for an advertising system and, all of sudden, observe a large, unexpected drop in revenue.
From an infrastructure perspective, we can check whether the model is correctly computing predictions or whether our data is generated incorrectly by testing well-defined inputs and outputs.
From a quality perspective, we can look at changes in the correlation between proxy and business metrics, or distributions of features or model predictions. This quality debugging work is much harder to do because we don’t know what types of changes are significant or how to easily find the root causes for those changes. We need to anticipate where ML projects will encounter these challenges when designing an ML system.
Get One Byte Flowing Through the Entire ML System
A key goal companies should embrace with every ML project is to have an implementation or prototype version of the complete ML + business metrics life cycle in place very early in development so it’s possible to run one byte through the entire system. This best practice encourages organizations to adopt an ML-centric approach as projects get underway, which ensures there’s a harmonization between the infrastructure and quality views of ML.
Taking a more holistic approach to ML that integrates infrastructure and quality from the beginning enables organizations to more quickly experiment and iterate, which shortens the time to develop, deploy, and improve models over time.