Stages of a Human-in-the-Loop Machine Learning Application
In previous posts we have introduced Human in the Loop (HITL) as a concept, explained how it can be applied in practice and presented the metrics that the most successful HITL operations are using to drive improvements. In this article, we focus on the different phases a routine work operation goes through to improve its efficiency and quality in their journey from being fully manual to completely automated.
Human-in-the-Loop applications generally imply a combination of computer and human-based decision making. We broaden the scope of what HITL means by also including those cases which are completely human dependent, as long as they have the characteristics that make them suitable for eventual automation. Manual workflows that can be automated must have the property of being routine and deterministic. This is what makes a process reliably replicable, and thus safely automatable.
The lifecycle of any Human-in-the-Loop machine learning (ML) application will contain the following stages as it matures:
- Inception. A manual workflow gets created.
- Iteration. The workflow is improved iteratively.
- Transition. The workflow transitions from fully manual to semi-automated.
- Monitoring. Full automation shifts HITL into a validation mechanism.
In this phase, we define the goal of the application and conceptualize a manual process to accomplish it. Then we implement it by establishing a team with the right tooling. This will ensure successful routine execution.
In order to be able to automate this process later on, we need to streamline it, by taking a “divide and conquer” approach. The more we can break down the process into smaller independent components, the more feasible it will be to train accurate machine learning models on their input and output data. Each one of these components will act as a step function within the process.
A golden rule of Human in the Loop Machine Learning systems is to never fully trust what is automated. So, even if you reach the Automated phase, you should still leverage the Human part of the system to verify that your automated processes still act as intended.
You should aim to visualize your process as a sequence of blocks, with data flowing out of a block to flow into the next block and so on. If you structure your manual workflows this way, you’re probably already halfway to successfully automating them.
This strategy will also help you acquire a more granular understanding of how your workflow is functioning so you can detect flaws and fix them, which takes us to the next stage.
Prior to starting any automation work, we want to ensure we maximize the efficiency — time cost — and the quality — accuracy — of each human decision taken. Increasing efficiency will help you scale in a more cost-efficient manner later on. Improving quality will help you achieve better outcomes and obtain higher quality data for the ML models you’ll develop when you start automating in the next stage.
Before you can even know what to improve, you will need to define efficiency and quality Key Performance Indicators (KPIs), at each decision step of your HITL process. Once you know this, you will probably have a better idea on where the most valuable opportunities to drive those KPIs are.
It is worth noting that quality and efficiency are often diametrically opposed. So, if you’re aiming to move your quality metric, you should use efficiency as your counter-metric, and ensure you’re gaining more than you’re losing. Setting a guard-rail or a budget for how much you’re willing to trade-off is generally a good idea.
When it comes to iterating on the components of your workflow, you will benefit from having a toolset that allows you to cheaply make changes to the structure of your workflow and the incoming and outgoing data of each step function in your routine workflow.
If you are utilizing the Human Lambdas platform to power your workflows, measuring and making the necessary changes to drive improvements becomes trivial. If you’re using your own internal tooling, make sure you have the right engineering teams commit bandwidth to the initiative and keep them in the loop so you can preserve iterative agility.
Once you have reached your desired human efficiency and quality levels, it is time to build those automation mechanisms. That could be rulesets, machine learning models or any other kind of AI-based system.
Coupling those automated mechanisms with the human component of your operation creates the actual Human-in-the-Loop machine learning system. As part of the transition you might be progressively adopting these schemes:
- Fully manual. The human receives the automated system’s decision alongside the input data, which can be used to start evaluating the model’s performance before it is allowed to independently act.
- Augmented manual. The human is augmented by the automation, acting as a gate-keeper for automated decisions, approving or declining — and correcting — those automated decisions.
- Semi-automated. A fraction of the automation’s decisions stop being supervised by a human. The rest remains under the Augmented manual model. This can be completely at random or weighted by the model’s predictive confidence or probability.
- Automated. The model reaches a desired accuracy level and humans stop being involved in the loop.
A golden rule of Human in the Loop Machine Learning systems is to never fully trust what is automated. So, even if you reach the Automated phase, you should still leverage the Human part of the system to verify that your automated processes still act as intended. This approach of keeping checks and balances will help eliminate, or at least anticipate, potential risks later down the line.
If cost is a concern, monitoring can be set up in a way that only requires the humans in the loop to approve or decline a sample of all the decisions taken. Ideally, you should budget enough human reviews to obtain precise statistical measures on your system’s accuracy. This is because sometimes ML systems degrade in subtle ways, and you want to make sure you catch these events as soon as possible.
Regarding which decisions to sample, you can do this in a totally random fashion, adjusted by confidence or by probability. We discussed this more in detail in this other note.
Once you have operationalized this monitoring process, you should ensure those checks roll up to a precision or accuracy metric of some sort. Additionally, your Data Science and Engineering team will certainly be interested in analyzing those erroneous decisions, to find potential gaps in their automation system.
Feature image via Pixabay.