Stop Talking about Responsible AI and Put It into Practice
Over the past few years, many organizations have been racing to build, launch and scale their artificial intelligence (AI) and machine learning (ML) models. However, the lack of guardrails and governance means that black-box ML models are released into the real world, where they have significant and often unintended impacts on the business and the public. This is where the concept and practice of Responsible AI comes into play.
Responsible AI (RAI) combines methodology and tools that must be in place for society to adopt AI and enjoy its game-changing benefits while minimizing unintended consequences. Over the last year, Responsible AI has been discussed at length by the AI community, with many large organizations — such as Microsoft and Google — creating their own RAI guidelines and governments proposing legislation. While it’s important to have this conversation, what tends to get lost is putting these concepts into practice.
By prioritizing Responsible AI, a variety of ML issues including model performance degradation, prediction drift and data drift, among many others, can be avoided or fixed before they impact the business or users. Implementing RAI concepts can even help prevent unintentionally biased models from discriminating against users (e.g., Steve Wozniak and his wife received significantly different credit limits, even though they share the same assets).
It’s precisely these issues that bring us back to the question: How do you implement RAI?
Let’s take a deep dive into four quick wins that will help you put Responsible AI into practice.
Model Visibility Is Key
Having a single dashboard, a centralized place where everyone in the organization can see and track the health of the model, analyze it, and explore it from a functional and business perspective is at the core of implementing Responsible AI.
Everyone in the organization needs to assume ownership over RAI, thus having a single place where stakeholders and ML teams alike can see which models are running in production — what predictions the models are making, what decisions the models are making for their users and how well the model is actually performing compared to its training — is crucial.
Once your model’s visibility mechanism is in place and you can see and investigate its true (rather than expected) performance — you are ready to avoid many of the pitfalls that your model and business can find themselves in. Most notably are models that may be operating with unintended bias — for example, are 25-35-year-olds in Alabama receiving the same credit score as their counterparts in California, and is that the intended outcome? Fairness is one of the main drivers of the success of AI, so having the right visibility system in place must be a top priority to ensure that model degradation is avoided and that unintended biases are kept in check.
Implementing Visibility in Your Team:
1. Store inference data from production in your data lake, including model inputs and outputs.
2. Track all models in production, including model artifacts, training data and metrics from different data slices.
3. Create a centralized overview of all models and their status — and ensure they are accessible and understood by all team members and stakeholders.
Be Proactive with ML Events
Becoming proactive when an ML event occurs is incredibly important.
With endless unique edge cases in AI and due to its black-boxed nature, it’s impossible to precisely define all unexpected behavior in advance. Therefore, organizations must define what they do know about their model and set alerts for any predictions outside of its known norm.
Once deployed into production, your ML model is highly affected by the data it’s being fed. From your data pipeline to the interface with the real-world through web applications, mobile apps, APIs, etc. — there’s a large and complex array of moving parts that impact it. So, when a global event like the COVID-19 pandemic strikes, it changes the way people react, and therefore, the data becomes skewed, which in turn affects the model’s predictions and, ultimately, its performance.
It makes sense to have an alert system in place so that when game-changing events alter the ML lifecycle, you’re the first to know. At the very core, every data scientist wants to know that their model, once in production, is performing according to their expectations. If not, the business may lose credibility, which could result in financial loss, negating the purpose of the model in the first place. By making use of an alert system, you will know about any performance degradation in your model before it impacts your customers, providing you the precious time needed to respond and the advantage of being first to know.
As another example, just imagine that one of your customers begins tweeting about a discriminative drift in your model’s performance — before you even know about it. A discriminating model not only carries a problematic social justice and fairness component but is also not aligned with the business goals. Having an automated ML monitoring system in place that constantly tracks the data, the predictions, and the business and data science KPIs ensures that you always know how your ML model is impacting your business. It is imperative that this system integrates seamlessly into the organization’s main communications channel (e.g., Slack, Teams, email, etc.) so that team members can respond and react efficiently to the incident without disrupting their workflow.
1. Define the expected behavior of your model: expected prediction distribution, input data distributions, data stats, data science KPIs and business KPIs.
2. Set alerts for any deviation out of the defined standards.
3. Integrate your organization’s main communications channel to make alerts accessible to all users and encourage discussion.
Production Performance Preview
By this step, you have set up a centralized overview of your different models and can see their predictions — you have clear visibility into their performance and an automated alert system in case issues arise. The next piece of the Responsible AI puzzle is setting up a cadence for having a meeting with the relevant stakeholders to review the results and the performance of the model.
You should aim for a weekly or bi-weekly meeting with:
- Product managers
- Internal ML clients, if any
- Data scientists
- Engineering teams
These meetings will raise questions about goals, performance and other key metrics, and they will also help define ownership for action items derived from the meeting. To ensure progress is being made, the next meeting should start by reviewing the action items from the previous one, making sure that all the boxes are checked and that the tasks are fulfilled. Over time, you will see clear and consistent improvements.
Implementing Production Performance:
1. Set a weekly or bi-weekly meeting with relevant stakeholders to evaluate the production performance of the model.
2. Write an agenda for that meeting so it will be effective. Make sure to do the following:
1. Review action items from the previous meeting.
2. Review actual performance metrics vs. expected KPIs.
3. Drill down into unexpected performance gaps and derive action items — what should be done, who owns the task and what’s the timeline to complete the task.
3. Review the previous meeting’s action items, current performance vs. business goals, list open questions, and set action items and owners for them.
Finally, it’s important to define a workflow for handling different events — drifts, performance degradation and potential bias. When an incident arises, chaos tends to assume control, and the stress of it can delay or confuse the response. Who’s responsible? Is it the data scientist? Is it the data engineer or the software engineer? What are we doing with the customers? Defining clear ownership and roles for incident-response ensures everyone knows what to do and avoids preventable, catastrophic failure.
Before we dive into the details, let’s clear up the difference between Response and Remediation. Both are equally important steps of incident reaction and for fixing your troubled model. Response is the immediate action taken once an issue has surfaced. This means that these are the measures you take to put out the fire. They may include short-term fixes and patches that will facilitate business continuity, allow the team to investigate the root cause of the incident more thoroughly, and ultimately decide the best path for remediation. Remediation is the long-term solution that needs to be implemented once the investigation has concluded in order to ensure that the same issue doesn’t start any new fires.
Now that everyone knows their roles, it’s useful to create a decision tree so that everyone understands how different events should be handled. The predefined decision tree should determine the correct response workflow such that if an alert goes off at two in the morning, everyone from the on-call engineer to the data scientist knows exactly what to do based on the characteristics of the incident.
This leads to the next step — setting up fallback mechanisms, which could mean reverting to a previous model version that was operating as you intended. Taking the HITL route and getting a human to review the incident in the meantime is an option as well as activating a non-ML-based algorithm or heuristics that was used before deploying the ML model. With these fallback mechanisms in place, you should be ready for any scenario so you don’t need to consider halting production.
Lastly, after the incident has been alerted and handled, the investigation takes priority. You want to summarize the incident, learn from it and explain it to the relevant stakeholders. This is how organizations can better themselves and have improved processes and practices in place for the next incident.
Implementing Incident-Response Workflow:
- Define a workflow for handling different ML events — what is the question flow and decision tree that the first responder should follow? Who responds first and how? What questions come up during and after the event?
- Be objective when defining urgency.
- Decide which fallback mechanisms should be activated in each scenario.
- Summarize the incident and learn how to improve your response for the next incident.
As machine learning models gain more traction for business applications, and governments and regulatory agencies call for more responsibility and regulations, it’s crucial to implement Responsible AI practices into your ML workflow. Implementing these four quick wins will enable you to achieve your AI goals and ensure that your organization is adopting AI efficiently and responsibly.