The Hidden Biases that Can Lurk in Artificial Intelligence Systems
Artificial intelligence and “automated decision-making” systems are being implemented in many crucial real-world settings, cutting humans out of the decisions that these systems are making. But how sure are we that the decisions they arrive at are fundamentally correct?
Researchers are finding a startling number cases of human bias that have crept into AI models and decision-making algorithms, which in turn can unwittingly lead to discriminatory practices on the part of companies and governments deploying these systems.
“I think we are kind of rushing into the world of tomorrow with big-data risk assessment, without properly vetting, studying and ensuring that we minimize a lot of these potential biases in the data,” Ezekiel Edwards, director of the ACLU’s Criminal Law Reform Project, told the New York Times last year.
This week at a conference at Massachusetts Institute of Technology, the American Civil Liberties Union and data scientists announced a new ongoing effort to highlight examples of algorithmic bias, working with a team led by Kate Crawford, a researcher at Microsoft, and Meredith Whittaker, a researcher at Google, called the “AI Now” initiative. September saw the launch of the Partnership on AI to Benefit People and Society — a nonprofit backed by Amazon, Google, Facebook, IBM, Microsoft, and Apple, “to advance public understanding … and formulate best practices on the challenges and opportunities within the field.”
In November MIT’s Technology Review reported that “computers are inheriting gender bias implanted in language data sets.”
Massive training datasets that are widely used for everything from chat-bots to image-captioning systems “considered the word ‘programmer’ closer to the word ‘man’ than ‘woman,’ and that the most similar word for ‘woman’ is ‘homemaker’,” they report, citing research conducted by James Zou of Microsoft Research New England. Speaking to the results, Harvard professor Barbara Grosz warned about a society that’s “trying to change the future to be not like the past…there’s an ethical question about whether we’re inhibiting the very evolution that we want.
“It’s not that you can avoid all these kinds of bias, but we need to be mindful in our design, and we need to be mindful about what we claim about our programs and their results.”
There have also been allegations of another kind of bias in the risk-assessment algorithms that are widely-used in the criminal justice system. “The questions on risk assessments never ask directly about race or income,” notes a 2015 report on Nate Silver’s FiveThirtyEight site. “But the answers can end up being proxies for race and class anyway.” They offer an interactive tool that lets reader play with the data themselves. “For example, if you’re a man without a high school diploma, you’re more likely to be poor and black or Hispanic. Same if you’re single and don’t have a job.”
And for years Columbia law professor Bernard E. Harcourt has been warning that a popular risk-assessment question — past convictions — has the same effect. “Prior criminality has become a proxy for race,” he wrote in a 2010 paper.
By 2014, U.S. Attorney General Eric Holder had his own related concerns about the impact of algorithms on the sentences for minorities and the socially disadvantaged — this time citing yet another kind of data that might skew results. Algorithms using education levels and employment history could benefit “those on the white collar side who may have advanced degrees and who may have done greater societal harm — if you pull back a little bit — than somebody who has not completed a master’s degree, doesn’t have a law degree, is not a doctor…
“I’m really concerned that this could lead us back to a place we don’t want to go.”
Even by 2014, Time was reporting that “Virtually every state has used such risk assessments to varying degrees over the past decade, and many have made them mandatory for sentencing and corrections as a way to reduce soaring prison populations, cut recidivism and save money.”
In 2016, ProPublica — a Pulitzer Prize-winning nonprofit site for investigative journalism — checked the accuracy of “risk assessment” algorithms being used for sentencing in Florida’s criminal justice system, cross-checking each of its predictions against the actual outcome for 7,000 convicts over a period of two years in Broward County. It found a startling bias against black criminals, compared to white ones.
“The formula was particularly likely to falsely flag black defendants as future criminals, wrongly labeling them this way at almost twice the rate as white defendants. White defendants were mislabeled as low risk more often than black defendants.”
Even when they factored out past crimes and convictions, “Black defendants were still 77 percent more likely to be pegged as at higher risk of committing a future violent crime and 45 percent more likely to be predicted to commit a future crime of any kind. ”
ProPublica concludes that’s the biggest issue with risk-assessment algorithms: their effectiveness just hasn’t been studied. One 2013 study examined 19 different risk-assessment methodologies used in the U.S. and concluded “in most cases, validity had only been examined in one or two studies” and that “frequently, those investigations were completed by the same people who developed the instrument.
For its analysis, ProPublica used one of the most widely-used risk assessment algorithms, from a for-profit company named Northpointe, which “disputes our analysis.”
ProPublica is also quick to point out that it’s a closed-source software, “so it is not possible for either defendants or the public to see what might be driving the disparity.” Northpointe acknowledged to the site that they were using educational history and whether the defendant was currently employed as part of their formula, but insisted that their specific calculations were proprietary.
It’s not the first time that closed-source algorithms come under fire.
Sometimes the allegations of bias aren’t about gender or race. Sometimes the biases can just be weirdly geographical.
In June the ACLU reported on a case involving 4,000 Medicaid recipients in Idaho who had suddenly found their benefits being cut 20 to 30 percent based on their answers to an assessment questionnaire. When asked why they were told, “it’s a trade secret.”
Richard Eppink, the ACLU’s legal director in Idaho, summarized their legal team’s response: “You can’t just be coming up with these numbers using a secret formula.” A judge agreed — citing a basic issue of due process (as well as the specifics of the Medicaid Act, which requires explanations). “That was five years ago…”
The ACLU eventually got the algorithm, and their analysis concluded that it was based on a dataset that had discarded two-thirds of the records because of “data entry errors and data that didn’t make sense.” Their analysis concluded that the final formula was statistically flawed, too. But it also appeared to be disproportionately (and inexplicably) impacting different parts of the state.
“Last year the court held that the formula itself was so bad that it was unconstitutional — violated due process — because it was effectively producing arbitrary results for a large number of people.”
The ACLU’s blog sees this as a teaching moment. “As our technological train hurtles down the tracks, we need policymakers at the federal, state, and local level who have a good understanding of the pitfalls involved in using computers to make decisions that affect people’s lives.”
It’s a concern that’s echoed by Cathy O’Neil, the New York data scientist who authored the best-selling book “Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy.” Last week she told MIT’s Technology Review that algorithms “replace human processes, but they’re not held to the same standards. People trust them too much.”
The ACLU’s Eppink also notes “this bias we all have for computerized results — we don’t question them…. My hunch is that this kind of thing is happening a lot across the United States and across the world as people move to these computerized systems. Nobody understands them, they think that somebody else does—but in the end, we trust them.”
And he also worries about how motivated private businesses will be to test the accuracy of their systems — “unless the cost is internalized on them through litigation.”
Kristian Hammond, a computer science professor at Northwestern University, argues in a paper that most systems have biases, which can creep in from multiple sources — most of which go undetected. This can include the training data — or even the unconscious bias of human “mentors” for AI systems. But her article ends on a positive note.
“By understanding the bias themselves and the source of the problems, we can actively design systems to avoid them.”
- How Airbnb is using AI to get you into other people’s homes.
- Former Mozilla CTO evaluates more data on Firefox’s decline.
- Augmented reality makes classic artworks come to life to mock you for taking selfies.
- CEO applauded for encouraging developers to take off mental health days.
- Can big tech giants solve the traffic and housing problems they create?
- Can you reverse-engineer success? Lessons learned crunching data from a website about entrepreneurs.
- The business case for kindness gets re-visited on Hacker News.
- Remembering Netflix’s competition to find the best movie-recommending algorithms.
- Stanford’s map of the world’s physical activity levels.
- America’s first code boot camp closes.
- Preserving Ted Nelson’s historic junk mail.
- The robot security guard that walked into a fountain.
Feature image by Andrew Worley via Unsplash.