Machine Learning

The Moral Choice Machine: An AI that Learns ‘Right’ from ‘Wrong’

8 Jun 2020 11:35am, by

Can machines learn to distinguish between right and wrong? We already know that potentially harmful human biases about gender or race can hide in AI algorithms, so it makes sense that machines can also adopt potentially positive human moral biases — that it’s okay to “kill time,” but wrong to kill people. Such efforts to de-bias AI are necessary as it becomes more prevalent and powerful in our daily lives, in order to avoid the proliferation of algorithmic bias — and by extension, what some experts call “killer AI” — artificially intelligent machines that can kill autonomously.

By using historical and present-day texts and articles, researchers from the Darmstadt University of Technology in Germany have trained an AI that uses human-like moral reasoning to determine whether an action is right, or wrong, using human morality as its referential compass. Dubbed the Moral Choice Machine, the researchers used books sourced from the sixteenth to nineteenth centuries, news articles from 1980s, 1990s and the mid-2000s, constitutional documents of different countries, as well as various religious works to train their AI system. Utilizing sentence embeddings — a machine learning technique where sentences are represented as real-valued vectors in a predefined vector space — the team generated a general set of positive and negative associations — a contextually derived list of do’s and don’ts — which were used to train the model.

“These vector representations carry relational information between different words and sentences,” explained Dr Cigdem Turan, co-author of the study, which was published in Frontiers in Artificial Intelligence. “You could think of it as learning a world map. Two cities are close on a map, when their distance is small. So what could the distance of words be? The idea is to make two words lie closely on the map, if they are used often together. This goes back to an idea in linguistics and philosophy that the meaning of language derives from its usage. So, while ‘kill’ and ‘murder’ would be two adjacent cities, ‘love’ would be a city far away from those two cities. Thus, the distance between two words in this learned representation, i.e. cities in the world map, shows us the semantic similarity of those two words.”


By using this framework, the model to able to learn about the associations between different words and sentences, thus allowing it to then make moral conclusions on its own.

Thus, if “we query the distances of the sentence ‘Should I kill?’, we expect that ‘No, you shouldn’t’ would be closer to the given query than ‘Yes, you should,’ ” added Turan. “In this way, we can query any question and calculate a moral bias. We showed in this paper that the trained model can differentiate between contextual information provided in a question. For instance, no, you should not kill people, but it is fine to kill time. It is good to eat fruits, but not dirt.”

Changing Morals

Interestingly, the team also found that the time period of the training texts also influenced the moral decisions of their model. For example, the moral bias that the model extracts from news articles published between 1987 and 1996-97 suggests that marrying and becoming a good parent has very positive associations, while the conclusions it draws from news articles produced between 2008-09 place a more positive association on going to work and school instead. In addition, the model also distinguished that different types of text will elevate the moral virtue of one action over another — but most importantly, it was also able to determine that there are moral absolutes — such as the negative associations of murder.

“One can see that go to church is one of the most positive actions in the religious and constitution text sources,” wrote the team. “All text sources reflect that [killing] people and [stealing] money is extreme negative. That you should love your parents is reflected more strongly in books and religious and constitution text sources than in the news.”

The notion that AI can inherit the positive biases of human morality is encouraging, as it means that machines can be trained to make human-like choices between wrong and right. In a complicated world where AI is being increasingly used to make life-changing decisions in the justice, financial and healthcare systems, we need AI that can make morally appropriate, trustworthy and just decisions.

The team now hopes to further study the relationship between tools that can “de-bias” AI that has been inadvertently been programmed with harmful biases, and how that would affect its artificial moral compass, and the reasoning behind such determinations, as experts still don’t fully understand why AI makes the choices that it does (otherwise known as AI’s “black box problem”).

“The moral bias depends on the data, but also on the language model, and task at hand,” said Turan. “Therefore, the Moral Choice Machine is also inheriting the flaws and limitations of the underlying model. However, the capabilities of language models, and AI systems in general, are increasing rapidly. One should investigate how such improvements influence the moral compass. Are these models able to make even more complex decisions?”

Read more in the team’s paper, and see the code over on GitHub.

Feature image by mohamed Hassan from Pixabay; Others courtesy of  Darmstadt University of Technology.

A newsletter digest of the week’s most important stories & analyses.