Machine Learning

MIT Algorithm Sniffs Out Sites Dedicated to ‘Fake News’

15 Nov 2018 3:00pm, by

The rise of the Internet has empowered ordinary people to report newsworthy events as they happen, spreading information through blogs or social media. The ongoing democratization of information means people can easily access alternatives to large media outlets, but the flip side is that it’s now also possible to generate and disseminate all kinds of disinformation — from digitally altered images or videos to websites or self-serving politicians promulgating wild conspiracy theories. Basically, it’s getting harder and harder to tell what’s true and what’s not.

Thankfully, there are a number of fact-checking websites that serve as a vital counterpoint to this growing problem of “fake news.” But like an uncontrollable wildfire, it seems that false information can spread much more quickly and has the potential to cause much more damage in real life, even if it is eventually debunked (case in point is the bizarre #Pizzagate conspiracy theory).

But what if there was a way to automatically detect fake news at its source, rather than painstakingly picking out individual untruths to remedy? That was the goal of a team of researchers from MIT’s Computer Science and Artificial Intelligence Lab (CSAIL) and the Qatar Computing Research Institute (QCRI), who recently developed a new machine learning algorithm that is capable of determining whether an entire news site — and not just an individual article — is routinely presenting accurate or false information. The idea is to create a tool that can automatically identify fake news at its root,before it spreads virally.

“If a website has published fake news before, there’s a good chance they’ll do it again,” explained Ramy Baly, the study’s lead author in a press relsase. “By automatically scraping data about these sites, the hope is that our system can help figure out which ones are likely to do it in the first place.”

Analyzing Whole Sites, Not Just One Article

According to the team, the algorithm more broadly focuses on evaluating the source’s reliability, compared to other AI models that might tackle other aspects of veracity, such as checking facts or identifying trolls who manipulate opinions surrounding an issue. To create this new system, the team used data gleaned from the website Media Bias/Fact Check (MBFC), which uses human researchers to manually assess the accuracy of a variety of news sources, from big media outlets to smaller operations.

The team ran MBFC’s data through their machine learning algorithm, and trained it to classify over 1,000 news sites according to the textual, syntactic and semantic analysis of site content, focusing on features like structure, sentiment, engagement, topic, complexity, bias, and morality. According to the team, the new system only needs to analyze about 150 articles from a particular site in order to reliably evaluate whether a news source is trustworthy or not.

For instance, more reliable sites will often use neutral language in their reporting, whereas fake news sites will use language that is more provocative and exaggerated — words like “crisis actor,” “hoax,” and “demonic.” Even the Wikipedia pages, Twitter feeds and URLs of sites were analyzed: longer Wikipedia entries about a news site translated to more credibility, while URLs containing special characters or weird subdirectories were considered less reputable.

The team notes that about 65 percent of the time, the algorithm was able to accurately determine whether a certain news source has a low, medium or high level of factuality. In addition, about 70 percent of the time, the machine learning model was able to correctly categorize news sites as either sitting on left, center or right of the political spectrum. While it’s not necessarily as accurate as a human performing the same task, one can imagine that when the algorithm is deployed in conjunction with human fact-checkers, the monumental chore of exposing false news becomes more automated, and therefore, much easier.

Though the team is now working on refining the algorithm further to include news sites in other languages and other kinds of biases beyond the typical left-versus-right framework, their research has nevertheless resulted in building what is currently the world’s largest database containing the factuality and bias ratings of over 1,000 news sources. In the future, the team is aiming to potentially develop an app that would serve up factual news items from different parts of the political spectrum for users to read — hopefully, expanding views on a particular issue beyond one’s familiar and closely held biases, and creating a possible antidote to an increasing political polarization in societies around the world.

To find out more, read the team’s paper and database.

Images: rawpixel on Unsplash; MIT/QCRI