Our world is defined by complex networks — whether that’s through the interconnections of species in an ecosystem or those in an online social media network. Nowadays, analyzing such complex networks with AI models and algorithms forms the backbone of many useful tools like search engines and recommendation systems. But do these machine learning methods actually work as well as their designers intended? The answer seems to be no, according to one recent study which demonstrates that some of these widely used AI models cannot accurately capture the real-world complexities found in such networks.
In particular, the machine learning technique known as low-dimensional embeddings was the focus of the research paper, which was recently published in the Proceedings of the National Academy of Sciences. Embeddings and other similar machine learning techniques are used to generate predictions by analyzing the relationships found in any diverse number of complex networks, ranging from biology, epidemiology and social media.
“Low-dimensional embeddings are a popular machine learning technique that converts ‘nodes’ — or people in a social network — into a shortlist of numbers or ‘vectors’,” explained the paper’s lead author C. “Sesh” Seshadhri, an associate professor of computer science and engineering at University of California, Santa Cruz.
“These numbers are then used to predict what a person might buy, what content to recommend, and who might be friends with whom. However, our paper argues that these methods do not work as well as people have thought,” he said.
Community Structures Lost
While newer and better embedding methods are constantly being developed, the researchers contend that these techniques all demonstrate the same deficiencies. Notably, they aren’t that all that great in representing adequately some key facets of complex networks, such as sets of tightly connected nodes, known in network science parlance as “community structures.” In the real world, community structures might be found in groups of people that share common interests.
“We discovered that low-dimensional embeddings often lose the community structure of social networks, which is probably the most important feature,” said Seshadhri. “Meaning that in a social network, the people or nodes are organized into a large number of tightly-knit communities. This information is lost in many low-dimensional embedding methods. We mathematically prove that, for some classes of low-dimensional embeddings, such a loss is inevitable.”
Embedding techniques convert a person’s position in a social network into a set of coordinates for a point in a geometric space, which can then generate a “low-dimensional” or a shortlist of numbers that can then be used as input for an algorithm. As the researchers show in their paper, however, important information is lost because this geometric approach disregards the larger social network after the conversion is done, and makes predictions based only on the relationships between these points in space. Using mathematical proofs and also testing different embedding techniques on various networks, the researchers demonstrate that important structural qualities of these networks are elided by using such methods.
For instance, one structural feature that isn’t well-represented by embedding techniques is the density of triangles, which exemplify the connections between three people. The presence of many triangles in a social network indicates an abundance of community structure. However, many of these data-rich triangles disappear in the embedding process.
“We suspected that low-dimensional embeddings couldn’t perform well in capturing community structure, but we were surprised at how poorly they performed,” said Seshadhri.
The team’s findings effectively downplay the view that these embeddings are powerful machine learning methods, as the small list of numbers that these techniques generate cannot possibly fully express the inherent complexity of social networks. Of course, embeddings aren’t the only method that could be used for analysis and prediction, and the team’s work suggests that it may be a better idea to use a variety of machine learning methods, rather than just relying on one technique, and to examine the assumptions that people might have about certain machine learning methods.
“I think work such as this is useful to practitioners, who will get a more complete picture of machine learning techniques,” noted Seshadhri. “Often, these techniques are highly mathematical and involve complicated code. It is hard for a non-expert to understand what works and what doesn’t. So naturally, people resort to looking at common research trends and follow the popular techniques. Ultimately, we need more work on investigating whether these techniques can truly deliver what they promise.”
Read more over at Proceedings of the National Academy of Sciences.
Feature image by Alina Grubnyak via Unsplash.
The New Stack is a wholly owned subsidiary of Insight Partners. TNS owner Insight Partners is an investor in the following companies: Real.