How Knowledge Graphs Make Data More Useful to Organizations
Knowledge graphs have emerged as an essential component underpinning the data science revolution. Machine learning and artificial intelligence (AI) can be used to draw inferences about relations between objects in what initially appears to be a disparate set of data points. In knowledge graphs, these inferences are visualized — with or without AI — and designed so the human mind can easily process them.
The inferences and analyses that knowledge graphs can make possible are typically unavailable in other types of data visualizations — and they can often reveal business insights.
For a supply chain, a knowledge graph can reveal complex distribution models. For example, they can clearly show the paths and connections between an inventory item and its use in constructing a car on the other side of the world.
But they’re not just good at mapping supply chains. With their ability to reveal the connections between different data points and sets, knowledge graphs can help support IT and DevOps, as well as prove useful in transportation, the hard sciences, health care, sociology, crime investigations, fintech, and numerous other industries.
The web of connections in a knowledge graph allows business users to make “broader conclusions about what’s going on in the real world,” Torsten Volk, an analyst for Enterprise Management Associates, told The New Stack.
For instance, he added, you could “predict what style of craft beer someone likes, based on seemingly unrelated data points showing details about this person’s job history, the cars this person has recently bought, and the party he or she is registered to vote for. “
Indeed, a knowledge graph can be thought of as “an augmented model view of a master data-management solution,” Jesús Barrasa, senior director, sales engineering EMEA for Neo4j, told The New Stack.
“So, you have information about your customers and about your products — the key entities in your business are represented as connected entities that become a graph. It’s a knowledge base, but in the form of a graph.”
Graph Tech on the Rise
Knowledge graphs also represent a major segment of the data graphs and visualizations landscape in general, a market that’s growing exponentially. By 2025, according to Gartner data, graph technologies will be used in 80% of data and analytics innovations, up from 10% in 2021, “facilitating rapid decision-making across the enterprise,” the analyst forecasted.
The human brain processes knowledge graphs particularly well because they contain real-life relationships and dependencies between entities in the real world and in abstract settings, such as code or scientific theories, said Volk.
“This takes an enormous amount of pressure off deep learning models, as we can now provide these relationships as input factors instead of making the model’s probabilistic algorithms figure out these relationships based on a ton of examples,” he said.
“This ability to drastically reduce the number of examples to train AI models is a big deal, and so is the fact that the graph database reveals and can partially explain causal relationships between seemingly disconnected data points in a simple manner.”
Still, AI can play a crucial role in knowledge graphics, especially when they’re used for modern intelligent business applications, and more specifically, process augmentation, Gartner wrote in an October 2021 report on emerging technologies and trends. AI technologies like knowledge graphs — as well as machine learning, decision intelligence, and explainable AI — offer more value to the business user by visualizing data more intelligently.
“In the future, process augmentation can be extended further to identify patterns of work, from which process models can be built and executed,” the Gartner report stated. “When processes or recommendations change due to AI, the business user responsible for the process and decision being taken must understand the reason for the changes — hence, the use of explainable AI.”
Why Do Knowledge Graphs Matter?
What is it about knowledge graphs that make them so unique? And why are they of such interest to data scientists, data analysts and the developer community?
To answer this question, we can begin with defining what a knowledge graph is. But while people generally recognize a knowledge graph when they see one, definitions vary.
For the purposes of this article, a knowledge graph is a visualization of the connected nature of different data sets.
As an example, check out this such as in this diagram of the connections between Harry Potter characters.
This graph can even become even more interesting by connecting these characters from the Harry Potter stories with objects from different data sources, Volk noted. These data sources might include spells, all the various potions name-checked in the Potter books, or all food and restaurants mentioned in them.
“This provides us with a unified data model that could instantly reveal actionable insights that would otherwise have remained hidden,” Volk said. “Connecting this model to data models from other ‘worlds’ would be a logical next step, as we could then have a deep learning model predict the behavior or characteristics of a wizard from ‘Game of Thrones’ or ‘Lord of the Rings,’ based on similarities and differences of wizards in these other worlds.”
In this way, a knowledge graph is built on a graph platform, on which it is created and deployed.
“You might find companies that call themselves ‘knowledge graph providers,’” said Barrasa. “My view is that a knowledge graph is an approach and more of a type of data solution than a product.”
Studying Data Patterns to Create Visualizations
It is simple to conceptualize the kind of data and kind of inferences that knowledge graphs offer. Through Facebook and LinkedIn, for example, a user might be directly connected to 100 people, while each of those users are connected to 200 more people. Some may belong to groups that the user shares and have more than 25 friends in common in each group.
A knowledge graph, through nodes, can illustrate how each of these people are connected.
“Facebook and LinkedIn, in this case, use AI algorithms to study patterns in the graph and use that to personalize your experience and create recommendations for you,” Barrasa said. “That’s exactly what Neo4j provides, by offering an environment where you can replicate these types of solutions and data products in a really straightforward and easy way.”
It is easy to get started creating and drawing inferences from a knowledge graph, as many possibilities exist for the beginner or citizen developer.
With Neo4j’s Sandbox, for example, it is possible to use the company’s Cypher language to visualize in knowledge graphs movies released after 2000. With it, you can limit the results to a specific number, such as five movies, while also visualizing the actor, producer and other connections to those movies. This data graph can be generated in just a few minutes on the Sandbox site.
Again, while the results of a knowledge graph are straightforward and accessible, the computing — and how Neo4j algorithms mine the data sets behind the scenes — are anything but, Barrasa said.
“A data set is pretty complex, but it looks simple in a knowledge graph because that’s the way we think before you put that to a table or create a SQL environment for the data sets,” he said. “But still, these kinds of indirect-connection analysis are sophisticated.”
Visualizing Complex Relationships
Moving beyond the very simple types of knowledge graphs that the citizen developer can create, Neo4j’s platforms are used in thousands of scenarios. They can be challenging to set up initially, given the complexities of scale and seemingly different data types and sources that are pulled together.
A couple of the more interesting and more complex projects were demoed during NODES 22, Neo4j’s annual developers’ virtual conference, held in November.
One particular presentation at NODES 22, by Alex Kaskasoli, an infrastructure and security engineer for DeepMind, spotlighted how insecure a GitOPs repository can be and how knowledge graphs can offer insights about compromised access to secrets and information about the attacker’s movements.
In the scenario Kaskasoli created for his presentation, an attacker used privileges from an admin named “Alice” to gain access to the secrets file in a GitHub repository. With the use of a GitHub token, a map was created, and the data was visualized with a Neo4j knowledge graph. This allowed the attack paths to be queried.
“We can see that we have nodes of different types and we’ve got labels here to help us identify what they are,” Kaskasoli said at the conference, pointing out details on the knowledge graph he created in the “Alice” example.
As part of an ambitious project involving a “longitudinal investigation” of sports rivalries with data from over 30,000 respondents across more than 60 sports leagues, two researchers demonstrated Neo4j queries and knowledge graph analysis at NODES ‘22.
The Know Rivalry research project connects disparate data sources and models sport leagues’ inconsistent hierarchical structures, according to presenters B. David Tyler, an associate professor at the University of Massachusetts Amherst, and Joe Cobbs, a professor and department chair at the Northern Kentucky University Haile College of Business.
In many ways, the knowledge graph data modeling is the springboard for a wealth of information and inferences that become accessible once the modeling is set up. “Modeling is where we spent so much of the time and was so hard, and what Neo4j did so well,” Tyler said. “But another thing that we can do really well is this database integration with other systems.”
Data sources included a number of databases and survey results as well as external data sources, such as Wikidata. “Let’s say we want to know the capacity for different venues,” Tyler said. “We can just run a query on that. We can get the different venues and their capacities and integrate that into our database.”
To take knowledge graphs for a spin and see how they might help your organization visualize the connections between data points and enhance your data analysis capabilities, check out Neo4j’s Sandbox.