The 3 Underrated Strengths of a Native Graph Database
The real problems a graph database solves for you are the big ones: Where are the faults in the system? Which patterns of transactions are likely indicators of malicious intent? Which combinations of treatments are the best treatments for the most inexplicable diseases?
When you build connections into your data, your data reveals connections when you need them most.
It’s not just about visualizing circles and arrows. A visualization add-on is not a native graph database, like Neo4j. A native graph tool explains relationships, and all the concepts that sprout forth from relationships (such as relevance, integrity, probability, reliability, deviation, vulnerability) to the database. This way, the information from those relationships can be located, analyzed and then explained to you, by the database.
A Matter of Perception
Humans generally look at their world, then process the elements they see in terms of interconnectedness, Michael Hunger, senior director of user innovation at graph database producer Neo4j, told The New Stack.
But when it comes time for these individuals to represent these same elements with a relational database, Hunger added, “the problem is, as soon as they get to their technology — their database — they need to forget all that interconnectedness.”
Relationships can be built into relational databases, but with considerable effort, plus a gargantuan amount of memory and storage — resources that come with giant price tags when they’re delivered by cloud platforms.
So to save time and money, organizations tend to downplay their need for understanding interconnectedness. Later, when they need intensive analysis, they rely on add-ons and extensions that can only deduce information from the surface levels of database schemas.
“It is not just that finance companies need to understand fraud detection, or that companies need to be able to see whether or not an employee is going to quit next month,” said Bradley Shimmin, chief analyst for artificial intelligence platforms, analytics and data management at Omdia, a technology research firm.
“What graph analysis is, is the ability to look across dimensions, to add context and meanings to data that tabular data just doesn’t know about.”1
What Is a Native Graph Database?
In the traditional relational database model, data is distributed over multiple tables, linked by keys. Running a SQL query typically means joining several tables, and multiple index lookups.
By comparison, in a Neo4j property graph model, the emphasis is on the relationships between data elements. These elements are stored as “nodes,” whose properties may be represented by any number of key/value pairs.
Nodes can be connected by any number of relationships. Both nodes and relationships may have “properties.” In a graph model, the columns or rows in a relational database table are represented as properties of nodes.
With a native graph database like Neo4j’s, the entire stack is optimized around this data model, from the query language to the file store. A native graph database will be more efficient at analyzing these relationships, as they are baked into the model from the beginning.
It’s important to distinguish between a native graph database, and a graph layer that runs on top of a relational database. The latter serves up results as graphs, but still must do joins and other operations to pull together data from across the entire database. This leads to latency and excessive compute resource consumption, all of which become more pronounced as you scale up.
Developing a graph database does involve extra work during the creation process, Hunger acknowledged, “because you also have to insert these relationships.” The labor involved pays off later, he asserted, allowing you to more easily deduce connections between the data.
The payoff comes by way of three underappreciated advantages:
1. Traversability: Making Connections
Graph databases enable you to “follow” relationships. “You get a big benefit,” noted Hunger, “when you read the data and you want to follow through, or traverse, the relationships.”
William Lyon, a developer relations engineer at Neo4j, pointed out a feature called “index-free adjacency, which is very specific to graph databases. It basically means that you can traverse from one node to any other node without going to do an index lookup.”
This traversability doesn’t just make it easier to track the relationships between data. It also reduces the compute required to perform a query, and thus the amount of human effort required to manage the compute process.
With a relational database, Hunger said, “you have to compute what goes together, and then you have to have some way of merging them together.” This means more joins or hops. The larger the amount of data in the database, the more joins and hops are inevitably required.
By comparison, with a graph database, Hunger said, “it doesn’t matter if I have eight billion people in my database. If I’m focused on Joe, I’m only interested in Joe’s relationships. I can completely ignore the others.”
This also makes graph databases particularly useful for analyzing hierarchical data, he added, such as of a company with a 200,000-strong workforce and tens of layers of management.
Thanks to traversability, Neo4j’s Cypher query language reveals a three-layer-deep relationship in four lines, rather than a complex script.
“Doing operations on this sort of large and complex hierarchy in a relational database means hundreds of thousands of self-joins: joining the person or the employee table, with a manager, with their manager, or with the manager for each of the people designated.”
With a graph database, it’s simply a question of following relationships up and down the hierarchy, Hunger said.
The graph database format is also particularly attractive to data scientists — who, he argued, “don’t like working with relational databases because there’s a strict schema that’s imposed. You have to define the schema of your data and impose that up front before you start working.”
The graph approach is quicker and more intuitive by comparison, making it easier for you to evolve a data model iteratively.
2. Interpretability: A Portable Whiteboard
If humans see the world in terms of relationships and connections, databases should be capable of deducing what relationships and connections mean.
Think of this principle as the “whiteboard friendliness” of the outputs — of the results you’re trying to find. When research physicians are communicating results to colleagues, typically they’re represented in Excel sheets, or some tabular format. At some base level, these formats equate to complete databases.
A graph, meanwhile, presents an immediately informational result to human beings. “You can take this rich model that you have on your whiteboard and put it into the database, because the relationships and entities are first-class citizens,” said Hunger, who also noted, “you don’t lose all the business people.”
Recommendation systems are a core use case for graph databases, said Lyon: “If you’re able to say, ‘We recommend this book to you, because it has similar elements of other books that you’ve rated highly,’ that’s a much more valuable recommendation.”
Network characteristics, added Hunger, enable you to deduce levels of influence and impact that some nodes may have upon others: “What are the clusters? Who is most influential? Who connects clusters? Who is essential?”
It’s a level of analysis that may only be accomplished with relational databases through very sophisticated processes of aggregation and so-called “knowledge discovery iteration” — all of which make the SQL cross-joining marathon depicted previously look more like a 100-yard sprint.
Indiana University researchers seeking patterns in the genetic relationships among the various causes of Parkinson’s disease, chose Neo4j as their native graph database for some very profound reasons. Among them: The pattern returned by a graph query is itself representable as a graph.
From their perspective, not just the database itself is represented as a knowledge graph, but the outputs as well.
3. Fairness: Putting Raw Data in Context
High-quality results require high-quality data. Graph databases reveal more underappreciated strengths with respect to fairness and eliminating bias, particularly in the huge datasets used in machine learning and artificial intelligence.
Bias can come from two angles, Hunger said: skewed data being fed into algorithms, and bias from the researchers in terms of the questions they’re asking. With a graph model, potential bias can be highlighted and eliminated in the design phase.
Data scientists will naturally want to make sure their training data is representative and unbiased before being fed into their machine learning algorithms, Lyon said. Using a native graph database in the data preparation stage may help put the raw information in context, and head off the possibility of feeding bias into the model.
“So you look at the clustering of your graph and if you have just one big cluster, then you have a bias and you have a problem,” he said. “You need to go back and make sure you have nice, distributed clusters.”
As Hunger pointed out, neural networks themselves are graphs. “That means as you train a model, the neural network changes,” he said. “If you present a graph and make it accessible and variable and visualizable, then you can identify how the network is acquiring its weights.”
In other words, any given output generated from the model, can be easily tracked back to its origins. Traversability works both ways.
Do all these strengths mean the relational model’s days are numbered? Unlikely. But, noted Shimmin, graph databases are “a vital technology, and something that I very much want to see more deeply, entirely integrated into traditional analytical workloads.”
Where to Begin
- Get started free with Neo4j AuraDB native graph database
- Learn the basics of Neo4j AuraDB native graph in just one hour
- Register now for Neo4j NODES 2022 Online Developer Education Summit November 16