Modal Title
Data / Data Science

What I Learned at Neo4j’s NODES 22 Conference

The event showcased some of the latest innovations in data science made possible by data graph tools, and explored use cases including transportation companies and journalism.
Dec 8th, 2022 12:51pm by
Featued image for: What I Learned at Neo4j’s NODES 22 Conference

Let’s just say there is a lot happening in data science — and in particular, data science assisted by artificial intelligence/machine learning — these days. It already is having profound effects on computing real-world applications in a number of areas, spanning the sciences, economics, industrial applications, and health care.

In many ways, we are just on the cusp of learning what’s possible. Gartner predicts that by 2025, graph technologies will be used in 80% of data and analytics innovations, up from 10% in 2021, facilitating rapid decision-making across the organization on a business level.

In parallel, the tools used for data science — and more specifically, data visualization and graphical representation — are fueling these advances.

Screenshot of Luke Gannon giving a presentation at the virtual NODES 22 event

Graph data science is when you want to answer questions, not just with your data, but also the connections between the data points,” said Luke Gannon, product manager at Neo4j, said during his talk at NODES 22, Neo4j’s annual developers’ virtual conference, held in November.

“And this is really important because when you have these connections, they allow you to answer new questions, like who is the most important, or what’s the best choice for one thing or what might happen next?’”

Graph data science is but one of the more exciting subjects that fall under the data science umbrella. Graph databases, data visualization, AI/ML pipelines and applications were among the use cases covered during NODES 22. The conference also served as a launch party of sorts for Neo4j 5, a major new release of the company’s signature technology.

Here’s what I learned at NODES 22 about what’s possible from data graphs and data science.

Social Connections: More Powerful Than You Think

Many of the data use cases described during the conference show how ML, applied to analysis of data involving human subjects, illuminates how interactions in a network can have major influences on human behavior. The term “social contagion” aptly applies in these cases, showing how the direct and indirect actions between social connections can alter not only individual human behavior but change the general behavior of entire groups.

The ability to draw inferences and patterns based on relationship data in groups’ networks may have either noble or nefarious purposes. Point-of-contact data was particularly useful to help trace the source of contagion for Covid-19 patients during the earlier stages of the pandemic.

On the other hand, Russian government-backed groups often use bots to influence behavior by targeting people in the U.S. and Europe who are susceptible to propaganda — and who might have strong influence over others in their network.

Most use cases usually fall somewhere in between these extremes of benevolence and malice, such as using data graphs to pinpoint those in a social network who have the most influence over others to purchase products.

“Human beings are embedded in social networks. These networks obey very particular biological, psychological, sociological and mathematical principles. And taking this into account offers us tremendous opportunities to gain new insights into behaviors and also to change,” said Nicholas A. Christakis, a Yale University professor and freelance scientific adviser, during a  keynote speech.

“We can use an understanding of social network structure and function for good to intervene in both online and offline worlds in order to enhance our health and our well-being, our public policy and our business.”

Taking a closer look, through data science, at how humans and their networks are embedded involves a shift in focus to the “externalities of intervention,” Christakis said. “It engages us in the exploration of how it is that when we intervene in a group, how we affect not just the people that we target, but also all the other people around them.”

Using data graphs and analysis and visualization tools, Christakis has uncovered some often startling results about how human behavior affects others who are not directly connected to them.

Indeed, he said, “one of the most bizarre results that has come out of my lab in the last few years” was after he used simple artificial linear networks to trace back the sequence of interactions that individuals had. The effects of altruism and how they affect both direct and indirect social connections were analyzed in the study.

“What we were able to find is that this kind of altruistic effect could spread from person to person,” he said.

Presentation slide titled "Spread Across People and Time." it shows the concept of "social contagion," how ideas move among people in a group.

How Jay treats Brecken depends on how Eleni treats Lucas, although neither Jay nor Brecken ever saw or interacted with a linear Lucas.

Social contagion can apply to any social setting. How two members of the conference attendees treat each other may depend on how two other members of the audience treat each other, even though neither pair ever interacted with any other member of the other pair, Christakis said.

“This is experimental documentation of social contagion,” he said.

App Interfaces Can Be Simple but Powerful

The rapid adoption of data science and in particular data visualization and data graph tools is largely attributed to not only their power but their simplicity of use. Neo4j has capitalized on this trend by supporting projects outside the traditional sphere of data science, for applications beyond the IT sector.

One case in point: A group of journalists used data science tools such as Neo4j to trace connections of more than 400,000 individuals connected with secret offshore accounts. Called the “Paradise Papers,” Frederik Obermaier and Bastian Obermayer, both German reporters, wrote the project for the Süddeutsche Zeitung newspaper.

A sample of the data queries conducted by the "Paradise Papers" reporters.

A sample of the data queries conducted by the “Paradise Papers” reporters.

The journalists’ use of Neo4j graph databases allowed them to better visualize and analyze connections between individuals and organizations, such as hidden offshore banks and companies. These data points can now be accessed with just a few command lines in a more intuitive and intelligent way than a SQL, NoSQL or other kinds of databases would have provided.

NODES 22 showed through demos and talks how Neo4j continues to improve, through the use of its platform and tools, data graphs and visualizations how to achieve graph-data calculations and visualizations in a scalable way.

By leveraging machine learning, Neo4j can now be used to run different algorithms and processes involving billions of nodes and relationships. The ML pipelines can be integrated with Python and other ML frameworks, while different sets of databases can be integrated into a single data visualization panel.

presentation slide for NODES 22 conference titled "Boring it is!" showing an ordinary data graph made with Neo4j tools.

Neo4j’s capabilities, such as data inferences and governance, may be innovative, but graph and other data visualizations should be simple to use. The graph data of Neo4J’s deployment by J.B. Hunt Transport Services, a supply chain and transportation services provider, could be described as “boring,” acknowledged Donovan Bergin, a technical solutions architect at J.B. Hunt, during his talk.

But that’s exactly what his company sometimes needs, he added: “We built a boring graph: You just got your equipment. You’ve got operations and other stuff that I’m not allowed to talk about today.”

J.B. Hunt uses the graph data to monitor equipment, with telemetry and sensor readings for location tracking, device alerts if voltages are too low, and sensor reading if temperatures are too high or too low or if “we’re going in the wrong direction,” Bergin said.

Other data provided can include visualization of different hubs or nodes connections by rail or other links. Data science can be used to, among other things, determine and predict how critical different nodes are for logistics.

Neo4J Has Made Significant Improvements

Neo4j’s major release of Neo4j 5 offers a number of new features, such as allowing users to integrate multiple data graphs and improvements to scalability and flexibility. Improvements have been made to drivers, query functionality and graphs using Neo4J’s graphic query language Cipher and other improvements focused on indexing. The latest version of the platform was also designed to make it easier to run and manage Neo4j clusters.

The autonomous clustering functionality is “perhaps one of the most sophisticated clustering architectures in the database,” according to Stu Moore, product manager at Neo4j, by enabling elasticity within a cluster.

“The key kind of innovation and change within string technology has been that you no longer need to run a copy of the database on every single server within the cluster,” Moore said.

Another key feature is how server-side routing is turned on by default in Neo4j 5 for the use of load balancers and other network technologies on the cloud. With it, queries are internally routed to appropriate database management servers.

Stu Moore of Neo4j presents a slide at NODES 22 conference titled "Scalability and Availability"

John Stegeman of Neo4j presents a slide at the NODES 22 conference titled "Neo4j 5 Continuous Support Model"

Neo4j 5 represents a new release model for the graph database management provider. Previously, each new release was followed by incremental bug fixes or security patches that represented subsequent versions of the main release.

However, the new release model “is really going to be more like what you would expect from a cloud-first vendor that releases software in a continuous fashion,” said John Stegeman, graph database product specialist, during his conference talk.

The new release model will follow that of Neo’s managed-cloud platform, AuraDB. “This is always what we’ve done with our Aura product, as new features of 5.1, 5.2, 5.3, etc. are coming more or less on a continuous basis over time,” Stegeman said.

“What’s changing with Neo4j 5 is we’re bringing the experience to the self-hosted community:  people who are running Neo4j in their own data center, or are self-managing their own deployments on the cloud vendors,” Stegeman said.

Neo4j 5 will no longer support entry and exit (B-Tree) indexes for database queries, which have been replaced with more elegant and nimble range and point indexes. This represents a “significant” change in Neo4j 5, Stegeman said.

Neo4j 5 is intended to serve as a point of departure for the company, as it seeks to improve and keep up with the scaling and performance demands required for many of the use cases described at NODES 22.

Said Moore during his talk, “This is really an exciting release for us — not just thanks to the features that we’re delivering — but because this is the first time we’ve released our entire Neo4j product platform at the same time.”

Data Science Is Beautiful and Scary

Anyone remotely familiar with data science these days knows how it has transformed computing and applications in a number of sectors. But, in many ways, what this conference conveyed was not so much what was said but what was left unsaid.

Christakis noted how inferences and analyses used to affect social connections and influences can be used for the good of society — without delving into how these very high-powered ML-assisted applications can also be used for nefarious purposes.

Also, as the conference talks reflected, data science should have an even more profound effect on a number of industries in the future than it has in the past. It will certainly be exciting to see what comes next.

Where to Go from Here

Group Created with Sketch.
TNS owner Insight Partners is an investor in: Pragma, The New Stack.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.