Neo4j 5 Hits GA with Major Performance, Scalability Improvements

Graph database stalwart Neo4j is releasing the fifth major version of its eponymous graph database into general availability (GA), with major improvements in performance, scalability and operational agility. The company describes Neo4j 5 as delivering next-generation database technology with a focus on integrating multiple graphs’ data and providing the scalability and flexibility necessary to power such scenarios successfully. Ramanan Balakrishnan, Neo4j’s Senior Director, Product Marketing, talked to The New Stack, providing the skinny on many of the new goodies in version 5.
Performance: Too Much Ain’t Enough
Balakrishnan, who said “Neo4j 5 is very pivotal to us” began his explainer of the new version discussing performance, where things seem very intriguing indeed. Specifically, for so-called k-hop queries, involving a large number of node traversals (or “hops”) through a graph, Balakrishnan said performance has increased as much as 1000x.
Does that mean everyday queries are 1000 times faster? Probably not, but according to a technical blog post from Neo4j, when the number of hops (k) comes in at around 8, that kind of speedup can be realized, and the more hops, the better the performance increase. Balakrishnan attributes the performance gains to improved query plans and new, more expressive constructs in the Cypher query language, for disjunctions (OR), grouping (AND), negations (NOT) and repetition. That’s noteworthy in itself, given Cypher’s status as an open standard query language for graph databases.
Check out: TigerGraph Supports Graph Query Language openCypher
Scale Out and Federate
Moving on to scalability, Neo4j is introducing a very interesting technology called Autonomous Clustering. Essentially, this feature lets administrators avoid the painstaking configuration of multiserver clusters. Instead, customers can effectively throw a bunch of servers at Neo, tell it how many primary and secondary database instances are needed, and then let the software automatically distribute database copies across all the hardware assets.
While that’s a nice capability all by itself, especially in this era of autonomous databases, what’s really compelling is the heterogeneous nature of the server resources you can use. Servers in the cluster can be a mix of bare metal machines, virtual machines, and containers, across on-premises data centers and one or multiple clouds. The architecture apparently scales horizontally to hundreds of machines (of any of the aforementioned types) and allows servers to be dropped and added on the fly, with rebalancing automatically implemented.
When Balakrishnan explained this, I got a little greedy in my thinking and asked if the servers could even include IaaS spot instances in the cloud, which are always in danger of expiring and failing out of the cluster. The best I was hoping for was to be told that such a configuration “should” work, but that it hadn’t been tested. To my surprise, Balakrishnan said it was fully supported, and that the very purpose of autonomous clustering was to offer the flexibility and resiliency required in such cases. “That’s exactly the point…the system will automatically reallocate and move the databases accordingly, just to make sure your fault tolerance is maintained,” Balakrishnan said, adding “and that’s exactly the premise.”
Neo4j adopted a technology mission statement of sorts involving something it calls the “enterprise knowledge lake,” orbiting around the ability to have a virtually unified graph across all of an organization’s individual graph databases, creating a knowledge graph of the entire Neo4j data estate. The Fabric feature, introduced Neo4j 4, already provided for federated queries, that allowed multiple graphs to be virtually integrated. But with Neo4j 5, that federation can now span local and remote clusters, which Balakrishnan said “provides massive scale out opportunities for very large workloads.”
An Ops GUI, and Release Versatility
On the operational agility side, the new Neo4j Ops Center will provide a UI console “cockpit” for database administration, providing operational control and monitoring of various metrics across databases, server instances and whole clusters. Such tooling brings Neo4j into the world of mature databases — especially relational ones — that have long featured tools that let database administrators (DBAs) get their work done without having to drop down to the command line or acquire third party tools. I don’t mean that assessment condescendingly; the point is that non-relational databases, including graph databases, are now coming into their own, in terms of enterprise manageability. Other enterprise-class operational improvements include point-in-time recovery and differential backups.
Sticking with that enterprise theme, Neo4j is announcing an evolution in release cadence strategy and, at the same time, the ability for enterprises to minimize disruption when new releases occur. To that end, Neo will be moving from a cadence of a couple of major releases each year to a system of continuous releases, approximately monthly in frequency. In Neo’s managed cloud platform, AuraDB, those releases will be deployed automatically. For self-hosted Neo4j implementations, customers can opt into those releases or can elect to skip the minor ones. A forthcoming long-term support model will allow customers to continue to use older releases, with full support from Neo. When a customer does decide to adopt a new release, new “any-to-any” zero-downtime rolling version upgrades will be possible.
Teach a DBA to Fish
The best thing a database vendor can do is to empower non-black belt customers to adopt use cases and capabilities that would have previously required advanced expertise and the fearless demeanor to take on what were high-risk implementations. When a platform vendor makes it feasible for mere mortals to take on these scenarios, it makes the platform more powerful, and it makes customers more confident and enthusiastic about using it. Assuming Neo4j is describing these new innovations in a matter-of-fact fashion, and not just aspirationally, Neo4j 5 looks to be one of those watershed releases.
Neo4j says Nodes, its annual developer conference, to be held Nov. 16-17, will have lots of content on version 5. The company will also host an online webinar on Dec. 8 dedicated to the subject.