The Enterprise of the Future Will Need Connected Big Data
“We are entering the connected age,” Emil Eifrem, Founder and CEO of Neo Technology, said in his keynote at the GraphConnect conference in San Francisco. The future “is highly, highly connected.”
Oh, and he just happens to have an app for that.
In 2027, 75 percent of the companies currently on the S&P 500 will no longer be on the list. What will keep the 25 percent on the list? Data.
“Data is the new oil,” Eifrem said, referencing a commodity that, along with railroads, made the millionaires of the Gilded Age. But it’s not just data.
While data in isolation is valuable, said Eifrem, “the ability to connect the data is extraordinarily valuable.” The most powerful database on the planet: the human brain. Its neurons connect to other neurons through synapses. And that looks very much like a graph.
What’s amazing, he said, is that the brain doesn’t just store data; it makes sense of it.
It’s hard to imagine now, he told the keynote audience of 1,200, but Google was the fifteenth search engine that came on the market back in the late 90s. What Google did differently, he explained, was to rank their search results; it took data core to its business and connected it.
Fundamentally, a database just stores retrieves data. But the future, declared Eifrem, belongs to those who can connect the data and make it instantly useful to decision-makers, whether they be customers seeing recommendations in real time, CEOs seeing at a glance how suppliers are performing, or the security experts seeing fraud that before was invisible.
A graph database is different from the standard relational database in that it is structured like the human brain, Eifrem,explained. It makes connections between the data that help you make sense of that data. “The fact that you can attach properties to the relationships sounds like a small thing, sounds like a simple thing, sounds like an obvious thing, but that is what makes the database expressive.”
It is this sort of contextualizing of the data that is the future of the Connected Enterprise. “If you have an enterprise where all of your data is connected, your supply chain is connected to your CRM, that’s connected to your marketing technologies, that is connected to your logistics, that is connected to your customers that is connected to your payment history, everything is directly or indirectly connected, that will be an extraordinarily powerful thing,” Eifrem stated.
The companies who do this are the ones that will remain in the 25 percent.
But How Do We Get There from Here?
In a press event following the keynote, Eifrem declared that the idea of the “one size fits all” database is over. The interesting question that data architects are now trying to solve is: what it a good fit? Relational Database Management Systems (RDBMS) hold their data in silos, and connecting data across silos is complicated and messy.
Neo is seeing businesses holding on to their RDBMS with silos and adding graph databases to store the connections across the different RDBMSes.
Scott Grimes, who is the eCommerce and CMS senior director/architect at Marriott Hotels, described how this works when company policy mandates the use of specific databases (e.g. IBM or Oracle): Over time, they ended up with data 10 levels deep.
The hotel chain started using graph databases to manage the relationships between the siloed data. “Each node is almost like a row,” Grimes explained, “so you can seamlessly follow that relationship with the data.”
Data retrieval went from 30 seconds to 25 milliseconds.
The key, said Grimes, is making the relationship a “first-class citizen.” Adding new relationships to a graph database is much easier than going into SQL database and establishing foreign keys throughout the system.
Indeed, the new clustering arch is designed for the cloud mindset, said Eifrem, where you have elastic things that are easy to set up and tear down as you scale up and down.
Oh, BTW, Hardware Is Now Sexy
The bad news from the hardware side, said Balog, is Moore’s Law is dead, asserted Doug Balog, GM IBM Power Systems, during his keynote talk. Adding new hardware will no longer double capacity. Instead, Balog sees acceleration technology as the bringing the next generation of performance improvements, which is why IBM is working with Neo to use graph databases to improve acceleration on IBM’s Power chips.
“I disrupted myself in hardware,” he said. It’s no longer just about scale, he declared, it’s about how to take advantage of that acceleration to rapidly drive insights. There is incredible tech disruption on the lowest level of the stack.
“Hardware,” Balog declared, “is an exciting place to be.”
Micro vs. Monolith
What is the difference between setting up a graph database in a microservices environment, as opposed to a monolith? The short answer, said Eifrem, is none, technically speaking.
The more nuanced answer is to view microservices as an accelerator or enabler. You are no longer forced to choose exact same data model for each piece of the backend, he clarified. The architecture is much more agile. For example, you can put your products in a graph database, so your customers have access to real-time recommendations, and put finances in a tabular database.
The open question is if there really is less complexity with big microservice-based architecture at scale. Very few companies have done this (aside from Netflix), so it’s not clear yet. At scale, there are a lot more run-time dependencies that need to be managed as opposed to the more statistic run-time dependencies found in the more monolithic structures. Time will tell.
What is clear, at least to Eifrem, is that at scale, a graph is needed in order to make sense of the architecture. The next generation of startups, he told us, are using Neo4j to manage the dependencies out of the gate.
Graph databases can also support of machine learning. Balog posited that machine learning is going to get so much better when we start focusing on the relationships instead of the data itself
It’s easy to move from data that’s in a graph to machine learning, said Eifrem. “Of course,” he said, “I drink my own Kool-Aid.”
Even so, it makes sense that a graph database, which mimics the structure of the human brain, would be a great contribution to machine learning. Because it’s the interaction between objects that is critical in machine learning.
Balog pointed out a stumbling block in self-driving cars is the need for a contextual understanding of objects, which is dependent, in part at least, on the relationship between things. For example, it’s easy to put a tree in as a database object, but one also needs to identify the characteristics of the tree, i.e., that it is stationary and will not jump out in front of the car. And if you’re headed straight towards it, it will not jump out of your way.
Ah Yes, But How Do You Convert the C-suite?
Most companies will stay with their relational databases until the pain is so huge they finally realize another solution is needed, Eifrem acknowledged.
A lot of companies are scaling up and moving into levels of data thought impossible a few years ago. For example, the optimal limit of joins in SQL is three, which is rapidly becoming outmoded.
Marriott didn’t start looking for a new solution until the data got 10 levels deep, said Grimes. But the faster processing speeds were a key factor in the ROI and getting the go-ahead. Now the development team can’t imagine doing the work without the Neo4j.
Balog said that companies can no longer use the old manufacturing approach of throwing hardware at the problem. Programming and new software are required to see a forward movement to efficiencies.
The advantage is the simplicity of Graph databases and the reduced time to market, he said, pointing out that you can get a full Neo4j instance running in production in a matter of weeks.
Want to Get Involved?
There are three avenues of open source contribution if you want to be part of Neo’s connected future.
The source code for Neo4J is open source, Neo has built an open repository for NEO4J procedures called APOC. APOC currently has over 200 procedures. Eifrem said that while this is a great way for the community to contribute procedures, it also provides the basis for new features built into future releases of the core product. For example, the feature released last week that allows you to add schema to your graphs started out in APOC.
Neo’s next big project is extremely ambitious. OpenCypher is a community-based project whose goal is to create free and open, vendor-neutral language for graphs (think of it as SQL for graphs, said Eifrem). Launched a year ago, it is Apache licensed and based on Cypher, which has been around a long time.
IBM is a sponsor of The New Stack.