Walking In a Graph Database and the Meaning it Holds
The graph database is still not quite understood. For that matter, we’re still adapting to a post-relational world — a new era that relates to the increasingly horizontal and distributed nature of application architectures. But there is something more that needs to be understood, and that’s the way databases affect our everyday life. We can’t truly understand the meaning of a graph database until we see how services change the way we view how we live and work.
Take a map for example: it has any number of points on it. Infinite “nodes” that represent land masses and the “things” on them. Until recently, the connections of these things meant nothing to us. A pole is a pole is a pole. But not really, anymore. A pole is almost better understood as a data object which can be viewed on an online map, or now increasingly, as something that connects to other things. Which gets us to the next question:
How Does This Change Our View of a Database?
Until now, a database was viewed as what managed bank records, receipts, bills, data about people, animals, etc. But with graph databases, it’s a bit like we are in the database itself. We can see a pole. We can see a street lamp. But we will soon be able to see them in multiple other dimensions. As we walk, the data changes with the differing relationships of the various objects in our path. It’s kind of like walking through a graph, linking us to each node across a data fabric.
Through this lens comes a new view about applications. We are starting to see how the companies that build data infrastructures find context in the data and use it to build new applications. The process is an infinite loop. Data is collected, analyzed, patterns get discovered and applications are developed. Graph databases speed this process by connecting seemingly unrelated points of information, such as with bank fraud, described by Philip Rathle last spring on The New Stack:
Unlike most other ways of displaying data, graphs are designed to express relatedness. Graph databases can uncover patterns that are difficult to detect when using traditional representations, such as tables. An increasing number of companies are using graph databases to solve a multitude of connected data problems, including fraud detection.
The Web is viewed by many technologists as a database. Now the database is turning inside out, exposing itself to the real world. A relational database may work for processing structured, highly uniform datasets that are managed in an isolated manner, as Matt Asay points out in a ReadWrite post, referring to a report on the IoT by Machina Research. With “sensors, devices and gateways by the millions,” there is the need for new kinds of databases that can process the meaning of a pole, house or tree.
Thinking too much about this can send one down an endless rabbit hole. The work today is on developing the databases that make the future now. That means sticking to the reality of thinking through distributed architectures and how graph databases function in those environments.
Graph Databases in a Distributed Platform
Over the past several months it has become apparent that the dynamics of a distributed, connected universe are delivering a new generation of graph database providers. Further, there is the emergence of hybrids that are connecting data platforms with graph database technologies. Joining graph databases with a distributed platform makes sense from a distributed perspective. As graph databases expand, solving the problem of being able to handle high volume read-write operations at scale will be the next challenge for the maturing market.
For example, integrate Cassandra’s NoSQL platform with TitanDB, the graph database DataStax recently acquired from Aurelius, and with it build a node network that scales horizontally and you begin to see this hybrid version emerge.
Cassandra, with its masterless architecture, is designed to withstand data center outages with no disruption in service. Netflix, a DataStax customer, uses Amazon Web Services (AWS). In 2012, AWS had an outage at one of its data centers during a big storm. None of Netflix’s customers lost service, which is in part attributed to Cassandra’s platform capabilities. With Cassandra, TitanDB can be implemented across any data center, providing a stability for customers which they can make part of their operations.
TitanDB is designed to scale out across a distributed cluster. With Cassandra, it has a distributed storage engine to scale the database as it adds nodes. In comparison, Neo4j, the leading graph database, scales up and has a master/slave architecture, which requires more powerful machines for scaling. Storage across clusters is what TitanDB has as its advantage.
Neo4j and Titan take a fundamentally different approach, said Emil Eifrem, founder of Neo4j. He said they made the decision to build a graph database from scratch for reasons of performance and reliability. Tradeoffs come when building a graph database on top of a database not built for graphs. Neo4j built their own relational database management system and a query language for it. For the record, the company announced Neo4j 2.2 this week with added read and write scalability.
TitanDB takes the other approach, building a non-native graph database on top of another database. Eifrem said databases like Cassandra are great at handling large volumes and ingesting lots of data. But he questions how fast queries run and thus how much they satisfy real-time business needs.
OrientDB, according to its web site, has a hybrid document-graph engine. It is built on SQL but adds extensions to enable tree and graph manipulation, while also offering a multi-master and shared architecture to overcome the master-slave bottleneck in write operations. Their site has a detailed comparative analysis of the differences between Orient and Neo4j.
Appbase: Graph Database as a Service
Meanwhile, new database contender Appbase is hoping its graph database as a service (DBaaS) offering will entice app makers to use its service when building their data engines for search and real-time social networking products.
Appbase offers a DBaaS product that allows developers to quickly scale up real-time databases and back-end services, making access to the data available via a single API. Appbase believes it is unique in how it is storing data: in graph format. This means that instead of rows and columns, data is organized by vertices and edges.
“If you were to model a social network in an SQL database, doing it in real-time wouldn’t be possible,” said Appbase Founder Siddharth Kothari. “Capturing objects in two different tables just isn’t possible. When you think about the whole data model as a network, vertices apply to how many rows. Edges means every object is linked by an edge. So as you grow, you need more vertices and more edges in your graph.”
Neo4j thinks of the graph database as a computational layer, Kothari said. What Appbase tries to do is think of a graph as a data structure.
While allowing flexibility and quick scale-up, Appbase’s offering does come with some tradeoffs. For starters, as a DBaaS, Appbase will make its own decisions on how to store end users’ data. As their website reads, users cannot pick the databases themselves: “Our goal is to create a database service that abstracts the underlying internals — so we can pick the best of everything and offer it as a single, consistent API. At the same time, we maintain transparency about our stack and offer data-modeling guidelines that work best with it.”
Similar services, like GrapheneDB, are built as a DBaaS on Neo4j’s open source tools, meaning business users may feel more in control over such decisions.
Is the Market Ready to Use Graph Databases?
Search trends definitely point to the growing interest in graph databases.
But respected industry services like Orchestrate are seeing less interest in their graph database search queries than some would expect.
“Graph workloads will be an area of interest for organizations contending with ever-expanding and ever-diversifying data sets. Graph databases provide ways to query data that can power sought-after features like recommendation engines. They can also be used for more pedestrian use cases, like describing relationships between items,” says Ian Plosker, CTO of Orchestrate, which has been offering a graph search query function on its Database-as-a-single-API service since 2013. “We provide a solid graph API that meets the vast majority of our users’ needs, but to be honest, graph is the least-requested feature from our developer community. Our biggest customers and workloads come to Orchestrate for our JSON document-oriented model with powerful search and time-series queries, the sorts of queries massive scale ‘Internet of Things’ projects require.
“Developers make use of our graph API to describe relationships between items. For example, a graph relation might be used to describe a friendship between two users, or a user’s ownership of a document. This allows developers to query all the friends of a user, or all of the documents a user owns. We don’t have limits on the number of relationships in a data set.”
Appbase, a Techstars alum, is so far finding that its product is being used among other startups and entrepreneurs. But if they are to solidify their position in the market, they will need to move beyond the startup use case and reach to the enterprise.
“Startups are the quickest to jump on the new trends, and they are the ones that realize these problems more quickly,” said Kothari. “But enterprises are much more like silos; how we will reach out to them is not something we have plotted out already. We are more interested in taking it organically.”
Despite not yet having an open source roadmap, Kothari confirms that end users will not be locked-in to Appbase. Data is accessible via the REST API. The company does not have any export mechanism built into the database but a programmer can access it.
Graph Databases on the Road to Maturity
While the graph database market is strengthening — leading analysts such as Forrester predict a 25 percent industry uptake by 2017 — there is still a long way ahead before it becomes an established database offering.
Database expert Curt Monash is an industry analyst and founder of Monash Research, which publishes the industry blog DBMS2. While optimistic, he still sees graph databases as having a fair way to go, as demonstrated with Neo4j having native graph storage and native graph processing, while TitanDB can be used across a distributed infrastructure to process data at a much higher volume.
“The closest thing to a standard graph language is SPARQL, which is meant for a limited set of graphs and graph use cases only,” Monash said. “So yes, it would be nice to see agreement on a more expressive graph language.”
The relative youth of graph databases points to the emerging importance of TinkerPop, the new Apache Foundation incubator project which standardizes graph databases. Members participating include Neo4j, Titan, MongoDB and FoundationDB. It’s this kind of effort that will help graph databases develop more maturity.
Generally, the two things Monash looks for are:
- Flexibility as to the data type’s — or indeed document type’s — labeling nodes and edges.
- A rich operator set — centrality, path length, and so on.
The future is predictable, but only so much as we learn how disparate points of information are related. Without that capability, traveling through data will be a muddled, cumbersome experience.
Image via Flickr Creative Commons.