All of a sudden, the thing you want to be, if you’re a component of software infrastructure lurking around a data center, is a persistent data store. A container or virtual machine can be as stateless and ephemeral as it wants to be, like a presidential campaign promise; meanwhile, you’re the stateful part that everyone can rely on. No plugins, no tunnels, no network functions virtualization required. Was it really this easy all along?
So it is not surprising that Red Hat has expanded its Red Hat JBoss Data Grid, allowing the one-time Java caching software to work as a general purpose distributed cache, NoSQL database, or event broker.
Back in 2004, the JBoss developer community developed JBoss Cache — a way for Java programs running on various servers in a cluster, to share state in a replicated cache. The project eventually gave rise to something called Infinispan, which is a cache whose API enables it to be leveraged as a networked, highly available key/value store.
In 2011, Red Hat began producing a commercial version of Infinispan, dubbed JBoss Enterprise Data Grid. At this point, it was a highly scalable data tier, with high resilience and transactional data access. It was touted as the “official” version of Infinispan. One of its principal contributors at the time, Manik Surtani, described the concept behind both products as appealing to financial services, specifically as a way to devise risk engines without introducing the level of latency that a typical database would introduce.
While JBoss Cache’s original purpose, Surtani said at the time, was to provide a convenient, and later scalable, memory store for databases, it turned out that customers were using it without the databases attached, as a data grid — and without mentioning Java in the least. That’s not how JBoss Cache was intended to be used, and he noted that folks associated with JBoss let their users know they were using it wrong.
“But that’s kind of where the penny drops,” he continued, “saying, there is a need for a data grid, people clearly want it, and they’re using whatever they have, even though whatever they have is inappropriate.”
Inappropriate or not for 2011, JBoss Data Grid 6 is now the culmination of the project that started out as a little database cache. And Red Hat is now fully embracing the in-memory aspect of the product as its purpose in life.
“Whether you use a disk as your data management storage layer or you want to use RAM… because your application is loaded up in RAM, guess what? Your data that your application is going to work on, should also be in RAM as well,” said Syed Rasheed, Red Hat’s director of middleware solutions marketing, in a discussion with The New Stack.
“Data Grid basically provides you a way to manage your application data in memory. Traditionally, a distributed cache has been a very popular use case. However, people use big data that does not conform to relational models really well. They come up with NoSQL databases. But Data Grid is also a NoSQL database that allows you to store any kind of data, without the restriction of applying a schema first, or conformity to the data.”
System designers have known for several decades that in-memory caches are responsible for orders of magnitude greater performance, compared to reading data from disk. But with respect to large — and later, huge — tabular or regulated data sets, RAM used to be a precious commodity, so only small portions could be loaded into memory at any one time. Retrieving a plurality of records that met a set of criteria often involved generating a data set in memory, then fetching a few dozen or a hundred records, and iterating through each one in sequence. This is how Java and Visual Basic applications worked with databases for much of the 1990s, and into the 2000s. And if your client/server application has a copyright date in that range, that’s probably how it still works today.
When virtualization made in-memory databases feasible for the first time, few were willing to accept the news that vast memory caches, and the first data grids, boosted performance by five or six orders of magnitude over their JDBC or ODBC-oriented counterparts.
But the in-memory concept called into question the continued viability of the cache-oriented data retrieval model. The benefits of in-memory cast skepticism on the way every application ever made, ever worked, when it was bound to a relational database. Apparently, well before the issue was settled in the minds of vendors and software producers, the users already came to a clear decision: Data grids, technically speaking, work.
“People want to take immediate action, as soon as something changes in the data,” explained Red Hat’s Rasheed. “One of our customers is using Data Grid because they want to monitor real-time change in pricing of their catalog, and they want to correlate that event with, who looked at this item in the past five minutes, so they can alert them in real-time. ‘Ah, there’s a price drop in the item you just looked at in your previous session, five minutes ago. Do you want to buy it?’ It’s that kind of event broker.”
Remember, we started this article talking about a temporary storage unit for data, on its way from a relational database or data warehouse into a display gadget such as an on-screen input form. Now it’s an event broker, or at the very least, a platform where the complex evaluation of data may take place in real-time. And not just with Java, but with any number of languages by way of APIs.
With the Data Grid, “You can use clients in virtually any language — we support Java, C#, C++, Python, Ruby on Rails,” said Rasheed. There are a handful of API styles supported, he added, one of which is clearly REST, but another being memcache API, which is a way to address data stores using a more method-like syntax gleaned from the heyday of object-oriented processing.
It’s here where the architecture makes yet another progressive evolutionary step. It’s become an event broker, which means it’s given the role that used to be attributed to entire data warehouses. But from here, it becomes a persistent data store platform for new classes of applications.
“One problem with the classic relational database — which is the single-record data store — is that it’s the ultimate monolith when it comes to microservices architecture,” said Rich Sharples, Red Hat’s senior director of middleware product management. “It’s what we describe as an anti-pattern. You can decompose and break services into smaller, atomic fragments. But if they’re all talking to the same database, and that database has a big schema, you’ve broken everything. You may as well go back to a monolithic architecture.”
Now, Red Hat is asking developers to consider Data Grid — version 7 of which was released last month — as a candidate for a persistent data store. As a container-based store, it could provide a stateful layer alongside stateless services, placing JBoss Data Grid in direct competition with one of the key features being touted by Mesosphere for DC/OS, in the wake of Apache Mesos declaring its 1.0 release.
“The idea is, Data Grid becomes your storage layer,” advised Rasheed. “Data Grid will also manage how and when it will write data back to disk, or keep data active in memory. It handles persistence. So when a developer [thinks], ‘Oh, I need to write the data back to the disk,’ why do you care? This provides a level of attraction that makes the process a lot easier.”
Cover image of William S. Burroughs’ first 1891 glass-sided adding machine in the public domain.