“What we’re talking about, in opening up this continuous analytics, are these operational types of applications that, until now, were somewhat separate,” says Jack Norris, MapR’s CMO, in an interview with The New Stack. “If you look at classic environments, you’ve got production flows that are loaded into operational data stores, which in turn are loaded into a data warehouse; and you’ve got a long time lag between the collection of that data and the ultimate analytics.
Going Home to OJAI
Today, a cottage industry has already emerged in the category of JSON-centric data warehouses. For better or worse, JSON has become an all-purpose expression format for data in transit. Because it’s rigidly formatted, the data JSON expresses is also highly normalized. So for ad hoc reporting, the job of moving data into a format to which a NoSQL database can effectively apply analytics is not very complex — just very big.
One company, jSONAR, was founded on this concept alone [see this PDF white paper for details], producing a JSON-only data warehouse system called SonarW, for NoSQL databases to utilize normalized data from Hadoop.
MapR’s innovation would effectively cut SonarW off at the pass, producing the JSON data that analytics tools can reference immediately, without the extra step.
“The ability to get to the analytics directly is key.” says Norris. “Whether that’s an appetizing dashboard for understanding what’s going on with the ad campaign now, or a recommendation application engine incorporated into an online retail environment — while the customer is personalizing her experience — all those require a real-time database. The fact that now you can do those applications directly against the JSON files is a huge advantage.”
The latest MapR DB release will utilize the Open JSON Application Interface (created, in part, just so they could use the acronym OJAI) for accessing an in-memory document structure via APIs. In a blog post Tuesday, MapR principal software engineer Bharat Baddepudi explained how a typical JSON in-memory document may appear, and how API function calls can be crafted to perform the typical CRUD (create, read, update, delete) operations that would be performed in an ordinary data repository.
Writes Baddepudi, “OJAI includes a backend document store interface referred to as a table, which is used to insert and retrieve documents and perform other such CRUD operations. Each such user document is stored as a row in the table, which is accessed using a unique row key.”
A JSON data set is expressed as a document using syntax that can be easily parsed by both humans and language interpreters. So if in one such document, field names are given to each of the properties or columns in a table, then the values for each of those fields may be written to using an API method
.set, placed to the
MapRDB object, in which the field name and setting are passed as parameters. Updates to individual records may be performed as “mutations” to memory, addressing the contents of memory as though they were arrays with numeric indices.
It’s a deceptively simple system, but here’s the point: Certainly no greater amount of effort, and possibly less effort, is expended creating the scripts for parsing JSON data being supplied directly from MapR DB, than producing a similar set of scripts for MongoDB. But since you’re addressing the database directly and not a rendering of it, you get all the benefits of Hadoop real-time processing.
Reconciliation Time Again
There’s a potential architectural shift that could emerge from this mode of processing. Real-world database applications that, in the past, depended on operational analytics and reporting functions to be triggered functionally, may now assume those functions are already done. Perhaps in later revisions, these applications can simply address the results of continually updated variables in memory.
MapR’s Jack Norris speaks to that point: “In our platform, we currently support table replication, and now JSON document replication, not only from the data’s perspective but in a preferred format. So you can consume it as a search index — you can have an application that is search-based, in sync with the same data space as your database table. That way, when you have a customer-facing application, there’s not a batch dependency of when was the data loaded, so that customers are getting different views of their support tickets or bills.”
Norris also perceives a possibility where up-to-the-millisecond operational analytics will become critical — for example, real-time fraud detection, where the operational logs produced by servers in retail outlets scan for unusual activity 24/7/365.
“How do you simplify the flow of data within an organization? How do you simplify the applications, so that the cycle between the data creation and the data collection is compressed?” Norris asks, admittedly rhetorically. “I think it’s increasingly clear: the differences in the platforms that support, to a much greater degree, this ability to do real-time difference functions, impacting business as they happen.”