Basho’s new data platform is a reflection of the variety of services that are available to any developer who wants to build applications that leverage open source technologies which may encompass different databases, analytics tools or search offerings. Basho is catering to this growing movement with a new type of data platform that offers integrations with Apache Spark, as well as Redis and Apache Solr, the open source search technology.
At the Gluecon conference last week, Tyler Hannan, Basho’s director of technical marketing, sat down with The New Stack Founder Alex Williams for a demo of the data platform, showing how it serves as a way to control and simplify the creation and distribution of a technology for modern, big data applications. Basho is the creator and maintainer of Riak, a distributed NoSQL Database and an object storage solution.
Companies have adopted NoSQL technologies for point solutions, said Tyler in the demo.
“As enterprises and startups alike saw the value in that adoption and in those deployments, they began to deploy them in multi-model solutions, with people leveraging both key-value stores and object storage alongside each other. And then they began to adopt more technology components.”
Basho’s Data Platform core services can be thought of “as a framework that enables scalability, distribution and fault-tolerance for a variety of resources,” he said. These are the core services that manage the replication and synchronization between storage instances — such as Riak KV and Riak S2 — and service instances.
Tyler displayed a set of instances that ran the Basho Data Platform. He issued a Riak admin command — a cluster status — which showed three instances of Riak joined together in a cluster. A Riak platform admin command returned not only what services were running, but also what services were available to deploy.
Basho’s Apache Spark Add-On integration offers the ability to “write it like Riak, analyze it like Spark, ” offering a deeper analysis capability.
For reference, Apache Spark is an in-memory data analtytics software that has proven enormously popular. In November, Datanami cited Google Trends, which showed Spark surpassing Apache Hadoop, the well-known file distribution technology. Since then, Spark has continued to show its strength.
Tyler referenced the “Key: (C) = Claimant” string, which is Riak Ensemble, a consensus algorithm. Riak Ensemble allows for the deployment of large clusters of Spark without using Apache’s Zookeeper service. “You can actually leverage Riak’s key core claim capabilities by Riak Ensemble to do that,” said Tyler.
Next, he kicked off a Spark job that read football data from Riak, ran a series of analyses against that data in Spark, and then persisted that data back to Riak, demonstrating how Basho’s Redis integration offers the ability to “write it like Riak, cache it like Redis,” enabling the reduced latency that Redis provides.
It’s taking the advanced caching capability of Redis, but making it enterprise-grade.
Then, he added Redis as a service within the platform, added the Basho Data Proxy as well, and started each with Riak platform admin commands. Tyler entered the port number where the Redis Proxy is listening, then used the Redis command line interface to get the desired data key from Redis.
“It’s actually done a read from Riak KV,” he explained, “but, importantly, it’s persisted that result to Redis.”
“If I’m leveraging Redis, I already have applications that know how to talk to Redis, and know how to interface with Redis,” said Tyler. “Instead of changing my tool-set entirely, I simply point that toward the Basho Data Platform, and I’m able to enjoy the benefits of Redis caching, with the persistence of Riak.”
In summary, Tyler said, “It’s available; it’s scalable; it’s simple. It takes all of these different applications that we deploy as part of a modern application stack, starting with Redis, with Apache Spark, with Solr, and it enables you to deploy and manage them very simply, and it scales.”
Basho is a sponsor of The New Stack.