Making MapR’s Data-Centric Platform More Elastic with Mesos and Yarn
MapR has a new release today that provides some perspectives on the influences that are shaping new data-centric architectures. In particular, it shows the importance of Yarn, Mesos and the continued value that Docker plays as the need increases for developing new patterns that reflect the forces of data gravity and container density.
That’s a lot to process, but it can be summed up in Myriad, a new open source project that MapR, eBay and Mesosphere launched last week. Myriad combines the orchestration of Mesos and the resource management of Yarn. It illustrates how data is dominating the way apps are developed. It shows there is a new thinking about how to make the platforms more elastic so analytics can be more accessible and widely available.
“Real-time” is the more common way to describe the way data is influencing the way we develop and manage apps. But for real-time to work, we need a better understanding of how data follows patterns that change according to the compute demands of the workload.
The concept speaks to why analytics is no longer one aspect of a business. Analytics is represented across a company’s entire operation and much more deeply in application development. The question becomes how compute resources are routed to make the analytics easy to do. The need, therefore, is for the data to be as close to the user as possible. And that is why the concept of data gravity is so pertinent. With mass, the data gets too heavy to move. Instead, the compute resources will increasingly swarm to the data and then move to another task. “There is no separate analytics piece,” said Jack Norris, Chief Marketing Officer at MapR. Analytics “happens across the platform.”
The difference in delivering compute resources has always been about speed. With services like AWS Lambda, compute deployments are in milliseconds, as Battery Ventures’ Adrian Cockcroft discussed last December at Dockercon. Here’s an image from one of his slides:
Information is moved at such scale that architectures will need to be built for big and fast data, Norris said.
“That is why we were so involved in the Myriad project. It was driven by the need to be more elastic.”
That gives MapR the underpinnings for how to sync data in different ways, such as bi-directionally across data centers.
The heart of what matters comes when considering the need to make data faster. That’s where Myriad comes into play. O’Reilly has a post on the topic, which is well worth the read. Here’s how it breaks down. Mesos is what can be described as a distributed kernel. It has the capability to scale the number and size of apps as well as the compute, storage and other resources that are needed for different types of workloads. It’s core is in the kernel, which performs the main functions, such as allocating resources to apps. Yarn allocates resources within Hadoop. Combined, a Hadoop cluster could be orchestrated with Mesos. Yarn would be used to spin up the cluster itself.
What this brings MapR and its customers is a data-centric architecture. It embraces containers, it has portability and it respects data gravity. It puts the database operations and Hadoop analytics into one deployment.
Here’s a fuller representation for how MapR represents a data-centric architecture:
MapR is also introducing what it calls “quick start” solutions. These are templates that users can deploy. For example, it would allow companies to use Hadoop for looking at several months of data to better detect anomalies across a broader time span of data.
It’s arguable that Cloudera, Hortonworks and the rest of the Hadoop ecosystem tread the path for customers so they could get a taste of what analytics can offer. But does Hadoop seem just a bit outdated? In some respects, it does when compared to new plays such as Docker and CoreOS. But this is also why Myriad seems to have some legs. It combines what can be done when data center orchestration is combined with resource management of Hadoop clusters. That marries container ecosystems with the capabilities of data analytics platforms, a combination, that if packaged correctly, can have some impact on the ways we view the data center and its future as a provider of truly elastic services.