Data / Kubernetes / Machine Learning

MapR Brings Apache Spark and Apache Drill to Kubernetes

2 Apr 2019 9:00am, by

As with most abstractions, the Kubernetes’ distributed application architecture solves some problems, but presents others. For example, Kubernetes makes it possible to host an application across numerous locations, whether on-prem, in the public or private cloud, or a hybrid combination thereof. It also allows for a truly cloud native approach to application development with continuous delivery. At the same time, the persistent stateful applications that developers have become accustomed to are no longer available by default. Rather, containerized applications are inherently stateless, as containers are intended to exist ephemerally, allowing them to be created and destroyed quickly without regard, enabling truly continuous delivery.

Last year, data platform MapR tackled this problem by providing a persistent storage layer for Kubernetes, and now the company is extending that functionality with new integrations for Apache Spark and Apache Drill. With these new features, MapR now allows for users to deploy Spark and Drill as compute containers orchestrated by Kubernetes, thereby allowing “end users including data engineers to run compute workloads in a Kubernetes cluster that is independent of where the data is stored or managed,” according to a company statement.

The new functionality is an extension of last year’s release that, according to the statement, will “make it easy to better manage highly elastic workloads while also facilitating in-time deployments and the ability to separately scale compute and storage.”

“We are extending the capabilities that we already know and have time tested on our MapR data platform and we are integrating them with Kubernetes. We are integrating through the APIs that Kubernetes offers and offering end users a way to run Spark containers and Drill containers as-pre packaged images,” said Suzy Visvanathan, MapR senior director, product management. “Kubernetes has a concept of autoscaling horizontally. Containers, by their inherent nature only, hold application information. Meaning if you have an application that edits a JPG image, that container will do only that. It does not hold any information about the OS or about the system libraries. They are almost residing in a different place. Because of that, containers are lightweight, elastic and portable, because they contain information only about that application and runtime.”

Visvanathan went on to explain that the new features allow an organization to scale their use of Spark and Drill in much the same way they can other aspects of containerized applications. For example, if a team needs to run a quarterly report requiring thousands of Spark jobs, rather than giving the team a dedicated cluster, you can instead spin up compute jobs on demand, whether on-prem or in the cloud, using MapR’s new integration.

“MapR is paving the way for enterprise organizations to easily do two key things: start separating compute and storage and quickly embrace Kubernetes when running analytical AI/ML apps,” said Suresh Ollala, senior vice president of Engineering at MapR, further explaining the integration in a company statement. “Deep integration with Kubernetes core components, like operators and namespaces, allows us to define multiple tenants with resource isolation and limits, all running on the same MapR platform. This is a significant enabler for not only applications that need the flexibility and elasticity but also for apps that need to move back and forth from the cloud.”

The company says that the integration will offer several concrete benefits, including isolating resources to prevent scenarios where applications starve each other of resources, allowing for a multitenant environment, the ability to run different Spark and Drill versions on the same platform, the ability to handle compute bursts, and the ability to dynamically scale Drillbits based on load and demand. The new integration will also, at its core, allow users to “deploy Spark and Drill container applications, along with MapR volumes, across multicloud environments, including private, hybrid and public clouds.”

Currently, the features are being tested in private beta and will be generally available sometime in mid-2019.

Feature image by DarkWorkX from Pixabay.

A newsletter digest of the week’s most important stories & analyses.

View / Add Comments

Please stay on topic and be respectful of others. Review our Terms of Use.