Data / Machine Learning / Monitoring

Unravel Data Tackles Application Performance Management for the Big Data Stack

10 Jul 2017 3:00am, by

Does Big Data require its own application performance management (APM) platform? Unravel Data believes so, citing the complexity of managing distributed computing and parallel processing applications.

It set out to simplify all the operations in the Big Data stack.

“It’s really complex. There’s a stack: Hadoop, Spark, Impala, Cloudera, Hbase, Kudu… And there’s a huge lack of talent in market — no one knows how to do data engineering and data science really well,” said Unravel CEO Kunal Agarwal.

Older APM tools gather all the logs, metrics and create graphs and dashboards, then say, “This is what happened in your system. You go figure out where the problem is,” according to Agarwal.

Unravel decided that giving people hundreds of graphs and screens wouldn’t help them figure out what’s going on.

“We decided to use Big Data technology ourselves to help customers uncover problems — to tell them in plain English: This is the problem. This is what happened. Here’s how to resolve it,” he said.

It gathers information from every layer of the stack — applications down to infrastructure —then applies machine learning to do three things in one platform:

  • Optimize: Speed up your app, reduce resource footprint, etc.
  • Troubleshoot: Find the root cause of problems and resolve them.
  • Analyze: Determine who’s using what and how much. Can I plan my cluster better? Can I do charge-back and show-back for my multitenant cluster to allocate costs?

Different Problems

Agarwal maintains that Big Data requires its own APM tools.

With web apps, for example, you’re looking at transaction times, he said.

“In Big Data, the timing and kinds of problems are completely different. … Say you want to do a sales report. You’re breaking that work down into 100 equal chunks You take a piece of code and running it on 100 parallel machines so this job can finish fast. The problems here lie in load and balance, parallelism, did one task on one machine do more work than other machines. So the problems are very different. The kinds of data you have to get for detecting and diagnosing these problems is different. And the resolution you can get for solving these problems is different. So you really have to design APM from the ground up for Big Data,” he said.

Unravel promises to discover and correct application performance and reliability issues, improve cluster utilization and help optimize storage by enforcing policies concerning which data to place in memory and which to move to archive.

For instance, with a recommendation engine job that needs to finish by 8 a.m. every day, people often see drastic declines in performance from one day to the next, he said. Unravel can analyze various runs and pinpoint what changed to cause the problem.

“It could be I’m processing a lot more data. Or it could be somebody changed configuration settings on the app, or even worse, it could be there was another job running on the cluster at the same time and it stole all the resources. Rather than you having to connect all the dots yourself, Unravel will sift through all this history, analyze the performance patterns and tell you in plain English: This is what happened today out of 100 possible problems, and this is how you can fix it,” he said.

“People can only use this open source technology if it’s not rough around the edges if it will return the answers to them in the time to meet their SLAs.”

It also can be set to dynamically allocate resources to jobs running in multitenant environments and provide a hot, warm, cold analysis of your data to optimize storage.

Growing Portfolio

Unravel Data claims to be the industry’s only APM platform offering a full-stack solution for Big Data, though it’s adding to its portfolio incrementally.

Founded in 2013, the Menlo Park, Calif.-based company launched with support for Hadoop and Spark and recently added support for the open-source stream processing platform Apache Kafka and the massively parallel processing (MPP) engine Impala backed by Cloudera. It also supports Amazon EMR.

Next up: NoSQL support. It has been working with companies in a private beta for four or five months because of the popularity of Cassandra in use with Spark and Hadoop stacks.

However, Alameda, Calif.-based startup MityLytics also is focused on APM for Big Data, providing predictive modeling and a simulation environment for testing out changes to Big Data workloads.

AppDynamics, which has a representative on Unravel Data’s board, has been touting a Kafka extension since the beginning of the year, and the engineering process for performance management has been a topic of discussion for some time.

Concurrent launched APM for Hadoop in 2015. It later changed its name to Driven and was acquired by data integration company Xplenty. OpsClarity, bought out earlier this year by Lightbend, also introduced an anomaly detection and event correlation solution for Big Data in 2016.

Use of ‘Sensors’

Unravel is deployed as its own server in on-premise installations, though some customers also employ it as a SaaS offering.

“Of the 100 percent of the data we need to perform our analysis, 75 percent already exists in some form on your own cluster. Hadoop itself is gathering a bunch of this information — cluster information like CPU, IO, metrics from your cluster management and monitoring tools. We absorb all that data and the correlation and analysis are the value proposition for Unravel,” Agarwal said.

It uses what it calls sensors to fill in the rest.

“These sensors are different from agents in one way: We don’t need to deploy sensors as root as agents do. These sensors do not need to run 24 hours, putting unnecessary overhead on the cluster,” he said.

“Sensors can dynamically instrument any app, any user, all the apps in the cluster, for example — but only absorb the piece of information that’s missing at the frequency you define to send that information back to Unravel.”

This is proprietary technology. If you have 100 machines and your job is going to take 20 machines. This piece of code works like a JAR file so it only goes on those 20 machines, and you can deploy this code with your Spark job and it only absorbs data while your job is running, then it folds itself back and it’s gone out of the cluster. It only absorbs data at the time you specify for the level of job you’re running, he said.

It’s all API driven, so Unravel can talk to your current monitoring systems. If you want to monitor Oracle and Hadoop together, for instance, you can bring in all your monitoring data from Oracle into Unravel or feed Unravel data to any other messaging, monitoring, alerting system you already have.

Feature image via Pixabay.


A digest of the week’s most important stories & analyses.

View / Add Comments

Please stay on topic and be respectful of others. Review our Terms of Use.