Cloud Services / Observability / Tools

Unravel Data Takes Big Data Application Performance Management to the Cloud

16 Apr 2019 12:12pm, by

Unravel Data made its name as an application performance management (APM) platform for the Big Data stack — Hadoop, Spark, Kafka, Impala, Cloudera, and so on.

More recently, it’s been helping customers take their Big Data applications to the cloud. For most, it’s a journey.

“Nobody turns off the on-premise switch one day and turns on the cloud switch the next day,” said Unravel CEO Kunal Agarwal.

“They go through this hybrid environment. They take a couple of years to go fully cloud and some companies are just hybrid for the rest of their lives because there are some use cases and some data sets that are just better to run on-premise.”

Unravel uses machine learning and artificial intelligence on top of log data to monitor application performance, detect anomalous behavior and offer recommendations to remedy the situation.

Unravel tells you, “This is the problem, this is what happened today, this is how you can resolve it,” Agarwal explained.

Whether on-premise or in the cloud, there can be similar problems: The applications is slow today and it wasn’t yesterday or it’s too costly to run, etc.

While the tools in the cloud might be similar — You might run AWS Kinesis instead of Kafka, Amazon EMR or Azure HDInsight rather than Hadoop or Spark, Redshift for fast SQL — yet they’re still different. The same holds true for the associated problems.

With the cloud, you pick your environment, load the data, then run analytics on top. The workload could be a batch process or a real-time process like IoT. It could be fast SQL lookups. With any of these, there are a common set of problems.

“If you have a failure, you’d have to dig into 20 different tools because the reason could be anywhere in the stack. It could be because your code was bad because your containers weren’t sized properly. In a multinode environment, if you have 100 machines, it could be because machine 84 wasn’t working yesterday,” Agarwal said.

Unravel takes all that full stack data and uses its analytics engine to run algorithms to detect these problems, pinpoint them, then offer a recommendation or automatic fix.

The analytics engine learns about your ecosystem. It learns seasonality of your cluster. It will understand the priority of applications, based on user profiles or repeatability of these jobs. When it makes recommendations, it puts them within the boundaries of what it’s learned about your environment, Agarwal said.

“We want to understand what’s possible in this ecosystem, then how do we recommend the right parameters, or right output of this so we’re improving the overall performance and reliability of your entire ecosystem,” he said.

The mechanism for autoscaling in the cloud exists, but how do you know your application needs 100 machines instead of 64?

Unravel can tell you not only how many you need for this app — the vendor might have 40 different machine options, all with different specs, different memory, CPU and prices.

“[Users] are running blind today, using some guesstimates. [So they order a bunch of stuff] and get a shock at the end of the month,” he said.

“Amazon doesn’t break down usage; it just says your EC2 cost was X, your EMR cost was Y and your S3 cost was Z and that equals $5,000.”

Unravel can help customers break down the bill by application.

“The auto-scaling — helping companies figure out what resources they need any given time of day, any day of the week is something that Unravel does. Then we use that same data to help customers break down the bill into a phone bill to see whether they really need to spend so much and can they lower the bill in any particular way,” Agarwal said.

Especially in an age of microservices, determining cost has been a tricky proposition. Tel Aviv-based startup Lumigo, for one, touts its ability to break down cost per transaction on AWS.

It also can help them identify which workloads are good for the cloud. That’s usually any workload that has a drastic change in performance from one day to the next or applications that often run out of resources.

It can help companies determine the resource footprint of each application individually, and determine, for instance, whether it’s a good idea to move a certain set of them — the marketing department, for instance — as a whole to the cloud.

Feature Image: “Rope Knot 2” by Rebecca Dongallo. Licensed under CC BY-SA 2.0.