What news from AWS re:Invent last week will have the most impact on you?
Amazon Q, an AI chatbot for explaining how AWS works.
Super-fast S3 Express storage.
New Graviton 4 processor instances.
Emily Freeman leaving AWS.
I don't use AWS, so none of this will affect me.
Microservices / Observability

Rocana Could Eliminate the Blame Game from Microservices Troubleshooting

Apr 5th, 2016 3:54am by
Featued image for: Rocana Could Eliminate the Blame Game from Microservices Troubleshooting

“It’s not my side that’s broken,” is a seemingly never-ending game in play most often by teams spending too much time determining the root cause of a problem. When one adds microservices to the mix, the blame-game gets complicated further — as teams will then be divided up into subsets based on which application or service with which they are working. Deploying individual tools for monitoring containers, systems, and microservices can provide insight into these infrastructure components though it leaves Ops teams at a disadvantage.

When Rocana teamed up with Spiceworks to better understand how DevOps teams drill down to resolve its infrastructure challenges, they found that 94 percent of IT teams disagree about the root cause of issues. More often than not, each team has its unique subset of data which makes basic comparisons a daunting task. Without the ability to correlate data from across their infrastructure, there’s more to sift through before finding a solution.

Shedding Light on Your Data, for Free

Scaling can present a challenge for Ops teams as many of the current workflow toolsets weren’t designed to handle the amount of operational data newer systems are generating.

“Popular log aggregation products with query-based interfaces were good tools a handful of years ago. But now, there is so much data that query response times are painfully slow,” said Rocana CEO Omer Trajman.

When massive amounts of data are added to an already complex microservice infrastructure or system, some search-based solutions struggle to isolate issues efficiently. As a result, enterprises often have to prioritize which data to collect while leaving other data behind. Data that could be used to prevent outages noted Trajman. To combat this, Rocana recently announced Rocana One, a free version of the company’s Rocana Ops tool that allows users to analyze up to 1TB of data free every day with unlimited retention.

Rocana: Line based data visualization breakdown by service

Rocana: Line based data visualization breakdown by service

Containers and microservices deployed within them present a unique challenge in terms of monitoring performance and health, as they are dynamically deployed at scale while also having short lifespans. Legacy tools are incapable of making sense of this data, no matter how solid one’s DevOps team may be. Filtering out the white noise, helps users better detect what they should be focusing on, and when.

Rocana can collect and analyze data from many sources which range from Syslog, application logs, host metrics, and even API-based data sources. When setting up Rocana in one’s stack, developers will most often find themselves ready to go in minutes as the necessary data will already be available from system metrics or other sources. Trajmaned explains that if one wanted a code-level view of their analytics, Rocana could also be used alongside profiling tools such as StatsD to instrument systems for later analysis.

Connecting it All Together

Rocana focuses on making life better for IT teams, DevOps, and system administrators. In particular, Rocana shines as a monitoring silo consolidation tool. In lieu of continued maintenance spending on these tools, it enables users to consolidate their monitoring information into a warehouse of Ops data. This includes not only traditional monitoring analytics but powerful search and query capabilities and anomaly detection.

When operating at scale, efficiency is the name of the game. Components can easily numbers in the hundreds of thousands, with a swath of unique variables to monitor. Rocana Ops utilizes a unique WARN (Weighted Analytic Risk Notification) score which identifies individual components, hosts, services, and even locations that are exhibiting anomalous behavior, Trajman noted.

There are a variety of solutions available to developers and IT Ops professionals in this space, including tools such as WaveFront, SpiceWorks, and Zabbix.

“This is the correlation of the haystack against one needle. At the low end, users don’t need this — It’s overkill. They’re not looking for a needle in a haystack. With 5 or 10,000 machines, you are,” said Dev Nag, chief technology officer of Wavefront, discussing the issue in general.

When addressing the issue of Application Performance Monitoring, Rocana enables enterprises to perform deep dives into their infrastructure in order to identify performance issues. Even something as simple as latency can affect overall performance as a service call can interact with dozens of components such as servers, databases, web applications, and routers as it completes its task. With Rocana Ops and Rocana One, developers can be better equipped to trace issues from end-to-end and resolve them before a cascading system failure occurs.

SpiceWorks offers a vast selection of tools for those working in an IT help desk environment, including inventory assessment, network monitoring, ping checking, and service monitoring. Zabbix also offers a variety of analytics tools including KPI indicators, capacity monitoring, and network capacity overviews. While end-to-end monitoring can be a challenge, these tools offer a strong selection of solutions for nearly any infrastructure.

Better Monitoring, Better Solutions

Developing a new infrastructure that makes use of the best-of-the-best monitoring tools is often a solid decision when migrating from a legacy solution to cloud-based technology. However, most enterprises hit a roadblock when their data has to be integrated with their current Operations layout.

Rocana: Plotting Search Queries Against One Another

Rocana: Plotting Search Queries Against One Another

While microservice monitoring platforms may be able to identify a sick or unresponsive service, this doesn’t help DevOps and IT teams determine the true cause of an issue.  “If that microservice that makes a call to an external database is the performance problem caused by an overloaded database, a congested WAN segment, or a host-based memory utilization problem?” Said Trajman.

Whether working on building out a legacy application, structuring a microservice framework, or trying to make the most of your data, understanding how to implement best a monitoring solution is crucial. Rocana offers solutions based not only on system monitoring, but debugging, operations, and issue resolution across one’s infrastructure. If you’re interested in signing up for Rocana One, you can apply here.

Images from Rocana.

Group Created with Sketch.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.