Monitoring Metrics You Can’t Afford to Ignore
Raygun sponsored this post.
With the DevOps movement on the rise, there’s a new focus on web-application monitoring tools. This is a great thing. Monitoring web applications (especially in a production environment) is often an afterthought — it’s usually put in place after a few incidents. By that time, value has already been lost — whether from crashing, poor performance or security flaws.
Having a strong monitoring strategy collects information pertaining to the health of your web application. When troubleshooting issues with your applications, having hard information to use as a guide does wonders. However, there can be too much of a good thing. Having too much information causes information fatigue, which is just as bad as not having enough information. If there’s too much information presented on your application, chances are it’ll eventually be tuned out. When this happens, it’s the same as not having monitoring in place.
In this article, I’ll talk about some of the most important metrics to consider for monitoring. In addition, we’ll go over information presentation methods used to provide information seamlessly. Let’s get started.
Picking the Most Important Metrics
As mentioned above, any information that can’t be acted on is a mental weight for you to consume. At best, you’ll just ignore the information, and at worst, it turns your monitoring into a jumbled mess that doesn’t provide much value. A good starting point is to filter out information to show just the critical metrics, whether that means cutting down information or building a monitoring strategy.
Let’s consider a couple of key metrics:
There are a few ways to handle monitoring errors in your application:
- Using a tool such as Raygun’s Crash Reporting to visualize any errors occurring on your web application easily. This tool allows for a turnkey solution to determining the errors occurring in your application, both in terms of the frequency and determining the priority;
- Using built-in metric reporting for an appropriate environment. For example, if you’re using Microsoft Azure for hosting, you can set up metrics to email you about any kind of 5xx errors that may occur on the server;
- Building error handling into your application. This is usually the most economical and bare-bones method, but can easily fall out of hand (also, make sure your error handling isn’t producing its own errors).
Application Performance Monitoring: The Internal Workings
Next, having Application Performance Monitoring (APM) is a critical metric for your application. APM tools provide a way to monitor the internal workings of your application. The most useful capability is to determine any bottlenecks occurring in your application. Application performance management consists of two major sets of metrics:
- Performance experienced by the end users of applications. This includes load times, the volume of requests, and more;
- The computational resources used for the application. This allows for determining any hardware bottlenecks in the application.
For example, let’s say performance is becoming an issue for your application, and users are starting to experience slowness. These kinds of issues can become very murky and difficult to discover. Unlike errors occurring in the application, performance issues are on more of a sliding scale. Maybe it’s just slow because the internet connection is poor? Maybe the user will just blame themselves instead? Since performance isn’t an absolute, the threshold of your users can vary.
Just like with error handling, there are a few ways to handle application performance management:
- An out-of-the-box tool such as Raygun’s APM As APM can be a pretty large undertaking to bring in, this is an easy way to get a lot of value out of the monitoring immediately, without much work.
- Adding performance logging into your application manually. This includes adding debug statements for query statements, computation times, and more.
So far, we’ve indicated two critical metrics to use for your application. This should assist with cutting down information fatigue and viewing only the most critical information to keep your web application in perfect shape. The next step is viewing that information, a presentation can make all the difference in assessing the state of your web applications.
Presentation: Digesting Information Efficiently
As important as metrics are, another critical aspect in managing monitoring capabilities is receiving the information. The ideal delivery method is one that doesn’t require a lot of work to gather the information required. There are two ways I prefer to think about this:
- Delivering critical information to me ASAP. Something like an app going down or a critical error qualifies for this;
- Getting a high-level view of the state of all applications. I should be able to see this quickly and get the working state of the application, digging deeper if I need to.
Let’s go through each of these:
#1: Critical Information Alerting
The first aspect of digesting monitoring information correctly is getting alerts for urgent situations. As with all things involving priority, it’s important to differentiate between an urgent issue and a non-urgent issue. Information overload becomes a risk here — if you start getting hundreds of emails about errors in the system, the next logical step is filtering those emails out. That puts you right back to where you were before, without good monitoring in place.
There’s a simple way to put effective alerting in place:
- What are the critical issues that need immediate attention? Downtime, security issues, or performance dropping past an SLA make for good candidates;
- What’s the best way to be alerted? If your team is using Slack, getting a Slack notification might be the best way to reach you immediately. Perhaps an SMS message? Email is always an option as well, although it can be a struggle to differentiate these alerts from the rest of your email clutter.
Revisiting the Raygun products mentioned earlier, there are a series of integrations that can make the alert method simple to implement. Whatever method you think is best for receiving alerts, Raygun should be able to cover it.
Finally, let’s look at the last aspect for digesting data when managing monitoring capabilities. Dashboards give a visual for viewing the status of your applications at any given time.
Let’s take a look at a dashboard that Raygun provides with its Crash Reporting application:
In a quick glance, I can see data such as the following:
- The live user count currently on the application;
- The average loading time;
- The count of recent crashes.
All of this provides hard data in an easily presentable manner on the performance of your application. If performance is improving due to development efforts, you’ll be able to present that in a meaningful way.
Wrangling Your Monitoring Capabilities
Now that you’ve gone through this guide, you should be equipped with both the critical information to monitor and the best ways to receive said information. Are you drowning in metrics that you can’t figure out how to effectively use? Consider filtering out your metrics to just use the ones explored above, and see if it helps out.
Feature image via Pixabay.