Logging and Monitoring: Why You Need Both
Logging and monitoring are peanut butter and jelly. The two go well together, just like bread and butter or other famous combinations you can think of. Similarly, logging and monitoring work best because they complement each other well.
Separately, logging generates a detailed set of events that occur within your application. Error monitoring tells you if your application is working. For example, logs can help find a cause of an application problem, but their sheer volume can limit how fast that gets done. But integrating logging with monitoring can help sift through this data much more quickly to solve customer experience issues caused by a slow-working application.
Together, monitoring your applications and logging their data help you keep your customers happy. As the systems that applications run on keep getting more complex, doing either one alone is not enough. In this post, I’m going to talk about some reasons why logging alone falls short. I’ll also share some ways you can integrate logging and monitoring so that you can more quickly solve problems that are impacting your customers’ experience.
For many years, logs have been an essential part of troubleshooting application and infrastructure performance. They help provide visibility into how our applications are running on each of the various infrastructure components.
Log data contains information such as out of memory exception or hard disk errors. This is very helpful information that will help us identify the “why” behind a problem either that a user has brought to our attention or that we have uncovered.
So logging data from your applications is great! But we need more.
The Log Is Not Enough
With log data, you can get a lot of key information about your applications, but that’s not enough to help you solve problems fast. There are a number of drawbacks to relying on log data alone.
One drawback is that traditionally log data is written to one or more files, sometimes in a rolling buffer, that are stored on the hard disks of the servers your applications run on. This can lead to disk space issues pretty quickly, depending on how much data is logged.
Most logging frameworks have a set of logging levels similar to the following: fatal, error, warning, info or debug. If you’re only logging errors, which is often the default, and only certain types of info, you may not run into any issues.
But as soon as you increase your logging level to debugging, which you may be using when in development, you could start having some disk space concerns. This may not be an issue if it’s one application. But if multiple applications are running, system resources could be affected.
Too Much to Search
Another drawback is the possibility of extreme logging. Log data is often most useful for troubleshooting when you’re logging a lot of data about your application. The more relevant data you log, the better off you could be when there are problems.
One issue with this is that the log files can grow so large that they become problematic just to open up and work with. This then makes it difficult to search and find the relevant data when there’s a problem you need to solve.
I still have nightmares of opening up large log files consisting of tens or hundreds of thousands of lines of log data that I needed to search to troubleshoot what turned out to be a Java virtual machine (JVM) memory issue.
Searching through all of this data, whether it’s stored in a single file, multiple files or a time-series database; can be time-consuming. And when there’s an end-user experience issue, time is of the essence.
A third drawback is the lack of correlation of data across systems. When your applications are installed across numerous microservices and each service is logging data, you need to be able to identify and track an end-user issue across the entire microservice architecture.
Having to look through multiple log files from multiple systems or query for the timestamps across multiple services to find why a user is experiencing slowness can take a while. And that time is money that your organization is losing.
Error Monitoring and Logging Integrated
Despite having logging tools available to store and analyze log data, many of these drawbacks still exist. There are commercial tools like Splunk and Loggly, along with open-source options like the ELK (Elasticsearch, Logstash and Kibana) stack. These tools will allow you to aggregate, store, search and visualize your log data. But you still have some of the same issues.
You still have to search your logs. And with the complexity of applications increasing, log data is likely to increase as well, so this search can still take time. The commercial tools introduce a cost drawback as well since some charge per gigabyte of data stored. The cost of logging your data can increase dramatically in this case.
So, rather than only sending your log data to a logging tool or, worse, to a log file, I recommend utilizing an error monitoring and crash reporting tool that will complement your logging.
Here are five ways you can do that and integrate your logging with error monitoring:
1. Log-o-Error Monitoring Tool
In order to have a chance to troubleshoot more quickly, your DevOps teams should work to have log data sent to the error monitoring tool, instead of to the disk or a logging tool alone. Code that is pushed to production should be sent to your monitoring tool to provide visibility into the performance of your applications as they progress through your CI/CD process;
2. Verify Application Language Support
Being able to log data directly to your error-monitoring tool is great. But you also need that tool to provide value almost as soon as you install it. To do that, your error-monitoring tool needs to support instrumentation for your application’s programming language. If your applications are written in .NET, you want a crash reporting solution that instruments your .NET applications so it can start providing error reports as quickly as possible;
3. Make Logging Descriptive and Contextual
When developers write log messages for your applications, this data should be descriptive and provide context to help with troubleshooting. You want to ensure that your log data includes information such as timestamps, session IDs, unique user IDs and resource-usage information. Having all of this data in your logs will help with faster troubleshooting. This information also helps you understand what was happening across your infrastructure before an error occurred. The error monitoring tool will be able to correlate this data and tie it to a particular user or session.
4. Make Log Data Structured
A common best practice for logging is to ensure that the data is structured. Unstructured log data makes it much tougher to store, index and search for log events when you’re troubleshooting. Having the data in a structured format requires specifying not just that something bad has happened within your application, but also which customer ID, for example, it happened to. When this data is sent to your monitoring tool, you’re now able to see that this customer was impacted by that issue, as well as what other issues may be impacting that customer.
5. Use APIs or Plugins
You want to minimize the changes you need to make to sustain your logging and monitoring integration. Utilizing APIs, webhooks or plugins from your error monitoring tool will help accomplish this. APIs tend to stay relatively constant to provide support for integrations from many systems, so using them to send or receive log data is recommended. Also, this can allow for alerting and notifications about issues uncovered in your log, provided to you via email or ChatOps systems.
Let’s ‘Log and Roll’
Log data contains some critical pieces of information about your applications and infrastructure, including databases. Having this data stored in logging tools is very beneficial for collecting, aggregating, and viewing this data. But there are shortcomings and weaknesses that can be overcome by complementing your logging with error monitoring.
The ultimate goal is to keep your end users happy and have good customer stories. To do this, you want to use both logging and monitoring together, in an integrated fashion, to accomplish your goal. With the right logging tools or methodology and error monitoring tools, your developers and operations teams will be able to plan for and troubleshoot application issues much faster.
And that’s the benefit of having complementary tools: things just get done faster.