Cloud Services / Monitoring / Serverless / Sponsored / Contributed

Pros and Cons of CloudWatch for Error Monitoring

15 Oct 2020 12:00pm, by

Sentry sponsored this post.

Anupam ‘AJ’ Jindal
AJ enjoys hacking new market technologies and building revenue-generating product solutions. These days, he is obsessed with serverless and Spring Boot platforms. AJ is driving growth initiatives as Head of Growth at Sentry.

As a developer, I love Lambda functions. They allow me to focus on the purpose of the functionality and save tons of time writing and deploying code. At the same time, one of the biggest challenges of using Lambda functions in production has been the troubleshooting of issues. This stems from a visibility gap between the code and how the user experiences the application, and a lack of monitoring tools that specifically address this key problem in serverless environments.

Certainly, Amazon’s monitoring tool CloudWatch provides a way to track function metrics and deep dive in the logs for debugging. However, combing through logs is not how I want to debug my issues — it takes hours. 

I took a look at a number of tools that help you set up log forwarding, to monitor errors and exceptions. Here’s how they typically work:

  1. Use the pre-configured CloudFormation stack to setup cloud resources and permissions in your environment.
  2. Use CloudWatch APIs to stream (usually using Kinesis+Firehose) filtered logs into their own tool.
  3. Apply formatting on ingested logs to present the errors and exceptions in a more consumable way.

This process works great. I was able to set up error monitoring in a few minutes without changing my code. In addition, I now had stack traces and interesting function details like:

  1. Function memory usage
  2. Function invocation time
  3. Cost of executing my lambda functions

Challenges

Now, here are the challenges with the approach:

  1. I don’t like that a tool has control over my aws account, because it is using assumeRole to access my account info.
  2. The stack traces are still quite hard to read. Here’s an example:
  3. A lot of run-time context is lost, for example other threads and the ability to configure additional parameters.
  4. I can’t see traces or transactions, to be able to debug issues across my application to correlate front-end and back-end behavior.

So, while log forwarding from CloudWatch is better than using the CloudWatch itself, it is not without flaws.

Error Reporting Through Code Instrumentation

Next, I tried error reporting through code instrumentation. For my trial, I used our open source tool: Sentry. Following are my setup steps for a Node function (although note that Sentry supports Python environments as well).

I followed Sentry’s docs for Node Lambda integration. The instrumentation works in the following way:

    1. Initialize Sentry with dsn (I got that by creating an account).

    1. Wrap my lambda handler in Sentry’s wrapper.

And that’s it. I was now reporting errors into Sentry.

This seemed like a better approach, because:

  1. Sentry is using run-time instrumentation to highlight exceptions, versus using assumeRole on my account to set up log forwarding. This eliminates the concern with providing a third-party tool with access to my AWS account. Side benefit: I don’t have to pay AWS for any additional resources that are required for log forwarding.
  1. I had all the function context, including request_id and execution time. I can use parameters like aws_request_id to understand the downstream impact of the error.
  2. Deep-links to CloudWatch logs to save time in searching for the right log stream and time window.

The cons are that I can’t get memory usage or function invocation time using this approach, but I can’t move those numbers anyway.

Another future benefit of the run-time instrumentation is that this allows me to monitor distributed tracing in order to identify what specific pieces are slowing down my function execution. As a result, I’m able to ensure a better user experience and save on AWS costs.

Serverless promises fewer management burdens for development teams, but limitations with troubleshooting can negate the time saved. It is important to consider monitoring tools that provide function context without increasing risk and cost. The comparison between CloudWatch and Sentry highlights these factors and their importance.

Amazon Web Services is a sponsor of The New Stack.

Feature image via Pixabay.

A newsletter digest of the week’s most important stories & analyses.