Dashbird: Turning Serverless Monitoring Instruments into Debugging Tools

Serverless technology is coming a long way quickly, but for more mainstream adoption, a suite of monitoring tools will be needed to allow developers to track what is actually going on with their applications and workflows. Serverless requires a different type of monitoring (focused on individual function performance and on the robustness of complex workflows, for example) and a wave of new tools are keen to provide the solutions: Stackery, IOpipe, and now Dashbird have all entered the arena.
Dashbird co-founders Taavi Rehemägi and Mikk Kirstein believe their monitoring approach is the most insightful and accurate, without degrading performance, as monitoring comes directly from CloudWatch logs rather than injecting new code that runs between functions. But almost as a byproduct, Dashbird is finding that their monitoring solution is having a new use: as a debugging tool that can help developers gain feedback on their serverless workflows as they execute them.
For now, Dashbird is focused on monitoring in AWS, as 95 percent of the market is on AWS Lambda, said Rehemägi, CEO and co-founder. “We hook up to an AWS account, and look at CloudWatch logs. Our service definitely doesn’t have any effect on the code execution speed or cause increased package size or any other overhead. Our immediate benefits are the setup speed and the account-wide visibility,” he said.
Rehemägi said that traditional application performance monitoring approaches usually send data via a remote API to collect logs, which are then used to create dashboards and alerts based on that data ingested. That’s not an approach that works for serverless, he explains: “You have a lot of functions going off and everything happens in split seconds, so collecting engine caches makes more sense. Also, timeouts might not get reported because the invocation is stopped there. As a way around that, we have a flag for function retires to show when it was a retrial or when it was a first invocation.”
Rehemägi said the main types of errors he is seeing amongst users include:
- Timeouts: These are the major type of errors, especially amongst serverless workflows where one function may take longer than six seconds
- Silent failures: Rehemägi said these are a big problem, as users are not attaching reporting to some functions that they believe can’t fail. “Some 10 or 20 functions out of 500 might have a problem that you don’t think will be there. That was a problem you didn’t have with Docker or traditional programming approaches.”
- Configuration errors: These occur when a user forgets to attach a dependency or library, which means the event handler isn’t triggered, so the problem doesn’t even get reported.
All three types of errors are able to be analyzed and addressed through Dashbird, which currently has dashboard views to provide overviews of serverless architecture performance, and can integrate with Slack to provide key reporting as needed. The next stage for the team is to begin analyzing memory usage over time so that users can get pro-active alerts when serverless architectures reach a certain threshold or at risk of running into memory limits.
Using Monitoring Tools for Debugging
But perhaps one of the unexpected use cases for Dashbird has been its use as a debugging tool. One of the challenges of serverless is that there is no development staging, so developers must execute their functions to see how they run in the real world and then figure out what is going on if it does not perform as expected. The feedback loop on what is happening with code is much longer: adopters need to build and deploy their serverless workflow before they can see what is happening.
The open source Serverless Offline plugin for use with the Serverless Framework can emulate Lambda operations and an API gateway to allow developers to create a development staging environment. With Serverless Offline, “at least now I can test all my code locally before pushing it to AWS. That’s a relief,” wrote Adnan Rahic, co-founder and developer at Croatian online learning platform, Bookvar.
But for those who do not need a completely separate environment, Dashbird’s monitoring may be able to act as a debugging tool in itself. “Since we separate all the invocations and the logs, you can use it as you develop so you get the context as it happens,” said Rehemägi. “So you see the code as it executes. It is not only about catching errors but about building systems as well.”
Rehemägi said he is seeing a number of Dashbird users stay in their monitoring tool for a few hours each day, indicating its use for debugging. While he believed that Dashbird does enable that feedback loop to shorten, he is hesitant to call it real-time — “the delay is about a minute” — but he is hopeful that the example will encourage other serverless product creators to start building solutions.
Meanwhile, Dashbird is now released in General Availability and has already doubled its usage since its private beta stage. As other serverless toolmakers are finding, early adopters are often coming from marketing and online ad companies but even within that segment, there is wide variability. Amongst those companies using Dashbird to date, about 10 percent are hitting 100 million invocations each month. Medium-sized companies are beginning to adopt serverless for one specific use case, and the rest are experimenting with what is possible. “But half are serious users of Lambda,” Rehemägi estimated.
Feature image: By Ridham Nagralawala on Unsplash.