When a performance issue starts to affect your customers or users, you want to be able to investigate, pinpoint the source of the problem, and correct it quickly. To defend against similar issues in the future, you also need to identify warning signs, set up automated alerts, and fully document your incident response in a postmortem, along with any further actions that should be taken.
Datadog, the infrastructure and application monitoring service, is making it easier to create rich, shareable postmortems with its new Notebooks feature. Much like the web-based notebooks that have become standard tools for data exploration in scientific fields, Datadog Notebooks allow you to contextualize live data from your applications and infrastructure with detailed explanations and analysis.
Each of the cells in a Notebook can contain a Datadog metric graph or rich, Markdown-formatted text. In addition to creating detailed postmortems, you can use Notebooks to build runbooks that help engineers address recurring issues, or simply use them as scratch pads for open-ended exploration of your metrics.
With Datadog Notebooks, you can create detailed postmortems that you can share with your entire team, using historical metric graphs from the incident. Contextualizing text added to any notebook is formatted with Markdown, allowing you to add headings, subheadings, links, lists, and code blocks:
Every graph in a Notebook can be set to an adjustable “Global Time” or locked to its own specific timeframe. So you can show graphs that depict system behavior at the time of the incident, as well as any warning signs or upstream issues that you identify. Pinpointing system behavior leading up to the incident provides the information you need to create alerts that can help you get ahead of the issue next time:
When an alert is triggered, having a runbook available can make all the difference in response time. Notebooks allow you to easily create, distribute, and update runbooks so you can ensure whoever is on-call has access to step-by-step instructions for dealing with known issues. Including a link to a runbook in an alert makes that alert much more actionable.
With Notebooks, you can quickly visualize any of your infrastructure or application metrics as time series, heat maps, or distributions. For instance, you can break out a graph of a globally aggregated metric into individual graphs for each availability zone:
New Notebooks are private and ephemeral by default, so you can treat them like a scratch pad for exploration or investigation. If you discover something worth saving or sharing, however, you can save your work with the “Save Notebook” button.
Go Forth and Explore!
Rich, up-to-date, accessible internal documentation provides much-needed context to engineering teams. Datadog’s Notebooks feature makes it easy to create and maintain postmortems and runbooks, while also allowing you to explore your infrastructure metrics freely.
If you’re already a Datadog customer, you can access Notebooks by clicking on the “Notebooks” button in your sidebar. Otherwise, Datadog has a free 14-day trial so you can give Notebooks a spin.
Datadog sponsored this story.