Infrastructure monitoring service provider Datadog recently unveiled a new integration with Atlassian’s Jira: a popular tool to manage the software development process. We’ve reached out to Datadog Product Manager Daniel Langer to learn a little bit more about this integration, and the future of Datadog’s services.
For those who might not yet know about DataDog, how would you describe it?
Say you’re on one public cloud, multiple public clouds, or on a hybrid solution. We enable you to gather metrics and event data from your various products and store it in Datadog. We do this primarily through two ways. One is via a Python program that sits on your virtual machines and sends performance level back to Datadog. We also have API-level integrations so you can integrate with AWS CloudWatch, Azure, and Google.
You get API level metric data from services and platforms. With things like Lambda and Azure Functions, where you don’t really have a virtual machine exposed, but you still want to monitor the performance. We also enable you to send custom metrics directly from your code to Datadog, using stat c [system call] or a fork of stat c to send custom metrics to Datadog.
We are time series based collection software, so we synchronize your data across all these different machines from the services in time, so that you can overlay metrics on top of each other in the same graph, and in the same time period. You can overlay events on top as well to check deployments, a Github push. You can see how events correlate with metrics to be able to understand what can cause an issue.
In addition to those integrations, we have integrations to third-party applications like Jira, Slack, and Github. They are less about getting metrics but more for integrating with tooling.
What differentiates Datadog as a monitoring solution?
Working with dynamic infrastructure is a big differentiator for us. A lot of the monitoring solutions were built for static environments.
The type of monitoring solutions that have been around forever, they are now trying to claim that they can do cloud, and handle the dynamic environments of these kinds of modern infrastructures, but when you get down to it, they don’t quite efficiently monitor these highly dynamic services. We have a lot of tagging systems in place to these things.
Tell us about your new Jira integration.
One of the core functionalities within Datadog is our alerting system. You set up alerts to be triggered on pretty much anything in your infrastructure. On your metrics, on certain machines, on an event occurring, on a combination metrics, or a set of machines because of with tagging.
Within these alerts, you can use integrations like Slack to get notified when an alert is triggered. Now we have a Jira integration that allows you to automatically create Jira tickets from these Datadog alerts. And the reason we are really excited about it is it’s really opening the doors to a new way of using Datadog with Jira.
Maybe an ops person is putting out the fire, and has to shot off an email to development, and maybe development fixes the bug, but the ops guy doesn’t get notified that they fixed it in a very efficient amount of time.
Traditionally Jira is a strong tool for software teams, for developing their agile process, for tracking stories, you can see burndowns. Up until now, there wasn’t a great solution for operations, maybe a manual process or a copy and paste of alerts from data dog into Jira, if there was a process at all. But now we have automated this process of tracking bigger types of issues into Jira by letting you set up your tickets and have them automatically created with info from Datadog.
You can see your infrastructure in detail, right next to more traditional software issues — which can help with better collaboration — and with better understanding the relationship between your infrastructure and your software issues, with the ultimate goal of reducing downtime and prevent recurring issues.
As an engineer who has been responsible for the uptime of production services, I’ve noticed that when downtime occurs, it can be difficult to create proper tickets to make sure the issue is being tracked properly. It would have been great to have something that could automatically kick off and manage that ticketing workflow for me.
Yes, that’s right. The automation is a big part of this because you’ve already predetermined what looks like a problem via setting thresholds, so when the alert is triggered it automatically creates the Jira card for you, and also populates it with the information you have already determined is relevant. It’s all there in the same tool that the developers are using.
We think the idea of developers and operations folk working close together is inevitably a better way to resolve issues in the cloud faster. You can contextualize what happens on the infrastructure side and how it correlates to the software. It doesn’t have to be two worlds operating separately. Maybe an ops person is putting out the fire, and has to shot off an email to development, and maybe development fixes the bug, but the ops guy doesn’t get notified that they fixed it in a very efficient amount of time. This is a way to really clean up all that broken communication between development and ops and track these issues more efficiently.
Can I programmatically specify who receives the ticket, such as the current engineer on call? Or if Engineer A is responsible for this piece of infrastructure, all tickets related to it should be assigned to that engineer?
We did have this for Slack. If there is an issue in a certain environment it gets pinged to your Slack time, which is good for some things but not for tracking, which is why this Jira announcement is exciting. We let you dynamically populate all fields, and one of those can be the assigning field. You can fill it with a particular person, or it can be generated when the alert triggers with the name of the person who the server belongs to.
For those of you who are interested in learning more about integrating Jira with Datadog, there is a great article on the Datadog HQ blog that is a great tutorial on implementing the integration in your project.
Feature image via Pixabay.