Cloud Services / DevOps / Monitoring

Dynatrace Confronts the ‘Unknown Unknowns’ of Observability

18 Feb 2021 9:35am, by

Dynatrace offered details about a number of updates to its monitoring platform last week during its Perform virtual conference, in support of its ambitions to offer “full observability” extending DevOps into all facets of BizDevSecOps, bringing business and security teams into app lifecycle management.

“DevOps will always be a huge focus of the Dynatrace Platform, but we’re here to both arm these teams with more precise automation and enable them to take the next steps towards BizDevSecOps,” Bernd Greifeneder, Dynatrace founder and chief technology officer, told The New Stack.

Dynatrace’s plans to extend its platforms and tools to beyond observability to help solve emerging DevOps’ challenges also support improvements to analytics, artificial intelligence (AI) and automation for “digital services in a broader sense,” Greifeneder told The New Stack.

There was a lot to unpack at this virtual conference.

Observability Context

Confusion still exists about what observability is and how DevOps can make use of it. Some vendors have been criticized for purporting to offer observability tools that are really, instead, application-monitoring platforms, while some have claimed — incorrectly, in this writer’s opinion — that monitoring and alerts have nothing to do with observability.

In his talk “Unlock and Scale OpenTelemetry with Dynatrace AI and Automation,” Daniel Khan, Dynatrace’s director of technology strategy, discussed how the “unknown unknowns” come into play. With the unknown unknowns, “very easily put, you might monitor or measure the request time of your application and you see it’s getting slow,” Khan said during his talk, given with Daniela Rabiser, Dynatrace technical product manager and Nizar Tyrewalla, AWS principal product manager. “But if your system is observable, you will not only know that it’s slow but because you can drill into that, you can find out that it’s only slow when your shop is making a currency conversion to, let’s say British pounds. So this is, knowing, or being able to look deeper into data.”

During the talk, Dynatrace’s future projects were also discussed, as well as the company’s participation in Cloud Native Computing Foundation’s (CNCF) OpenTelemetry project. Created to help DevOps teams avoid having to instrument themselves high-quality telemetry while avoiding vendor lock-in with proprietary agents, the CNCF OpenTelemetry project offers vendor-neutral integration points. These help organizations obtain the raw materials — the “telemetry” — that fuel modern observability tools, and with minimal effort at integration time.

Rabiser described how Dynatrace offers “a vendor-agnostic, highly stable and performant” OpenTelemetry metrics collector that customers can use to unlock intelligent observability for custom OpenTelemetry metrics from AWS.

“Dynatrace does not only support OpenTelemetry metrics, it also automates data collection and data enrichment and we also make sure to put the collected data into context for OpenTelemetry, as well as for open tracing,” Rabiser said. “We also cover both custom instrumentation, as well as pre-instrumentation technologies, all across heterogeneous environments, and the captured spans are then integrated into Dynatrace.”

Theory to Instrumentation

Dynatrace improved its platform’s infrastructure-monitoring capabilities by expanding its observability reach with native log support for Kubernetes and multicloud environments. Among other enhancements:

  • A Cloud Automation Module designed to further automate operations and continuous delivery (CD).  “Automation, security, service level and user experience all start with developers — think shift left and right — and therefore span the full software lifecycle, from pre-production to production and back,” Greifeneder said. “The Dynatrace Cloud Automation Module utilizes quality gates and auto-remediation best practices, allowing DevOps and SRE teams to be more agile, while also increasing production reliability.”
  • Broader application-security coverage to help DevSecOps teams make “more informed decisions,” and to help secure their cloud native environments. Greifeneder described how Dynatrace Application Security automates application security across the entire application production pipeline and deployment lifecycle by “drastically reduced false positives in pre-production, as well as always-on protection for operations,” Greifeneder said.
  • The extension of Session Replay and business analytics to native-mobile applications, to better support more highly regulated industries, such as financial services, health care and government. “With more Session Replay capabilities for native-mobile environments, we bring the real-world experience from customers back to developers, so that teams make the right decisions from sprint to sprint based on feature adoption and customer journeys,” Greifeneder said.
  • The new Dynatrace Software Intelligence Hub, which allows Dynatrace customers to access over 500 tools and technologies from cloud providers, including Amazon Web Services (AWS) and Google. “We launched the Dynatrace Software Intelligence Hub to allow customers to easily extend Dynatrace’s automation and AI-assistance across their multicloud environments and to more use cases,” Greifeneder said.

AI Eats the World

In addition to serving as a major component of the announcements described above, automation, and especially AI, were commonly addressed and discussed during Perform. For observability and monitoring that span across multicloud and legacy environments, Wolfgang Beer, principal product manager, Dynatrace, used the now-iconic data stream in the science-fiction fiction classic “The Matrix” to illustrate the scope and scale of data that must be monitored in an IT environment consisting of tens of thousands of hosts.

“So, think about, and imagine, all of that information streaming in your monitoring environment: it could look like a Matrix-style of streaming information. So, it’s a lot of information streaming in into your monitoring environment, split by individual dimensions, such as hosts, processes and services,” Beer said, during his talk, “Automating Cloud Operations with Davis AI at the Core of the Dynatrace Platform, with Abigail Wilson, reliability architect, CFA Institute. “With all kinds of dimensionality in place, without the proper AI on top, you have no idea what it’s all about. So, there is no meaning behind all that data, and a human being is likely overwhelmed by that stream of data.”

Wilson described how when previously trying to track down a root without AI could result in hundreds of different job tickets, for example. “But once we moved to Dynatrace and Davis, that helped us get a lot of clarity on those issues, and our incident count is now in the low double digits, which feels really good for all of our team,” Wilson said, adding that her team could also use the Dynatrace API to monitor custom events and alerts that were related to legacy architecture, as well as to cloud environments.

“With all these things together, we really have an integrated view into our entire platform, and we’re able to make decisions based on the information that helped, not just respond to an incident, but also helped us to move forward,” Wilson said.

Amazon Web Services, the Cloud Native Computing Foundation and Dynatrace are sponsors of The New Stack.

A newsletter digest of the week’s most important stories & analyses.