Cloud Native Ecosystem / Microservices / Observability

New Relic Takes on Distributed Tracing’s Heavy Lifting 

5 May 2020 11:11am, by

Observability platform provider New Relic has added what it calls “Infinite Tracing” to its distributed tracing package. By adding the capability, New Relic removes the need for organizations to create — and invest in — the underlying infrastructure required for monitoring and debugging highly distributed environments, such as infrastructures deployed across multicloud environments, the company asserts. In this way, Infinite Tracing assumes such tasks as storage, deployments and management of the data collected from often numerous environments an application management provider (APM) might trace.

While New Relic and other distributed tracing systems have previously existed, none previously were able to process a “multitenant pool of customers’ firehose data at scale,” Andrew Tunall, senior director, product management for New Relic, told The New Stack. “Your DevOps can just begin to send us your data for distributed tracing and be able to see it,” Tunall said. “New Relic didn’t previously have this fully managed offering, nor did anyone else.”

The main concept is how Infinite Tracing automates many of the tasks associated with the tail-based sampling of errors. With traditional tail-based sampling systems, organizations must install and manage the observance of spans for error detection. Organizations also need to manage the installation and storage of the data from these observances in the form of logs, metrics and traces for events.

An organization might, for example, be faced with delegating to operations or site reliability engineering (SRE) teams the management of massive amounts of egress and storage data that for many organizations might result in millions of tracing-generated data points. With Infinite Tracing, New Relic says it automates this process.

For an organization that “has already decided that they don’t want to go the pure open source route,” Nancy Gohring an analyst for 451 Research, said the key factors taken into consideration when selecting a distributed tracing tool include:

  • whether the system allows access to the traces required for troubleshooting, “without breaking the bank.”
  • whether the system intelligently analyzes traces in a way that is helpful to users of the tool.
  • whether the system is or can be well-integrated with adjacent monitoring tools that collect other types of data, such as metrics and logs.

While noting there are several vendors in the market that are competitive in these three areas, Gohring said: “Like most any commercial distributed tracing offering, New Relic’s service takes away a lot of heavy lifting as compared to running pure open source distributed tracing software,” such as those Dapper and Zipkin offer.

New Relic offers the example of an organization that might have an average span load of 3 million spans per minute, which can jump to 300 million spans per minute, such as when a new application is released.

For such massive amounts of microservices data to manage, tracing consists of “one of those key signals you’re trying to understand,” in order to troubleshoot or optimize the system with hundreds or thousands of microservices operating distributed systems, Tunall said. “And so the result is that metrics tell us something is going wrong, maybe infrequently, maybe in a very odd percentile, but rarely do we actually capture the data that tells us very specifically what’s going on, which is really what the trace signal is intended for,” Tunall said.

With Infinite Tracing, distributed tracing is “fully managed for our customers,” Tunall said. According to Tunall, for example, Infinite Tracing allows for an executed GraphQL query the provisioning of a trace observer that is a publicly accessible endpoint for the workload to be created and provisioned in just a few seconds.

“So if you think about a new DevOps team, they spin up a new app, they’ve already have continuous deployment and rapid iteration tools,” Tunall said. “And the last thing that they want to do now is to add the toil of managing a system to gather and forward their data to a telemetry provider.”

Feature image via Pixabay.