How the OpenTelemetry Collector Scales Observability
You can manage without an OpenTelemetry collector, an open source observability framework, but you probably won’t want to, especially if you’re deploying and monitoring applications at scale.
You’ll probably also want to use the OpenTelemetry Collector whenever multiple applications or microservices are involved, particularly for security considerations.
This becomes apparent as OpenTelemetry expands its scope and becomes widely accepted as a way to use your favorite observability tools in a unified interface or component that’s compatible as vendors seek to meet the OpenTelemetry standard.
OpenTelemetry Collector is an observability pipeline middleware that can receive, process and export data at scale, explained Evan Bradley, a senior software engineer at Dynatrace, described during a talk “OTTL Me Why Transforming Telemetry in the OpenTelemetry Collector” with Tyler Helmuth senior software engineer, open source, at Honeycomb.io during KubeCon + CloudNativeCon last month.
“So why might you want to use the collector? Well, there are numerous reasons but the first significant one is that you can process at the edge — processing at the edge allows you to split this work across multiple machines, which can help increase data throughput in your pipeline,” Bradley said. “You can run the collector at the edge or anywhere else in your pipeline because it can be deployed anywhere and it can be deployed in containerized, virtualized, or even functions as a service environment. Moreover, you can process data close to its origins or further away, such as at critical points of your pipeline, like at the point of ingress at the boundary of a secure network.”
The collector is designed to adapt well to different use cases since it’s fast and versatile, Bradley said. “It has been designed with high throughput and low latency in mind, so it won’t slow down your pipeline. Additionally, it has low CPU, memory, and disk space requirements,” Bradley said.
What Does an OpenTelemetry Collector Do?
An OpenTelemetry collector serves to collect data sent to it from one or many sources. In addition to receiving data, it sends data to an endpoint, such as for visualization with a Grafana panel.
With it, it can be configured to collect certain types of logs, traces and metrics for observability.
Initially, you can opt not to use it, especially when employing a monitoring application that collects and transfers all data directly to the observability platform or through OpenTelemetry, gathering metrics, logs, traces, etc.
However, this approach becomes challenging when monitoring multiple applications or microservices. Without the OpenTelemetry collector, you’d need to configure each backend or user monitoring separately for those, which can be cumbersome.
On the contrary, an OpenTelemetry collector serves as a single endpoint for all microservices, streamlining access to applications and microservices through a unified point facilitated by the collector.
Utilizing this collector, you can view and manage them collectively, offering a consolidated view on a platform like Grafana. While Grafana provides certain alternatives without an OpenTelemetry collector, the collector significantly simplifies this process.
A custom collector can also be tailored to fit the situation at hand by selecting only the components you need, Bradley said. For cases where existing options are unavailable, all collector components are written using the same core APIs, allowing you to leverage these to add your own code to accomplish a task,” Bradley said.
The data flow through the collector is organized into pipelines, composed of individual components, each handling a specific task, Bradley said. The collector has five classes of components, but in his talk, receivers, processors and exporters were covered. The diagram above illustrates an example pipeline, where data enters the collector at one of the points on the left, proceeds through the pipeline, and is emitted on the right, Bradley said.
With the OpenTelemetry Transformation Language, the OpenTelemetry Collector’s filter or processing function can be used to filter the kinds of telemetry data it receives and sends. Helmuth showed how OTTL supports the filter functionality.
During his presentation, Helmuth showed when it makes sense to reduce the ingest volume by dropping events categorized as completed, as they are deemed unnecessary, he said.
In the image above, the intention was to utilize the filter processor to implement the decision of which data to drop, which operated based on an OTTL condition. These conditions interacted with the underlying telemetry without altering it. The filter processor employed OTTL conditions to select data for dropping; when the condition was satisfied, the processor removed the data, Helmuth said.
In the case of a Kubernetes objects receiver, it would emit Kubernetes events in the form of logs, with these events existing as nested maps within the log body.
Any body not structured as anticipated (i.e., not resembling a K8s event) was to be discarded, Helmuth described. In the top box of the image above, the body was a map containing a nested map within the object key, so the conditions were not met, and the data was retained. Conversely, in the second box in the above image, the body was a string, which did not align with the expected map structure, Helmuth said.
— BC Gain (@bcamerongain) November 6, 2023
Different alternatives exist for telemetry collection. As such the OpenTelemetry Collector falls under the category of an observability agent. Observability agents, such as the OpenTelemetry Collector, include FluentBit, Vector, and others, “exhibit high robustness and perform various tasks to achieve their remarkable outcomes,” Braydon Kains, software developer at Google, said during his own KubeCon + CloudNativeCon talk “How Much Overhead How to Evaluate Observability Agent Performance.”
At the end of the talk, the question was asked about which collector is the best collector. Kains described how the Google Cloud Ops agent is a fusion of two agents. Behind the scenes, it combines Fluent Bit for log collection and OpenTelemetry for gathering metrics and traces, he said.
The team manages a central configuration layer responsible for generating configurations for both the underlying OpenTelemetry and Fluent Bit. These configurations include recommended optimizations tailored for users primarily operating on virtual machines, like plain VMs, to efficiently collect metrics using OpenTelemetry, he said.
“There are a lot of knobs to keep track of and it can be hard for a customer who’s new to this to keep track of them all,” Kains said. “We take on the responsibility of keeping track of those knobs, and try to come up with the settings that are going to be optimal in the most general cases.”