What news from AWS re:Invent last week will have the most impact on you?
Amazon Q, an AI chatbot for explaining how AWS works.
Super-fast S3 Express storage.
New Graviton 4 processor instances.
Emily Freeman leaving AWS.
I don't use AWS, so none of this will affect me.
Kubernetes / Microservices / Observability

Introducing the Telegraf Operator: Kubernetes Sidecars Made Simple

Both DaemonSet and Sidecar are important for monitoring in Kubernetes. The Telegraf Operator allows you to define a common output destination for metrics.
Jun 10th, 2020 11:16am by
Featued image for: Introducing the Telegraf Operator: Kubernetes Sidecars Made Simple
Feature image via Pixabay.

Giacomo Tirabassi
Giacomo is a Site Reliability Engineer at InfluxData.

Kubernetes is great. It takes care of deployment, scaling and upgrading/updating of containerized application clusters in a declarative and automated manner. But although automation reduces the operational burden of managing applications in production, it also makes monitoring even more necessary. Full-stack observation of metrics and events — including container and Kubernetes orchestration layers — must be in place in order to maintain functional and performant applications. Left unwatched, automation can hide issues, until it’s too late and you’ve hit a breaking point.

Monitoring is considered a first-class property of any modern system, and Kubernetes is no different. There are two main mechanisms for deploying monitoring agents:

  • DaemonSet deployment that ensures that all nodes run a copy of a monitoring agent pod.
  • Sidecar deployment where the monitoring agent container shares the pod with the application container.

In a DaemonSet type of monitoring, the node agent will collect data from all pods running on the node. Usually, it is used to observe the framework infrastructure, such as kubelet (node, container and pod) metrics, network metrics, logs, tracing, and error reporting. However, when it comes to collecting metrics from specific workloads/applications running on the containers, a Sidecar deployment is typically a better alternative.

That is because, with a Sidecar monitoring agent, custom metrics and monitoring of that specific application can be defined without impacting the overall monitoring framework shared by other workloads. Over time, a growing number of Prometheus endpoint metrics exposed by application developers can lead to scalability issues in a DaemonSet type of deployment. See this blog post for more information about using a Sidecar deployment to scale application monitoring on Kubernetes, enabling IT Ops to give developers the ability to monitor their own applications.

Telegraf Kubernetes Operator for DaemonSet and Sidecar Deployments

Both DaemonSet and Sidecar are important for monitoring in Kubernetes. We, at InfluxData, learned that lesson from our own work to monitor InfluxDB Cloud running on Kubernetes. We have multiple clouds and regions used to serve our InfluxDB Cloud customers globally.

As our developers instrumented the microservices that compose our platform to expose metrics for performance monitoring and diagnosis, we quickly found out that DaemonSet implementation for scraping application metrics hit practical challenges with the growing number of metrics exposed by some microservices. So we adopted:

  • DaemonSet for node, pod and container metrics
  • Sidecar monitoring for microservices that expose large amounts of metrics

By doing both, we make application monitoring a no-contention issue between infrastructure teams and development teams.

How Does Telegraf Operator Help?

In short, the Telegraf Operator allows you to define a common output destination for all your metrics (we think the InfluxDB output is a good choice, but you can use any Telegraf output), and configure Sidecar monitoring on your application pods using labels.

A good Kubernetes monitoring solution also has to be a painless one, which must also apply to the scaling of that monitoring. With that vision in mind, InfluxData looked to take the pain out of adding DaemonSet and Sidecar monitoring deployment on Kubernetes, by building an open source project on GitHub called telegraf-operator.

Telegraf is InfluxData’s lightweight plugin-based agent that you can use to collect Prometheus metrics, custom application metrics, logs, network performance data, system metrics and more. There are more than 200 plugins for the various applications, tools, protocols and virtualization frameworks in use today. If you see something missing, we invite you to help contribute by adding a custom plugin of your own.

Now, let’s explain what a Kubernetes operator is. A Kubernetes operator is a method of packaging, deploying and managing an application using Kubernetes constructs. Basically, it expands Kubernetes to support other workflows and functionalities specific to an application. So, the telegraf-operator packages the operational aspects for deploying a Telegraf agent on Kubernetes as an application Sidecar, and configures it to scrape the exposed application metrics. All defined declaratively in a yaml file. That’s it!

Let’s examine this in practice.

Installing the Telegraf Operator in Kubernetes

The telegraf-operator starts a Pod in the cluster in its own namespace. Installing the telegraf-operator is very simple and can be done via kubectl, as shown below:

kubectl apply -f telegraf-operator.yml

(An example of the yml file can be found in the deploy directory.)

It can also be installed using other tools, such as Helm or Jsonnet.

helm upgrade --install my-release influxdata/telegraf-operator

The telegraf-operator will start watching for pods being deployed with a specific set of pod annotations. It will then take care of installing Telegraf Sidecars with the respective input plugin configuration to those pods automatically, and sending the metrics data to the output you’ve set up. Your users deploying applications never need to worry about configuring a metrics destination. It is set once by you for the entire cluster.

Start Scraping Metrics

For instance, if you would like Telegraf to scrape your application/metrics endpoint, you just need to annotate the pod of the application container.

See below an example of a DaemonSet deployment yaml file with Telegraf configuration data:

And a sample of a StatefulSet deployment of Redis yaml file with Telegraf configuration data:

Telegraf-Operator with Helm

Most helm charts offer the possibility to add custom annotations to pods, which is all that is needed to use telegraf-operator with Helm. For instance, here is an example of what it would take to add Telegraf monitoring to an Elasticsearch deployment using Helm.

You just need to substitute podAnnotations: {} in the values.yml with:

Next, install Elasticsearch with newly changed values file:

It’s that simple!

Enhancing Customer and Developer Community Experiences

Extending the power of Kubernetes constructs to automate and ease the management of applications is the great value of operators. So we leveraged Kubernetes operator framework and developed a telegraf-operator to ease deployment and configuration of Telegraf agent in a Sidecar deployment mode. By doing so, we made monitoring Kubernetes and the workloads running on it scalable from both technical and operational perspectives. We at InfluxData are committed to continually enhancing our platform with modern engineering practices and technologies to deliver the best experience not only to our customers, but also to our developer community.

We’d love to hear from you in our Slack channels or GitHub repos. Let us know what you think!

Group Created with Sketch.
TNS owner Insight Partners is an investor in: Pragma.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.