How Container Lifespan Affects Observability
LogDNA sponsored this post.
Containers have fundamentally changed the way we run applications. Organizations no longer run applications as long-running services, but as temporary processes. The speed at which containers can be provisioned has allowed organizations to scale, optimize resource consumption, and update applications faster than ever before. A typical Kubernetes container lasts for just one day, while a typical AWS Lambda container only lasts for around one hour.
The dynamic and ephemeral nature of containers has also impacted observability. Observability is a crucial part of the DevOps process, but containers add unique challenges when compared to more traditional monolithic applications.
In this post I’ll explain what these challenges are, and what DevOps teams can do to ease the transition into a container-based architecture.
The Complexities of Container Observability
Before we dive into container lifespans, I’ll first explain the challenges in collecting observability data from containers. There are several methods, including some that are built into container runtime and orchestration tools like Docker and Kubernetes. These include:
- Deploying a dedicated monitoring agent as a host application or container.
- Deploying a log router to automatically collect logs generated by containers.
- Using the Docker logging driver to store container logs to the host.
- Collecting metrics via docker stats, the Kubernetes metrics pipeline, or a similar API.
However, there are risks to these approaches. Enterprise microservice deployments may involve hundreds, thousands, or even tens of thousands of hosts spanning many cloud platforms. Not only do applications no longer run continuously, but teams no longer know with certainty which host will run the application until it’s deployed. A single application may also have multiple instances running simultaneously, making it difficult to know where to look when problems arise. Since containers are ephemeral, any data written to the container filesystem will be deleted along with the container unless it can be transferred to the host. This also means engineers are no longer guaranteed the ability to interactively troubleshoot a running container, since it may have been deleted before an engineer can open a session.
In addition, hosts themselves can be ephemeral. Tools like the Kubernetes Cluster Autoscaler, Google Kubernetes Engine, and Amazon Elastic Kubernetes Service can automatically add or remove hosts to meet changes in demand, which can lead to data loss for any files written to the host filesystem. Compare this to a more traditional monolithic application, where the deployment environment remains relatively consistent throughout the application’s lifecycle. In a traditional environment, DevOps teams can install a monitoring agent to the host, record data directly from the application to the host’s filesystem, or even log into the host environment to gather information or troubleshoot the problem. With containers, this becomes a nearly impossible task. This is why approaching container observability with a monolithic mindset is not only self-defeating but dangerous.
When implementing observability in a container-based architecture, teams need to focus on two key goals:
- Collecting and centralizing observability data from containers and hosts.
- Measuring observability data in the context of the entire application, rather than individual containers.
Next, I’ll explain what these goals mean and how to implement them.
Collecting and Centralizing Observability Data
As mentioned previously, containers and hosts are ephemeral: they can start, stop, and migrate at any moment. As such, any observability data generated by containers and hosts should be sent to a persistent collection and storage service.
Third-party services like LogDNA are an ideal solution in this scenario because they provide a dedicated, fast, and reliable platform designed to ingest large amounts of data. For example, LogDNA aggregates and parses incoming log data so that it can be immediately searched, filtered, and analyzed by engineers. This eliminates the need for engineers to remotely log into the application environment and ensures that observability data is always available, even if the original container or host is destroyed.
Contextualization Through Metadata
Unlike a monolith, a container is just one small part of a larger application. While collecting observability data from individual containers is important, this data becomes truly useful when viewed in the context of the entire application.
For example, imagine that you have a three-tiered web application, with each tier running as a separate container. Now imagine that your backend tier suddenly starts generating errors and containers are crashing as a result. Pulling logs and metrics from individual containers will help with root cause analysis, but that won’t help you see the error in the context of the entire application. The problem may be container-specific, or it may be indicative of a broader, application-wide issue.
How does this relate to ephemerality? When collecting observability data, the data you collect should identify the service provided by the container, not the container itself. For example, the LogDNA agent automatically tags each container log with the container name and image name. The Kubernetes agent extends this by including the Pod name, node, and Kubernetes namespace, along with other useful metadata. This allows you to search and filter logs across all instances of a container, Kubernetes service, or deployment. Having this level of context is especially important for distributed tracing, but it applies to all forms of observability data.
How to Effectively Handle Ephemerality
Managed monitoring services like LogDNA are among the easiest and most effective solutions for collecting, contextualizing, and accessing observability data. With LogDNA, you can deploy an agent to collect logs from an entire Kubernetes cluster with just two commands. The LogDNA agent collects host logs in addition to your container logs, giving you a holistic view of your applications and infrastructure. And, because the agent is deployed as a DaemonSet, it automatically scales with your infrastructure.
If you want to learn more about how to log microservices and ephemeral workloads, read our blog post on Logging in an Era of Containers, as well as our post on the Challenges in Logging Serverless Applications.
Feature image via Pixabay.