Tips for Automating Distributed Logging on Production Kubernetes

Any Kubernetes production environment will rely heavily on logs. Using built-in Kubernetes capabilities along with some additional data collection tools, you can easily automate log collection and aggregation for ongoing analysis of your Kubernetes clusters.
At Kenzan, we typically try to separate out platform logging from application logging. This may be done via very different tooling and applications, or even by filtering and tagging within the logs themselves.
As with any distributed system, logging provides the vital evidence for accurately tracing specific calls, even if they are on different microservices, so that a root cause of the issue(s) may be identified.
Here are our suggestions for logging within a distributed Kubernetes environment:
- Use a single, highly available log aggregator, and capture data from across the entire environment in a single place.
- Create a single, common transaction ID across the entire end-to-end call for each client. This will make it much easier to trace the thread all the way to the ground.
- Ensure that service names and applications are being logged.
- Standardize the logging levels within the entire stack.
- Ensure that no data intended to be secure is being logged in the clear.
Logging in Kubernetes
Besides this high-level approach to logging, you should understand how Kubernetes handles its own logging and events.
Kubernetes nodes run on a virtual Linux computing platform. Components like kubelet and Docker runtime run natively on Linux, logging onto its local system. Linux logging is configured at different folder locations including the ubiquitous /var/log file.
[cycloneslider id=”kubernetes-series-book-1-sponsors”]
The first thing an administrator should do is validate log rotations for these log files, as well as all the other miscellaneous Linux logs. Kubernetes’ documentation provides good recommendations for files to rotate. The logging configuration should be inspected even if you intend to replace the local logging mechanism with an alternative.
We don’t recommend that you keep logs for virtual compute instances inside ephemeral cloud computing environments. Such instances can disappear without notice. Modern logging and analytics tools provide enough context and visual aids to help operators determine what actually transpires inside large Kubernetes cluster deployments. You should use a log aggregation service to ship your logs away from the Kubernetes environment, for later review and analysis.
Capturing Kubernetes Logs
There are a few reliable methods for capturing Kubernetes-native Linux logs, Kubernetes container-based component logs, and all application container log data:
- Simply extend Kubernetes’ existing logging capability. As logs accumulate and rotate on the nodes, you can ship them elsewhere. One popular way to do that is with a logging container whose entire purpose is to send logs to another system.
- Alternately, you can use the Fluentd data collector to transport logs to an ELK stack (Elasticsearch, Logstash and Kibana), or to some other log aggregation and analytics system. This log shipping method makes use of the command kubectl logs. You could have a logging pod with a Fluentd container on every node (Kubernetes makes this easy with the concept of DaemonSets). Fluentd will be configured (via configmap) to read all the log locations for every node and essentially aggregate them into a common searching location.
- There are variations on this approach where your application containers have a logging container in the same pod (A.K.A. Container Sidecar Approach), separating the application from the system logging.
Whatever method you choose, the logs do end up residing on the node at some point, and they do have to go someplace else. All of the methods mentioned above can be very successful but there is a common thread between all of them that we find very important. They all decouple the aggregation of logs away from the application code. This is very critical for separating the concerns of responsibility but also for performance reasons as well.