Modal Title
Kubernetes / Observability / Operations

Enabling Collaborative K8s Troubleshooting with ChatOps

ChatOps lets you chat directly with your Kubernetes cluster to monitor, debug and optimize it.
Nov 18th, 2022 6:28am by
Featued image for: Enabling Collaborative K8s Troubleshooting with ChatOps
Feature image via Pixabay

With all the modern-day tools that harangue us during (and, unfortunately, beyond) our work days, chat has always occupied an important role in the way we communicate online. It has swung in and out of popular fashion, much like the latest JavaScript framework, instead finding the most success in niche communities or applications.

One of those applications was in helping people collaborate on developing software. Instead of multiple people working on their isolated machines, relying on screen shares or copy-pasting code samples into Jira, chat created an opportunity to have ongoing conversations about software — and even automate CI/CD pipelines, deploy to production and respond to outages.

In 2013, GitHub coined the term ChatOps with the introduction of Hubot, an open source chatbot that allowed developers to deploy, monitor, provision and more from its Campfire chat rooms. The idea was to bring the processes developers needed to do their jobs into the tools they used to organize and collaborate as a team — chat rooms. If they could chat and take action from the same prompt, they wouldn’t have to context-switch between many tools, and they could preserve their actions in a transparent, teamwide space.

Between 2014 and 2016, ChatOps was the industry rage. Conversation-driven development felt like a breath of fresh air. With remote work and cloud native growing fast, managers eagerly sought new models, like collaborative management, for dealing with distributed employees and infrastructure. But those early versions of ChatOps disappeared just as quickly, and the DevOps teams relegated chat to “just” an alternative to the hassles of internal emails.

Like no more “reply all chains.

But 2016 was also just the beginning of cloud native. A lot has changed since then. As I said before, chat has always found the most value in smaller communities driving toward specific goals, which is exactly why ChatOps for Kubernetes is leading the conversation-driven charge once again.

What Is ChatOps for Kubernetes?

In short, ChatOps lets you chat directly with your Kubernetes cluster to monitor, debug and optimize it.

But to illustrate exactly how that works, let’s walk through a scenario where ChatOps for Kubernetes can help you resolve a user-facing issue before it starts affecting their experience.

It’s 10 p.m. on a Saturday, and the Kubernetes cluster responsible for running your organization’s web app just experienced an unexpected anomaly. Fortunately, you have a ChatOps app that’s watching the resources of your Kubernetes cluster and listening to the events they emit, including this anomaly. Your ChatOps app translates that event into an alert, which it sends to your organization’s chat app of choice — such as Slack, Microsoft Teams, Discord or Mattermost — for all your DevOps engineers and admins to see.

And because it’s late on a Saturday night, the engineer on call gets a ping on that chat app telling them to drop what they’re doing and start investigating. In normal circumstances, being without a laptop and a steady WiFi connection is a serious hamper, but ChatOps lets your engineer use only the Slack app on their iPhone to monitor, investigate and debug the source of the error event.

ChatOps enriches any chat app into an always-on, context-aware, transparent terminal with direct access to your Kubernetes cluster via all the kubectl syntax your engineer needs to get crucial information about what’s gone wrong and how to fix it.

And as others join the fray, they can start chatting about and collaborating around the issue in parallel to the debugging work. Your chat app becomes not just a historical log of the action already taken, but also a robust audit log for the further development toward new best practices, improved internal documentation and the eventual postmortem.

Enter Collaborative Management for Kubernetes

When collaborative management in DevOps works well, it encourages transparency and behavior that lifts an entire team.

  • Giving constructive feedback and accepting feedback from others in a productive manner, especially to and from other teams, like development and QA.
  • Group brainstorming, where every member of the team, regardless of their role or seniority, is expected to actively participate in making decisions.
  • High degrees of honesty between managers, supervisors and employees to build trust and promote a collaborative spirit.
  • Working toward a common goal rather than individual ones, acting as a coach and coworker to many different people.

But cloud native environments tend to throw a few curveballs into an already-challenging effort. You likely have an enormously complex microservice-based architecture with multiple clusters running on parallel data centers, and there’s a good chance most of your team is remote, to boot. This combination of DevOps culture, distributed technology and the need for constant coordination make implementing collaborative management a greater technical challenge, but not an impossible one.

ChatOps helps bridge that gap between people and technology, getting them to work in parallel to make collaborative management shine in a few common DevOps scenarios:

  • Single on-call engineers: We outlined this scenario in the previous section, but with ChatOps for Kubernetes, a single on-call DevOps engineer has far more flexibility in how, when and where they respond. Instead of being forced back to their work machine, where they have their environment and keys/secrets set up to properly query the Kubernetes cluster, they can start debugging wherever your organization’s chat app goes. They get a head start on the situation, and when it’s time for another engineer to take over, the centralization and transparency of ChatOps simplifies the transition.
  • Trading monitoring and debugging tasks: As the second engineer comes online, they join the channel or room where debugging has already started. Since ChatOps enforces transparency and working from a common terminal over individual siloed systems, the second engineer can view the entire audit log of debugging work that’s already done. If they’re replacing the first engineer, they won’t need to waste time re-running commands, as they can see every kubectl execution and the results in the chat log. They can debug deployments, services or other resources from multiple clusters in a way that encourages honesty, openness and continuously improving from previous cycles of incident and resolution.
  • Live pairing: Or, if the two engineers are debugging and troubleshooting as a pair, ChatOps helps them treat the work like they’re in the same mission-control room, even if they happen to be hundreds or thousands of miles apart in separate remote — or home — offices. They don’t need to rely on awkward and unproductive screen shares, where the viewer can only observe, take notes and make suggestions — or even worse, copy-pasting their kubectl commands into the chat app. With ChatOps supporting a spirit of collaborative management, your engineers work their magic in parallel. They have the freedom to experiment, debug and query to their heart’s content, all with the transparency and ongoing conversation that good collaboration demands. It’s a single environment for talking to each other and your clusters.

But not all ChatOps platforms are made the same, especially when Kubernetes is involved.

Botkube for Monitoring and Debugging Kubernetes Clusters

Botkube is a new generation of ChatOps for Kubernetes. With open source Botkube, you can monitor multiple clusters, debug your deployments in real time and check the state of your clusters for recommendations on where your team could continuously improve.

We support DevOps teams’ most popular messaging platforms, like Slack, Microsoft Teams, Discord and Mattermost. We even support sending messages to ElasticSearch for archival purposes or an outgoing webhook to build a custom messaging response pipeline.

We just released v0.14.0 in October 2022 with more user-friendly ways to dynamically change your Botkube notification settings. By default, Botkube monitors and displays all Kubernetes events in the channel you configure in your YAML files or during a Helm install/upgrade, but now you can change those settings on the fly with a simple @Botkube edit SourceBindings message in your chat app of choice. We’ve also released a new Slack app that’s more robust and has richer security options.

Plus, a major win for organizations that love using open source across the board, not just in their infrastructure: Botkube now supports Mattermost v7.x!

Botkube v0.13.0 arrived only a month earlier with full support for our most requested feature: multichannel support. With a single Botkube installation in your Kubernetes cluster, you can group and send events to different channels and messaging platforms. For example, you can send high-severity events — like Resource deleted, Failed to pull image, or Readiness probe failed errors — to Slack for an immediate response from your team and archive them in ElasticSearch for transparent, auditable logs of your incident response actions.

Our community already loves Botkube for a lot of other reasons, too:

  • No new syntax or workflows to learn — Botkube uses the familiar kubectl syntax, just with a new interface.
  • Botkube runs inside of your cluster as a READONLY service account by default, ensuring you have access control and administrative privileges set up according to your organization’s requirements.
  • Support for monitoring custom resources to receive alerts on certificate expiration or backup failure (Velero and Kanister).
  • We’re completely open source! We have 1.5k stars on GitHub, nearly 100 contributors and a fast-growing community.

The Future of ChatOps for Kubernetes

To get started with Botkube, you need to install two components:

  • The Botkube integration in your chat app of choice.
  • The Botkube backend for that app in your Kubernetes cluster.

Because the setup process varies a bit for each app — they require different variables and secrets, not to mention unique setup processes for creating and authenticating the chatbot — you should check out the doc that’s tailored for your organization’s app(s) of choice:

And as an open source, community-driven project, we want to hear your feedback and ideas. That’s how we optimize the roadmap and implement the most in-demand features to make Botkube as valuable as possible as quickly as possible.

Send us your thoughts in GitHub issues for Botkube or keep up the spirit of chat apps by joining our Slack community. Or, if you prefer the directness of an email, you can email me at blair@kubeshop.io.

Group Created with Sketch.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.