What We Learned from Building a Chatbot

It’s clear now that AI chatbots built with ChatGPT can add value for developers. From code generation and debugging to looking up how to do something, there are many ways to use ChatGPT effectively. The outputs it generates might not always be perfect, but developers can use it to get started on coding or learning how to do something complex.
At the organizational level, systems and workflows are complicated, specifically with DevOps. One of our big predictions with generative AI is that it will be customized for DevOps flows from code deployment to infrastructure provisioning and monitoring to incident response. If a chatbot knows your organizational context, can recognize your questions and can resolve them or connect you to the right resources, the added value for DevOps teams is huge. That’s why we built a conversational bot that solves the collaboration, communication and skills gap problems in DevOps.
Here are five lessons that we learned from building our DevOps-focused chatbot called PromptOps. Apply these lessons to build a bot that optimizes feedback loops and minimizes cognitive load while assuring confidence for your users.
Lesson 1: Let the Chatbot Be the Guide
Before the user types in their first word, they must have some ideas of what the tool can do. Can it get logs? Can it get metrics? Can it connect to Amazon Web Services (AWS), Lambda and Kubernetes? Can it take action on a task? But often, users don’t even know how to ask the right question.
You should aim to create a self-serve experience that clearly communicates the capabilities of the chatbot and enables the bot with the right information to avoid users dropping out or reaching out to you. For example, we designed our Slackbot PromptOps to let users know that:
- It can help with AWS services and configurations.
- It can help with Kubernetes operations and monitoring.
- It combines siloed knowledge across document stores.
- It can provide best practices for setting up CI/CD pipelines.
Instead of waiting for the user to figure out what questions to ask, let the chatbot guide the learning and discovery journey.
Lesson 2: Know the Limitations of the Slack UX
Building a DevOps chatbot within Slack has its limitations. Because the Slack Block Kit UX elements are limited compared to web applications, taming the model to fit into the Slack user interface and making it look clean while still being intuitively functional to the user is challenging. For example, Slackbot is good for pulling all the information you need for root cause analysis. But when you need to drill into the metric chart and look at specific numbers, a better user experience is to let the user click on the metric chart and look at it. You can’t do that in Slack.
A good conversational agent doesn’t need to be too chatty. A lot of things we do in DevOps are visual. We look at charts to spot trends and patterns. We wanted to show metrics and logs within Slack but were limited to offering embedded links into existing tools like Grafana or the AWS console. The problem is that those tools lack chat interfaces so the UX breaks at that point. To address this issue, we’ve been building a web app for PromptOps within CtrlStack that lets users jump from embedded links in Slack into an observability platform and continue the conversation there.
It’s important to have a strong backend observability platform like CtrlStack that provides all the UI elements where users can click out to view the charts and interact with the data directly. This backend system would also provide the organizational context needed to ground the chatbot in your organization’s data and avoid hallucinations. I’ll dive deeper into that in Lesson 4.
Lesson 3: Expect Out-of-Scope Input
Implement gates to ensure user actions fall within the scope and capabilities of the model. And train your model to be biased toward its capabilities and your organization’s best practices. For example, our model knows we support AWS. A user might ask a question that might apply to Google Cloud Platform (GCP) or AWS: “How do I create a new compute instance?” Because our model knows what we’re running on AWS, it will produce a script that creates a compute instance for AWS and not GCP. If the user prods the chatbot for a script for GCP, our model will create a script for it, but we won’t offer the run button functionality.
Gate user actions based on their cloud credentials or role-based access control (RBAC) permissions. Whenever a user is executing a code, ensure it’s executed in a protected environment. The code should not have open access to everything; it should run only on certain libraries. This ensures that the code doesn’t take destructive actions without permission. We’ve found that allowing the user to look at the code before they run it also helps in building trust in the system.
Besides gating users on actions, you should set guardrails to avoid cloud misconfigurations and practice consistent resource management. Build your chatbot to use your organizational context which is either stored in Slack, Notion or Confluence to get the information it needs.
Let’s say a user in an organization asks “What’s the best practice for creating a Lambda function?” The answer should be specific to the organization based on the organizational context. If your organization always prefers to create a Lambda function with a certain Identity and Access Management (IAM) group attached to it, then the chatbot should provide the answer in the form of text or a script based on the organization’s preferred practice.
PromptOps addresses this by spanning the knowledge base across wikis, Slack and other tools. This ensures that runbooks are never forgotten and best practices remain sticky.
Lesson 4: Provide Ways to Prevent Hallucinations
Everyone knows that large language models hallucinate whenever they don’t know something. How do you work around it? We found that providing references is one way of avoiding the issue. If the chatbot doesn’t reference the right information, that’s a good way to validate that the bot hallucinated.
Another way to stem hallucinations is to use the DevOps graph to ground the chatbot in the organization’s data. The DevOps graph in CtrlStack models the different components of the DevOps pipeline as nodes in a graph to provide users with a better understanding of how different components interact and how changes in one area can affect the entire system.
If you tell the chatbot very clearly what it knows and that this is all it knows, that’ll stem the hallucination. For example, if you provide the DevOps graph, which doesn’t contain information about Amazon Elastic Container Registry (ECR) images in a Lambda function, the chatbot will know that no ECR images are getting pushed into the Lambda function. This prevents the bot from making up information with generic information.
Lastly, if you’re building your chatbot to support emotional reactions, respect those reactions and learn from them. They can help stem hallucinations.
Lesson 5: Correctly Handle “Not Understood” Scenarios
When a bot doesn’t understand your input, it’ll most likely try to generate something, and that’s a problem. To help your chatbot tell you it doesn’t know something, you first need to catch it having a problem. Then implement methods to find the answer or the right people to get the answer. The easiest and simplest method is to have the human in the loop. If you find the answer to the question that the chatbot doesn’t know about, provide the input to the chatbot. The chatbot should remember that input and answer the question correctly in subsequent conversations.
In the coming months, new methods may arise that will allow us to implement better flows. For example, if someone asks something that’s in scope and the bot doesn’t know the answer, it needs to realize that and provide an alternative solution. The bot can look at the history of conversations across channels about the topic and direct the user to the people who can help resolve the problem. By helping the user find a solution, that bot adds value and creates a positive experience.
Final Thoughts
When you build a chatbot that can take action on your infrastructure, you need a human in the loop until you build trust. If you implement these lessons, the chatbot will not feel like a black box, but rather a helper that explains how it solves the problem for you — a helper that doesn’t hallucinate but puts the team on the forefront by just surfacing information and knowledge scattered across tools, systems and teammates.
A good conversational bot is built on organizational context. The more information the bot can ingest about the organization and processes, the more accurate the solutions it finds. By parsing all the information, like your DevOps graph, document stores and Slack context, the bot can make assumptions about the user and their organization so the user does not need to put too much work into creating a prompt. A good prompt interface is eventually just going to be a single word.