How Monitoring Can Keep DevOps in the Feedback Loop
Raygun sponsored this article.
When you go shopping for a car, do you refuse to take it on a test drive? Likely not. No matter how much a salesperson hypes up a car, you have to feel for yourself how smoothly it drives and how easy it brakes, for example. You also need to drive the car on a real road and in real conditions.
Software is the same way. Just deploying code into production and doing a health check is no longer good enough. We’re now looking beyond what we used to consider “done”, and instead, continue to monitor how it runs. Getting feedback about what happens after the software is deployed to stay competitive and make our system better is essential.
And we can’t get this through simulation — we need the feel of our software running on the open road that is production.
When looking beyond our “done” column, we have two types of feedback: operational feedback (did we build the thing right?) and business feedback (did we build the right thing?) In this post, we look at these two types of feedback loops will help us continually drive our software to be a smoother ride.
In your car, you have all sorts of gauges to let you know what’s up. They’ll tell you what speed you’re going, how much gas you have left, what your engine temperature is and more. You even have certain warning lights that come on if something’s amiss. These are your car’s operational feedback loops. At any given time, they help you know that it’s running right.
It’s the same with software. With a good monitoring tool, you’ll know how well your software is running. I’ll cover a few here.
Latency is a fantastic metric to let you know how well your app is running. Dips and spikes and legacy can indicate all sorts of ripple effects that new features may have in your system. In addition, you can see which endpoints are doing their job well and which endpoints are lagging behind. When abnormalities occur in your system, latency across your service is likely one of the first things to noticeably change.
We usually want as few errors as possible in our app. Too many errors make us look bad and provide a suboptimal experience for our users. I’d often get very frustrated at my car’s old infotainment system because the Bluetooth logic would frequently cut out. However, by tracking the error count, I can get an idea of whether my endpoints are doing their job. If I deploy a feature and see that 30 percent of my responses are 500 errors, that means I may have missed something. That is invaluable feedback.
Raygun’s error tracking makes such things very visible, as you can see below:
Back to our car example: You have a windshield, a rearview mirror and camera and side windows. These let you see where you’re going at any given time. You may even have navigation set up so that you can see how close you are to your destination. They let you know that you’re moving in the right direction.
With software, certain metrics will let you know whether your features are accomplishing the goals you built them for. I’ll cover a few here after we talk about business events.
One thing you may see in monitoring that gives invaluable business feedback is sessions. In our car analogy, think of sessions as the history of routes we’ve taken. Imagine if you could look back on all your past routes and see what’s most interesting to you and where you usually go. Now imagine you can do this for every driver of your model of vehicle. From that, you could extract all sorts of data. For example, maybe partnering with that Sonic everyone stops by would be a great opportunity.
For software, instead of driving history, sessions are a history of what a single user did inside your application. You can slice this down or aggregate it up as you see fit to learn trends about your customers. Here’s what sessions look like in Raygun:
Another business metric is errors. Well, beyond ensuring that your system is stable, they can have another purpose. You can use them to look at interesting scenarios your customers may be performing.
For example, let’s say my customers keep looking up a product in my product catalog that does not exist and get a 404 error. I also see this 404 error spike up on a recent release. Upon further investigation, I see that we meant to publish a marketing landing page for a new product line, but we accidentally published a landing page pointing to an old, out-of-commission product. So, when customers click on the product from the landing page, they get a 404.
This is just the tip of the error iceberg. Categorizing and breaking down your errors along business boundaries can give critical feedback.
Out with the Old, in with the New
As mentioned above, the old guard of viewing code in production as “done” is fading away. It’s time for a new guard: one where the monitoring of the software we delivered provides us with valuable feedback and learning.
In this new era, we use metrics like request count and error rates to let us know what our customers are thinking. We look at our latency so that we can prioritize the right technical debt to pay down. We’re not only agile in our design but also our operations. By taking in feedback well after we’ve deployed a new code, we’ll maintain a hefty advantage over our competitors.
Feature image via Pixabay.