Monitoring Methodologies: RED and USE

Implementing monitoring for a new system or for the first time can be a daunting task. Every new system or service brings new data to parse through and make sense of, so where do you start? It’s important to first consider the basic questions you would like your data to answer. Is the system performing well? Has anything in the environment changed? Is there a problem? What is the user experiencing?
Answers to each of these questions can provide valuable insight into your overall system health and performance, as well as highlight issues and areas for improvement. Two common monitoring methodologies that identify metrics to help answer these questions are RED and USE.
RED — Rate, Errors, Duration
The RED methodology focuses on three metrics, primarily aimed at request-driven systems, like modern web applications: rate, errors and duration. Individually, these metrics can act as application performance indicators, but together they provide meaningful insight into how users are interacting with your systems and the overall user experience.
Rate
This is the rate at which your system is receiving requests and can provide important context when monitoring performance or troubleshooting. In general, it is important to know how many users are interacting with your services. However, rates also provide important context to other metrics that can help guide troubleshooting or indicate areas for future improvements. Is one service receiving a higher rate of traffic? How is performance affected as traffic increases?
Errors
Simply, how many requests are ending in or encountering errors in your system? Is a specific call failing 100% of the time? Do errors increase as the rate of traffic increases? Is downstream latency causing calls to timeout? Errors are your system asking for help and monitoring them is crucial to identifying, prioritizing and fixing system issues.
Duration
Duration refers to the length of time each request to your system takes. Increasingly complex systems rely on numerous calls for each individual request. The request duration is critical to determining end-user experience and monitoring overall performance. System latency and long-running calls can lead to disgruntled users and be indicative of larger application issues. After all, “slow is the new down,” according to Google. As page load time increases from 1 to 3 seconds, the likelihood of a user leaving increases by 32%.
USE — Utilization, Saturation, Errors
Whereas RED leans more toward application metrics, the USE methodology focuses on system resources. Visibility into the health of your infrastructure is just as important as visibility into the health of your application.
Utilization
Utilization refers to the number of resources a system is using to process work. This could be CPU, memory, network bandwidth or even software metrics like process capacity and thread pools. Having visibility into your system architecture is important when monitoring utilization. Each step in a service flow requires and uses system resources; if these resources are unavailable, your applications could stop running or encounter problems.
Saturation
Saturation is the amount of work that cannot be processed by the system due to a lack of available resources, similar to a backlog. Saturation can often be observed as queuing or latency and can lead to work erroring out. Ideally, your system has high utilization and low saturation, allowing for new work to be accepted and processed.
Errors
Just as errors can signal issues with your application, they can signal issues with your resources. Does your system have enough resources allocated for the current rate of traffic? Does the server have enough memory to run your service? Is the disk full so that transactions can’t be persisted? Errors are your system letting you know there is a problem that needs to be addressed.
Conclusion
Using standardized methodologies as the basis for your monitoring provides teams with key performance indicators that can be monitored throughout the development and deployment life cycle to guide innovation and prioritize future development items.
Increasingly complex systems require monitoring that takes into account the application, the infrastructure supporting it and the user experience. Together, RED and USE metrics offer a starting point for monitoring new systems or services and work toward establishing end-to-end system visibility.