TNS
VOXPOP
Where are you using WebAssembly?
Wasm promises to let developers build once and run anywhere. Are you using it yet?
At work, for production apps
0%
At work, but not for production apps
0%
I don’t use WebAssembly but expect to when the technology matures
0%
I have no plans to use WebAssembly
0%
No plans and I get mad whenever I see the buzzword
0%
Observability

Rethinking Observability

Two best practices to better align observability practices with the goal of delivering exceptional user experiences.
Jan 3rd, 2024 10:00am by
Featued image for: Rethinking Observability
Feature image via Pixabay.

Organizations need a way to understand what is happening in highly distributed systems. Today, observability is the approach of choice, and suddenly observability projects are everywhere.

But observability has not delivered its promise. Many organizations have tried it for environments large and small. In many cases, observability projects resulted in a considerable amount of data and cognitive overload, without bringing visible change to the reliability of the system.

In addition, implementing observability requires a massive integration effort: Developers have to instrument their code to emit the right traces, metrics and logs to make the system observable. Instrumentation is still very much an art today. Little is known of the most efficient way to instrument code, resulting in many trial-and-error efforts and friction everywhere.

But perhaps more importantly, observability teaches you to focus on operations-centric, myopic metrics rather than thinking about the service like a user: what the user wants to achieve with the service, how she wants to achieve them, etc. These are the levels of insight not readily available through low-level metrics.

The result? Reliability engineering teams are overwhelmed with an explosion of data, but still lack the insight or tools to drive meaningful outcomes in system reliability or user experience.

Critical User Journeys

We argue that instead of observability, you need to focus on critical user journeys (CUJ) and mechanisms to deliver and preserve CUJs.

A CUJ is a sequence of user interactions vital for the successful operation of a service. It directly affects the user’s satisfaction and engagement with the service. A CUJ can be anything from checking out a shopping cart to retrieving an account balance or submitting a form response.

By focusing on critical user journeys, we can discard useless details about the internal behavior of services. Consequently, we can direct our attention and resources on what truly matters to the user — for example, moving away from “service_db_be is alerting” to “half of the login CUJ is broken.” A critical benefit of the CUJ approach is that you start to view the service through the lens of the user, a mindset mostly missing from current observability approaches.

Furthermore, magic happens when you combine critical user journeys with service-level objectives (SLOs).

An SLO defines specific, measurable goals that the service aims to achieve. When you apply SLOs to a specific user journey, you have a measure of true user experience as well as a mechanism for predicting and managing that experience.

Monitoring a critical user journey with a defined service-level objective can deliver proactive signals that reliability thresholds are in danger of being violated. For example, a Taylor Swift ticket overload incident can happen if you have no way to maintain separate service-level objectives for different user journeys. Under extreme bot activity and high demand from real human users, if you could not divert system resources away from non-essential traffic to preserve service-level objectives for ticket purchasing journeys, that’s when your ticket-serving services can melt under pressure.

Journey-Specific SLOs

Like tracing, CUJs observe data across services, but additionally they aggregate signals across transactions to identify patterns and trends that traditional tools might miss. By looking at critical user journeys as a whole rather than system performance in isolation, operations teams as well as business decision-makers can be informed where and when they should apply effort to build better, more robust and reliable user experiences.

In practice, one way to achieve CUJs with SLOs is smart traffic management. In sophisticated environments, operations teams increasingly use traffic shaping as a strategic tool to deliver desired business outcomes, enhance overall user experience and service reliability.

More specifically, traffic shaping allows you to:

  • Prioritize critical user journeys and maintain SLOs: Traffic shaping allows you to redirect network and system resources to focus on critical user journeys. By guarding the paths that are most important for user satisfaction and business outcomes, traffic can be managed to ensure that critical user journeys receive the bandwidth and speed they require. During peak load times, traffic shaping can deprioritize less critical traffic and apply graceful degradation to critical user experiences, ensuring that the performance of high-priority journeys remains within SLO thresholds.
  • Enhancing user experience: Predictive and adaptive traffic shaping can significantly improve the end-user experience. Advanced traffic shaping tools can predictively adjust call patterns based on user behavior, time of day or other factors. This proactive approach, rather than reactive traffic shaping, helps in maintaining user journey SLOs consistently and delivering a seamless and engaging user experience.

An astute user would know that all this is just a different way of delivering observability. But we feel strongly that the future of observability lies in offering a more comprehensive and accurate measure of user experiences. CUJs and journey-specific SLOs represent a significant stride in moving beyond the confines of system-centric metrics and toward a more user-centric approach.

By embracing concepts like critical user journeys and journey-specific SLOs, we can better align observability practices with the ultimate goal of delivering exceptional user experiences. This is not just about keeping pace with technological advances; this is about rethinking how we measure reliability from a user-centric perspective.

Group Created with Sketch.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.