Observability Is Shifting Left, Following Security and Ops
A major trend sweeping our industry is to make common practices more “developer-friendly,” mostly by trying to meet developers where they are and integrating those practices into existing developer tools.
This trend is often referred to as “shifting left” — referring to the act of “pushing” steps from the right side of the software development life cycle (SDLC) to the left side:
In the past few years we’ve seen at least a couple of players take deeply-ingrained industry best practices and shift them to the left:
- Shifting Left Security — In the world of application security, Snyk has drastically changed the way it approaches supply chain security by continuously analyzing, detecting and offering automatic remediations for various third-party library vulnerabilities. By integrating security checks earlier in the process — and involving developers in said process inside the tools they already work with (notably Snyk Open Source and Snyk Code) — developers are empowered to take part in the security process instead of just bumping dependencies following CISO email blasts.
- Shifting Left Infrastructure — In the world of application infrastructure, HashiCorp has revolutionized the way DevOps engineers manage their resources by offering declarative, codified infrastructure definitions en masse, allowing for easier planning, versioning and application of cloud infrastructure deployments.
Since it’s evident that giving developers more power in various parts of the SDLC yields tremendous benefits for both developers and the companies they work for, it’s about time we take a look at why shifting left observability makes sense:
- With DevOps becoming a major part of engineering culture, developers are expected to own reliability. There is, however, a major gap in the availability of information from the running systems, as observability tools nowadays mostly focus on observing static data generated from logs and metrics instrumented by developers during development. In practice, many of the application problems that arise in a production setting are not trivial and require diving through endless dashboards to connect the dots. There’s no easy way, currently, to query for new information — one that does not rely on the logging practices of the engineering org or a random developer’s passion for good logs. Given this lack of information, it’s difficult for developers to own reliability. How can you expect someone to own reliability without being able to actively ask questions and get immediate answers from the very applications whose reliability you’re entrusted with?
- With applications going fully cloud native and production environments becoming increasingly more distributed and complex, the gap between dev and prod widens significantly. This means that predicting issues ahead of time is exponentially more difficult than it was in the good ol’ days, and that reproducing these issues locally is significantly harder due to the number of pieces involved. It’s also why you see more and more teams turning toward a once-frowned-upon methodology — testing in production — to understand how their applications actually behave in production when something goes wrong, as it’s very difficult to predict all the unknown unknowns that might occur in advance.
- Existing o11y tools focus mainly on operators, not developers, and with only a fraction of the approximately 27 million developers in the world today practicing observability on a day-to-day basis, that’s becoming a real issue. For a seasoned operator, looking at the configuration of a load balancer or inspecting spikes in CPU usage is second nature — the patterns “pop up” from the screen in the same way that a badly-designed if statement or a poorly-written data structure might alert a veteran developer. Dashboards and YAML files that might indicate what is wrong with a specific deployment are often foreign to developers, leaving them in the dark when attempting to solve a production incident that involves their own code. There’s a large chasm between what they know and what they understand about it in its current, deployed state.
In the next article, we’re going to cover how to shift left observability in practice, and specifically how to apply it across the entire SDLC.
Read the third article in this series.