Shifting Left Observability in Practice — An Overview
As an industry, we’ve been arguing for years about what should be the guiding light in the development process. We started off with long design documents and waterfall methodologies, sprinted through agile, considered TDD (test-driven development) and BDD (behavior-driven development) in some cases and generally ended up all over the place.
We believe observability can be a great lighthouse, and we should make it an integral part of the SDLC (software delivery life cycle). When you look at it as a crucial operational process across dev, staging, QA, the full CI/CD cycle and production (including progressive rollouts), it creates a world in which developers are truly connected to their live applications. That way, the outputs of the system are fed directly to the people writing the inputs of the system, closing the loop on an ever-expanding SDLC.
To put it another way: Observability used to be about slashing the mean time to resolve (MTTR). Today, by inserting it into the development process, we believe it can be a power tool that can improve time to market (TTM) and, more generally, the productivity of your developers.
But, buzzwords aside, let’s talk practicalities:
The first problem in line is not about shifting observability anywhere, but rather including it in the development life cycle.
Currently, most software development processes revolve around iterations over local information: You write something, you compile it, you check that it works, you push to CI to test it, you commit it when tests pass.
Nowhere in this cycle is the developer observing the real information streaming in from production to better understand how what they wrote behaves and adapt the code on the fly according to this behavior.
In practice, and if we isolate the problem further, we need to extend observability even further to the left. It should not only be relegated to production maintenance and troubleshooting, but also to the entire development life cycle. This is what shift left observability is all about.
By inserting observability as part of the day-to-day developer life cycle, and specifically into pre-prod environments, we’ll shift observability from being an MTTR crusher to a more wholesome practice, one that improves developer productivity, time to market and the quality of the eventual product by finding issues earlier in the SDLC.
Introducing a New, Developer-Native Observability
We can shift left observability by creating a process that allows developers to connect to their live applications from their existing tools without forcing them to change their existing habits to apply it.
This new process, in my opinion, should be:
- Real time — Any delay in getting answers back from the live application will cost a developer precious time and cause inevitable context switches. Like Snyk’s IDE integration, which alerts you to the existence of various security vulnerabilities in your application as you code, real-time observability should allow you to ask questions and get answers in real time.
- Ops free — Operational tools and processes are abstracted away from developers in most enterprise settings. A developer who needs to understand the state of the application should be able to just query for the information in the dashboarding system, without needing the intervention of ops or having to understand in depth the underlying persistence layer. In other words, extracting information from live applications should work without any networking tricks or configuration prerequisites.
- Developer native — Complex dashboards often offer a breadth of information in a location far away from the consumer. If a developer working on a specific piece of logic wants to see how it behaves in practice, he should be able to get that information next to the lines of code he’s working on. Looking at the flow of a user through the application’s code should happen right next to the conditionals that dictate that route. Context switches are the enemy of productivity, and in a world where developer time and attention are the organization’s most valuable resources, it’s important to make sure to save them as much as possible. In other words, we should not need to introduce developers to the deep nuances of our operational processes; instead, we should bring granular data and conclusions from these operational processes into the developers’ natural workflow.
- Cost-effective — A recession is already in full swing, and costs add up. Specifically, logging costs are often incurred on more than one front — ingesting them costs money, storing them costs money and, of course, analyzing them costs money. We need to be able to log exactly what we need, when we need it, instead of relying on static, hardcoded logs and metrics written during development. We should shift focus to adaptive, dynamic, context and state-based observability that works for the developer instead of making the developer work for it — saving up on the costs of ingesting, storing and analyzing endless bales of hay only to find a few needles.
The Day After — Observability-Driven Development
When observability becomes a habit, the activities developers do every day take a drastic turn. The following is a brief (and partial) list of some of the changes a normal development team might encounter after adopting the observability mindset:
- Writing code is an act of asking questions and getting answers, rather than hypothesizing about the state of the system in production. If you’re writing an API endpoint and are not sure about the possible edge cases, just check how real users interact with that endpoint to get a sense of possible pitfalls.
- Testing is also a breeze. You can just copy real-life scenarios and make tests out of them. Look at what your users are doing in production and use the interactions as test cases.
- Debugging is no longer about reaching in the dark or filtering through endless logs. If you want to add a new log or capture the state of the application in production, you can do so right from your IDE.
- Performance of a system should be understood by adding metrics and consuming them immediately. If you want to measure how much time a certain section of your code took to run, just add a metric.
- Security is about identifying and assessing the breadth of the vulnerabilities in front of you. A major part of this is the ability to understand, in real time, whether a specific vulnerability is indeed part of the execution path of the application. This information can help to prioritize remediation efforts and also can help to distinguish between signal and noise.
- Incident management becomes increasingly faster and easier — ops people spend time on ops problems with ops-oriented tools, and developers spend time on dev problems with developer-oriented tools. Mean time to resolve is reduced significantly.
When shifting left observability and incorporating it earlier in the pipeline as part of the development cycle, we optimize not only MTTR, which is normally considered an operational metric, but also time to market, developer productivity as a whole and the quality of software we deliver — which are clearly software development metrics.
In practice, deciding to incorporate real-time, ops-free and developer native processes for observability will allow your development team to enjoy the benefits of observability without succumbing to endless context switches and without changing their existing habits too drastically.