The Road to Service Ownership Nirvana
Customer experiences, good or bad, can make or break a brand. And increasingly, these experiences are happening online. That’s why CIOs are doubling down on digital transformation, not only to become more efficient but to build closer bonds with their customers.
The challenge is that as more companies invest in multi- and hybrid-cloud environments and digital infrastructure, the complexity of backend tech environments increases, leading to rising incident levels. This transformation also means that teams begin to fragment into separate lines of business, with their own toolchains and workflows. This means when an incident occurs, it can be challenging to manage everything in a centralized way. Siloed teams, as well as manual tools and processes, combine to slow response times and worsen the customer experience.
Full-service ownership is not a panacea. But by encouraging developers and engineers to “own their code,” organizations can reduce handoffs and uncertainty, and ultimately reduce mean time to resolution (MTTR).
Why Full-Service Ownership?
Unfortunately, outages and service slowdowns are inevitable given the complexity of today’s digital infrastructure. But they needn’t imperil corporate reputation and profits. Fast, efficient incident response is the name of the game. It starts with visibility and triaging, and demands intelligent routing to the right teams, with sufficient context for them to hit the ground running. Throughout, focused orchestration is required to accelerate the response and contain any fallout.
Full-service ownership can help to support these goals by ensuring those people closest to services are responsible for it throughout its life cycle. This empowers developers and engineers to take responsibility for their code in production, rather than hand it over to teams less knowledgeable about the service.
This “you code it, you own it” mindset connects developers more closely with the purpose and impact of their work, encouraging them to build even better products. Clear ownership means fewer people need to be drafted in to troubleshoot, and because the service owner acts as first responder, issues get fixed by the best person for the job each time. This can help to drive down MTTR.
Nine Steps to Activate Service Ownership
There may be initial resistance, but fear of change often ignites such concerns. These nine steps can help organizations push ahead with full-service ownership:
1. Be Agile
Agility is critical in all parts of the business. But at the start of service ownership projects, work can be unpredictable, making greater flexibility important in helping teams not to be distracted from longer-term goals. Agile workflows can identify what’s working and what is potentially a block on progress. They’re a common part of digital transformation initiatives, so adoption should not require a huge leap.
2. Start Small
Executive buy-in is key for any project, and the journey to full-service ownership is no different. Organizations should start small in order to demonstrate value without ramping up business risk. Capture a baseline of performance for non-critical production systems and give teams the time they need to master the details. As they gradually progress, business benefits should start becoming apparent. One of the first wins should be faster incident acknowledgment and resolution.
3. Be Realistic
The ideal approach would be to start from scratch with greenfield projects unbound by technical and organizational debt. If your goal is to prove organizational value, it may better serve your long-term transformational initiative to select an achievable brownfield project. This enables teams to prove in a more impactful way how much value the model could generate for the business. This is especially true of organizations where the corporate culture can be resistant to change.
4. Create a Culture of Empowerment
It’s critical that developers and engineers tasked with owning their code are not punished or belittled for mistakes. Human error is inevitable, especially when transitioning to and adapting new operational models to the organization. That’s why those at the heart of full-service ownership should be empowered to experiment without fear of retribution.
5. Practice as Often as Possible
Practice is key to building up confidence, but only as long as it happens in low-risk settings. Developer and engineer teams new to owning production services could kick things off with incident simulations. These would help them get used to managing incidents without exposing the business to risk. Chaos engineering — testing systems to withstand unexpected disruptions — can be a useful way to build resilient workflows.
6. Ensure Teams Are the Right Size
Smaller teams arguably get things done faster. That’s the logic behind Jeff Bezos’ now-famous rule that immediate teams should be no larger than two pizzas can feed. By keeping teams small, collaboration should be more effective with less time spent providing status updates and more on getting stuff done. Team members will also have more time to devote to continuous improvement.
7. Clearly Define Services
Services should be defined in enough detail so first responders can quickly identify the source of any issues. During this process, if microservices behave in a similar manner and fixing a problem on one means fixing it on another, it might make sense to combine them. Documenting dependencies will also help to define roles and responsibilities among team members, so incidents are always referred to the right person.
8. Manage Your Monoliths
A complex monolithic technology stack tends to be the root cause of many incidents, which makes it crucial for organizations to devise a strategy for managing on-call responsibilities. What usually happens is that a team responsible for handling a monolith doesn’t own any other services unless the workload for that monolith is quite light. However, if multiple teams share responsibility for the monolith, it may be worth identifying different sources of functionality within it. Then, route the relevant alerts to teams with the right context. Each logical source of functionality can be considered its own service. Runbooks, wikis and on-call ownership lists can be updated for your documentation accordingly.
9. Write It, Then Document It
Full-service ownership doesn’t just mean writing code and being responsible for managing any incidents traced back to it. Every engineer that contributes to the code base should also take responsibility for service documentation. This should explain what the code does, in addition to a contract that other services can understand if they need to interact with it.
Shifting the Culture
Full-service ownership isn’t as simple as getting engineers to take responsibility for their code. Often both a cultural shift and executive buy-in are necessary. This will take time. But by starting small and nurturing a blame-free culture for developers, success could be just around the corner.