Cloud Native Day 2 Operations: Why This Begins on Day 0
Feature image via Pixabay
Day 2 operations (Day 2 Ops) are important for cloud native developers to understand, regardless of what level of lifecycle responsibility they take on in their organization.
Cloud native experts tend to agree that understanding every aspect of running an app in production is, if not critical, extremely valuable for developers and the site reliability engineers and DevOps teams they work with.
We talk a lot about the “shift left” and “full life cycle” developer responsibility as part of the new developer experience. It comes after Day 0, which consists of spinning up needed resources, and Day 1, which focuses on building out the design, the infrastructure and continuous integration/continuous delivery around that. Day 2 Ops extends to everything that happens post-deployment.
Day 2 Ops encompass all the activities that must take place once a product is shipped, such as observing/monitoring, analyzing and then troubleshooting, maintaining and optimizing the software accordingly. In traditional software development, each step of the software release was separate. Development in the cloud has made these efforts continuous. And Day 2 activities are not assigned specifically to nondevelopment departments.
When we spoke to cloud leaders during an Ambassador panel discussion, these ideas about cloud native Day 2 Ops were echoed. Lunar’s Kasper Nissen explained, “Day 2 is about the shift left. Giving developers ownership to run things in production. For me as a platform engineer and architect, it is about creating abstractions and making it easy for our developers to run software. Give them what they need to monitor, log, trace and all the things that are required for managing a service in production. We try to build a platform that supports developers in this journey.”
Design for Day 2 Ops as Early as Days 0 and 1
In the same panel discussion, the panelists agreed that Day 2 operations, such as security, reliability and observability, should be considered and included early in the design process.
Likewise, it’s a balance among the developer responsibilities, the developer experience and how architects can create easy and convenient end-user experiences for these developers. Nissen explained, “A new engineer at Lunar would be told that these aspects of running software need to be baked into the code they create.”
Damian Márquez, senior solutions architect at Kubermatic, another panelist at the Ambassador event echoed these ideas, “When designing the architecture to be easy for engineers and developers to work, you need to understand what they need to be seeing. What visibility is important for them. It’s a symbiotic learning process between engineers and architects. The earlier in the planning and design we start with that process, the more education everybody gets, especially for the developer experience. They need to understand the full life cycle of the application, and that gives us the ability to shift the paradigm in the order of things we are building.”
Adding to the consideration for the longer-term outcome of what is put into the code, another Ambassador podcast guest, cloud luminary Kelsey Hightower, discussed the importance of focusing on this responsibility. Hightower described developer responsibility as being accountable for, and able to justify, the “ingredients they add” to the code, He advocated for the significance of being very clear about what you can do at every step, even at an informal level, before you adopt a software bill of materials (SBOM) or SLSA: “If you’re the developer, you will have some responsibility for the ‘ingredients’ you add to the mix. You will be asked to understand and answer for some of the choices you make. And beyond the developer, everyone needs to be aware of their responsibility in that pipeline.”
Build a Developer Platform to Ease the Day 2 Developer Journey
Making Day 2 operations more accessible for developers involves, as Nissen stated, abstracting the right things and empowering developers with defaults and automation when they create a service, which will let them follow it through its life cycle. From an architecture standpoint, working with engineers in this capacity has been eased by introducing a developer platform where developers can get everything they need, from visibility to observability to performance metrics, tailored for the developer experience.
Whether the developer is new and using the platform as an onboarding tool or is an experienced developer looking to get a service up and running and into production in five steps with the click of a button, the developer platform caters to easing the developer journey.
Nissen added, “From a developer perspective, the platform makes it easy to get started and get all the benefits of a dashboard and default metrics that make sense, libraries that are instrumented, automatically. Platform architects should provide developers with best practices and defaults they can work with.”
At Lunar, Nissen explained, they have built their internal developer platform using the Spotify-founded, now-CNCF project, Backstage.
Summary: Developer Responsibility on Day 2
If the cloud native promise of shipping software faster is to materialize fully, the developer experience needs to be loosely shaped to reduce friction and enable clear visibility into code, its dependencies, source control, service ownership and everything that is required to code, ship and run software.
In traditional software development, the post-deployment life of the software has not been part of the developer’s responsibility and, even now, is not always in the developer’s hands. The promise of cloud native is to give developers the ability to code and ship faster — and to do that effectively, they need to consider the implications and dependencies of their code on “run” processes, regardless of how hands-on they are in the running of apps.
Marquez stated, as a piece of parting advice, “Code defensively — degrade gracefully.” In today’s rapidly shifting cloud native environment, coding defensively is one way to ensure that Day 2 operations run smoothly, and even if an incident arises, the developer who has taken on full responsibility for the ingredients of the code has a clearer understanding and faster pathway to fixing any issue.