Cloud Native / Culture / DevOps / Sponsored / Contributed

Readiness to Learn: Safely and Reliably Deploy to the Cloud

28 Sep 2021 8:00am, by

Laura Maguire
Laura is a researcher at Jeli.io. Her work studies how software engineers can optimize performance through evidence-based practice for incident management, learning from incidents and organizational change management. She holds a masters degree in human factors and systems safety from Lund University and a Ph.D. in integrated systems engineering from Ohio State University.

“Cloud native technologies … and techniques enable loosely coupled systems that are resilient, manageable and observable. Combined with robust automation, they allow engineers to make high-impact changes frequently and predictably with minimal toil.”
Cloud Native Computing Foundation

Production readiness and resiliency guidance in cloud native environments often recommends several categories of preparation to ensure that an application is ready for “prime time” use. These typically include operational, security, deployment, resiliency, disaster recovery, testing and governance, and risk and compliance considerations. Most of the guidance emphasizes the readiness of the technological aspects of modern reliability. But even well-prepared companies will overlook a critical element of their capacity to safely, reliably deploy — one that is essential for business performance — which is readiness to learn.

The Changing Nature of Knowledge

The pace of modern work has driven many fields to recognize the need for continuous learning, to keep pace with changing technologies, practices and theories. Companies that are slow to develop the capacities to adjust to changing demands find themselves falling behind in competitive markets. As Josh Bersin notes “the learning curve is the earning curve.” But there are several core challenges surrounding learning in continuously changing environments:

  1. Knowing what knowledge is going to be needed, to what depth and at what points in time.
  2. Developing the internal capacity to develop and train for that knowledge.
  3. Identifying and correcting any misunderstandings.
  4. Fine-tuning partial understandings.
  5. Deepening the understanding.

Further complicating these challenges, many organizations do not have a dedicated learning or development team to help manage knowledge development, and those that do are often not technical enough to generate content that is both deep and specific enough to address the knowledge gaps most relevant to organizational performance. Therefore, rapid knowledge updating must come in an emergent fashion and come from those closest to the technical work to ensure currency and technical depth.

The Importance of Current Knowledge in Software Engineering

For the software industry, continuous integration/continuous deployment practices further amplify the need for individual contributors to be able to rapidly update their knowledge to help sustain ongoing organizational learning. In the highly dynamic CI/CD environment, engineers with stale or outdated knowledge of the system are less able to detect, diagnose or repair anomalous behavior in their systems, and so less likely to quickly identify opportunities for innovation. This can lead to outages that may be longer or more impactful to customers and an opportunity cost to pushing new features. Because of this, increasingly, both employees and employers in IT expect continuous learning to be a core function of day-to-day activities.

However, while an organization can recognize the importance of learning, finding the time to learn in between code deployments can be a challenge. Here we discuss three strategies for leveling up learning. Each strategy is focused on getting a greater return on investment from the practices you may already have in place.

Continuous Learning Opportunities

Just as modern software engineering practices do not take the system offline to push new code, modern professional development in IT typically takes place without taking software engineers “offline” or by minimizing the amount of time away from the codebase. To level up your continuous learning practices, look for and exploit micro-learning opportunities for your engineers to update their understanding of the system.

Encourage pair programming for both new feature development and ongoing operations or maintenance. This allows engineers to surface assumptions about how different aspects of the system function together and what the resulting risks are. A more structured way of facilitating these exchanges includes having one engineer create the change request and another deploy it, in order to generate explicit questions. These conversations help update each person’s mental models through a fluid exchange of both the programming activity itself and the background knowledge needed to execute the activity. Using evidence-based mentorship practices can help improve the learning outcomes for both participants.

Adding background context to project briefings or sprint updates can be a high-value micro-learning opportunity. Best practice for agile or DevOps-influenced team structures dictates that the members of a project team represent diverse collections of skills, knowledge and experience. This diversity also means that there will be high variability of knowledge about any one aspect of the project. Research has shown that even within an engineering team, there are key differences in understanding about how the system works. Therefore, it’s likely that in any given meeting, there are varying levels of knowledge and assumptions about the technology involved. So cultivating the practice of adding a short background context section to standups or meetings can provide an efficient, ongoing way to improve knowledge transfer. Rotating the responsibility for giving the background among different roles on the project team can help vary the content and depth of these micro-learning opportunities to better represent the differing levels of the team members. Establishing these background sessions as a time to recalibrate implicit, and often hidden, assumptions by sharing knowledge can encourage questions and clarifications, maximizing the learning experience.

Investing in incident reviews is perhaps the highest-value micro-learning opportunity, since the training content and trainers are directly relevant to day-to-day work. Plus, incidents can draw a lot of organizational attention, so there is already interest and motivation to understand the event topics themselves. Well-run incident reviews directly address the five challenges because they are immediate, relevant learning opportunities based on your team’s actual actions and the decisions being made in real-time. In fact, incident reviews often aren’t about the incident itself, but rather clear and direct pointers about how the organization itself functions. In this way, incident reviews provide the highest return on investments in learning that a modern software organization can make. Additionally, an incident review can couple pairing and background with the event itself, further amplifying the benefit. How so? Every incident review should include some background context (how the software involved was first introduced, how its use evolved, etc.) to understand current state and should also encourage participants to shift perspective and take the view of the engineers involved.

The Benefits of Organizational Readiness to Learn

Many of the strategies discussed so far focus on learning at the individual and project team level. However, hidden interdependencies between roles and departments across the organization means the organization as a whole can benefit from greater shared knowledge transfer.

A common response to this advice is to point out how time-consuming it can be to attend incident reviews, particularly if one thinks they might not learn anything new. However, the power of harnessing continuous learning lies in an organization being able to strengthen its ability to see relationships, think proactively and globally across the organization, and act accordingly. In this way, valuable learning is everywhere — it just might not be apparent until it is needed. To make the best use of these cross-functional interactions, start by thinking critically about where gaps in knowledge and understanding can affect effective incident response and hinder innovation.

Research in effective coordination across large-scale distributed software organizations has shown that high-performing teams have up-to-date and layered knowledge about not only the nature of the work conducted by other teams, but what they are working on, what challenges they might be acutely or chronically facing and what their skills and abilities are. Making progress on the first of the five challenges — knowing what knowledge is needed, to what depth and at what points in time — can be achieved by increasing learning-focused interactions among roles and levels of the organization.

The single most efficient and effective way to do this? Encourage cross-functional participation in incident reviews, where other roles can listen in on the inner workings of how the incident was handled, ask clarifying questions and add context about how that event affected their work to increase understanding of the dependencies between functions. The relationships established between different aspects of the business, the shared basis of knowledge and the trust developed from learning together — these all have benefits that extend well beyond the kinds of incidents discussed. Take an organization-wide approach to readiness to learn.

Summary

Cloud native companies that are well-positioned to meet future market demands know that continuous learning is a competitive advantage and work to mitigate risks associated with unplanned outages. While the five challenges of learning in continuously changing environments can be difficult to overcome, strategies to cultivate continuous learning can help. By encouraging evidence-based mentorship through pair programming, continually sharing important background context and conducting learning-focused incident reviews, software companies can level up their learning. Further benefit is realized when learnings are distributed across the organization, particularly through incident reviews, to build up greater shared knowledge about how the system works under real-world conditions.

To learn more about Kubernetes and other cloud native technologies, consider coming to KubeCon+CloudNativeCon North America 2021 on Oct. 11-15.

Photo by Pew Nguyen from Pexels.

A newsletter digest of the week’s most important stories & analyses.