DevOps Institute Checks the Pulse of SRE
The DevOps Institute this week released its first-ever report on the state of Site Reliability Engineering (SRE) — the Global SRE Pulse 2022 Report.
The push toward digital services and digital transformation has driven IT organizations to adopt new operating models such as SRE, which Google defines as “an engineering discipline devoted to helping an organization sustainably achieve the ‘appropriate level of reliability’ in their systems, services and products.”
Indeed, the practice of SRE has risen to a must-have engineering practice for enterprises trying to accelerate digital transformations, the report said.
“As enterprises are implementing SRE in their respective teams by developing and adjusting the best practices introduced by Google, the operating model continuously gains attention from decision makers within IT and the business,” the report said.
The SRE Pulse provides insights into the state, practices, health activities, automation and adoption of SRE around the world. The report includes input from more than 460 SRE leaders and practitioners from companies of all sizes.
SRE has begun to take root in many organizations and its adoption will continue, the report said.
“It is encouraging to see that overall adoption is solid. However, companies are at a variety of states of SRE — from the entire organization leveraging SRE (19%) to specific teams, products and services (55%), to piloting SRE (23%),” said Eveline Oehrlich, the DevOps Institute’s chief research officer, who led research and data analysis for the Global SRE Pulse with support from sponsors Sumo Logic, AWS, StackState and Sedai.
“The results demonstrate that SRE enhances development and operations collaboration, increases IT value from the business perspective and is an essential engineering function for digital transformation,” Oehrlich noted. “As we look ahead, SRE will continue to play an important role in driving value for modern and complex software environments — especially in teams’ efforts for continuous improvement.”
Meanwhile, SRE is useful both for running businesses and serving customers, the report indicated. When asked where SRE is applied today — the software a company builds or the set of services SRE teams interact with — 56% of survey respondents said they use SRE for operating their Systems of Engagement and 42% for their Systems of Record.
“The need for SREs is not a hype cycle or a fad, it is here to stay,” said Bruno Kurtic, founding vice president of Product and Strategy at Sumo Logic. “The SRE function is critical to ensure application reliability for digital business. DevOps Institute has its finger on the pulse of this important field. This research further proves the growing prominence of SRE to thwart competition and drive business innovation.”
According to the survey, the biggest challenge to SRE adoption and success is finding staff with the right skills for SRE to work. Eighty-five percent (85%) of survey respondents cited the lack of staff with the necessary skills as their biggest challenge when implementing SRE. Additional challenges cited in the report include “value of SRE is not understood” (71%), “lack of tools in place” (55%), “don’t have time to implement SRE” (53%), and “lack of management support” (44%). When analyzing the challenges across the different company sizes, there were no significant differences, the survey showed.
“Some organizations are opting for a single, central team of SREs, sometimes replacing their traditional IT Operations team and updating their ITSM practices with SRE approaches,” said Helen Beal, chief ambassador for the DevOps Institute, in a statement. “Others are embedding SRE in their multifunctional, autonomous product-oriented teams. The organizational design doesn’t matter — as long as the SRE outcomes for availability and customer experience are attained.”
Moreover, the report indicated that the SRE role is balanced between Dev and Ops work. The daily task for an SRE is to continuously improve the reliability of systems and help with troubleshooting tactical problems. For instance, SREs spend time on IT infrastructure and operations-related work such as performing retrospectives, spinning up new hosts/instances and performing release management activities. In their operations work, they address customer issues and are on call, the report said.
And because the applications that SREs oversee are expected to be highly automated and self-healing, the engineers have time to experiment or develop processes and best practices.
Where Do SREs Report?
The largest share of SRE teams report to IT operations. When asked, “Where does your SRE team report into?” Thirty percent (30%) said that they report into IT operations, 22% said they are part of a separate team, 18% report into application design and development, 18% report into IT infrastructure and 3% said they report into IT security.
“I believe both Reliability and Security are important facets of ‘product quality,’ so I’m not surprised to see security concerns make their way into this survey,” said Sam Fell, vice president of Observability Product Marketing at Sumo Logic. “However, I am surprised that the number is only 3% and would expect see that number rise in the future. In my experience, companies that encourage more collaboration between Dev/Sec/Ops teams are eliminating data and process silos that create better outcomes for their employees and their customers.”
Tools in Use
For SREs, the most adopted automation tools are observability and monitoring platforms, the report indicated.
SRE teams use automation to increase the rate at which changes are absorbed. For multiple teams with varying topologies all implementing SRE across different ecosystems, intelligent automation ensures reliability, health and continuous operation of systems, applications and services, the report said.
“The need for Site Reliability Engineers opens new opportunities for DevOps humans to advance their careers and take on an engineering role that provides immense value to organizations around the globe,” said Jayne Groll, CEO of DevOps Institute, in a statement. “The Global SRE Pulse examines the human and automation aspects of SRE and offers deep insights into SRE adoption, best practices, challenges and outcomes. With adequate development to address the skills gaps, opportunities for SRE are vast for IT professionals — and in high demand at companies of all sizes.”