Site reliability engineers (SREs) are tightly woven into DevOps today. They also provide a changing and critical role in deployments on cloud native platforms and microservices deployments. But as a job description and function, an SRE position and role is often described incorrectly — in that way, the definition of an SRE can mean many different things, depending on whom you talk to.
And yet, SREs do share some common responsibilities in DevOps, especially in cloud native environments where the roles of development and operations are often very different compared to those for traditional monolithic and on-premise infrastructures.
In this episode of The New Stack Makers podcast, Steve Herrod, managing director at General Catalyst, expressed this truism: “One thing I would characterize most SREs by are that they are great with writing automation and scripts and using the tools themselves to do something custom in their environment.”
Herrod should know, as many of the organizations his venture capital firm works with, rely on SREs as an integral part of their DevOps model. About 10 of the 13 companies on which he is a board member have them.
“It’s been fun to watch,” Herrod said. “It’s really tracked forward with the DevOps movement that everyone has really been talking about for quite a while.”
However, the concept of an SRE was employed by Google a long while ago, when the search engine giant began its momentous shift to the cloud.
“Most people may have heard site reliability engineering really started at Google as they were building up these massive scalable applications running on top of the cloud infrastructure there. So, originally, they were in charge of making these things run well, and track them and interact with their coding teams and there’s more and more companies have started delivering SaaS applications and highly available ones,” Herrod said. “They’ve taken a lot of the original tricks and tools used by Google and certainly evolved it at fair amount to where we are today. And where we are today, is probably if you ask ten people, you get ten different definitions of the SRE role but it definitely comes down to a core ability to automate things and to make sure that applications are highly available.”
SREs also play a key role in bridging developers and operations teams. What you typically hear about is an SRE is actually kind of perfect for DevOps — usually, about half their time is around Ops issues, whether that’s answering a crashed application or debugging something that’s around the live production,” Herrod said. “And then the other half is around what are the tools and the code that I can write to basically not have problems happen in the future. And so, I think it’s just perfect blend of Dev and Ops and in fact often times you see them as being a coder who has done a bunch of system administration or a system administrator that knows coding.”
The capacity to emphasize with team members, and ultimately, the end-use customer’s wants and needs and understanding how automation factors into the equations are also key.
“So they have to have appreciation and the empathy for that but I don’t think you would ever call someone an SRE if they didn’t have more of a coding and automation mindset. And in many cases, when you talk to anyone at SREs, the mantra over and over is automate, automate, automate,” Herrod said. “And that might be automating the testing you do before you deploy something. It certainly is, how do you automate troubleshooting or automate restarting things when things going right.”
As the end of the day, an SRE will also be able to successful meld together different skill sets.
“I think, the real key, the perfect profile typically, is someone who has done system administration and operations at scale, they understand the implications of poor performance or of things going down,” Herrod said. “They understand how hard it can be to capture data when things are running live or changing things while things are running live. It’s sort of that changing wheels on a race car while it’s still going.”
In this Edition:
0:31: Where are we with the SRE position and role today.
5:26: How are things changing as we move towards cloud-native platforms, serverless.
12:23: SREs and cloud native environments.
14:15: The possibility of the lifestyle component and the middle-of-the-night alert.
20:33: The SRE instigating DevOps in an organization.
27:29: To what extent are programming languages, and knowing certain programming languages are critical as an SRE.
Feature image via Pixabay.