Culture / DevOps / Sponsored / Contributed

Grow Your Skills in Site Reliability Engineering

3 Jun 2021 6:36am, by and

Ingo Averdunk
Ingo is an IBM Distinguished Engineer specializing in site reliability engineering and service management. He is the worldwide SRE profession co-leader for IBM.

Site reliability engineer (SRE) is the most exciting new career path created by the shift to cloud computing. The SRE profession is exciting because it operates at the intersection of cloud services and business applications, both of which can be very dynamic. By combining software engineering and IT operations skills, site reliability engineers enable the right balance between velocity and stability.

SREs collaborate across traditional IT boundaries to modernize IT service management. SREs bring agile methodology, DevOps and site operations functions closer together.

Site reliability engineer is a relatively new profession and requires a different mindset than most IT careers. If this sounds interesting, we encourage you to grow your site reliability engineer skills.

Who Should Grow Site Reliability Engineer Skills?

Gene Brown
Gene is an IBM Distinguished Engineer working in the Hybrid Multicloud Delivery Guild for IBM Global Technology Services.

There are two answers to this question:

  1. Organizations
  2. IT professionals

Organizations that use cloud have a strong business interest in growing their internal SRE skills. In our experience with cloud projects, teams with an SRE mindset deliver higher reliability and mitigate risks better than teams that don’t have SREs. When there are SRE skill gaps, application teams can get bogged down with complex site operations issues and operational tasks that are different with each cloud provider.

Large enterprises in particular can benefit from formal SRE training programs. It can be a great way to mobilize traditional IT workforces toward new ways of working.

IT professionals who are working on cloud projects or aspiring to move to cloud technology can learn how large-scale cloud computing is done. SRE training establishes a common foundation, consolidates the key points and encourages learners to think more holistically about their environments.

IBM new-hire classes for SREs are typically attended by former developers, DevOps specialists, systems engineers and technical specialists. There are no hard prerequisites, but solid IT experience is needed to transition into the SRE role successfully.

How to Grow Site Reliability Engineer Skills

At first, SREs were self-taught, but most SREs wouldn’t recommend that path anymore. Today, there are many choices of texts and courses.

The IBM new-hire curriculum initially used whitepapers, product documentation and live lectures in a five- to six-day boot camp format. It was effective, but not too efficient. It was hard to schedule learners and keep the content current.

Now, we rely more on self-paced courses. They’re better for learners and easier on us.

SRE Learning Paths and Certifications for IBM Cloud

We were part of the team that developed site reliability engineer learning paths and certifications for IBM. We think they’re a good example of what’s needed for larger organizations.

We defined three levels of SRE training and certification: associate, professional and advanced. Associate and professional learning paths are available now, as shown in figures 1 and 2.

Free study guides and sample exams are available for experienced professionals who are ready to take the IBM Associate SRE and Professional SRE certification exams.

IBM SRE learning paths are available as part of IBM Cloud role-based learning subscriptions. IBM Cloud role-based learning subscriptions enable access to learning paths for all job roles, levels and specialties in the catalog for the subscription period.

Learning subscriptions can be purchased from IBM business partners, IBM sellers and the IBM training marketplace. Certification and specialty exams are administered by a third party, Pearson VUE, and are not included in the learning subscription price.

SRE Textbooks We Recommend

  1. Site Reliability Engineering by Betsy Beyer, Chris Jones, Niall Richard Murphy, Jennifer Petoff, April 2016, O’Reilly Media, Inc., ISBN: 9781491929124
  2. The Site Reliability Workbook by Betsy Beyer, Niall Richard Murphy, David K. Rensin, Kent Kawahara, Stephen Thorne, July 2018, O’Reilly Media, Inc., ISBN: 9781492029502
  3. Seeking SRE by David N. Blank-Edelman, September 2018, O’Reilly Media, Inc., ISBN: 9781491978863
  4. The Cloud Adoption Playbook: Proven strategies for transforming your organization with the cloud by Moe Abdula, Ingo Averdunk, Roland Barcia, Kyle Brown, Ndu Emuchay, April 2018, Wiley, ISBN-13: 978-1119491811

Learn More About the SRE Profession

Lead image via Pixabay.

A newsletter digest of the week’s most important stories & analyses.