Culture

Making a Difference at Uber and Airbnb, the Story of a Reliability Engineer

5 Jan 2021 1:56pm, by

Welcome to The New Stack Makers: Scaling New Heights, a series of interviews, conducted by Scalyr CEO Christine Heckart, that cover the challenges engineering managers have faced when scaling architectures to support the demands of the business.

Uber. Recall the company in 2017, the management, the scale, and the post by then Uber engineer Susan Fowler, which detailed her terrible run-in with the workplace sexism that plagued the company at the time. Uber had scalability issues to confront, but also, less known at the time, was crippled by serious HR issues as well, ones that left Fowler, an invaluable microservices expert, totally alienated from the company.

Site Reliability Engineering was new to Uber, as the company’s engineers were still developing microservices that broke off from what she describes as its “monolithic” API. It was an exciting time to be at Uber. She chose a team to work with that would play to her expertise. And then the troubles started. Her manager said he was looking for women to have sex with. From that point, it just got worse. Fowler captured her experience in her subsequent book “Whistleblower.”


Scaling New Heights EP #8 – Making a Difference at Airbnb, the Story of a Reliability Engineer

Also available on Apple Podcasts, Google Podcasts, Overcast, PlayerFM, Pocket Casts, Spotify, Stitcher, TuneIn

In this interview with Heckart, Donald Sumbry, who was at Uber then and now heads reliability engineering at Airbnb, was not aware of the issues internally at Uber, due, he says, to the work and all the technical problems that needed resolving.

“In early 2017, we had the Susan Fowler blog post, and one of the things I remember the most was that some of what was what had happened was actually a surprise to me,” Sumbry said, “And I realized that I was so knee-deep in the work that I was doing, that there were so many problems to solve. And we attracted the type of people that just jumped into a problem. And you know, you know, spend all your time trying to solve it the right away, that it was sort of easy to miss some of the things that were happening around you.”

“And if there’s one big lesson I’ve learned from that, it’s that you got to pay attention to everything, you can’t be super focused on, you know, one specific problem,” Sumbry said.

“But that said, you know, there’s a lot of people like me that spent a lot of time over, you know, the next few years, really working hard to improve the culture at Uber to really help implement positive changes,” Sumbry said. “I’m really, really proud of a lot of stuff that we did there over the years. You know, I feel like I’m one of those people that left it in a much better situation after I left them before I started there. You know, I’m definitely really proud of that.”

As a reliability engineer, there were challenges in reporting on a single, global availability metric from the thousands of microservices Uber was running. To measure and report on the availability of the infrastructure proved to be a lot of work. It had been attempted three times before without success.

It took a year but Sumbry and his team did it — they provided a way to show the global availability in a clean way.

“Now I see that the industry, in general, is now moving towards measuring availability in the same way,” Sumbry said, “And so I think to an extent we were a little bit ahead of everyone else. And now, you know, everyone else is kind of catching up right now. So that was awesome.”

Today, the contrast between Uber and Airbnb is in terms of the requirements of scale-out architecture with deeper contrasts in the business dynamics. Uber is a transactional business, the customer experience is often just about 15 minutes. For an Airbnb customer, it’s a relationship, a brand experience with often the cost to the customer of several hundred dollars.

Joining Airbnb, Sumbry brought what he learned at Uber about looking at the big picture. He also learned to avoid the savior complex. Every company is different, no matter how much it may seem that the engineer has seen it all and can solve all the problems.

“So you know, as you’re taking everything in on that big meta-level, you have to understand what’s different about the company that you’re at, take the time to actually learn that, and then apply your own unique knowledge and experience on top of that, as well,” Sumbry said.And so I’m definitely keeping that in mind today. If I could have helped do that at Uber, like back in the day as well, you know, it would be one of those things I, I probably just tweak a little bit or change as well.”