Cloud Services / Culture

One SRE’s Journey When Cloud Infrastructures Became What They are Today

1 May 2019 4:00pm, by

One SRE’s Journey When Cloud Infrastructures Became What They are Today

Also available on Apple Podcasts, Google Podcasts, Overcast, PlayerFM, Pocket Casts, Spotify, Stitcher, TuneIn

When Gianluca Arbezzano, now a site reliability engineer (SRE) for InfluxData, first approached Chris Churilo, director of product marketing, over two years ago, what an SRE was and did was a largely nebulous concept for him.

“I was looking at myself more as a DevOps person, but I was never a systems administration,” Arbezzano said. “I always wrote code and automated [applications and deployments].”

Initially, when Arbezzano reached out to InfluxData, “he said he was a fan of InfluxDB and we just hit it off and I said ‘we’ve got to do something together,’” Churilo said. Arbezzano then grew into the role from there and has since been instrumental in helping InfluxData in a number of ways, including helping to automate InfluxCloud processes and making sure customer deployments remain on track.

In this latest episode of The New Stack Makers podcast, Arbezzano and Churilo discussed what is like to be an SRE today and how it all fits together in InfluxData’s quest to help customers improve observability and analytics and automation with its time series platform.

Arbezzano’s interest in coding can be traced back to his online gaming days and studying the underlying code. Professionally, he quickly learned when using APIs to configure and manage servers, he “realized that was what I was happy doing more than writing product code,” Arbezzano said. “I really enjoyed writing automation for cloud and infrastructures and was really close the infrastructures side, dealing with provisioning of servers and with infrastructure failures,” Arbezzano said. “I discovered that I like this complexity and enjoyed it more than the produce side.”

Over the course of this career, while working mainly as a software engineer at mostly Italy-based software firms, Arbezzano often helped organizations make the shift from traditional data center models to moving operations to cloud infrastructures. “This definitely triggered my passion for infrastructure,” Arbezzano said. “On the cloud, you get good reliability with less investment so you can stay focused on what is core to your business. I really like to keep my team happy and confident about what they are doing and people are happy to code and deploy their code on something that works and have visibility into what they are doing.”

Soon after first approaching InfluxData, “an opportunity showed up,” Churilo said. “I needed to [know what] you could do with InfluxDB and Docker containers, so Gianluca and I brainstormed a little bit, and he came up with the idea of building this open source project called Orbiter that allows you to spin up and spin down based on the resources the application inside the Docker container needs,” Churilo said.

The collaboration served as a way to show the rest of the team members at InfluxData that “‘hey, this guy is pretty special — look at how cleverly he has approached this idea,’” Churilo said. “My hunch was ‘this is probably going to be useful in managing our cloud application.’”

Gianluca was able to demonstrate skills that later would soon fall under the SRE category. “You always want to make sure that you have people that can look at something and try to take a different perspective because we definitely needed at the time [to do things in a more automated way],” Gianluca said. “Truth be told back then, the way that we would actually try to allocate more resources to our cloud instances were selling to our customers was pretty manual and arcane. But having somebody like Gianluca onboard, looking at it, knowing that he could actually write code while understanding the environment and the resources that are being consumed by that individual customer and then figuring out how to give this person more resources.”

Today, Arbezzano said his role as an SRE covers more than just developing code for automation and the final software product. “I’m also able to understand how the code is working, in order to make it better.

I am also really closer to the code than I am to managing the infrastructure itself,” Arbezzano said. “My background is more as a developer instead of as a system administrator, so as an SRE, the challenge is to my job is not just to scale [deployments] from an infrastructure perspective on the cloud, as part of a DevOps role by automating and interacting with the cloud provider, but as an SRE you also need to [dig] into the code in order to understand how it scales and how to develop a solution that helps the code to scale.”

Taking a step back Arbezzano’s input, as well as all of InfluxData’s makers, involves taking much of the heavy lifting otherwise associated with analysis and observability, especially when things go wrong, from the shoulders of customers.

“You may not know all of the intricacies behind [applications], because when you see, as Gianluca said,  in a particular time frame that something didn’t really go right, that is your chance to really dig in very deeply. Instead of boiling the ocean and thinking about everything about [an application or a deployment] about which you will never have a full understanding, you can take a step back and relax,” Churilo said. “You can look at that point in time and say, ‘at least I know something went funny here and at least I know it is these two services that are the culprit and let’s dig a little bit more and you should get a pretty good sense about what to do next.’”

In this Edition:

0:52: Discussing Arbezzano’s career history and how he became a part of the SRE movement
8:06: What would you say to those kinds of concerns?
12:40: How Churilo and Arbezzano began working together
18:13: When you’re speaking with organizations and they’re having observability problems or monitoring problems, I’m assuming that you say, “Hey, this is what we did, and it really works,” can you give me an example of that, if it does occur?
23:29: Going back to the Kubernetes logging function, how did that work? What did you learn?
33:12: How does that feedback usually work?

InfluxData is a sponsor of The New Stack.

Feature image via Pixabay.

The New Stack is a wholly owned subsidiary of Insight Partners, an investor in the following companies mentioned in this article: Docker.