Google Reveals the Secrets of DevOps
At the JFrog SwampUp conference in Napa earlier this year, Google Senior Director of Engineering Melody Meckfessel and former Google vice president Sam Ramji (now with Autodesk) were showing off the way Google makes its sausage. Unlike other less well-curated development experiences, Google’s process is worth examining and shouldn’t leave anyone offended or covered in sausage leavings.
For starters, Google’s internal development processes and practices are immense. The numbers revealed by Meckfessel at the conference showed that over 500 million tests are run per day inside Google’s systems. That’s to accommodate over 4 million builds, daily.
Why so many builds? Because Google’s Bazel build system allows for near-instant build processes, ensuring developers can quickly gain the feedback they need from their code.
“You want your build cycle happening almost instantly, so your developers aren’t context switching. The insight and observability we have into the monitoring process might allow us to be able to say we could speed up a release cycle. Or I proactively get an alert that our error rates have spiked. It’s about the ability to have that data at your fingertips to make improvements. How are we feeding that data back into making the code better? That’s going to feed back into what I want as an operator: a stable and reliable production stack,” said Meckfessel.
The pair mentioned a number of tools used by Google, with the newest being Asylo, a framework for performing encrypted compute inside untrusted compute infrastructure. “Asylo lets you take advantage of the trusted compute environments from AMD and Intel. What if you don’t trust the host with your IP, but you still need the economies of scale? Asylo will give you encrypted runtimes,” said Ramji.
One of the more interesting and compelling aspects of the Google development philosophy is the idea of blameless post-mortems. When a problem occurs and is resolved, the chief goal of the Google development teams is to find a way to solve the problem with tools or processes, rather than finding a way to assign blame. Ramji referenced Toyota’s techniques which were the inspiration: an error is the fault of the process, not necessarily of the person who created the error itself.
“I think it’s important to ground everything we’re doing in the human as practitioners, and the community we’re collaborating with. Partners, members of our teams and others can have a very difficult time to transition to be a developer and operator. We can bring insight and empathy to the challenges that exist in DevOps,” said Meckfessel.
We sat down with Melody and Sam to ask about the tools they use internally at Google to handle development at such scale. Check it out!
In this Edition:
0:54: Should developers use Bazel even as a smaller team?
3:42: What about using Spinnaker?
7:13: The side effect of using developer tools as a cultural transmission mechanism.
10:59: Active improvement with TPU’s and the DevOps cycle.
12:17: Exploring the battery of tests run at Google.
14:50: Making builds repeatable.