LONDON – In 2014, Facebook moved from CEO Mark Zuckerberg’s favorite motto, “Move fast and break things,” to “Move fast with stable infra.” The new slogan hasn’t caught the tech bro wave in the same way, and it still misses the mark quite a bit, neglecting the most important factor in engineering success — the teams.
In our rush to move “harder, faster, stronger,” she said in her talk on Wednesday, we focus too much on the tech and not enough on the people and processes. Without them, we’re simply not setting teams up for success.
“Speed without direction is running without actually knowing if you’ll get there,” Cregg told the audience. And “moving fast, breaking things, and taking forever to resolve the problems does not result in happy customers and employees.” Yet, you are still expected to deliver features faster than ever. How can you move fast and not break things?
Moving too Fast, You’ll Miss Things
Cregg quoted several founders who have learned the painful lesson that it’s just impossible to keep up with hype cycles. Eventually, the humans on your team can’t keep up pace — another reason why two-thirds of the tech industry has known burnout.
“If you’re using acceleration to keep up, you have no choice but to keep up that rate,” she said. But without processes, things will be rapidly broken and often missed.
Cregg, whose role now has her writing and speaking about the human implications of engineering decisions, used to work for a “spaghetti on the wall” startup — instead of strategy, she said, they were just throwing things to see what sticks.
Of course, developers want to move that fast, she acknowledged. After all, this is the industry that monetized the attention span right down to the first five seconds having the highest impact on impression, conversion and engagement. Each additional second, the conversion rate drops by 4%.
But moving fast and breaking things becomes a vicious cycle where companies push employees, employees push customers, who then in turn push companies to give more and more. It’s especially hard on DevOps and site reliability engineering (SRE) teams, she said, who end up unnecessarily on call.
There are other consequences to moving fast, she noted, ranging from releasing features before they are actually ready to employee turnover to a loss of customer confidence.
The ability to move fast safely is what separates the best from the rest. Cregg’s talk was probably the first on this fall’s conference circuit to heavily cite the latest State of DevOps report by Google Cloud’s DORA.
This year’s report showed more than ever how the organizations its researchers have dubbed “elite performers” stand out against low performers in moving fast, breaking things and recovering quickly. The gap between the two groups continues to grow with astounding numbers.
These elite DevOps performers:
- Make 973 times more frequent deployments.
- Are a third less likely to fail.
- Have a 6,570 times faster lead time from commit to deploy.
- And also are 6,570 times faster to recover from incidents.
Elite performers also all averaged less than an hour to restore service, while their counterparts took more than six months. These elite teams also boast significantly lower change rates. In order to successfully move fast and break things, Cregg said, you need processes, especially for:
- Releasing a new feature into production. This process answers how and when code is deployed, as well as how it will be progressively delivered.
- Incident alerts and response. This process identifies monitoring mechanisms, clarifies who is on call when, and outlines the escalation procedure.
- What to do when things do break. This process outlines the steps to safely disable or rollback features.
In fact, processes were a key theme in this year’s DevOps report. From documentation to security, the elite performers have all shifted left and integrated these essential steps into the full software engineering process.
“Elite performers that met or exceeded their reliability targets were twice as likely to have security integrated in their software development process,” the report found.
The industry overall is successfully accelerating, according to the report. Over the past three years, it’s gone from 7% to 26% of respondents meeting the elite threshold, while the lowest group has gone down from 15% to a mere 7%. “Yes, they’re moving quickly, but they’re also failing and learning from their failures faster,” Cregg said
Progressive Delivery Minimizes Blast Radius
Of course, some technical processes allow you to build better, faster. This includes testing in production. It’s also progressive delivery, the umbrella term for any time you release features to a subset of your customers, which effectively reduces your blast radius. Cregg included targeted rollouts, feature flags and canary launches as driving these elites to release and roll back fast.
“In terms of operations, things are going to break, which is why you need safety valves and kill switches [and to] look at service metrics and how features are performing,” Cregg said.
She shared an experiment that LaunchDarkly a feature flag and toggle manager, ran around pagination to test out whether it was going to make the tool harder for new customers. The team began with a small test mix of newbies and regulars. This progressive delivery experiment helped to measure that the changes to the tool had a positive impact on those who had a lot of feature flags, while it left no impact on those who had few.
“Fostering a culture of inclusion and openness can generally create better processes. Only you can prevent dumpster fires.”
— Jessica Cregg, developer advocate, LaunchDarkly
The Developer Happiness Index found that the more a team employs SRE best practices — like SLI and SLO frameworks — the less likely its members are to experience burnout while the more likely they are to optimize resources and spend time writing code.
Everything must be done to evaluate, measure and improve processes, to remove friction and create an environment that supports innovation. After all, team productivity is what fuels business success. In fact, McKinsey has found that these teams are capable of four to five times faster revenue growth.
Psychological Safety Still Matters
This level of elite success, unsurprisingly, comes down to psychological safety and specifically reframing failures, which Cregg said is a natural part of knowledge acquisition.
Review processes, retrospectives and postmortems are well-structured for learning. These teams also revise and change strategy as they go, capable of change on the fly. It’s a mistake, she said, to see incidents as failures. Instead, Netflix — that elite of elites — reframes incidents as surprises and chaos events.
“Experiments are learning opportunities and give you the ability to fail in a measured manner,” Cregg said. “Experimentation in software teaches us how to gather feedback.”
This also necessitates people feeling safe to speak up, not only when things go wrong, but when the process does. It’s also important to nurture a blameless culture, where you separate the attributes of incidents from human behavior.
One way to develop a relationship with psychological safety, Cregg told the Blueprint audience, is to change the meaning of “why.” Instead, get more people involved in the discovery process with “how” and “what” questions.
She again cited the Developer Happiness Index, which found that developers want to be part of decision-making processes, particularly in choosing their own tooling.
“Fostering a culture of inclusion and openness can generally create better processes,” she said. “Only you can prevent dumpster fires.”
The New Stack is a wholly owned subsidiary of Insight Partners, an investor in the following companies mentioned in this article: LaunchDarkly.
LaunchDarkly is a sponsor of The New Stack.
Featured image of Jessica Cregg by Jennifer Riggins.