Contributed / Top Stories /

Measuring Engineering Velocity: Why Mainline Branch Stability Matters

20 Mar 2018 10:51am, by

This is the first article in a three-part series about Measuring Engineering Velocity sponsored by CircleCI. Read more about deploy frequency in part 2 and deploy time in part 3.

Can metrics relating to DevOps indicate your business’s potential for growth? If yes, how can you improve these metrics so that DevOps becomes your gold standard for improving code? We sought to answer these questions in our new CircleCI report on engineering velocity, offering a deep dive into the impact of DevOps on digital transformation. By examining a sample of GitHub and Bitbucket organizations built on CircleCI’s cloud platform in-mid-2017, we were able to study three metrics most likely to affect engineering velocity: mainline branch stability, deploy time and deploy frequency. In this post, we’ll focus on our results for mainline branch stability. (Read more about deploy time and deploy frequency in our upcoming articles.)

Mainline Branch Stability: Vital to Staying Deployable

Jim Rose, CircleCI
Jim Rose, CEO, joined CircleCI in 2014 through the acquisition of Distiller, an iOS-only continuous integration service. He was Distiller's co-founder and CEO. Prior to Distiller, Jim was the co-founder and CEO of many companies; Copious, a social marketplace backed by Foundation Capital, Google Ventures, amongst others; Vamoose, a vertical search engine in the travel space acquired by Internet Brands; MobShop, which invented and patented the idea of group buying online in 2000, raised over $49 million in funding, and whose IP was acquired by Groupon.

The mainline branch is your application’s source of truth — the master mold for every feature branch created by your developers. If the mainline branch is broken, your team can’t start building new features, and can’t respond swiftly to address major incidents.

In our study, we measured stability as the percentage of wall-clock time a project’s default branch spent in a failed state. Wall-clock time represents real-world passage of time, as opposed to total time consumed by all of a project’s containers — virtual machines in which software is tested. The default branch is the branch a project has chosen as the project’s “master” branch.

Findings: As we found in our study, stability is a key metric, with 80 percent of organizations in our sample keeping their mainline branch deployable over 90 percent of the time. Median stability is at 98.7 percent stability; the mainline branches of the lower 5th percentile spend 47 percent of their time in a failed state.

For the top 10th percentile of Alexa Internet Ranked organizations, the findings are similar, with the bottom 5th percentile of organizations spending 46 percent of their time in a failed state. Median stability is 98.5 percent, and the top percentile is 99.9 percent. Again, 80 percent of all Alexa-ranked organizations keep their master branch stable 90 percent of the time.

Whether organizations are part of the Fortune 100 or they’re three-person startups, most understand the value of high mainline branch stability. But reaching and maintaining that stability requires rethinking team structure and traditional processes.

Best Practices for Maintaining Mainline Branch Stability

Robust suite of tests: If a mainline branch fails, production grinds to a halt. Smart companies know this and invest in robust automated testing. By testing each change, engineers ensure their code functions as expected. These test suites act as a kind of preventative healthcare for software. It’s far easier to catch bugs by performing routine checkups — and the more routine, the better.

Ben Sheldon, senior software engineer at Code for America, believes good tests reduce cognitive load. “In most of our code reviews, we have high confidence that, if the tests pass, the risk of actually breaking something is low,” Sheldon says. “Our code review process is less around preventing problems and more around understanding changes that are happening. Ultimately, we want engineers to have the opportunity to talk at a high level about architecture, or where the codebase is going in the long term.”

Feature flags: Even the most sophisticated test suites can’t capture the complexity of the real application. Instead of depending on tests to catch every potential issue, feature flags can improve chances for safer releases.

“Feature flags decouple the release of functionality from the deployment,” explains Tim Wong, principal technical account manager of LaunchDarkly. “With feature flags, you get to decide when to deploy the code, and when to release your code. You test in the real world, then you can decide whether you’re prepared to give that to your users.”

A bonus effect of feature flags is that they can be used defensively during incidents. If there’s a problem in production, developers can simply turn off the offending feature, quarantining the problematic code until a fix is produced. This is usually faster than rolling back an entire change.

Efficient recovery from failure: Even with the best test suites, bugs will inevitably find their way into production. When that happens, it’s important to have a fast, reliable recovery process in place.

Increment, the software magazine published by Stripe, recently published its study on incident response at leading companies like Amazon, Facebook and Google. The study found that these organizations had well-defined processes for dealing with production outages, from paging on-call engineers to runbooks for consistently triaging issues, all of which are essential for maintaining the mainline branch’s health.

One finding in the Increment report that’s relevant to our study is that leading companies mitigate before they resolve. This often means rolling back a change instead of fixing the root cause. Debugging an error is time-consuming and results in more time in the red, whereas reverting a change can quickly staunch the flow.

Communicating the cost of failure: Bugs aren’t just errors in the codebase; they have real consequences. Identifying and understanding the impact of these consequences helps prioritize work and increases motivation to resolve technical issues.

At GetCalFresh, a Code for America service that helps people apply for food stamps online, engineers block out an hour each day to observe how people fill out food stamp applications. If anything breaks in that hour, engineers witness it firsthand. This makes abstract “errors” more concrete and increases motivation for fixing problems. Not only do engineers see the impact of their work, they’re also able to see how problems affect lives.

Communicating potential conflicts: While tests are the most obvious way to prevent failure, there are other variables that affect failure. The most insidious of these is a lack of awareness and communication around potential code conflicts. It might seem logical to organize teams around parts of the tech stack, such as backend and frontend, but these structures have weaknesses. Horizontal slices of engineers have more blind spots and less insight into how their code affects others.

At, which provides computer science training to teachers and students, engineers are organized into functional, vertical “cabals” of three to seven engineers. Each cabal participates in a daily standup, where engineers share unplanned work and highlight potential conflicts. These daily standups work because there are just enough engineers to be productive, but not so many that people are losing context into what others are doing.

Keeping a close eye on mainline branch stability is one of the ways your organization can adopt a nimbler development practice. It’s also a good way to ensure that your DevOps culture is one that embraces change and adapts to it.

CircleCI sponsored this story.

Feature image by CircleCI.

A digest of the week’s most important stories & analyses.

View / Add Comments

Please stay on topic and be respectful of others. Review our Terms of Use.