On Call Rotations: How Best to Wake Devs Up in the Middle of the Night
There’s something glamorous about a surgeon being on-call, at least on television dramas such as “Grey’s Anatomy.” After all, the significant others of doctors can certainly be annoyed by it, but they are saving lives, right? But when you’re on call for software development, you have a harder sell to your husband or wife.
But on call software development is important. It’s all about sustainability at the organizational level. “On-call is your sharpest tool in your toolbox,” said Charity Majors, founder of Honeycomb, an observability platform.
She was speaking at this year’s Monki Gras in London, a conference centered on sustaining craft. Majors says being on call supports sustainability because, while it doesn’t usually have the thrill of greenfield development, on call really gets to customer needs and maintaining your software.
On Call Should Be an Honor and a Privilege
“I am sick and tired of people talking about how on call ruins their life,” she said, speaking to managers who shouldn’t feel like they are inflicting a necessary evil on their teams. Majors herself has been on call, in her estimation, for about half her life now.
“It is an honor and a privilege to be a fully paid engineer to make software that you believe in.” — Charity Majors
She argues the number one way to align infrastructure and to enact DevOps is to put your engineers on call. It’s all about reframing the superhero nature of the on call engineer, reminding them: “You build things that matter.”
It all comes down to remembering why you are on call. Many audience members not only answered this but why their partners think they are on call because the on call experience is undoubtedly a family affair. The reasoning all came down to “In case one of our clients’ software stops working” and “Stop writing shit software.”
“On call rotations are a necessary and critical piece of every software development ecosystem — provided you have more than one person working on a product and you care about quality at all,” Majors said.
“Software engineering is about maintainability — fixing, iterating, listening to feedback, fixing bugs,” Majors said, saying the on call rotation gets this stuff done, as the responsibility falls on everyone, not an individual.
“It is your responsibility as a leader of a company to make sure you have a team that can work on the entire codebase,” she said. She says you accomplish this by creating load balance and a sustainably distributed knowledge and experience. She calls on call the relentlessly human-scale process that accomplishes this.
How to Make On Call Actually Feel like an Honor and a Privilege
Like most things with company culture, organizing a motivated on call rotation is a mix of good leadership and strong team buy in. Majors offers ways to get a necessarily enthusiastic commitment.
On Call Greatness Tip #1: Every Rotation Needs an Owner.
She says this rotation ownership usually falls on a manager, but it’s also good for somebody who wants to be on the manager track, as it gives them a true understanding of tedious tasks plus culture. When someone is woken up on call, this owner should know the next day, so it’s not only acknowledged with gratitude but so it can be considered with planning future software improvements, so they aren’t woken up again for the same problem.
Majors also says you should use your service level agreement (SLA) budget because this on call time is essential to meeting your SLA, as well as that’s a budget she thinks you should use.
“If you haven’t used your downtime, you aren’t taking enough risks.”
“Both customer good will and team good will can get burned through — which in my experience team burnout is faster.” — Charity Majors
On Call Greatness Tip #2: Code is Liability.
We are all struggling with technical debt, so it’s important to recognize that adding to the codebase is not usually the best way to deal with a problem.
“The best senior engineers that I’ve worked with are the ones that worked the hardest to not have to write more code,” Majors said.
A successful on call rotation is a way of getting creative with fixing things within the current codebase, not adding to it. Often it’s fixing things that have been broken for awhile, too, because so much of software development is focused on building and building more and more features.
On Call Greatness Tip #3: Services Need Owners, not Operators.
Majors talked about the problem DevOps is supposed to solve, where developers are “the absentee fathers of software development” who just toss things over a wall. By having devs on call, it gives a better sense of ownership and care for that work.
She says that software is a long-term commitment and that everyone has to be involved in its maintenance because otherwise, it’s simply too much time-consuming work to successfully accomplish.
On Call Greatness Tip #4: Take Advantage of That Time to Fix Things.
As a developer or an on call manager already woken up by your pager, there are small things you can do that make for better software. And these are usually the things you just don’t have time to do during your regular daily grind.
You can answer support tickets. You can fix staging areas and continuous testing. You can fix unreliable tests and other test failures in shared systems. You can answer tech support and reply to shared inboxes. You can track down security bug bounty reports.
When you are focused on the code during off hours, you are able to focus on caring more about less, including focusing on resiliency which of course is also focusing on sustainability.
On Call Greatness Tip #5: Signal That you Value this Work (and Actually Value This Work)
Appreciation is one of if not the most important motivator in any work. If you are facilitating people waking up in the middle of the night to fix broken software, you better make sure they not only know their work is appreciated, but that what they are doing matters, to you, to the project, and to the clients.
As an on call manager, you must always remember that you are responsible for waking people up in the middle of the night. That’s why Majors argues the manager should take herself out of the rotation, not because she’s better than it, but because she’s then ready and able to jump in and help when needed. When a developer has been up all night dealing with broken software, the manager should be the one to say “Stay home,” and then cover for that person.
Finally, Majors reminded us that “a lot of these things do not require to be paged in the middle of the night, avoid that as much as possible.”
In an on call rotation, “You should not get paged unless the world is on fire.” — Charity Majors
She managing an on call rotation is all about modeling a compassionate company culture. And that sometimes just means applying common sense like never paging new parents: “You should not have more than one thing waking you up in the middle of the night.”
Decide what you will support and what you will not, so pagers aren’t going off for anything more than what’s necessary.
As a manager of an on call team, get creative and try to implement small incentives, positive social pressure and team cohesion.
And believe in your team.
“Don’t be afraid to ask for hard things — people want to be asked to step up and do amazing things with you,” Majors said. “This way, your on call rotation, instead of being the sentence of death, it’s something you can look forward to.”