5 Steps to Debug Development and Operations Teams
I recently conducted a series of free online mentorship calls and a question that came up several times is, “What do you do as a CTO when you believe a team is underperforming?”
We all know that measuring developer productivity is difficult, but it is not unusual to hear from colleagues that work from one particular team is regularly late, unreliable or not very good.
And these productivity issues don’t just apply to developers. Indeed Chris Swan, an engineer at Atsign and former delivery CTO at DXC Technology, told The New Stack, “We saw it so often with operations teams, that we built a repeatable process for dealing with it.”
Strictly speaking, poor team performance is the responsibility of the team manager. As a manager of managers, your role is to alert the relevant team lead that there may be a performance issue and have them resolve it.
This also represents a good opportunity to teach the manager and help them grow, however, and I think it is fine to jump in if you see that a manager is struggling, or if they actively ask for help.
Swan agreed. “This isn’t something we can reasonably expect line managers to be experienced in,” he told us. “So it is entirely reasonable to expect them to ask for help, and for a CTO or VP of engineering to get stuck into providing guidance to managers so that they have a toolkit to help them deal with these situations.”
It is worth adding that managers hear so many complaints it can be difficult to know which to pay attention to and which to ignore. A side effect of this is that you’ll likely discover that employees working at the coal face will know what the issues are, and have raised them before, so part of your job is to give them and their manager sufficient backing and resources to make the necessary changes.
“A lot of the time we were providing support for the frontline workers to get what they needed from management,” Swan told us.
Here are five practical steps for figuring out why a team is struggling.
1. Check the Data for Clues
I use the term “team debugging” to refer to the process of investigating an underperforming team, and was delighted when reading Camille Fournier’s excellent book, “The Manager’s Path,” to find she used the same analogy.
Fournier recommended starting with a hypothesis as you might when debugging a system. “Do this in as minimally invasive a way as possible,” she stated, “to prevent your meddling from obscuring the problems.”
My own approach, though similar to that described in Fournier’s book, is to start by looking for signals in data, and using them to guide an initial hypothesis. I also try to keep in mind the poet Walt Whitman’s adage, “Be curious, not judgmental.”
Good things to check include turnover — do a lot of people leave this particular team relative to the rest of IT? Can you get sick day stats? If so, are people out sick more often than in the rest of the department?
These aren’t necessarily smoking guns, but they can give you an idea of what might be wrong. Keep in mind that even at this early stage leaks happen surprisingly quickly. Try and find ways to frame your questions so that it’s not too obvious what you are looking for.
The next thing I tend to do is look at calendars. Is the manager holding regular one-on-one meetings? Is the team spending too many hours per week in meetings?
Then, something that Fournier also suggested, look at the other systems of record — chat logs, emails, tickets, code reviews and check-ins, and see what they tell you.
Are the team members “bickering over coding style in their code review comments?” Fournier asked. “Are the tickets that are being written vague, too big, too small? Does the team seem upbeat in their communication style, sharing fun things as well as important work in chat, or are they purely business?”
2. Identify Why Doesn’t the Team Feel Empowered
For an operations team, a lot of the focus will be on the service management logs. But drawing on the late business management expert Eliyahu M. Goldratt’s theory of constraints from “The Goal: A Process of Ongoing Improvement,” the aforementioned DXC process involves getting hold of all of the service logs and then applying data science to identify constraints.
Working this way “allowed us to focus on the one thing that mattered most at that point,” Swan told us. Addressing the constraints ends up being an iterative process where you fix the top-level issue, and then re-measure.
“We took care to re-measure each time because we’d changed the operating characteristics of a dynamic, adaptive system,” Swan said.
According to him, certain constraints would come up regularly: “One we saw often was ticket ping-pong, where nobody would take responsibility for fixing a problem but would instead pass it on to somebody else.”
In this particular case, what you’re looking for is some means by which people will take accountability for fixing a problem, and be empowered to do so.
The question of empowerment comes back to the managers. “We’d find that the front-line people felt frustrated because they could see what needed doing but did not think they were allowed to do it,” Swan said. “Sometimes this requires you to look at soft aspects like role definitions and job specifications, but it might also get you into hard aspects such as identity management and authorization.”
A particular issue that Swan highlighted was around cybersecurity; in essence, access had been restricted to prevent bad things from happening, but as a direct consequence service agents were unable to fix problems.
Resolving this requires “a very thoughtful approach to risk management, which can be supported by better systems,” he said. “So, for example, if you can give people a ‘break glass’ route to privileged access management, then you’re not in the situation where staff constantly have access to ‘god-like’ credentials, but they can get them when they need them.”
Another common source of problems is handovers. It is always good to minimize handovers as much as possible and ensure that they are complete when you have them.
This often crops up in the context of outsourcing. So, for example, if the sys admin team is in-house but the networking team has been outsourced, “The sys admin team will say, ‘We raise the tickets and nothing happens,’ and the network team will say, ‘Every ticket we receive is incomplete and we are therefore unable to action it,’” Swan said.
“Here somebody needs to step in and clarify the interface between the two organizations so that high-quality information can cross that boundary.”
3. Walk, Talk and Skip the ‘Praise Sandwich’
Sometimes the data doesn’t seem to point to anything, which means you have to use another approach. A struggling team is rarely the result of a single point of failure (though I have seen a really bad manager destroy a really good team). So the next thing to look at is team dynamics.
If you haven’t done so already, and the team lead hasn’t approached you, talk to them. If the two of you are co-located, maybe grab a coffee together.
I’ve regularly held these types of meetings while out walking (including remotely, with both of us walking and talking on the phone) and have found that the change of environment can be helpful. Evidence suggests that walking unlocks creativity, but I’ve also found it helpful in situations where a discussion could be stressful.
Be straightforward and direct at this point, and please avoid the “praise sandwich,” sometimes referred to in rather ruder terms. (“Your hair looks nice. Your team is underperforming and I think it might be your fault. Nice shoes.”) It is not going to help anyone.
Make it clear that you’ve heard suggestions that the team might be underperforming and that if it is, you are here to help. What do the managers themselves think? And what can you do to support them?
4. Use 1:1 Meetings Wisely
If you hold regular skip-level one-on-one meetings (and if you don’t, I strongly recommend that you start the practice), this could be a good moment to have one with a member of the team. Try and probe a bit more; is there something amiss?
A common anti-pattern I’ve seen with managers is that they don’t turn up to one-on-one meetings — perhaps sending a direct message at the last minute to cancel. Or the manager doesn’t respond to messages on Slack.
Another common anti-pattern is someone who uses one-on-one meetings solely for boring status updates and doesn’t spend time talking to their reports about things that actually matter to them, such as career plans, ambitions or anything more interesting. All of these can be demotivating.
Another tactic is to attend a team meeting but keep in mind that, by dint of you being there, the team dynamics will change. Good things to think about here include, what is the energy like in the room? Do people seem to be engaged, or apathetic? Who is speaking the majority of the time? Is the staff bored? Are you bored?
If a team isn’t engaged, that is usually symptomatic of a deeper problem. From looking at the calendars you might already have one idea, which is that there are just too many meetings.
Another common problem is that the team doesn’t feel able to influence their work or set their own direction — the opposite of what Daniel Pink means in his management book “Drive” when he writes about “autonomy, mastery and purpose” as a framework for fostering motivation in people.
A classic symptom of this is when the team lead is doing all the talking.
Fournier also suggested in her book that you ask the team what their goals are. “Can they tell you? Do they understand why those are the goals? If they don’t understand the goals of their work, their leaders (manager, tech lead, product manager) aren’t doing a good job engaging the team in the purpose of the work.”
5. Find out Whether Team Members Feel Safe
A final thing to consider is team safety. “One of the main questions for me when looking at a struggling team is, ‘Are these people working in a safe environment?’” Swan said. “There is a psychological safety aspect to this—if they make a mistake, is that a learning opportunity or are they to be blamed?”
There is also a more practical aspect of being allowed to make mistakes that have damaging consequences but which could be prevented, said Swan. As an example, he talked about using trunk-based development without branch protection.
“Branch protection is a fundamental piece of safety but it’s not the default,” he said. “Without it, the clock is ticking until someone force pushes to the trunk when they shouldn’t.”
And a team that doesn’t understand this might also be too inexperienced to know how to resolve it. This is where an experienced manager can make all the difference.