Monitoring Developer Metrics: Team Approach Is Best
Mason McLead, chief technology officer of Software.com, knows what it’s like to have a development team revolt against monitoring software.
McLead deployed a monitoring solution at a previous company when he had an employee that, on one level, he already knew wasn’t working out.
“It told me what I already knew, like it was not necessary. I was just feeling personally insecure about it,” McLead acknowledged. “When I finally had a good chat with this person, they were relieved that we finally had this chat, and all of that data that I gathered was useless.”
What it did do was irritate the performing members of his team, he added.
”I told my team that I had this tool, and I was trying it out; and a month later, they were like, no, you need to get rid of it. We’re all working super hard. It’s ridiculous that you would be looking at this data and tracking us in this way,’” he said. “So we ejected it in a month.”
That’s why he’s very clear that Software.com’s DevOps metrics tool is not used to report on individual developers, but rather to aggregate data about a team’s work.
“It’s definitely a big point that we have that we are not spyware in any way,” he said. “I still code. I’m a developer, our product people are developers. We take that to heart.”
Instead, he recommends CTO and other DevOps leaders provide those productivity metrics to the developers, both as teams and as individuals.
For individual developers, Software.com provides a “flow mode,” which uses individual metrics to determine when developers are in the “flow state” — that holy grail of work where the developer is fully engaged and deeply involved in coding — and not to be confused with development flow. The flow mode tools install in the developer’s code editor, such as VS Code or Sublime Text, to track data specifically for the developer — and only the individual developer has access to it. The tool leverages machine learning to detect when the developer is in that precious flow state and then the tools can automatically shut off disturbances to help the developer maintain that state for longer.
“We have an automation platform that I use to turn off Slack notifications, it sets my status to away and puts a little purple dot to let people know that I’m in flow and books time in my calendar, so no one else can,” McLead said. “It puts on my favorite Spotify playlist.”
Metrics for the Greater Good
Those individual metrics aren’t sent to management, however.
Instead, Software.com draws event-level metrics from the company’s CI/CD pipeline and PR reviews and deployments on GitHub. The data is all anonymized individually and focuses on the team’s efforts as a whole. The event-level data is validated by a JSON schema, he added.
“We actually have authentication built into our data pipeline, so you can’t send in a bunch of fake events and expect that to mess up metrics downstream,” he said.
The data flows in using a tool called Snow Plow, then goes into an Amazon Kinesis data stream. Events are pulled off that as they come in and pushed into files in Amazon S3. That data is then pulled into a Snowflake data warehouse.
“We’re able to go from an event occurring to us being able to analyze it in about 20 seconds on average,” he said. “So that’s how fast we’ve gotten our response time to events in our pipeline.”
There are bigger batch jobs that are running on DBT — its data transformation tool — to tell Snowflake what to do, he explained, adding that all of the SQL code is in GitHub. Software.com uses pull requests and that data gets pushed out into a whole series of report tables that are cached and then served up to the frontend via an API for charting and for custom dashboards, and for export.
“What we’re going after is the parts where we’re able to monitor what’s flowing through your actual delivery pipeline, and show you where that’s working, and where that’s not. And also highlight what’s important right now, and what’s not,” he said.
Software.com has used its own solution to speed up its data team. It had an engineer go through the data and find places to automate and remove manual processes. The vendor was able to half its lead time from six days to three on average to get new data work to production.
“I didn’t, as the CTO says, ‘You need to go and do that,’” McLead said. “They were able to see the data, they knew the pain internally, so they went and fixed it.”