How to Boost Developer Productivity with Generative AI

This year continues to be about pursuing developer productivity increases, mostly through platform engineering and AI. Those two are tied together, as the secret to unlocking developer productivity lies in the lessening of cognitive load and context switching so your engineering teams can reach that coveted developer flow time.
But are generative AI and the large language models (LLMs) backing them actually boosting developer productivity already? And how?
Ben Wilcock, who works in technical marketing at VMWare, took the BackstageCon stage in Chicago this week to make the case for a strategic approach to AI, including the open source chatbot he’s built right into a local version of the Backstage developer portal.
AI Productivity Gains Are Already Measurable
Wilcock kicked off his talk sharing the measurable developer productivity gains that teams are already achieving and sharing.
Back in September, Harvard Business School and Boston Consulting Group together published a large-scale study of AI productivity, which examined the effect of AI on the productivity of 758 knowledge workers from BCG.
“Every single knowledge worker that took part saw a benefit from using AI in that particular test,” Wilcock said. Indeed, high performers among staff saw a 17% improvement, while lower performers saw a 43% improvement. These younger, less experienced folks saw the most gains.
“They also found that when they used AI for tests that it’s really good at — things like writing and analyzing data and creativity and that sort of thing — the workers who used the AI saw significant benefits in their productivity and the quality and the speed,” he explained.
“These sorts of productivity improvements don’t come very often, perhaps once in a generation.”
Overall, those in the trial completed 12% more tasks at a rate of 25% quicker — which, he pointed out, is like one quarter sooner.
Now, software developers are knowledge workers, but those in this study tended on the consultant side.
During the first half of this year, McKinsey ran an AI developer productivity study, which was smaller but controlled. It split 40 developers of varying experience in half:
- Group A had access to a general chatbot AI (like ChatGPT or Bard) and a coding AI (like GitHub Copilot.)
- Group B had no access to AI tools, and behaving as usual.
The experiment focused on code generation, refactoring and documentation. Developers with access to the AI tools saw a 20 to 50% speed increase and were 25 to 30% more likely to finish a task on time.
The McKinsey paper cited “tremendous gains” in four areas:
- Expediting manual and receptive work.
- Jump-starting the first draft of new code.
- Accelerating updates to existing code, including app modernization.
- Increasing developers’ ability to tackle new challenges.
It’s clear to Wilcock, AI makes us more productive, especially with knowledge-heavy tasks like coding. Therefore he thinks developers should consider AI, but likely following the McKinsey mix of internal developer environment-based AI to support coding tasks, alongside a more general AI chatbot.
“It’s also really useful to have something that helps you with all the times when you’re not inside your IDE,” he said. “Sometimes having access to chatbots, like a ChatGPT or something like that, is really really useful with helping with all those ancillary tasks that perhaps you don’t use your IDE for.”
AI for Full SDLC Productivity
AI can benefit the whole software development management lifecycle, Wilcock contends, improving DORA metrics, testing and documentation, and improving developer experience.
“Using AI in this way, by helping and supporting developers in their knowledge tasks, you can really reduce stress levels,” he said. “You can limit burnout. You can increase knowledge, share it more widely amongst team members. Experienced and inexperienced alike, they can all benefit.”
Indeed, the McKinsey report found that developers were two times happier and were able to enter those flow states faster.
The aforementioned study out of Harvard and BCG also found that AI can help diagnose and fix issues faster, as well as help reduce technical debt and improve code quality. This study actually found that AI work was 40% higher quality.
AI is already widely being used in test generation, which Wilcock pointed out, leads to higher code test coverage, which should translate to fewer defects and regression. We’ve already seen with products like Joggr that documentation generation is an important use case — as, year after year, developers demand more docs the context they give, but are still resistant to docs being part of the Definition of Done. We know that better documentation — especially when integrated with the code — reduces overall cognitive load and burnout, increasing team efficiency by increasing developer flow.
AI also has broad use cases, Wilcock reminded the BackstageCon audience, in data analytics and decision making — especially around resource allocation. We’ve already seen the effect of FinOps with Cast AI potentially cutting your cloud cost in half with the help of AI.
“Or perhaps it’s embedding AI features in products to help customers use them more easily,” he continued, “or to get more from them than you currently do.”
The Risks of Generative AI
But just because it has all these productivity benefits, for any technology developing as fast as AI — and frankly none so far has — there are red flags and risks. Especially when you are using online or public AI. Each organization needs to quickly develop and communicate a generative AI policy.
“The use of online large language model (LLM) offerings — such as ChatGPT and Google Bard — requires a number of trade-offs that many enterprises will find unacceptable,” reads one Gartner report on understanding ChatGPT and other LLMs. Researchers out of the University of North Carolina at Chapel Hill explored “Can sensitive information be deleted from LLMs?” What they found was that, even with state-of-the-art editing methods, they could not fully delete information.
Companies must remember that OpenAI is a research project and free things often come at a price. That cost right now could be your reputation when a data privacy breach occurs, like when a Samsung employee accidentally gave ChatGPT their chip manufacturing trade secrets.
“ChatGPT does not forget the prompts that you give it, and it can reuse those prompts in its memory and start to share those more widely,” Wilcock said. In this case, Samsung placed a full ban on internal use of ChatGPT, but by fully banning they are also blocking the possibility of these dramatic productivity gains.
Another big concern is the impact AI has on society and the earth. AI demands more chips — which we already have a shortage of — and more data centers, along with the energy mix that powers them. We already know that data centers are more likely to affect those in poorer communities, which means they disproportionately reap the impact of blackouts when they take up too much power to cool, less clean water also being used to cool, and pollution that is a consequence of living nearby. And AI continues to be growing at a scale previously unseen, so the impact will be greater. Wilcock gave the example of how AI tools fueled a 34% spike in Microsoft’s water consumption, putting residential water supply potentially at risk.
We also know that, while AI comes off as very persuasive, it can be inaccurate. In fact, an academic study out of Cornell University found that 52% of ChatGPT’s software engineering questions were wrong — and these hallucinations are usually persuasive. Yet, when compared with Stack Overflow (which surely the LLM has trained on), 39% of users preferred ChatGPT because of its “comprehensiveness and articulate language style.”
Currently, generative AI is trained to give an answer — whether or not it knows that answer. Not every team knows to ask a chatbot if it is certain it’s right, or to back it up with citation.
According to the McKinsey State of AI report, inaccuracy is the top concern of IT leaders, above cybersecurity and regulatory compliance. However, just 32% of respondents said they are working to mitigate inaccuracy. Overall, the report found that most respondents said their organizations are not addressing AI-related risks yet.
Finally, the issue of copyrighting is very murky with AI. The LLMs are likely to be training on at least partly copyrighted material — “you don’t know exactly where it came from” — and then, at least in the U.S., you cannot copyright something generated by AI.
Also Read: How Generative AI Can Increase Developer Productivity Now
Building an Open Source Generative AI Chatbot for Backstage
“I wanted to show you what I’ve been doing to experiment with AI safety in Backstage,” Wilcock said, which is “adding a private, secure, local AI to Backstage.”
The private, secure, and local aspect of this is important as it overcomes a lot of the challenges to generative AI around privacy and security, while still based on open source. It also enables retraining on your company data which can give domain and company-specific context.
Backstage is a service catalog and platform for building internal developer portals (IDP), open sourced by Spotify, that looks to create an end-to-end streamlined development environment, including infrastructure tooling, services and documentation. Wilcock’s proof of concept is called the Backchat plugin. It aims to give the user a feel for what generative AI features inside Backstage might be like.
“So developers didn’t have to leave Backstage in order to be more productive,” he explained. This proof of concept is based on a combination of open source tools that can be introduced into the Backstage graphical user interface (GUI)
- Open source tools and libraries proving Backchat’s backend: LocalAI, Chatbot UI, or Text Generation WebUI
- An open source large language model, like Mistral AI or VMware’s Open Instruct
By leveraging these smaller, open source LLMs, Wilcock pointed out that they can be run with or without a GPU — on a laptop, desktop, workstation or virtual machine — and have modest RAM requirements. These LLMs shut down when not in use and he says they are getting faster all the time. These also require no account, credit card or API keys, which further reduces risk and makes it easier to get up and running.
Generative AI Use Cases
“It’s not just for developers, remember, it’s for anybody who’s got access to Backstage. If that’s your product managers, if that’s your database people, if that’s your infrastructure people, they can all come together, they can all use it in the same spot,” Wilcock explained. Examples he gave include:
- Databases: designing databases, creating data, extracting data, transforming data, and answering questions about Graph and SQL databases.
- Operations: managing platforms, maintaining security and availability, DevOps tasks, like help with Kubernetes.
- Testing: Writing tests, improving test coverage, generating test data.
- Documentation: Writing documentation, reformatting documentation, accessing and improving documentation.
- Ideation: coming up with and evaluating ideas, suggesting alternatives, simplifying concepts, and translating text.
- Research: ask your GenAI to pretend to be a particular user persona and ask it questions, learn new concepts, and evaluate alternatives.
- Teamwork: 360-degree feedback optimization (like to be less blunt in giving feedback), writing emails and proposals.
- Unfamiliar code: explaining code, troubleshooting frameworks, learning new techniques.
Speaking of BackChat, “It’s not how I want it to be. I’ve got some ideas about what the roadmap should be for this tool, but I need help,” Wilcock commented. “So if you’re a developer out there and you want to help bring generative AI to Backstage, then you should definitely get in touch because I could use your support.”
Generative AI for Developers Prompting Tips
As part of your organization’s generative AI policy, provide advice for developers, who all are on their way to becoming prompt engineers and AI engineers.
Wilcock kicked off this section by offering a simple AI prompt structure:
- Role – Tell the AI the role that it’s performing.
- Situation – Tell the AI the situation in which it’s being used.
- Task – Give it a very clear directive in terms of a task.
He gave the example:
- You are a database designer with expert knowledge of SQL.
- Your company manages a fleet of vehicles.
- Create a database that holds information about vehicles and their owners.
“Next, think about chain of thought prompting,” Wilcock continued. “This is when you encourage the AI to take a breath, think step by step, work through the problem slowly,” as a way to prevent the AI from jumping to conclusions, by showing its work.
You can tackle this by adding suggestions, like:
- “Let’s think about this step by step.”
- “Check your answer.”
- “Think carefully.”
He gave the example: “Create the JavaDoc required to fully describe what the following method does. {{Fibonacci method}} Your JavaDoc should be descriptive and verbose. Include details the reader will find fascinating. Think carefully and check your answer.”
“Using chain of thought prompting can help it become more accurate,” he explained. “I’m hoping this is going to get the AI to sort of look at the code that’s in the method and sort of actually try and document it properly.”
Wilcock offered other AI prompting styles:
- Show Me (What You Got) Prompts: you show the AI what it needs to do, providing an example of the desired result, but without giving too detailed instructions.
- Tell Me (How to Succeed) Prompts: tell the AI what a good answer will look like, providing explicit instructions, clear steps and success criteria for the response.
- Few Shot Prompts: for repeatable tasks, provide the AI with a few examples to demonstrate the output format, style, and key elements it should copy.
Whatever path you take, AI has great potential for developer productivity — and productivity across your organization — so it’s important to consider how you will integrate it within your people, processes and products in a way that will maximize benefits while minimizing risks.
“Think about what are you going to do when AI is everywhere, like when AI is in your pocket or in your car or in your fridge?” Wilcock ended with. “What are you going to do when it becomes an AI-centric world?”