How Generative AI Can Increase Developer Productivity Now
You can’t walk into a developer Discord without a chat about generative AI. After all, software development teams are trying to do more with less, while the tech stack is increasingly complex — making the search for developer productivity feel more urgent than ever. Add to this the 27 million AI engineer talent gap, and your organization has to figure out how to embrace GenAI from the inside.
But data breaches are also way up — you don’t want to have your engineering teams feeding sensitive information into the massive research project that is ChatGPT. Your company needed a generative AI policy yesterday.
And you should already be asking your developers how they are already using GenAI. You might be surprised. You likely have productivity wins already emerging, just locked behind team silos. With the majority of organizations surveyed for the 2023 State of DevOps Report already incorporating AI into at least some of their tasks, your software development team is at risk of being left behind.
That’s why The New Stack sat down with four early adopters of generative AI for engineering to learn how they are already using GenAI to increase developer productivity — and when they are avoiding it.
Developer Productivity Comes from Internal Context
Time and again we hear you want an internal generative AI policy in place; and not just because of a decreased data privacy risk — though that is important. Early adopters of GenAI accelerate time to value when the large language model is trained on your internal documentation, policies and more. Context is definitely driving early generative AI productivity wins.
A conversational interface, when trained and applied strategically to your developer experience, can help deliver the self-service promises of platform engineering. GenAI backed by natural language — How do I do X here? What security checks do I need to achieve Y? — can help onboard faster. Or an advanced individual contributor can unlock shortcuts to automate-by-command repeated and boring activity, like typing “I want to create a new parameter for AWS SSM” into where they’re already working.
As a natural language processing or NLP platform for data labelers and annotators, Datasaur is a logical early adopter of generative AI. CEO and Founder Ivan Lee told The New Stack that they kicked off usage by having a small team of engineers run a GenAI test pilot for two months.
They soon realized that “there are specific things that you want to use it for, and things that you should stay away from,” he said. By the end of the trial, GenAI had already increased developer productivity by about 10%. They quickly then held an internal generative AI training for their whole 55-person team.
They now use OpenAI’s ChatGPT and GitHub’s Copilot, both of which have their strengths and weaknesses, Lee observed. “We actually prefer Copilot. ChatGPT can perform better, but that performance increase actually doesn’t overcome the hassle of having to switch to another window or tab, so having Copilot directly where we’re already doing coding is the preferred practice for us.”
Karol Danutama, Datasaur’s vice president of engineering, uses Copilot to run internal code reviews and proofs of concept. He told The New Stack that he has also saved a lot of time — cutting that manual work by a whopping 70 to 80% — by leveraging ChatGPT for internal communication like concept explanation, generating software diagrams and documentation.
Like with all things that aim to increase developer productivity, you want to avoid context switching that takes you out of your flow state. This is where Copilot is the early frontrunner among LLMs serving the developer community as its workflow and code auto-generator is built right into your code repository, allowing it to better understand your technical context.
As a team, Joggr uses an internal version of the tool Copilot to generate some code that would otherwise slow them down to do it manually.
“Copilot at its core is basically a really fancy autocomplete for coding… It’s learning from the code base. It knows all the frameworks that exist to a certain degree and that we’re using. So it just simplifies some of the tasks that will be more mundane or manual,” Joggr CEO Zac Rosenbauer said. This includes schema validations in their server library. “They’re these huge JSON blobs that are really boring to write… those things are very painful to write. And it’s auto-completed so easily by Copilot.”
They would’ve built Joggr without the generative AI, but, he said, it enabled them to build it faster.
“There’s a market perception that we’ve heard from a lot of people that you can just slap ChatGPT on an app and it’s amazing. It’s actually a giant pain in the ass. You have to verify all of the content. You have to make sure that users aren’t doing terrible prompts — people are bad at prompting these things,” Rosenbauer said.
The team is not wasting millions trying to build their own LLM either. Joggr itself is built on top of a fine-tuned version of OpenAI, Vertex AI and Anthropic, because, as Rosenbauer says, unless you are an LLM company, you’re wasting your time creating an LLM.
Navigating Your Productivity Relationship with GenAI
“We’ve been advocating automation for a long time in DevOps,” Patrick Debois, vice president of engineering at Showpad and “godfather of DevOps,” told The New Stack. So it’s not surprising his recent obsession has been in AI and machine learning to remove developer friction. For a while, he continued, the industry has adopted “early MLOps,” for anomaly detection and predictions.
But the real breakthrough was Copilot, Debois said, when developers started to realize, “I can improve and get faster at coding some of the issues. That evolved into summarizing PRs, and it was also detecting vulnerabilities.”
Now, he argued, it’s evolved into helping to make junior engineers be productive faster. When they don’t know a certain subject, they can ask for broad examples. However, “when you are already versed in a certain subject, I see them getting less value,” he explained, like when he was coding Selenium, but the model suggested the right call from two versions ago. “The model is still going to suggest [to] me that bad call from a while ago because that was what it was trained on, but I’m more proficient.”
When you know a language or technology well, you tend to ask harder and more niche questions that these code completion tools aren’t prepared for yet.
You must always remind generative AI to only provide an answer if it knows it’s right. Because, Debois reminded us, it is trained first and foremost to respond, not necessarily admit its own fallibility. Indeed, code written by ChatGPT appears accurate but, it turns out, 52% of the time it is wrong. Of course, he pointed out that ChatGPT is a model not trained specifically to write better code, unless you are using the OpenAI code interpreter plug-ins.
“Like any tool, you need to understand the limitations and the kind of things that you use it for,” but, he continued, “if you’re not that well-versed, it’s also very hard to see whether this is good code or bad code.”
Linters are still necessary to support. And, he said, we need to always make it easier, especially for newcomers, to make mistakes and to fail gracefully, with progressive delivery and rollbacks in place, all to lessen the impact.
Code generated by AI does not seem ready for production yet, but there are other ways GenAI can boost developer productivity. One solid use case is in creating test cases and synthetic data to test with. Another, he said, is to make the AI behave like a linter, where newbies can learn at their own pace, with the LLM flagging human-made errors as you go.
“It’s almost becoming in-context learning that the machine is helping you while you’re on the learning journey,” Debois said. Just keep all personally identifiable data out of that trip.
Next is the rise of prompt engineering, he predicts, where the questions become the code: “If you ask the right questions, the model actually doesn’t have to understand the intermediate code, it will just give the answer,” like feeding the LLM the open API specification of a service and it spins back the call.
“I’m not saying this is perfect all the time, but the models are getting better. And we can think about it like, if it’s good enough that it doesn’t hurt anything, then it’s gonna save us time.” He emphasized that there are still tradeoffs. “I would not bet my life on this but in certain areas that can be corrected where the cost of impact is low, we can gain a lot more value from using those kinds of automation there.”