What news from AWS re:Invent last week will have the most impact on you?
Amazon Q, an AI chatbot for explaining how AWS works.
Super-fast S3 Express storage.
New Graviton 4 processor instances.
Emily Freeman leaving AWS.
I don't use AWS, so none of this will affect me.
Large Language Models / Open Source / Software Development

How to Use LLMs for Dynamic Documentation

Some explanations should be written by code authors. Others may best be generated on the fly by LLM-assisted code readers.
Oct 12th, 2023 5:00am by
Featued image for: How to Use LLMs for Dynamic Documentation
Image via Unsplash

The other day I revisited a SQL query I’d written a while ago, in order to adapt it to the latest version of the GitHub plugin. The minor adjustments took longer than they should have. Why? I’d forgotten how the query works! And of course, I hadn’t written any comments; it seemed simple at the time, right?

Here’s the original query. It reports recent commits for a set of Steampipe plugins.

Here’s how it works. If you fill in the $1 and $2 parameters with the values turbot/steampipe and 2023-10-04, the with repos CTE (common table expression) runs this query.

github_search_repository is a table, provided by the GitHub plugin, that encapsulates GitHub’s syntax for searching repositories. When you write a SQL query that uses the table, you’re required to provide a WHERE clause with a qualifier called query and a value that is a piece of query syntax for searching GitHub repositories.

At the moment, this query returns 208 rows like this.

The query column contains another piece of GitHub search syntax. In this case, it’s the required qualifier for a different Steampipe table, github_search_commit, which encapsulates the GitHub syntax for searching commits. Here’s a standalone version of a query against this table.

So we’ve got two different query qualifiers in play.

used in table example qualifier
the with repos CTE github_search_repository where query = ‘turbot/steampipe in name’
the body of the main query github_search_commit where query = ‘author-date:>2023-10-04 repo:turbot/steampipe’

Now let’s focus on the body of the main query.

There are no WHERE clauses in play here but the same qualifiers are required; they can also be provided by way of a JOIN.

So there are two levels at which GitHub search syntax is dynamically built.

  1. to find GitHub repositories matching steampipe-plugin
  2. for each of the 200+ repositories thus found, to find commits after a given date

Now that I’ve written out the explanation, it hardly seems simple at all. If you could only write a single comment to help clarify things, what would it be?

The Most Important Single Comment

When I first showed the query to ChatGPT it wasn’t to ask for documentation, but rather to resolve a name conflict I’d introduced when adapting to some changed column names in the new version of the GitHub plugin. Nevertheless, the LLM provided two critical improvements — unrelated to those column changes — that I hadn’t even asked for.

– ‘author-date:>’ || $2|| ‘ repo:’ || g.name_with_owner as query
+ ‘author-date:>’ || $2|| ‘ repo:’ || g.name_with_owner as commit_query– and r.query = c.query
+ and r.query = c.commit_query — use constructed commit_query

That was exactly right. The query column produced by the CTE shouldn’t have shared the same name as the query qualifier used in the body of the query. This wasn’t a syntactic problem but it was a conceptual disaster. Renaming the CTE’s query column to commit_query signals that it’s distinct from the query qualifier: a different kind of thing that powers the github_search_commit query.

This is the kind of tacit knowledge transfer discussed in Learning While Coding: How LLMs Teach You Implicitly. It was a bonus outcome that I didn’t even ask for.

I think this is a powerful prompt:

If you could add just one comment to clarify this uncommented code, what would it be? And why?

I put the question to both Copilot Chat and Sourcegraph Cody.


“If I were to add just one comment to the query to clarify how it works, I would add a comment above the WITH clause to explain what it does.”

Bzzzt. Wrong answer!


“If I were to add one explanatory comment to this query, it would be here:

The reason I would add a comment here is because this line is constructing a custom search query for commits for each repository.

This is a key part of the query — it allows searching commits within the date range specifically for each repo.

Without this dynamic construction, we wouldn’t be able to filter commits effectively per repo.

So adding a quick comment explaining that this line is creating a custom commit search per repo helps clarify why it is needed and how it works with the rest of the query to connect repos to matching commits.”

Exactly right. Well done Cody!

Dynamic Documentation

I understand why programmers often resist writing comments. Laziness? Sure, but sometimes laziness is a virtue. The computer doesn’t care about comments, it only cares about code, so if you can write code in a way that works for the computer while also helping the programmer understand the code, that’s a great start. The code itself is already a maintenance burden; comments that can easily drift from the source of truth become an additional burden.

Still, I’ve always been sympathetic to the idea of literate programming. A year ago, in Literate Programming in Go, I described a project to improve the documentation of the Steampipe plugin SDK. It was interesting to learn that Go source code can embed wiki-like documentation, and we made some headway toward enhancing the otherwise uninspiring docs generated from Go source. But we haven’t made further progress, and I’m keen to take another run at it in the LLM era.

For starters, I’d revisit the overview docs that we wrote to explain the high-level components and concepts. For that, I think Cody will serve us well, once it has indexed For these overview docs, it will be worthwhile to iterate with LLMs to create documentation that will become a permanent part of the code and will be maintained accordingly.

But for function- and line-level comments, I’m now wondering whether a dynamic approach may sometimes (or often!) turn out to be best. I have never found mechanically-generated function-level comments to be particularly useful. But we have new machines to partner with now. Will the comments they can write dynamically be useful enough to avoid baking in function- and line-level documentation that’s liable to drift from the source of truth?

This isn’t an either/or question. We’re entering a phase in which both methods will coexist. But my hunch is that we’re about to see a fascinating new twist on the old idea of literate programming. Some explanations can, will, and should be written by code authors alone, or by those authors in partnership with LLMs. Others can, will, and should be conjured dynamically by code readers who ask LLMs for explanations on the fly.

Group Created with Sketch.
TNS owner Insight Partners is an investor in: Sourcegraph.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.