TNS
VOXPOP
Where are you using WebAssembly?
Wasm promises to let developers build once and run anywhere. Are you using it yet?
At work, for production apps
0%
At work, but not for production apps
0%
I don’t use WebAssembly but expect to when the technology matures
0%
I have no plans to use WebAssembly
0%
No plans and I get mad whenever I see the buzzword
0%
AI / Large Language Models / Software Development

7 Guiding Principles for Working with LLMs

Generative AI has revolutionized programming. Based on his own experience, Jon Udell codifies how to partner effectively with LLM assistants.
Jan 10th, 2024 9:03am by
Featued image for: 7 Guiding Principles for Working with LLMs
Photo by DESIGNECOLOGIST on Unsplash.

In Seven ways to think like the web, I codified a set of patterns that emerged from my experience as a web-first writer and software developer. The idea was (and is) that the web’s architecture — described by Roy Fielding as an internet-scale distributed hypermedia system — exhibits a grain that you want to align with. You don’t need a low-level understanding of TCP/IP or HTTP to work with the grain of the web, but you do need to develop intuitions about higher-order constructs: hyperlinks, structured and unstructured data, reusable components and services, the publish-and-subscribe communication pattern.

Likewise, you don’t need a low-level understanding of the neural networks at the core of large language models in order to align with the grain of that architecture. Although I can’t explain how LLMs work — arguably nobody can — I’m able to use them effectively, and I’ve begun to codify a set of guiding principles. Here’s my list:

  1. Think out loud
  2. Never trust, always verify
  3. Use a team of assistants
  4. Ask for choral explanations
  5. Outsource pattern recognition
  6. Automate transformations
  7. Learn by doing

1. Think out Loud

In When the rubber duck talks back I described one of the best ways to use LLMs: just talk to them! The term “rubber duck” comes from software engineering:

Rubber duck debugging (or rubberducking) is a method of debugging code by articulating a problem in spoken or written natural language. The name is a reference to a story in the book The Pragmatic Programmer in which a programmer would carry around a rubber duck and debug their code by forcing themselves to explain it, line by line, to the duck. — Wikipedia

When programming, I talk to LLMs about code idioms, libraries, and configuration strategies. Today, for example, I was helping someone who was struggling to run Steampipe in a container. This turned into an opportunity for me to learn more about bind mounts and volume mounts in Docker, a topic I’d been unclear on. After a bit of research, I gathered that it makes sense to use bind mounts for config files that I might want to change while the container is running, and volume mounts for the database files (because that’s more efficient for files you won’t be editing).

In the pre-LLM era that would have remained an internal train of thought because, while it’s always valuable to narrate your work, talking to yourself feels awkward. Why not just talk to colleagues? Of course you can and should, but you need to be mindful of when and how often to interrupt their flow. Now I can bounce ideas off my team of always-available LLM assistants. That would be useful even if they were simply mute rubber ducks, but they aren’t mute: they respond in ways that help me validate, refute, or clarify an idea.

This principle applies more broadly than just to technical topics. When I’m writing prose, I now often talk to LLMs. For example, as I wrote the lead for this article I struggled with the “grain” metaphor. It felt right but also perhaps a bit clichéd. Instead of just thinking that, I said it to both ChatGPT and Claude. Neither found it problematic and also neither came up with a compelling alternative, so I decided to stick with the metaphor. Even when the LLMs don’t say anything useful, they encourage you to think out loud.

2. Never Trust, Always Verify

This rule is easiest to apply in technical domains. Nowadays I use LLMs to write lots of little convenience scripts that simplify or automate chores. Recently, for example, I was making a screencast and needed a bit of JavaScript that would scroll smoothly through a long list of items on a web page I was demonstrating. In the Before Time that would have entailed a tradeoff: was the benefit worth the effort to write the code? Now I just ask for the code, it often works straightaway or with minor tweaks, but the outcome is easy to verify: either it works or it doesn’t.

Of course, there is a more rigorous way to verify software: write tests that prove it does what you expect. I’m surely not the only one who’s skimped on tests in the past. Now I’m more likely to use tests as a way to verify LLM-written code — it’s a great incentive! In some cases, you can even put an LLM into an autonomous self-directed loop and watch it iterate toward a solution that passes your tests, which is both mind-boggling and wildly entertaining.

In other domains, it’s harder to formalize the verification of LLM output. When writing an article I’ll often ask my AI assistants to propose variations on my headline and subhead, but there’s no right or wrong answer — it’s a subjective call about what’s better or worse.

I have never relied on an LLM for factual information, but if you do, obviously you should check your facts. Some may be right or wrong, others are open to interpretation. Either way, there’s no substitute for human judgment. And it never hurts to push LLMs to cite their sources. Historically they couldn’t, but as they gain the ability to augment their training data with live web search it becomes more feasible to ground their responses in sources you can check.

3. Recruit a Team of Assistants

I regularly use both ChatGPT and Claude, as well as coding assistants that rely on one or another of those engines. Which is better? It depends! In a given situation, any of my assistants may turn out to be the one that solves a technical problem or provokes a valuable insight. And it’s often useful to compare results. When I wanted to evaluate the tradeoffs between two alternate solutions to a technical problem, I invited my whole team of assistants to weigh in on the pros and cons. The consensus that emerged wasn’t binding, and I ultimately decided to override it, but the fact that there was a consensus — backed by several complementary rationales — helped me sharpen my thinking and justify my decision.

On the non-technical front, I recently gave my assistants a long list of books that I’ve enjoyed and asked for recommendations. Here was the prompt.

  1. Don’t include books by authors already listed, I’ve likely either read them or know about them and decided not to include them.
  2. Don’t apply the “more like this” rule, I am looking for books, or genres, or topics, that will interest me but are not obviously related to books on this list.
  3. Do surprise me with delightful and thoughtful books that I will enjoy.

I appreciated the diversity of responses. There was consensus here too: both ChatGPT and Claude suggested Sy Montgomery’s The Soul of an Octopus. As it happens I’ve already read the book (though I’m now inclined to reread it), and it shouldn’t have been a suggestion because my list included another Sy Montgomery book. See rule 2: never trust, always verify!

4. Ask for Choral Explanations

In Choral Explanations, Mike Caulfield describes how the process of question-answering on sites like StackExchange and Quora delivers a range of answers from which readers can synthesize an understanding. If you follow rule 2 (Use a team of assistants) you can ask for a chorus of explanations on any topic. In this example, I wanted to know more about the HTTP headers returned by a web server. When I presented the headers to Sourcegraph Cody, GitHub Copilot Chat, and ChatGPT, and asked for a summary with explanations, each answered in slightly different ways. The differences — with respect to which items they chose to summarize, and how they explained them — were instructive, and gave me a better grasp of terms, concepts, and relationships than any single explanation could have.

You can also ask an individual LLM for a chorus of explanations by using Wired’s 5 levels formula.

In 5 Levels, an expert scientist explains a high-level subject in five different layers of complexity — first to a child, then a teenager, then an undergrad majoring in the same subject, a grad student and, finally, a colleague.

This works well for technical topics, but also more generally. For example:

My financial planner is proposing that I move some funds into annuities. Please explain the pros and cons as if to:

  • a ninth-grader
  • a college senior
  • a mid-career professional
  • a late-career professional
  • a financial planner

It’s long been possible to cobble together explanations from a variety of sources, but never so easily or with such granular control over the level of explanation. Subject to rule 2 (Never trust, always verify), you can gain a lot by asking one or several LLMs to provide a chorus of explanations.

5. Exploit Pattern Recognition

Both humans and LLMs are powerful pattern recognizers, in complementary ways. We easily detect patterns that underlie narrative arcs, abstract analogies, and emotional intelligence. But we struggle to apprehend patterns in data that LLMs can easily spot. Partnering with LLMs is a powerful way to mitigate that human weakness.

In one case, a colleague and I were stumped when we couldn’t load a CSV file into a Steampipe table. None of the CSV validators found any syntactic problem with the data, but ChatGPT noticed an anomaly: there were two columns with the same name. Excel doesn’t mind that, but Postgres does. We’d eventually have figured it out, but the duplicate-column pattern that wasn’t obvious to us was obvious to the LLM.

In another case, a community member was having trouble running Steampipe in a container. The problem turned out to be misuse of the --mount argument to the docker run command. There are two flavors: bind mount (which uses a host path) and volume mount (which uses a logical name). Being unfamiliar with those options, it wasn’t immediately obvious to me that the failure was due to a mixup between host paths and logical names. But ChatGPT saw it right away.

And here’s a non-technical example. I gave each of my assistants the list of books that I’d used to prompt them for recommendations and asked them to group the books by category. I then asked them to recommend books in each category. This task didn’t require an inhuman ability to notice low-level details in data, but did benefit from an inhuman ability to quickly and comprehensively detect patterns that I could have adduced myself with a lot more effort.

Our information diet includes many kinds of structured, semi-structured, and unstructured data. It’s still our job to make sense of that data. Doing so requires recognizing various kinds of patterns in the data, and for that LLMs are powerful allies.

6. Automate Transformations

A distressing amount of what we call knowledge work entails rote transformation of stuff from one format to another: an HTML document needs to become Markdown (or vice versa), and a JSON format needs to be converted to CSV (or vice versa). It’s death by a thousand cuts when people spend hours pushing characters around in editors in order to effect these transformations. The LLM pattern recognition superpower extends to pattern-based transformation and can enable us to spend those hours more productively on the higher-order intellectual tasks for which these transformations are just entry-level requirements.

In one case I needed to transform a complex table provided in a Google Doc into the corresponding JSON structure required to render the table on a web page. In the Before Time that would have required a lot of tedious manual editing. ChatGPT couldn’t do the whole transformation in a single pass but was successful when I gave it individual sections of the table, each exhibiting a distinct pattern. As always, I applied rule 2 (Never trust, always verify) to check the results and then make minor corrections. But that was a trivial amount of effort compared to what a manual transformation would have required.

In another case, I used an LLM to transform the raw logs of a test script into a summary that deduplicated and categorized the events in the log. It would have been onerous to fully verify that result, but that wasn’t really necessary for my purpose — I only needed a rough sense of the categories and number of events in each.

These are mundane uses of LLMs, and that’s the point. We spend far too much time and effort on these mundane tasks. Let’s outsource the grunt work and put that time and effort to better use.

7. Learn by Doing

A friend recently said: “ChatGPT is the best teacher I think I’ve ever had. I have a personal project that involves ordinary differential equations. Those were terrifying in college. Never took that class. Now I can ask questions, iterate on answers, explore associations. No judgment, no Fs, no grades.”

LLM feedback is a great way to learn on-demand as you work on projects. Because you acquire knowledge in task-oriented teachable moments, learning isn’t prospective — it’s immediate and tangible. In this column, I showed how LLMs enabled me to learn just enough about a JavaScript framework to make headway on a project.

As you apply explicit learning you’re also likely to acquire related knowledge tacitly. In this column, I cited examples of such tacit learning: LLMs teaching me about tools and techniques that I didn’t know I needed to know. That happens when we learn from other people who unconsciously deliver tacit as well as explicit instruction, and it’s an ideal way to learn. But we need to regulate the demands we make on others’ time and attention. With LLM copilots always available to monitor and react to our ongoing knowledge work, we can learn effectively while doing the work. And knowledge acquired just in time, in task context, is knowledge that’s likely to stick.

Rules for Robot Partners

Barely a year into the LLM era we’re all discovering what AI assistants can do, and how to make best use of their talents. As features and capabilities continue to emerge, we need to develop general principles to guide us through this epochal shift. These seven guidelines won’t be the last word on the subject, but they’ve served me well so far and may help you navigate a world in which we routinely partner with AI assistants.

Group Created with Sketch.
TNS owner Insight Partners is an investor in: Docker, Sourcegraph.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.