How Large Language Models Assisted a Website Makeover
A major project for me in these past few weeks has been a website makeover that combines two pre-existing sites and adds new marketing literature. I’ve used my team of Large Language Model (LLM) virtual assistants to help with both coding and writing tasks. I’ll discuss coding here, and how the assistants helped with the writing of prose next time.
One of the legacy sites used title-case: capitalize most words, excluding words like “a,” “an,” “the,” “but” and proper nouns. The other site used sentence-case: capitalize only the first word of the title. In reality, neither rule was followed with 100% consistency.
We opted for sentence-case. With 250 titles to consider, this presented a common challenge. Would it be faster to just make the changes manually, or write a script to automate the transformation? Historically I’ve bet the script would be a net time-saver, and to be honest, sometimes lost that bet. It’s interesting and fun to build automation, manual editing is tedious drudge work, so there’s a counterproductive bias at play.
Nowadays, though, my toolkit includes LLM assistants that I figured could improve my odds. I began with a woefully underspecified prompt along the lines of: “Here are titles, please sentence-case them.” The LLMs aim to please, so they immediately began writing scripts that were easy to validate by eyeballing the mappings they produced. Easy validation has emerged as a guiding principle: you have to check results, and if that’s a slow or difficult task, you’ll lose your bet.
After detours through various Python libraries, including spaCy — in an abandoned effort to recognize named entities — we fumbled our way to a 90% solution. Then, sensing diminishing returns, I finished the job by hand. Although this wasn’t the fastest route to the solution, I don’t think an unassisted effort would have gotten there any quicker. And had I done it that way, I wouldn’t have taken a rapid guided tour through some libraries that might come in handy another time.
With the mapping in hand, I just needed a script to march through the files and apply the transformations. LLMs shine when it comes to writing simple scripts that, sure, I can write myself, but at the cost of time and attention I’d rather invest in higher-order tasks. We’ve always used throwaway scripts to knit solutions together, and I don’t think that will (or should) change. If anything, I’m hoping LLMs will help democratize scripting — again, subject to the constraint that the scripter can verify results easily, quickly, and confidently.
Now let’s fast-forward to a much more interesting prompt that I wrote after completing the exercise.
Here is a sample of document names and titles, stored in a file called titles.txt. The format is:
2015-04-kms-integration.md:title: “Enterprise guardrails for AWS Key Management Service”
Write a script to convert the titles to sentence-case:
– initial cap
– preserve all-uppercase acronyms
– preserve an enumerated set of capitalized phrases
Here are phrases to preserve:
Key Management Service
Here are tests it should pass.
(‘2015-03-turbot-initialized.md’, ‘Turbot Initialized’):’ Turbot initialized’
(‘2015-04-kms-integration.md’, ‘Enterprise guardrails for AWS Key Management Service)’: ‘Enterprise guardrails for AWS Key Management Service’
(‘2022-06-guardrails-hipaa-compliance-controls.md’, ‘HIPAA Compliance Controls’): ‘HIPAA compliance controls
(‘2022-08-guardrails-quick-actions.md’, ‘Turbot Guardrails Quick Actions’): ‘Turbot Guardrails Quick Actions’
(‘2016-07-nist-800-53-controls.md’, ‘NIST 800-53 Controls’): ‘NIST 800-53 controls’
(‘2022-02-pipes-for-audit-readiness.md’, ‘Pipes for Audit Readiness’): ‘Pipes for audit readiness’
Recruiting GPT-4 Code Interpreter
This made for a great first trial of the GPT-4 code interpreter model, which runs the code that it writes and iterates autonomously toward a solution. My experience matched what AI expert Simon Willison describes in this podcast:
[01:32:42] And in fact, when it wrote the code, I watched it make the exact same mistakes I would’ve made, like getting off one-off by one errors and all that kind of thing. And then it output the results and was like, oh, I made a mistake. I should fix that. So it pretty much did, it did. Wrote the code the exact way I would’ve written the code, except that it churns through it really quickly and I just got to sit back and watch it do its job.
Here’s an intermediate iteration of the sentence-case function that GPT-4 wrote.
We see the LLM notice the very kinds of errors that I’d fumbled through initially. And it’s using tests that it built — from the test data I supplied — to spot the errors. I’ve tried before to get that to happen, by feeding test outputs back into the loop, but never got good results. Even with this massively better prompt, Cody and Copilot struggled to write code that could pass the tests.
The GPT-4 code interpreter model still needed some prodding, but it did get there. On a toy problem, admittedly, but there are a lot like it that chew up time and attention. If we can solve them quickly and reliably, we can focus on bigger problems where, I hope, we’ll also benefit from automation of the generate/test cycle.
A Chorus of Stochastic Parrots
Although we were already using a link-checking tool, I wanted to double-check and was curious how quickly and easily I could put together a simple checker with the help of my team. The tool came together nicely and, while using it, I wondered about the headers returned by the server. When I asked my team to explain them, they provided an interesting variety of explanations.
In Choral Explanations, Mike Caulfield describes how the process of question-answering on sites like StackExchange and Quora delivers a range of answers from which readers can synthesize an understanding.
These “choral explanations”
1. combine to push me to a deep understanding no single explanation can, and
2. give me multiple routes into the content
My team of stochastic parrots can produce that effect. If Copilot says “any origin is allowed to access the resource” I might wonder how “origin” is defined. When Cody adds “cross-origin requests from any domain” I can relate “origin” to “domain”. And GPT-4 connects these concepts to CORS. You don’t always want this effect, often you’re looking for a single best answer, but when you’re trying to learn about a topic, a chorus of explanations can be very helpful.
When to Silence the Chorus
One final task was to find a set of small images that had to be restyled. My team of assistants helped me put together a basic script to scan the source tree for images, then quickly iterate through a few different approaches to extracting image dimensions. But the transformations needed to produce links to the pages that contain those images proved fiddly, and in this case the chorus was more like a cacophony.
In the end, after spending too much time trying different and unsatisfactory approaches, I benched the team and completed the task myself. As with all technologies that augment human intelligence, there’s a real risk of atrophy. Wayfinding without GPS is becoming a lost art, and coding without LLMs is heading in the same direction.
Ideally, our assistants will free us from low-level details so we can focus on higher-order reasoning, and often that’s what happens. But just as it’s sometimes useful to turn off your phone and navigate by dead reckoning, it’s also important to know when to silence the chorus of coding assistants.