TNS
VOXPOP
What news from AWS re:Invent last week will have the most impact on you?
Amazon Q, an AI chatbot for explaining how AWS works.
0%
Super-fast S3 Express storage.
0%
New Graviton 4 processor instances.
0%
Emily Freeman leaving AWS.
0%
I don't use AWS, so none of this will affect me.
0%
API Management / Large Language Models / Software Testing

Let’s Talk: Conversational Software Development

Asking LLMs to write code is a life-changer, but so is talking to them about the process. Jon Udell continues to explore LLMs for coders.
Oct 27th, 2023 10:11am by
Featued image for: Let’s Talk: Conversational Software Development
Image via Unsplash

Here’s the prompt that kicked off what turned into a very successful interaction with ChatGPT.

PROMPT:

I’ll give you a test script and the output it is testing, and ask about strategies to improve the test, OK? Here’s the script.

A Passing Observation

The prompt was an intentionally naive version of the bash script I envisioned. I had intended to also provide sample output in the initial prompt but forgot to do so. In hindsight, it was probably a good idea to withhold the output and let ChatGPT focus first on the script. Among its observations:

RESPONSE:

While you’ve used set -eu to exit on any error or unset variable usage, consider using set -euo pipefail to also capture failures in any part of a pipeline.

This was a nice example of Learning while coding: How LLMs teach you implicitly. I’d had some understanding of what I think of as strict mode in bash, but it was incomplete. So we paused to explore that topic for a while, I solidified my knowledge, and made a note to consider using other -o arguments — notably -x (xtrace) and -n (noexec) for debugging. We had barely gotten started on the actual task at hand, but this small detour was intrinsically valuable.

Let’s Talk About the JSON

I now circled back to provide the sample output:

PROMPT:

“tickets_list”: “[{\”associates_rdb\”:null,\”association_type\”:null,\”cc_email\”:{\”cc_emails\”:[],\”fwd_emails\”:[],\”reply_cc\”:[],\”tkt_cc\”:[]},\”created_at\”:\”2023-10-12T12:15:00+01:00\”,\”custom_field\”:{\”cf_reference_number_2939992\”:null},\”datetime_tc01\”:null,\”datetime_tc02\”:null,\”datetime_tc03\”:null,\”deleted\”:false,\”delta\”:true,\”description\”:\”world\”,\”description_html\”:\”\\u003cdiv\\u003eworld\\u003c/div\\u003e\”,\”dirty\”:0,\”display_id\”:17,\”due_by\”:\”2023-10-16T22:00:00+01:00\”,\”email_config_id\”:153000020300,\”frDueBy\”:\”2023-10-12T22:00:00+01:00\”,\”fr_escalated\”:true,\”group_id\”:null,\”id\”:153002218563,\”int_tc01\”:null,\”int_tc02\”:null,\”int_tc03\”:null,\”int_tc04\”:null,\”int_tc05\”:null,\”internal_agent_id\”:null,\”internal_group_id\”:null,\”isescalated\”:false,\”json_tc01\”:null,\”long_tc01\”:null,\”long_tc02\”:null,\”long_tc03\”:null,\”long_tc04\”:null,\”long_tc05\”:null,\”nr_due_by\”:null,\”nr_escalated\”:false,\”nr_reminded\”:false,\”owner_id\”:null,\”parent_ticket_id\”:null,\”priority\”:1,\”priority_name\”:\”Low\”,\”product_id\”:153000018092,\”requester_id\”:153002362554,\”requester_name\”:\”Jon Udell\”,\”requester_status_name\”:\”Open\”,\”responder_id\”:null,\”responder_name\”:\”No Agent\”,\”sl_escalation_level\”:null,\”sl_manual_dueby\”:null,\”sl_merge_parent_ticket\”:null,\”sl_product_id\”:null,\”sl_skill_id\”:null,\”sl_sla_policy_id\”:null,\”sla_state\”:0,\”source\”:2,\”source_name\”:\”Portal\”,\”spam\”:false,\”st_survey_rating\”:null,\”status\”:2,\”status_name\”:\”Open\”,\”subject\”:\”hello\”,\”ticket_type\”:null,\”to_email\”:null,\”to_emails\”:null,\”trained\”:false,\”updated_at\”:\”2023-10-12T22:05:04+01:00\”,\”urgent\”:false},{\”associates_rdb\”:null,\”association_type\”:null,\”cc_email\”:{\”bcc_emails\”:[],\”cc_emails\”:[],\”fwd_emails\”:[],\”reply_cc\”:[],\”tkt_cc\”:[]},\”created_at\”:\”2023-10-12T04:41:12+01:00\”,\”custom_field\”:{\”cf_reference_number_2939992\”:null},\”datetime_tc01\”:null,\”datetime_tc02\”:null,\”datetime_tc03\”:null,\”deleted\”:false,\”delta\”:true,\”description\”:\”Hello there, Our Report metrics over the last week is at zero and can’t be correct? Are you facing any issues?\”,\”description_html\”:\”Hello there, Our Report metrics over the last week is at zero and can’t be correct? Are you facing any issues?\”,\”dirty\”:0,\”display_id\”:6,\”due_by\”:\”2023-10-16T22:00:00+01:00\”,\”email_config_id\”:153000020300,\”frDueBy\”:\”2023-10-12T22:00:00+01:00\”,\”fr_escalated\”:false,\”group_id\”:153000077019,\”id\”:153002214584,\”int_tc01\”:null,\”int_tc02\”:null,\”int_tc03\”:null,\”int_tc04\”:null,\”int_tc05\”:null,\”internal_agent_id\”:null,\”internal_group_id\”:null,\”isescalated\”:false,\”json_tc01\”:null,\”long_tc01\”:null,\”long_tc02\”:null,\”long_tc03\”:null,\”long_tc04\”:null,\”long_tc05\”:null,\”nr_due_by\”:null,\”nr_escalated\”:false,\”nr_reminded\”:false,\”owner_id\”:null,\”parent_ticket_id\”:null,\”priority\”:1,\”priority_name\”:\”Low\”,\”product_id\”:153000018092,\”requester_id\”:153002465401,\”requester_name\”:\”Adam Doe\”,\”requester_status_name\”:\”Open\”,\”responder_id\”:null,\”responder_name\”:\”No Agent\”,\”sl_escalation_level\”:null,\”sl_manual_dueby\”:null,\”sl_merge_parent_ticket\”:null,\”sl_product_id\”:null,\”sl_skill_id\”:null,\”sl_sla_policy_id\”:null,\”sla_state\”:0,\”source\”:1,\”source_name\”:\”Email\”,\”spam\”:false,\”st_survey_rating\”:null,\”status\”:2,\”status_name\”:\”Open\”,\”subject\”:\”Issues with reports\”,\”ticket_type\”:\”Bug\”,\”to_email\”:null,\”to_emails\”:null,\”trained\”:false,\”updated_at\”:\”2023-10-12T04:41:13+01:00\”,\”urgent\”:false}]”
}

In another context, I might have taken the time to pretty-print this JSON output. But LLMs — like compilers and interpreters — don’t care about that. Even better, they tolerate syntax that chokes conventional JSON formatters. If you paste the above example into a conventional formatter you’ll run into two problems. First, it won’t be valid JSON until you wrap it in curly braces. Second, the value of the tickets_list won’t be a JSON object, it’ll be a string, because the value is a stringified representation of JSON. LLMs, quite wonderfully, “know” your intent and do the right thing:

RESPONSE:

 

This is trivial stuff, but it’s the kind of thing that can kill your flow and waste cognitive resources best reserved for higher-order functions. At this point I also conducted what I’ll call an explainability test: Did the LLM explain that 1) the fragment needs to be wrapped in curlies, and 2) the nested value needs to be parsed? Here are the results.

tool llm provider explains that fragment needed {} explains that tickets_list needed to be parsed
chatgpt openai no yes
copilot chat openai no yes
claude anthropic no no
cody anthropic no no

This interests me because, although I know that Copilot Chat uses OpenAI and Cody uses Anthropic, that’s not always obvious. The respective chat wrappers augment prompts in ways that make responses differ from those you get from the underlying LLMs. In this example, though, the tools seem to align with the behavior of their wrappers.

Let’s Talk About the Test Strategy

ChatGPT’s observation about set -euo pipefail was one of a half-dozen avenues we could have explored, but I steered it back to the task at hand: refining the test script I included in the initial prompt. The output it tests is from the Freshdesk ticketing system, and the naive test just checked to see that the tool we were testing could call Freshdesk’s API and return more than zero tickets. What would be a better test? ChatGPT’s recommendations included checking to see that priorities and statuses match expected values, that null or long lists of tickets are handled gracefully, and that all date fields exist and contain valid dates.

I opted for the last approach, and it revised the test accordingly. This entailed writing a regex to match dates, and looping through tickets applying the regex to each of a set of date fields in each ticket. That felt too complex, so I asked to limit the scope to just a single date field: created_at. That still felt too complex, so we ditched the loop and whittled the test down to just check for a valid date in a single ticket. Then we moved onto the next topic of conversation.

Let’s Talk About How to Generalize to a Suite of Tests

These were the cases to test:

  • List Freshdesk tickets.
  • Create a new Freshdesk ticket.
  • Delete a Freshdesk ticket.
  • Retrieve details of a specific Freshdesk ticket.
  • Update a Freshdesk ticket.
  • Add a note to a Freshdesk ticket.

I’ve rarely used bash functions, but I figured they might be useful here, so I was pleased by a response that structured the bash script as a set of six functions corresponding to the six test cases. “Looks good,” I said, “please give me a complete version that is structured this way and includes the first test.”

At this point a coworker, looking over my shoulder, wondered if using “please” and “thank you” with LLMs yields better results. Beats me! I just do it because it feels good. Is that a dangerous bit of anthropomorphism? Maybe. But I can’t see any upside to bullying my assistants even when they are nonhuman.

The same coworker then asked about an alternative strategy that would drive the tests from a table. We asked:

PROMPT:

We are wondering if it would be better to have a data structure for these tests instead of a set of loose functions, what do you think are the pros and cons?

The response mentioned pros (maintainability, reusability, parallel execution, consistent logging, clarity) and cons (overhead, complexity, migration time, learning curve). It concluded with:

RESPONSE:

Given that your test suite might grow or change over time, adopting a structured approach earlier can be beneficial. However, it’s essential to weigh the initial time investment against the long-term benefits. If you think your test suite will remain relatively small and static, the current loose function setup might suffice. But if you anticipate growth or frequent changes, the structured approach will pay off in the long run.

As we were in a phase of rapid prototyping, and not yet prepared to make a long-term investment in this test strategy, we opted to stick with loose functions. But we both felt it had been valuable to spend a few minutes considering the alternative approach.

Let’s Talk About Persisting the Ticket ID

The next test to write was for creating a ticket. ChatGPT proposed storing the ticket’s id for use in other tests of the ability to read, update, and add a note to a ticket. And it wrote a function that persisted the id to a file. I wondered about that, and asked for an assessment of the tradeoffs between saving it in a file versus a variable. Here’s a summary of the response, revised (courtesy of ChatGPT) into tabular form.

saving in a file saving in a variable
pros cons pros cons
Persistence Speed Speed Volatility
Inter-Process Communication Complexity Simplicity Limited to One Script/Process
Large Data Volumes Clean Up No Filesystem Dep Memory Constraints

And it added this conclusion:

RESPONSE:

If you’re writing a single script, and the data doesn’t need to persist beyond the script’s execution, a variable might be simpler and faster.

The alternative file-based approach hadn’t even occurred to me. When ChatGPT took that approach it prompted me to consider, and then discuss, the tradeoffs. Since I wound up doing exactly what I would have done pre-LLM, was this detour a pointless distraction? I don’t think so. It’s always valuable to consider alternative approaches. The detour cost very little time and, while it didn’t change what I ultimately did, it felt valuable.

We had a similar interaction when writing the test for the get_ticket function. Would it be sufficient to check that the ID of a retrieved ticket matched the one that we stored? ChatGPT drew a distinction between efficiency and thoroughness, laid out options for a more thorough check, and suggested that for basic smoke-testing it makes sense to prioritize efficiency. Again, I’d already have done the simpler thing, so this didn’t change anything. But even though I knew that of course we were in smoke-testing mode, saying it out loud seemed helpful.

Let’s Talk About Bash

Here’s an interaction that did very much change the outcome. We added a third test to the suite, but only the first two ran. What could be wrong? The debugging proceeded as it would have pre-LLM, but much faster because ChatGPT can sprinkle print statements into code much faster than I can. After a few iterations of this brute-force approach, we still couldn’t get past the second test.

It then occurred to me to ask: “Would it make sense to turn off set -euo pipefail?” When I did that the script then ran to completion — that is, all six tests ran — but the second test was failing when it should have succeeded. Then the penny dropped. I asked: “Should we be returning $TICKET_ID from test_create_ticket?” ChatGPT reminded me of a behavior I’d read about but never encountered:

RESPONSE:

In the context of bash scripts, when you return a number from a function, it’s analogous to an exit code.

Facepalm! Of course! We needed the function to set the global variable with the value of the created ticket, but return zero to avoid a premature exit while enabling strict mode to remain in effect. That was the final breakthrough; after that, it was smooth sailing.

When the Rubber Duck Talks Back

I keep coming back to the theme of the first article in this series: When the rubber duck talks back. Thinking out loud always helps. Ideally, you get to do that with a human partner. A rubber duck, though a poor substitute, is far better than nothing.

Conversing with LLMs isn’t like either of these options, it’s something else entirely; and we’re all in the midst of figuring out how it can work. Asking an LLM to write code, and having it magically appear? That’s an obvious life-changer. Talking with an LLM about the code you’re partnering with it to write? I think that’s a less obvious but equally profound life-changer.

Group Created with Sketch.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.