Favorite Social Media Timesink
When you take a break from work, where are you going?
Video clips on TikTok/YouTube
X, Bluesky, Mastodon et al...
Web surfing
I do not get distracted by petty amusements
Data / Large Language Models / Python

Learn by Doing: How LLMs Should Reshape Education

The path toward hands-on autonomous learning is through large language models (LLMs). Jon Udell shows how the EdTech sector can use AI.
Jan 22nd, 2024 7:16am by
Featued image for: Learn by Doing: How LLMs Should Reshape Education
Photo by Kenny Eliason on Unsplash   

Today my friend Greg Wilson, who was translating a tidyverse/Pandas idiom to SQL, asked me the following question:

OK, what’s wrong with my SQL?

The result doesn’t show 0 for (Bob, clean). Why not?

I relayed the question to my lead assistant, ChatGPT, and here was the response:

ChatGPT: Your SQL script mostly looks correct, but there’s an issue with the final query that you’re using to combine the tables. Let’s break it down:

Table Creation and Insertion: Your script correctly creates the tables person, job, and did, and inserts data into them. This part seems fine.

Final Query for Combined Data: You’re attempting to combine data from these tables to show which person does which job. However, the query you’ve written using INNER JOIN won’t include combinations where a person doesn’t have a job (like Bob with “clean” in your example). This is because INNER JOIN only includes rows where there is a match in both tables.

To include all combinations of persons and jobs, even when a person doesn’t have a particular job, you should use a CROSS JOIN to create a Cartesian product of person and job, and then a LEFT JOIN with the did table. This way, you will get all combinations, and you can then count the occurrences.

Here is the corrected query:

This query will list every combination of person and job, and count how many times each person has each job. For combinations not present in the did table (like Bob with “clean”), it will show 0.

Rule 2: Never Trust, Always Verify

Mindful of rule 2 (see 7 Guiding Principles for Working with LLMs), here was my next move:

Jon: Sounds good, please write a test to prove it works.

What happened next is something I’ve mentioned before, in How Large Language Models Assisted a Website Makeover: ChatGPT wrote and ran the test. It’s mind-blowing if you haven’t seen this happen yet, and my sense is that many folks have not.

ChatGPT: To test the SQL query, I will set up a simulation of the database environment and then execute the query. This will demonstrate the correctness of the query in handling your requirements. Let’s proceed with the test:

Create the Tables: I’ll create the person, job, and did tables as per your script. Insert Data: I’ll insert the data you provided into these tables. Execute the Query: I’ll run the corrected query to show how it handles the combination of person and job, including cases where a person does not have a job record.

Let’s start with this test:

As a follow-up, I asked for a version of the query written as a pipeline of CTEs (common table expressions), and for the output of just the cross join component of the query.

ChatGPT: To show the output of the PersonJobCombos CTE, we need to run just that part of the query. This CTE generates all possible combinations of persons and jobs, regardless of whether a person has actually done a job or not. Let’s execute this CTE separately to see its output.

The output of the PersonJobCombos CTE shows all possible combinations of persons and jobs:

Alice is paired with both “cook” and “clean”.

Bob is also paired with both “cook” and “clean”.

This demonstrates the Cartesian product of the person and job tables, creating a combination of each person with each job, regardless of whether they have actually performed that job or not.

Rule 7: Learn by Doing

Now here’s an awkward confession. I wrote a lot of SQL in my last job, and I write a lot in my current job too. In the last gig, I never found a need for cross join. Working with Steampipe I’ve encountered the idiom in many example queries, but never in this simple form. Steampipe queries tend to use cross join in conjunction with set-returning JSONB functions in ways that are hard for my brain to visualize. In those situations, I reach for CTE pipelines in order to build queries as sequences of steps that each do one simple transformation.

I put rule 7 last because it felt most important. Here, for example, I didn’t set out to learn about cross join, I just wanted to know how to report 0 for the Bob/clean pair. Because cross join emerged as an organic part of the solution, I’m likely to remember and be able to refer back to this example. And I hope I’ll reach for it when facing an analogous problem.

That’s not all I might have learned by doing this exercise. I happen to know how to use SQLite from inside Python, but if I hadn’t known that, ChatGPT would have tacitly taught me the method and provided another useful example to refer back to.

Reinventing Education

We’ve always needed this kind of on-demand learning — and now it’s becoming possible. In this example, we’ve seen how a tidyverse/Pandas practitioner can learn how a familiar idiom plays out in SQL. As an SQL practitioner, I can go the other way and learn how a familiar SQL idiom plays out in R or Python. In fact, I just did that, and ChatGPT was again able to not only write the code but also run it — for Python only, at this point (R isn’t supported yet), but it seems inevitable that LLMs will connect to multiple engines for various analytical tasks.

Now let’s imagine a classroom exercise that might have motivated Greg’s schema and query. Here’s how a teacher might describe the assignment:

Teacher: These are the jobs people can do: cook, clean.

These are the people: Alice, Bob.

These are the jobs they have done: Alice: cook, clean; Bob: cook.

Show how to model this scenario using one or more SQL tables, and report the data in this format.

You should expect ChatGPT to write, run, and iterate on a Python script that embeds SQLite and invokes the necessary SQL statements. When you arrive at a script that reports data as above, run it yourself to verify, and include the script.

Now answer this question: How does the SQL output include this row?

It wasn’t in the data, where does it come from?

The term cross join is nowhere in sight! The student will need to inspect the generated code, then (with LLM assistance!) explain why cross join is necessary and how it works.

Some teachers will be willing and able to adopt this radical new approach right now. Many will likely wait until it’s embedded in courseware with guardrails to keep students on track. But one way or another, there’s a sea change coming to pedagogy.

The programmer and writer Ellen Ullman likes to say that programmers create systems in their own image and according to their own desires. We were among the first to experience the lifestyle that we all take for granted now: asynchronous, machine-mediated, always-on. And now we’ll be the first to experience LLM-assisted education.

For course exercises that can be solved with Python, teachers and learners are living in an unevenly distributed future. The path to a more even distribution of this kind of hands-on autonomous learning isn’t yet well-lit, but I hope educators are walking it with headlamps and flashlights blazing.

Group Created with Sketch.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.