Learn by Doing: How LLMs Should Reshape Education

The path toward hands-on autonomous learning is through large language models (LLMs). Jon Udell shows how the EdTech sector can use AI.

Jan 22nd, 2024 7:16am by Jon Udell

Featued image for: Learn by Doing: How LLMs Should Reshape Education

Photo by Kenny Eliason on Unsplash

Today my friend Greg Wilson, who was translating a tidyverse/Pandas idiom to SQL, asked me the following question:

OK, what’s wrong with my SQL?

drop table if exists person;

create table person(

ident integer not null primary key,

name text not null

);

drop table if exists job;

create table job(

ident integer not null primary key,

name text not null

);

drop table if exists did;

create table did(

person_id integer not null,

job_id integer not null

);

insert into person values (1, "Alice"), (2, "Bob");

insert into job values (20, "cook"), (30, "clean");

insert into did values(1, 20), (2, 20), (1, 30);

select person.name, job.name, count(*)

from person inner join did inner join job

on person.ident = did.person_id and did.job_id = job.ident

group by person.ident, job.ident

order by person.ident, job.ident;

The result doesn’t show 0 for (Bob, clean). Why not?

I relayed the question to my lead assistant, ChatGPT, and here was the response:

ChatGPT: Your SQL script mostly looks correct, but there’s an issue with the final query that you’re using to combine the tables. Let’s break it down:

Table Creation and Insertion: Your script correctly creates the tables person, job, and did, and inserts data into them. This part seems fine.

Final Query for Combined Data: You’re attempting to combine data from these tables to show which person does which job. However, the query you’ve written using INNER JOIN won’t include combinations where a person doesn’t have a job (like Bob with “clean” in your example). This is because INNER JOIN only includes rows where there is a match in both tables.

To include all combinations of persons and jobs, even when a person doesn’t have a particular job, you should use a CROSS JOIN to create a Cartesian product of person and job, and then a LEFT JOIN with the did table. This way, you will get all combinations, and you can then count the occurrences.

Here is the corrected query:

SELECT person.name, job.name, COUNT(did.person_id) as job_count

FROM person

CROSS JOIN job

LEFT JOIN did ON person.ident = did.person_id AND job.ident = did.job_id

GROUP BY person.name, job.name

ORDER BY person.name, job.name;

This query will list every combination of person and job, and count how many times each person has each job. For combinations not present in the did table (like Bob with “clean”), it will show 0.

Rule 2: Never Trust, Always Verify

Mindful of rule 2 (see 7 Guiding Principles for Working with LLMs), here was my next move:

Jon: Sounds good, please write a test to prove it works.

What happened next is something I’ve mentioned before, in How Large Language Models Assisted a Website Makeover: ChatGPT wrote and ran the test. It’s mind-blowing if you haven’t seen this happen yet, and my sense is that many folks have not.

ChatGPT: To test the SQL query, I will set up a simulation of the database environment and then execute the query. This will demonstrate the correctness of the query in handling your requirements. Let’s proceed with the test:

Create the Tables: I’ll create the person, job, and did tables as per your script. Insert Data: I’ll insert the data you provided into these tables. Execute the Query: I’ll run the corrected query to show how it handles the combination of person and job, including cases where a person does not have a job record.

Let’s start with this test:

import sqlite3
import pandas as pd

# Create an in-memory SQLite database
conn = sqlite3.connect(':memory:')

# Create tables
create_statements = [
    '''
    CREATE TABLE person (
        ident INTEGER NOT NULL PRIMARY KEY,
        name TEXT NOT NULL
    );
    ''',
    '''
    CREATE TABLE job (
        ident INTEGER NOT NULL PRIMARY KEY,
        name TEXT NOT NULL
    );
    ''',
    '''
    CREATE TABLE did (
        person_id INTEGER NOT NULL,
        job_id INTEGER NOT NULL
    );
    '''
]

for create_statement in create_statements:
    conn.execute(create_statement)

# Insert data
person_data = [(1, "Alice"), (2, "Bob")]
job_data = [(20, "cook"), (30, "clean")]
did_data = [(1, 20), (2, 20), (1, 30)]

conn.executemany('INSERT INTO person VALUES (?, ?);', person_data)
conn.executemany('INSERT INTO job VALUES (?, ?);', job_data)
conn.executemany('INSERT INTO did VALUES (?, ?);', did_data)

# Query to combine the tables
query = '''
SELECT person.name AS PersonName, job.name AS JobName, COUNT(did.person_id) AS JobCount
FROM person
CROSS JOIN job
LEFT JOIN did ON person.ident = did.person_id AND job.ident = did.job_id
GROUP BY person.name, job.name
ORDER BY person.name, job.name;
'''

# Execute the query
result = pd.read_sql_query(query, conn)
conn.close()

result

PersonName JobName  JobCount
0      Alice   clean         1
1      Alice    cook         1
2        Bob   clean         0
3        Bob    cook         1

import sqlite3

import pandas as pd

# Create an in-memory SQLite database

conn = sqlite3.connect(':memory:')

# Create tables

create_statements = [

'''

CREATE TABLE person (

ident INTEGER NOT NULL PRIMARY KEY,

name TEXT NOT NULL

);

''',

'''

CREATE TABLE job (

ident INTEGER NOT NULL PRIMARY KEY,

name TEXT NOT NULL

);

''',

'''

CREATE TABLE did (

person_id INTEGER NOT NULL,

job_id INTEGER NOT NULL

);

'''

]

for create_statement in create_statements:

conn.execute(create_statement)

# Insert data

person_data = [(1, "Alice"), (2, "Bob")]

job_data = [(20, "cook"), (30, "clean")]

did_data = [(1, 20), (2, 20), (1, 30)]

conn.executemany('INSERT INTO person VALUES (?, ?);', person_data)

conn.executemany('INSERT INTO job VALUES (?, ?);', job_data)

conn.executemany('INSERT INTO did VALUES (?, ?);', did_data)

# Query to combine the tables

query = '''

SELECT person.name AS PersonName, job.name AS JobName, COUNT(did.person_id) AS JobCount

FROM person

CROSS JOIN job

LEFT JOIN did ON person.ident = did.person_id AND job.ident = did.job_id

GROUP BY person.name, job.name

ORDER BY person.name, job.name;

'''

# Execute the query

result = pd.read_sql_query(query, conn)

conn.close()

result

PersonName JobName JobCount

0 Alice clean 1

1 Alice cook 1

2 Bob clean 0

3 Bob cook 1

As a follow-up, I asked for a version of the query written as a pipeline of CTEs (common table expressions), and for the output of just the cross join component of the query.

ChatGPT: To show the output of the PersonJobCombos CTE, we need to run just that part of the query. This CTE generates all possible combinations of persons and jobs, regardless of whether a person has actually done a job or not. Let’s execute this CTE separately to see its output.

# Reconnect to the in-memory SQLite database

conn = sqlite3.connect(':memory:')

# Recreate tables and insert data as before

for create_statement in create_statements:

conn.execute(create_statement)

conn.executemany('INSERT INTO person VALUES (?, ?);', person_data)

conn.executemany('INSERT INTO job VALUES (?, ?);', job_data)

conn.executemany('INSERT INTO did VALUES (?, ?);', did_data)

# Query for PersonJobCombos CTE

person_job_combos_query = '''

WITH PersonJobCombos AS (

SELECT person.ident AS PersonID, person.name AS PersonName, job.ident AS JobID, job.name AS JobName

FROM person

CROSS JOIN job

)

SELECT * FROM PersonJobCombos;

# Execute the PersonJobCombos CTE query

person_job_combos_result = pd.read_sql_query(person_job_combos_query, conn)

conn.close()

person_job_combos_result

PersonID PersonName JobID JobName

0 1 Alice 20 cook

1 1 Alice 30 clean

2 2 Bob 20 cook

3 2 Bob 30 clean

The output of the PersonJobCombos CTE shows all possible combinations of persons and jobs:

Alice is paired with both “cook” and “clean”.

Bob is also paired with both “cook” and “clean”.

This demonstrates the Cartesian product of the person and job tables, creating a combination of each person with each job, regardless of whether they have actually performed that job or not.

Rule 7: Learn by Doing

Now here’s an awkward confession. I wrote a lot of SQL in my last job, and I write a lot in my current job too. In the last gig, I never found a need for cross join. Working with Steampipe I’ve encountered the idiom in many example queries, but never in this simple form. Steampipe queries tend to use cross join in conjunction with set-returning JSONB functions in ways that are hard for my brain to visualize. In those situations, I reach for CTE pipelines in order to build queries as sequences of steps that each do one simple transformation.

I put rule 7 last because it felt most important. Here, for example, I didn’t set out to learn about cross join, I just wanted to know how to report 0 for the Bob/clean pair. Because cross join emerged as an organic part of the solution, I’m likely to remember and be able to refer back to this example. And I hope I’ll reach for it when facing an analogous problem.

That’s not all I might have learned by doing this exercise. I happen to know how to use SQLite from inside Python, but if I hadn’t known that, ChatGPT would have tacitly taught me the method and provided another useful example to refer back to.

Reinventing Education

We’ve always needed this kind of on-demand learning — and now it’s becoming possible. In this example, we’ve seen how a tidyverse/Pandas practitioner can learn how a familiar idiom plays out in SQL. As an SQL practitioner, I can go the other way and learn how a familiar SQL idiom plays out in R or Python. In fact, I just did that, and ChatGPT was again able to not only write the code but also run it — for Python only, at this point (R isn’t supported yet), but it seems inevitable that LLMs will connect to multiple engines for various analytical tasks.

Now let’s imagine a classroom exercise that might have motivated Greg’s schema and query. Here’s how a teacher might describe the assignment:

Teacher: These are the jobs people can do: cook, clean.

These are the people: Alice, Bob.

These are the jobs they have done: Alice: cook, clean; Bob: cook.

Show how to model this scenario using one or more SQL tables, and report the data in this format.

PersonName JobName JobCount

Alice clean 1

Alice cook 1

Bob clean 0

Bob cook 1

You should expect ChatGPT to write, run, and iterate on a Python script that embeds SQLite and invokes the necessary SQL statements. When you arrive at a script that reports data as above, run it yourself to verify, and include the script.

Now answer this question: How does the SQL output include this row?

1	Bob clean 0

It wasn’t in the data, where does it come from?

The term cross join is nowhere in sight! The student will need to inspect the generated code, then (with LLM assistance!) explain why cross join is necessary and how it works.

Some teachers will be willing and able to adopt this radical new approach right now. Many will likely wait until it’s embedded in courseware with guardrails to keep students on track. But one way or another, there’s a sea change coming to pedagogy.

The programmer and writer Ellen Ullman likes to say that programmers create systems in their own image and according to their own desires. We were among the first to experience the lifestyle that we all take for granted now: asynchronous, machine-mediated, always-on. And now we’ll be the first to experience LLM-assisted education.

For course exercises that can be solved with Python, teachers and learners are living in an unevenly distributed future. The path to a more even distribution of this kind of hands-on autonomous learning isn’t yet well-lit, but I hope educators are walking it with headlamps and flashlights blazing.

Jon Udell is an author and software developer who explores software tools and technologies and explains them in writing, audio, and video. He is the author of the cult classic Practical Internet Groupware. Past gigs include Lotus, BYTE magazine, Safari...