Skip to content

Instantly share code, notes, and snippets.

@simonw
Created November 14, 2025 19:59
Show Gist options
  • Select an option

  • Save simonw/ea1dc73029bf080676839289e705a2a2 to your computer and use it in GitHub Desktop.

Select an option

Save simonw/ea1dc73029bf080676839289e705a2a2 to your computer and use it in GitHub Desktop.
1
00:00:00,080 --> 00:00:09,920
You can't talk about NoBook LM without talking about their brilliantly weird podcast thing, where they can generate a podcast about the um about whatever content you've thrown in there.
2
00:00:09,920 --> 00:00:28,320
So I like doing things like um I fed in a uh big boring report about some like generative AI study, and I told them you are banana slugs, and I want you to have a conversation about this paper from the perspective of banana slugs talking about how this could affect your slug society, and they did it.
3
00:00:28,320 --> 00:00:30,880
It's like a monthly Python skit by the leg at all.
4
00:00:30,960 --> 00:00:32,560
Oh my god.
5
00:00:36,480 --> 00:00:46,480
Welcome to episode six of Accessibility and Gen AI, a podcast that interviews the newsmakers and thought leaders in the world of accessibility and artificial intelligence.
6
00:00:46,480 --> 00:00:50,720
I'm Joe Devon, and I'm joined by my co-host Eamon MacElane.
7
00:00:50,720 --> 00:00:56,160
And today we are interviewing Simon Willison, a true OG of the web.
8
00:00:56,160 --> 00:00:58,960
He created my favorite framework called Django.
9
00:00:58,960 --> 00:01:09,760
He created Lanyard data set, and now he is a prolific blogger talking about and building projects on top of AI on a daily basis.
10
00:01:09,760 --> 00:01:16,000
Simon has contributed so much to the world we live in, and it is a true pleasure to have him join us today.
11
00:01:16,000 --> 00:01:17,920
Simon, welcome to the pod.
12
00:01:18,320 --> 00:01:19,200
Thank you very much.
13
00:01:19,200 --> 00:01:21,040
I'm excited to be here.
14
00:01:21,040 --> 00:01:28,320
You know, when I think about all the things that you've done, your impact on the web spans so many influential projects.
15
00:01:28,320 --> 00:01:35,520
But I'd love to understand what has been the most meaningful to you personally and what is a typical day look like for you.
16
00:01:35,600 --> 00:01:39,200
So I think the the most impactful project I've worked on is still Django.
17
00:01:39,200 --> 00:01:42,720
Like Django has it's extraordinary how far that framework's gone.
18
00:01:42,720 --> 00:01:46,480
You know, NASA have used it, Instagram and Pinterest were built on top of it.
19
00:01:46,480 --> 00:01:52,320
Um it uh face uh Meta's Threads is just another Django application I found out recently.
20
00:01:52,320 --> 00:01:53,680
Um so that's amazing.
21
00:01:53,680 --> 00:01:59,600
And I love the fact that Django can be classified now as boring technology in that it's the safe choice, right?
22
00:01:59,600 --> 00:02:04,360
If you just want to build something on the web, if you pick Django, you're not going to run into any sharp edges.
23
00:02:04,360 --> 00:02:05,640
There won't be any surprises.
24
00:02:05,640 --> 00:02:07,880
I'm really proud that it's made it to that point.
25
00:02:08,280 --> 00:02:21,240
But these days, um, I'm really excited about the more recent stuff I'm working on and working on open source tools for data journalism, where the dream here is I want the tools that I'm building to help somebody win a Pulitzer Prize in investigative reporting.
26
00:02:21,240 --> 00:02:30,200
And that sounds wildly ambitious, and I think it is, it's ambitious, but that would be such a cool sort of um example of the kinds of problems I want to solve.
27
00:02:30,200 --> 00:02:35,720
Like I build tools for other people, and I want those other people to achieve sort of world-changing results with them.
28
00:02:35,800 --> 00:02:37,080
That is so cool.
29
00:02:37,080 --> 00:02:43,160
Uh and and I'm just going to tell you an anecdote, which I have not mentioned to you before when we met.
30
00:02:43,160 --> 00:02:49,960
I I had never touched Python, and I needed to deliver an e-commerce project for a client.
31
00:02:49,960 --> 00:02:55,880
And I learned Python, Django built the app in two weeks.
32
00:02:55,880 --> 00:02:58,040
And my client was so happy.
33
00:02:58,040 --> 00:03:05,800
She was like, because it you know had that admin that that you guys put in there, and my client was like, you should sell this whole admin thing.
34
00:03:05,800 --> 00:03:10,120
And I was just laughing so loud because this was a two week project for me.
35
00:03:10,120 --> 00:03:11,640
That is so thrilling.
36
00:03:11,640 --> 00:03:13,080
And I I've heard a lot of that.
37
00:03:13,080 --> 00:03:15,640
A lot of people got their start in programming, learning Django.
38
00:03:15,640 --> 00:03:18,760
And when we built Django, we never dreamed it would be a beginner's project.
39
00:03:18,760 --> 00:03:23,720
We thought it was like, you know, for advanced for experienced engineers to quickly build web applications.
40
00:03:23,720 --> 00:03:28,280
But since then there's the Django Girls tutorials that have been running for I think nearly a decade now.
41
00:03:28,280 --> 00:03:29,960
There's been so much activity on that front.
42
00:03:29,960 --> 00:03:34,520
And yeah, I meet people and they're like, oh, Django was my introduction to programming and web development.
43
00:03:34,600 --> 00:03:35,800
That that's so exciting.
44
00:03:35,800 --> 00:03:39,640
That you know, that wasn't the plan, but it's so thrilling to see that happening.
45
00:03:39,640 --> 00:03:44,520
Yeah, though, to be fair, I did start with PHP in MySQL before, but uh did I.
46
00:03:44,520 --> 00:03:46,560
That was my that was my intro to Python.
47
00:03:46,560 --> 00:03:51,120
It was really hard to get Python working, but Django part of it was was awesome and easy.
48
00:03:51,120 --> 00:03:51,600
Nice.
49
00:03:51,600 --> 00:03:52,480
It's funny you say that.
50
00:03:52,480 --> 00:03:53,520
Simon, pleased to meet you.
51
00:03:53,520 --> 00:03:54,640
You and I never met before.
52
00:03:54,640 --> 00:03:57,120
So thanks for spending your time with us today.
53
00:03:57,120 --> 00:04:00,160
Um I've heard Python a lot over the past several weeks.
54
00:04:00,160 --> 00:04:05,120
Our youngest son, who's a sophomore in Oregon State, he's currently doing a Python class.
55
00:04:05,200 --> 00:04:07,040
He loves it compared to C.
56
00:04:07,040 --> 00:04:08,320
He's like, I'm all over it.
57
00:04:10,080 --> 00:04:11,200
He's a fan.
58
00:04:11,200 --> 00:04:13,120
He's a huge fan, he is.
59
00:04:13,120 --> 00:04:21,680
Um, you mentioned in your initial response about you know your goal, core goal of helping people and making things easier.
60
00:04:21,680 --> 00:04:25,600
Um tying that into accessibility, digital accessibility.
61
00:04:25,600 --> 00:04:30,240
How do you see the GNAI roadmap and accessibility coming together?
62
00:04:30,240 --> 00:04:35,440
Um, and maybe the improvements from a from an inclusiveness perspective.
63
00:04:35,440 --> 00:04:41,120
So this is something that I so I I do not I I'm not an expert in accessibility.
64
00:04:41,120 --> 00:04:43,040
It's something I've cared about throughout my entire career.
65
00:04:43,040 --> 00:04:51,040
Like Django from the very start always was built with sort of semantic HTML and all of those sort of 20 year ago accessibility concerns in place.
66
00:04:51,040 --> 00:04:57,040
And the even these days there is an accessibility working group that I'm not involved with, but that's doing amazing work on the Django admin and so forth.
67
00:04:57,040 --> 00:04:58,960
And so it's something I care very deeply about.
68
00:04:58,960 --> 00:05:04,080
I'm fascinated to learn more about that intersection between generative AI and accessibility myself.
69
00:05:04,080 --> 00:05:10,080
I feel like the most exciting trend over the past year for me has been the vision models, or these these multimodal models, right?
70
00:05:10,080 --> 00:05:17,600
Which um they can consume um images, video, the audio stuff has got incredibly powerful in the past two months.
71
00:05:17,600 --> 00:05:24,720
Like as of now, it's not even surprising that you can have an audio conversation with one of these models two months ago that was hardly a feature.
72
00:05:24,720 --> 00:05:25,920
That's so cool.
73
00:05:25,920 --> 00:05:35,560
And um, I feel like the accessibility benefits of these seem underexplored, at least from from my perspective, but but so so promising.
74
00:05:35,560 --> 00:05:46,520
Some people I've talked to have been skeptical about the accessibility benefits because their argument is, you know, if you give somebody unreliable technology that might hallucinate and make things up, surely that that's that's harming them.
75
00:05:46,520 --> 00:05:48,360
I don't think that's true.
76
00:05:48,360 --> 00:05:53,000
I feel like people who use screen readers are used to unreliable technology.
77
00:05:53,000 --> 00:05:58,040
You know, if you use a guide dog, a guide dog, it's a wonderful thing and a very unreliable piece of technology.
78
00:05:58,040 --> 00:06:04,520
So, you know, when you consider that people with accessibility needs have agency, they can understand the limitations of the technology they're using.
79
00:06:04,520 --> 00:06:09,000
I feel like giving them a tool where they can point their phone at something and it can describe it to them.
80
00:06:09,000 --> 00:06:11,640
It's got really good OCR capabilities built in.
81
00:06:11,640 --> 00:06:13,480
You can have an audio conversation with this.
82
00:06:13,480 --> 00:06:19,400
This just feels like a world away from accessibility technology in that jet just sort of three or four years ago.
83
00:06:19,480 --> 00:06:20,520
Agree, completely agree.
84
00:06:20,680 --> 00:06:29,000
I think you know that concern ties into well, maybe the concern of LLMs not being fully inclusive.
85
00:06:29,000 --> 00:06:32,840
And if LLMs are not inclusive, they're that there will be gaps.
86
00:06:32,840 --> 00:06:34,040
But we can solve that.
87
00:06:34,040 --> 00:06:46,280
We can solve that with engaging with individuals with disabilities with the prompts, we can evolve uh resolve that with making sure that we have a true comprehensive, non-biased data set when we're building data sets.
88
00:06:46,280 --> 00:06:52,040
Um I think as long as we get ahead of it, as long as we're aware of that potential gap, that we can solve it.
89
00:06:52,040 --> 00:06:53,800
I do, I believe that.
90
00:06:53,800 --> 00:06:57,880
Simon, we are now uh, I think it is day 12.
91
00:06:57,880 --> 00:07:00,920
Is it was it 12 days of open AI or 14 days?
92
00:07:00,920 --> 00:07:04,200
I think it's 12 days, and we've reached the final day.
93
00:07:04,200 --> 00:07:08,280
But it also feels like the 12 days of Google AI.
94
00:07:08,280 --> 00:07:14,440
Uh, and when Google was doing their announcements, open AI, it sort of like did their own bunch of announcements.
95
00:07:14,440 --> 00:07:19,120
It just feels like we're in a war right now, massive competition.
96
00:07:19,120 --> 00:07:22,320
Uh, so much to talk about with respect to that.
97
00:07:22,320 --> 00:07:29,120
But um, what do you what were you most excited about in the last couple of weeks that was released?
98
00:07:29,120 --> 00:07:31,600
Like, what are the the top items?
99
00:07:31,600 --> 00:07:42,880
And then as an aside, if any of those are um tie into accessibility, would love your thoughts because I haven't had enough time to even look at uh most of these.
100
00:07:42,960 --> 00:07:44,240
It's been bewildering.
101
00:07:44,240 --> 00:07:47,200
The whole month of December has just been a whirlwind.
102
00:07:47,200 --> 00:07:50,080
And when is December the month that people release everything, right?
103
00:07:50,080 --> 00:07:55,680
You'd expect people to be dialing down for the holidays, but no, we've had extraordinary releases from OpenAI.
104
00:07:55,680 --> 00:07:59,360
Google have managed to undercut OpenAI, which has never happened before.
105
00:07:59,360 --> 00:08:06,480
Like last year, every time Google made a Gemini announcement, OpenAI would launch something better that morning, almost as sort of a power move.
106
00:08:06,480 --> 00:08:09,520
The opposite is happening today, which is so fascinating.
107
00:08:09,520 --> 00:08:12,560
Like Google's Gemini team are really ramping up.
108
00:08:12,560 --> 00:08:15,840
Um there's a bunch of Gemini stuff that's really exciting.
109
00:08:15,840 --> 00:08:24,400
They released uh Gemini Flash 2.0, which is the for sort of the cheapest version of their Gemini 2.0 series, is a really impressive model.
110
00:08:24,400 --> 00:08:26,320
I've been playing around with that one a whole lot.
111
00:08:26,320 --> 00:08:33,280
The Gemini models, um, they can do audio input and they can do video input, which puts them a step ahead of OpenAI.
112
00:08:33,280 --> 00:08:39,200
OpenAI have some preview audio models, but nothing like what Gemini can do on that sort of multimodal front.
113
00:08:39,200 --> 00:08:55,040
Um, the really fun thing is um Gemini and OpenAI both now have streaming video APIs where you can literally point your webcam at something and you can then stream video images into the model, talk over them and have it talk back to you.
114
00:08:55,040 --> 00:08:57,920
And this was this is absolute science fiction.
115
00:08:57,920 --> 00:09:03,720
Like Gemini managed to squeeze their version of this out the day before OpenAI did, which was extraordinary.
116
00:09:03,880 --> 00:09:07,960
OpenAI, however, productized it, it's in the Chat GPT mobile app now.
117
00:09:08,120 --> 00:09:15,160
So I can fire up ChatGPT, I can turn on my webcam, I can point, and I can start having a conversation, including with Santa Claus.
118
00:09:15,160 --> 00:09:18,120
They've got a gimmicky Santa Claus voice that they can talk to.
119
00:09:18,120 --> 00:09:19,800
Like I introduced it to my chickens.
120
00:09:19,800 --> 00:09:22,120
I said, here are my chickens, this is their names.
121
00:09:22,120 --> 00:09:24,920
And then a few minutes later I pointed at Chick and said, which chicken is this?
122
00:09:24,920 --> 00:09:26,040
And it got the answer right.
123
00:09:26,040 --> 00:09:28,040
That's what are we even doing?
124
00:09:28,040 --> 00:09:29,640
That's amazing, right?
125
00:09:29,880 --> 00:09:31,720
Um that's so exciting.
126
00:09:31,720 --> 00:09:38,040
And the accessibility implications of streaming video and audio into these things, that's extraordinary.
127
00:09:38,040 --> 00:09:39,240
Absolutely extraordinary.
128
00:09:39,240 --> 00:09:42,360
Those capabilities became available, what, three or four days ago?
129
00:09:42,360 --> 00:09:44,680
Like this is absolute cutting edge.
130
00:09:44,680 --> 00:09:46,760
The stuff is available over APIs as well.
131
00:09:46,760 --> 00:09:48,120
Just was it yesterday?
132
00:09:48,120 --> 00:09:49,400
I'm losing track of the days.
133
00:09:49,400 --> 00:10:02,840
OpenAI now have a new Web RTC API for their real-time stuff, which I knocked out a little test web page, which you can click a button on, and now you're having an audio conversation with with one of their GPT voices.
134
00:10:02,840 --> 00:10:06,440
And it was like a dozen lines of JavaScript to get that working.
135
00:10:06,680 --> 00:10:08,200
Unbelievable, right?
136
00:10:08,200 --> 00:10:10,920
So that stuff is, and again, it's just so new.
137
00:10:10,920 --> 00:10:13,880
Like these streaming APIs didn't exist two weeks ago.
138
00:10:13,880 --> 00:10:16,040
Now they've they've rolling out.
139
00:10:16,040 --> 00:10:20,760
We're, I feel like we've hardly even started dipping our toes into what those can do.
140
00:10:20,760 --> 00:10:27,240
The other exciting thing is um OpenAI drops the prices of their audio API by a lot.
141
00:10:27,240 --> 00:10:31,480
Like previously it was prohibitively expensive, now it's just about affordable.
142
00:10:31,480 --> 00:10:39,480
Um, Gemini haven't announced the pricing on their API yet, but all of their other models are just bargain basement prices already.
143
00:10:39,480 --> 00:10:44,360
There's this part part of the benefits of the competition is that the pricing just keeps on going down.
144
00:10:44,360 --> 00:10:48,400
It's unbelievably inexpensive to use these vision models right now.
145
00:10:48,640 --> 00:10:58,160
A little while ago, so I say a while ago, like two weeks ago, Amazon announced their Amazon Nova models, which were effectively their version of the Google Gemini models.
146
00:10:58,160 --> 00:11:00,640
They're similarly priced, they have similar capabilities.
147
00:11:00,640 --> 00:11:22,480
And I did a uh napkin calculation and found that if I wanted to take 67,000 photographs from my photo library and run all of those through either Gemini 1.5 Flash or the Amazon Nova cheap one, it would cost me $10.25 to do 67,000 photos to get an actual useful text description of those photos.
148
00:11:22,480 --> 00:11:28,160
That I ran those numbers three times because I didn't believe them, like the first time I calculated them.
149
00:11:28,240 --> 00:11:29,440
That's amazing.
150
00:11:29,440 --> 00:11:30,240
Wow, right?
151
00:11:30,240 --> 00:11:31,120
Absolutely incredible.
152
00:11:32,160 --> 00:11:35,680
Yeah, I was gonna ask you what tools you currently use.
153
00:11:35,680 --> 00:11:38,960
It sounds like you use everything on a daily basis.
154
00:11:38,960 --> 00:11:50,160
Um but from what you see now or where you see things growing, do you believe that there's a current front runner or somebody that's just gaining that constant momentum and getting ahead of the game?
155
00:11:50,160 --> 00:11:51,760
No, no, and this is new.
156
00:11:51,760 --> 00:11:54,560
Like um six months ago, it was OpenAI, right?
157
00:11:54,560 --> 00:11:59,920
OpenAI with GPT, they launched GPT 4, and GPT 4 was the best available model for like nine months.
158
00:11:59,920 --> 00:12:01,520
It felt like an eternity.
159
00:12:01,520 --> 00:12:06,320
Then that broke when Anthropic released their Claude 3 Opus model.
160
00:12:06,320 --> 00:12:08,320
I I'd have to look it up, I think it was March this year.
161
00:12:08,320 --> 00:12:11,680
And then they followed up this model's Claude 3.5 Sonnet.
162
00:12:11,680 --> 00:12:13,520
None of these names are very catchy.
163
00:12:13,520 --> 00:12:18,000
That's been the best model, like the model I use every day for about six months now.
164
00:12:18,000 --> 00:12:21,600
And I think it's still just ahead of the rest of the pack.
165
00:12:21,600 --> 00:12:24,320
But the Gemini models are catching up really quickly.
166
00:12:24,320 --> 00:12:33,080
The OpenAI models, um they're they're the new O1 models that they they released another one of those like a few days ago, those are really interesting as well.
167
00:12:29,920 --> 00:12:43,400
But meanwhile, the the Chinese labs put out this um Quen 2.5 is an openly licensed model that I can run on my laptop, which is in that GPT4 space.
168
00:12:43,400 --> 00:12:49,320
The Lama models from Meta Lama 3.3, again, GP4 level runs on my laptop directly.
169
00:12:49,320 --> 00:12:50,520
I never thought that would happen.
170
00:12:50,520 --> 00:12:55,400
I thought I'd need a $50,000 like mainframe computer to run that kind of model.
171
00:12:55,400 --> 00:12:57,480
This is all changing so quickly.
172
00:12:57,480 --> 00:13:05,160
The um the flip side is there is a suspicion that the the scaling thing is no longer holding.
173
00:13:05,160 --> 00:13:10,760
Like it used to be you chuck in more data and more parameters and more GPU time, and you get better models.
174
00:13:10,760 --> 00:13:15,960
But everything seems to be sort of leveling off at the GPT-4.0 Claude Sonnet level.
175
00:13:15,960 --> 00:13:20,440
Like the new models are slightly better, but they're not like two times better.
176
00:13:20,440 --> 00:13:26,600
And so the new frontier appears to be this idea of um um inference scaling.
177
00:13:26,600 --> 00:13:33,720
It's this thing where you take a model that's that's that's really strong and you just give it longer to churn away to try and come up with answers.
178
00:13:33,720 --> 00:13:36,920
And that um OpenAI started out with their O1 model.
179
00:13:36,920 --> 00:13:45,480
Um Gemini four hours ago, Google announced their released their first version of this, this thing called Gemini Flash Thinking, I think it's called.
180
00:13:45,480 --> 00:13:50,040
I just literally got some software working against that like half an hour ago.
181
00:13:50,040 --> 00:13:52,920
Um, and it looks I'm happy I saw it a half an hour ago.
182
00:13:52,920 --> 00:13:54,280
You already have code working.
183
00:13:54,280 --> 00:13:56,440
This is this is Simon for you.
184
00:13:56,440 --> 00:14:02,200
It's the the and the Chinese ones, there's um a Quen model called QWQ, which is the same pattern.
185
00:14:02,200 --> 00:14:03,320
There's a another.
186
00:14:03,320 --> 00:14:05,400
Um, and so this is all happening now.
187
00:14:05,400 --> 00:14:20,080
And what's interesting about those is they're not better at every task, but they are better at tasks like coding and mathematics, where it helps where if you're a human being and you've got about a big notepad and you you wrote down your thinking step by step, that would help you solve a problem.
188
00:14:20,080 --> 00:14:22,400
These models are now capable of doing that kind of stuff.
189
00:14:22,400 --> 00:14:24,320
So it's a different kind of better.
190
00:14:24,320 --> 00:14:32,960
They're not just getting universally better at everything, but certain sort of like problem-solving tasks that we've got a new frontier that people are working on.
191
00:14:32,960 --> 00:14:35,360
And all of this happened so recently as well.
192
00:14:35,360 --> 00:14:40,080
Like it's it's it's it's a full-time job keeping up with it, definitely.
193
00:14:40,160 --> 00:14:41,040
It really is.
194
00:14:41,040 --> 00:14:45,920
And uh it's definitely eating into my days, uh, just trying to keep up.
195
00:14:45,920 --> 00:14:51,760
Um I am curious, since Eamon brought it up, what what do you use on a daily basis?
196
00:14:51,760 --> 00:15:01,680
And what is good enough for you to have taken uh done the subscription for uh, including are you are you spending that 200 a month on the new open AI model?
197
00:15:01,680 --> 00:15:03,120
Is that any any good?
198
00:15:03,120 --> 00:15:08,480
I've seen some people say it's not that good, and then a few people are like, this is the best thing out there.
199
00:15:08,480 --> 00:15:09,680
I'm so torn on that one.
200
00:15:09,680 --> 00:15:11,680
So I'm not spending the 200 a month yet.
201
00:15:11,680 --> 00:15:17,200
At the moment, I pay for Chat GPT 20 a month and Claude 20 a month.
202
00:15:17,200 --> 00:15:21,520
I would be paying for GitHub Copilot, but I get it for free as an open source maintainer.
203
00:15:21,520 --> 00:15:23,680
And that's it for my subscriptions.
204
00:15:23,680 --> 00:15:28,480
But then I've also got API accounts with basically everyone, and I'm constantly experimenting with the APIs.
205
00:15:28,480 --> 00:15:32,960
The thing is they're so cheap that most months my API bill across everyone comes like 10 bucks.
206
00:15:32,960 --> 00:15:38,480
Like I've never managed to spend more than 20 on the API on all of the APIs in any given month.
207
00:15:38,480 --> 00:15:41,760
So it's not a huge amount of money that I'm spending right now.
208
00:15:41,760 --> 00:15:48,560
Yeah, the the the $200 thing gets you, it gets you less um caps on the 01 model.
209
00:15:48,560 --> 00:15:57,760
Like the 01 model, I think you can use 50 times before you get locked out of it for a few days, which is a bit frustrating because I've started using that one a little bit more.
210
00:15:57,760 --> 00:16:05,560
Um, and you get this thing called O1 Pro, which I really want to try, but I don't want to spend $200 a month to $200 to try.
211
00:16:05,720 --> 00:16:08,600
So I just give me one free go at it, you know.
212
00:16:08,600 --> 00:16:14,680
Um I don't know, I might end up paying $200 a month at some point, but I haven't quite justified it to myself yet.
213
00:16:14,760 --> 00:16:22,440
We just had Ed Summers, the head of um accessibility for GitHub, um, and he announced that Copilot is now free.
214
00:16:22,440 --> 00:16:26,040
Um so that's not even limits for everyone.
215
00:16:26,120 --> 00:16:27,080
Yeah, it's uh yes.
216
00:16:27,080 --> 00:16:32,040
I forget what the limits are, but it's it's it's absolutely and the great thing about that is it's not just a free trial.
217
00:16:32,040 --> 00:16:35,720
This is a free tier that GitHub are planning to make permanent.
218
00:16:35,720 --> 00:16:39,080
So you will always, it's especially important for people around the world, you know.
219
00:16:39,080 --> 00:16:44,200
People people in India are much less likely to be able to set up that credit card subscription and so forth.
220
00:16:44,200 --> 00:16:47,640
Now they get access to the copilot um experience.
221
00:16:47,640 --> 00:16:48,840
I'm really excited about that.
222
00:16:48,840 --> 00:16:54,360
I think that's a it's a it's like it's actually it's the oldest generative AI tool in mainstream use.
223
00:16:54,360 --> 00:16:59,240
Copilot, it turns out, predates chat GPT by like what nearly two years.
224
00:16:59,240 --> 00:17:02,680
They released the first version of Copilot in 2021.
225
00:17:02,680 --> 00:17:08,440
And I love that it's not a chat and or at least it originally it was that autocomplete interface, which was really innovative.
226
00:17:08,440 --> 00:17:11,960
It was a really really interesting way of interacting with those models.
227
00:17:11,960 --> 00:17:15,400
So yeah, I'm I've been I'm I'm a huge copilot user.
228
00:17:15,400 --> 00:17:20,040
I've I'm at that point now where if it's not running in my text editor, I feel restricted.
229
00:17:20,040 --> 00:17:24,360
Like, wow, now I'm I'm having to actually type the code out in full myself.
230
00:17:24,360 --> 00:17:36,280
Yeah, and now that they've added a system prompt um for the accessibility, that's so helpful because then you can really make sure that uh what it spits out is much more likely to be accessible.
231
00:17:36,280 --> 00:17:39,240
Um I I've been asking them about that for a year.
232
00:17:39,240 --> 00:17:42,920
So uh they announced it just now at the the universe.
233
00:17:43,000 --> 00:17:44,040
That's amazing.
234
00:17:44,040 --> 00:17:51,600
That's something I wrote up a thing last night where one of the things you can do with Claude, and ChatGPT has this as well now.
235
00:17:51,600 --> 00:17:58,000
There's this thing called a project, and where you set up a project and you can dump a bunch of files into it, but you can also set custom instructions in there.
236
00:17:58,000 --> 00:18:00,640
So it's a nice, easy way of doing system prompts.
237
00:18:00,640 --> 00:18:14,560
And I've been setting up little projects for different types of code that I write with custom instructions that just have like always I always start my HTML documents in this way, always includes um box sizing border box in the CSS, little things like that.
238
00:18:14,560 --> 00:18:16,080
Um it's fantastic.
239
00:18:16,080 --> 00:18:24,560
Like I can now one shot prompt an app a full page of of working code and have all of those little um those little ideas baked into it.
240
00:18:24,560 --> 00:18:29,280
It's also interesting because it means that you can use the model for things that aren't in its training data.
241
00:18:29,280 --> 00:18:42,000
Like I've started using the Python UV tool a lot, which has ways of um running Python script where you list the dependencies in a magic comment at the top of the file, and then you don't have to think about your dependencies, it just uses them correctly.
242
00:18:42,000 --> 00:18:48,560
And so I built a I built a little custom project which teaches it, give gives it one example of here's how you list your dependencies.
243
00:18:48,560 --> 00:18:52,400
And now I can one shot fully working self-contained Python script.
244
00:18:52,400 --> 00:19:09,280
So yeah, I absolutely buy that if you're if you have expertise in accessibility in ARIA stuff that the models aren't doing, you give them one example, just one example of your sort of ideal framework, uh, your your ideal layout, and from that point on they'll they'll be really good at producing code that fits that that example.
245
00:19:09,280 --> 00:19:10,960
I love to see you enthusiasm here.
246
00:19:10,960 --> 00:19:14,000
Yeah, it's clear how much you love it, it really is.
247
00:19:14,000 --> 00:19:20,240
Um you personally um created a command line tool called LLM.
248
00:19:20,240 --> 00:19:24,640
Can you share with our listeners what that's all about and how useful that can be?
249
00:19:24,640 --> 00:19:35,000
Yeah, so this is um the the initial idea around this was I spent a lot of time in the terminal, like um running uh I'm I'm on MacOS, but you know I'm I'm I'm a terminal person.
250
00:19:35,160 --> 00:19:44,200
And I realized that large language models and the terminal are really good match because at their most basic form, a large language model is a thing where you give it a prompt and it gives you a response.
251
00:19:44,200 --> 00:19:47,480
And in the terminal, you're always piping things from one tool to another.
252
00:19:47,480 --> 00:19:53,240
So wouldn't it be great if you could just pipe text into a language model and then have the response come back out again?
253
00:19:53,240 --> 00:19:55,560
So the first version of LLM was exactly that.
254
00:19:55,560 --> 00:19:58,120
It was using the OpenAI API.
255
00:19:58,120 --> 00:20:02,600
And I noticed that nobody had LLM on the Python package repository yet.
256
00:20:02,600 --> 00:20:03,560
So I grabbed that.
257
00:20:03,560 --> 00:20:09,000
It was like a namespace grab, because a three-letter acronym tool felt like a cool thing to have.
258
00:20:09,000 --> 00:20:11,080
And so I built that, and it turns out it is great.
259
00:20:11,080 --> 00:20:18,520
It's really fun being able to say cat, my file.py, pipe, LLM, explain this code, and it spits out an explanation.
260
00:20:18,520 --> 00:20:19,800
That's really fun.
261
00:20:19,800 --> 00:20:25,560
And then I added plugin support to it so that you could have it support additional models.
262
00:20:25,560 --> 00:20:31,800
Because, you know, why talk to just OpenAI when you could talk to Anthropic or Gemini or all of these other models as well.
263
00:20:31,800 --> 00:20:37,480
And because that's based on plugins, anyone else can write a plugin that adds support for a new model.
264
00:20:37,480 --> 00:20:40,120
And I also can write plugins that do local models.
265
00:20:40,120 --> 00:20:43,560
So now my little command line tool out of the box does OpenAI.
266
00:20:43,560 --> 00:20:47,240
And if you install a plugin, it can do Gemini and then Anthropic and Claude and so on.
267
00:20:47,240 --> 00:20:50,840
And then you can install some plugins that will install models on your laptop.
268
00:20:50,840 --> 00:20:54,600
And now you've got a completely offline language model environment.
269
00:20:54,600 --> 00:20:55,640
So much fun.
270
00:20:55,640 --> 00:20:59,720
Like it's it also means that whenever a new model comes out, I've got something I can do with it.
271
00:20:59,720 --> 00:21:03,000
Like I can be like, okay, new Gemini model, tap, tap, tap, tap, tap.
272
00:21:03,000 --> 00:21:06,200
Now my LLM plugin for that can support that model.
273
00:21:06,200 --> 00:21:13,320
Um so it sort of helps me stay on top of new developments because I've got uh I'm actually writing code that interacts with these models.
274
00:21:13,320 --> 00:21:14,680
And I use that on a daily basis.
275
00:21:14,680 --> 00:21:29,200
Like there are all sorts of things where it's convenient to be in the terminal and to quickly ask a question or quickly analyze something, or you can do things like curl a URL and pipe that into the model, and now you've got ask questions against a web page.
276
00:21:36,960 --> 00:21:39,040
A lot of people don't know how to use a terminal.
277
00:21:39,040 --> 00:21:41,280
Like it's it's a it's a power user tool.
278
00:21:41,280 --> 00:21:47,440
Um and it bothers me that a lot of the stuff I'm building is then only available to people who are terminal users.
279
00:21:47,440 --> 00:21:51,840
So I have an ongoing goal to build the sort of web application on top of LLM.
280
00:21:51,840 --> 00:22:00,000
So you can type LLM space web enter, it runs a local web server, it pops open your browser, and now you've got a GUI where you can start playing with models.
281
00:22:00,000 --> 00:22:03,600
And I'm I'm forever two weeks away from getting that feature working.
282
00:22:03,840 --> 00:22:04,640
Forever.
283
00:22:04,640 --> 00:22:05,520
Yeah.
284
00:22:05,840 --> 00:22:06,880
That'll be fun.
285
00:22:06,880 --> 00:22:10,240
And you're also doing multimodal on the uh command line.
286
00:22:10,240 --> 00:22:18,640
Oh, um, and I think I saw something you did that was accessibility that could be used for accessibility, like alt text or image description or something like that, correct?
287
00:22:18,640 --> 00:22:24,720
So that's something I actually use large language models for uh most of my alt text text these days.
288
00:22:24,720 --> 00:22:40,400
Like whenever I tweet an image or whatever, I basically I've I've got a clawed project that's called out text writer, and it's got a prompt and example, and I dump an image in, and it gives me the alt text, and I very rarely just use it because that's rude, right?
289
00:22:40,400 --> 00:22:46,640
You should never just dump text onto people that you haven't reviewed yourself, but it's always a good starting point.
290
00:22:46,640 --> 00:22:53,280
And normally I'll edit a tiny little bit, I'll like delete an unimportant detail, or I'll bulk something up, and then I've got alt text that works.
291
00:22:53,280 --> 00:22:56,400
And often it's actually got really good taste.
292
00:22:56,400 --> 00:23:05,000
Like a great example is if you're if you're if you've got a screenshot of an interface, there's a lot of words on a screenshot of like a uh an interface.
293
00:22:59,840 --> 00:23:06,360
And most of those words don't matter.
294
00:23:06,440 --> 00:23:09,160
Like the message you're trying to compare in the alt text is okay.
295
00:23:09,160 --> 00:23:14,360
It's two panels on the left is a conversation, on the right, there's a preview of the SVG file or something.
296
00:23:14,360 --> 00:23:16,840
My alt text writer normally gets that right.
297
00:23:16,840 --> 00:23:27,320
Like it it it's even good at summarizing tables of data where it will notice that actually what really matters is that Gemini got a score of 57 and Nova got a score of 53.
298
00:23:27,320 --> 00:23:31,320
And so it'll pull those details out and ignore like the release dates and so forth.
299
00:23:31,320 --> 00:23:32,440
That's really cool.
300
00:23:32,840 --> 00:23:39,560
So I'll be able to prioritize actually what the overall screen, I'll be able to pick out the key key components and key metrics on the screen.
301
00:23:39,640 --> 00:23:40,840
Just does it already.
302
00:23:40,840 --> 00:23:42,920
It's just got good taste by default.
303
00:23:42,920 --> 00:23:44,120
And then you can always talk to it.
304
00:23:44,120 --> 00:23:50,120
So it can give you alt text and you can reply and say, yeah, ignore this column, and then dump, it'll try again.
305
00:23:50,120 --> 00:23:51,160
That I love that.
306
00:23:51,160 --> 00:23:56,280
Like I'm I I take pride in the alt text on these images because so many people don't bother.
307
00:23:56,280 --> 00:24:14,680
And I love um, I'll often try and drop little like in jokes in or things like not jokes that are sort of like Easter eggs that would spoil the experience for somebody with a screen reader who's who's actually using a screen reader, but just little things that make it clear that I'm trying to convey the the sort of the message that's embedded in the image.
308
00:24:14,680 --> 00:24:16,920
It's really fun, you know.
309
00:24:17,160 --> 00:24:34,040
Simon, I um I listened to about uh a one-hour podcast that you did a couple of months ago, uh, an engineering podcast, and and you really did a good job of explaining um where fine-tuning made sense, which I think you said doesn't really usually make sense.
310
00:24:34,040 --> 00:24:36,600
Um, and you compared it to RAG.
311
00:24:36,600 --> 00:24:43,080
Um, and recently there was another release that OpenAI did to improve the fine-tuning.
312
00:24:43,080 --> 00:24:46,240
And where I'm going with this is I'd love for you to explain.
313
00:24:46,320 --> 00:24:54,480
Like, I'm I'm very uh passionate about the coding LLMs and that they should be accessible by default.
314
00:24:54,480 --> 00:24:56,640
Um, so there's two aspects to this.
315
00:24:56,640 --> 00:25:03,040
One is um how do you create a model or a version of the model that is more accessible?
316
00:25:03,040 --> 00:25:09,200
In other words, you you feed it accessible code, right?
317
00:25:09,200 --> 00:25:29,600
Um, just to try and counteract uh the bad code that's in the training data, um, which, as you mentioned on that podcast, you're gonna really struggle to um to really make a difference if you're adding data because there's too much data that might in this case not be accessible.
318
00:25:29,600 --> 00:25:34,080
So, how would you recommend um customizing the model?
319
00:25:34,080 --> 00:25:59,520
And then part two of the question is how how would you recommend that I go ahead with my mission of trying to get AI researchers to pay a little more attention to this, um, perhaps with a benchmark or or somehow get them to compete with each other so that every time there's a new foundation model, they would take a look at the accessibility and say, hey, we perform well on on this benchmark.
320
00:25:59,600 --> 00:26:06,160
A benchmark is an incredibly good idea, like absolutely fantastic idea to have an accessibility benchmark.
321
00:26:06,160 --> 00:26:11,840
And I feel like it's it's difficult, like building benchmarks is not a trivial thing, but it's definitely achievable.
322
00:26:11,840 --> 00:26:13,280
There's lots of examples out there.
323
00:26:13,280 --> 00:26:15,040
There are people who could help with that.
324
00:26:15,040 --> 00:26:16,400
I love that idea.
325
00:26:16,400 --> 00:26:28,800
Because yeah, um, one of the things that's becoming increasingly clear with these models is um a lot of people have this idea that all you do is you scrape the entire internet and dump it into the model and then try and get even more data and dump it in.
326
00:26:28,800 --> 00:26:32,120
That's not actually an accurate model of of how this stuff works.
327
00:26:32,120 --> 00:26:37,800
The more you took the more the AI labs experiment, the more it's becoming clear that the quality of the data matters so much.
328
00:26:37,800 --> 00:26:41,720
Like you really don't just want a random scrape of a boat, a bunch of junk.
329
00:26:41,720 --> 00:26:44,760
You want really high quality, well curated data.
330
00:26:44,760 --> 00:26:56,920
There's a lot of work going on right now with um synthetic data, where people are artificially creating vast amounts of data and feeding into their models because they know that they've just fed in a bunch of Python code that passes its unit tests, for example.
331
00:26:56,920 --> 00:26:58,680
And that's that's just better.
332
00:26:58,680 --> 00:27:07,000
So, and the flip side of that is that occasionally you hear little little hints that the labs are hiring experts just to help with their training data.
333
00:27:07,000 --> 00:27:14,440
Like they will hire experts, like expert biologists to help like refine and dump in way more high-quality biology data.
334
00:27:14,440 --> 00:27:20,520
There is no reason at all that they couldn't hire expert accessibility engineers to help curate and dump stuff in.
335
00:27:20,520 --> 00:27:23,160
There just needs to, they need to see that there's demand for that.
336
00:27:23,160 --> 00:27:27,160
So yeah, I and if there were benchmarks, that would that would help push the needle on that one.
337
00:27:27,480 --> 00:27:31,960
I've written to all of the foundation models and not gotten any response so far.
338
00:27:31,960 --> 00:27:34,200
But um, anyway, what were you gonna say?
339
00:27:34,200 --> 00:27:37,320
Well, yeah, so it's we should talk a little bit more about fine-tuning.
340
00:27:37,320 --> 00:27:43,880
So the the obvious everyone who starts working on these models, one of the first things they think is, I wish it knew my stuff, right?
341
00:27:43,880 --> 00:27:47,240
I wish it had been trained on all of the documents within my company.
342
00:27:47,240 --> 00:27:51,960
And so obviously, I should fine-tune a model to train it to understand that information.
343
00:27:51,960 --> 00:27:54,280
That's the thing which mostly doesn't work.
344
00:27:54,280 --> 00:28:01,440
I mean, you can try and do it, but it turns out dumping a little bit of extra information to a model that's been trained on a giant scrape of the internet.
345
00:28:01,440 --> 00:28:05,080
There's so much in there already, it's very difficult to bias it in the correct direction.
346
00:28:05,080 --> 00:28:09,480
And my big frustration with fine-tuning is lots of people will sell it to you.
347
00:28:09,480 --> 00:28:12,120
There are very expensive APIs from all of these providers.
348
00:28:12,120 --> 00:28:14,840
There are companies and startups that will help you do this.
349
00:28:15,040 --> 00:28:24,240
When you ask them for demos, like I just want somebody to show me a really clear demo of look, here's the default model, here's the fine-tuned one, the default one sucks at answering this question.
350
00:28:24,240 --> 00:28:26,080
The fine-tuned one is really good at it.
351
00:28:26,080 --> 00:28:33,120
And these demos are really hard to come across, which is one of the reasons I'm very I remain skeptical of fine-tuning as a technique.
352
00:28:33,120 --> 00:28:42,800
I think someday it's going to be useful and people will have those demos, but right now I feel like you can spend a lot of time and money and energy and just not get really great results out of it.
353
00:28:42,800 --> 00:28:51,680
The flip side is the thing that's getting increasingly easy these days is just straight up prompting using these long context models.
354
00:28:51,680 --> 00:29:01,840
So like just two years ago, most models only accepted up to like 8,000 tokens, which is like maybe maybe 20 pages of text.
355
00:29:01,840 --> 00:29:03,760
I'd have to have to look that up.
356
00:29:03,760 --> 00:29:11,520
Today, almost all of the good models will accept 100,000 tokens, and the and Gemini, it takes a million or two million tokens.
357
00:29:11,520 --> 00:29:16,640
And that's like you can dump multiple novels worth of information into Gemini in one go.
358
00:29:16,640 --> 00:29:31,200
Which means if you wanted to build a model that was really good at accessibility engineering, find like 50,000 tokens worth, 10,000 tokens worth of really high quality code, stick that in the prompt, and it'll it'll pick up from those examples.
359
00:29:31,200 --> 00:29:33,520
Models are amazingly receptive to examples.
360
00:29:33,520 --> 00:29:37,440
Like that's the most powerful way to work with them is to give them examples of what you want.
361
00:29:37,440 --> 00:29:48,080
And honestly, even like three or four really good examples of well-written accessible code might be enough to start the models around a lot along the right route.
362
00:29:48,080 --> 00:29:50,160
And that's a really cheap experiment to run.
363
00:29:50,160 --> 00:30:02,280
There's also there are these um uh prompt caching mechanisms that a lot of the providers have now, where if you give it the same exact sort of system prompt, it costs way less money on the second and third and fourth goes.
364
00:29:59,840 --> 00:30:03,240
And that's really useful as well.
365
00:30:03,320 --> 00:30:09,400
If you're gonna have like a long prompt full of examples, you pay money up front for the first one, and then from then on it gets cheaper.
366
00:30:09,400 --> 00:30:10,520
I think that's the way to do it.
367
00:30:10,520 --> 00:30:12,680
It's also really quick to iterate on these.
368
00:30:12,680 --> 00:30:17,560
Like you build a really big prompt, try it, then you tweak it and try it again and see if you get better results.
369
00:30:17,560 --> 00:30:20,840
So I think that's the what that's the most promising avenue right now.
370
00:30:20,920 --> 00:30:27,000
Now I finally understand the caching because I I didn't totally get it, but it's the system prompt.
371
00:30:27,000 --> 00:30:32,200
That makes so much sense because that's gonna be a hit every every single time.
372
00:30:32,200 --> 00:30:35,400
Um I mean, it's also it's it's common prefixes.
373
00:30:35,400 --> 00:30:43,720
So you might have a system prompt that says you're a useful robot that answers questions based on this document, and then a regular prompt that's the document, and then questions after that.
374
00:30:43,720 --> 00:30:46,920
And as long as the document stays the same, you'll get that benefit.
375
00:30:46,920 --> 00:30:55,080
Also, if you're thinking about doing chat interfaces, the way chats work is each time you say something new, it replays the previous conversation.
376
00:30:55,080 --> 00:30:57,320
And again, that's where caching kicks in.
377
00:30:57,320 --> 00:31:02,440
So if the caching is happening, the the subsequent posts in the conversation save a lot of money.
378
00:31:02,440 --> 00:31:03,240
That makes sense.
379
00:31:03,240 --> 00:31:04,280
That makes sense.
380
00:31:04,280 --> 00:31:13,720
And then for your approach of these uh really long prompts with examples in them, how would you contrast that with rag as an approach?
381
00:31:13,720 --> 00:31:15,800
I think it's the same kind of idea.
382
00:31:15,800 --> 00:31:19,640
So RAG, um, so RAG it stands for retrieval augmented generation.
383
00:31:19,640 --> 00:31:39,240
And the first version of it was a trick where you ask the model a question, and rather than just answering, it goes and tries to look in your big corpus of documents for anything that looks roughly similar to that, and then junk just basically so it does a search, gets the results, sticks those into the prompt hidden from you, and then tries to answer the question.
384
00:31:39,240 --> 00:31:40,840
It's a really effective trick.
385
00:31:40,840 --> 00:31:46,800
Like it's the answer to how do I teach the model about my company isn't fine-tune a model.
386
00:31:46,800 --> 00:31:50,080
It's set up a rag system that can run searches against things.
387
00:31:50,080 --> 00:31:57,440
And really the lesson from that is most of prompt engineering, most of building on top of LLMs is thinking about the context.
388
00:31:57,440 --> 00:32:06,320
It's thinking, okay, what is the best thing I can cram into those 8,000, 100,000 million tokens to guarantee that to increase the chance that I get a good answer?
389
00:32:06,320 --> 00:32:26,400
And yeah, the the um the examples thing is almost like a fixed version of RAC, where there are actually things you can do where you could have a system where the user says, I want to build an interface that does this, and you do effectively a rag search against a hundred examples and find the five most relevant pieces of example code, bung those in the prompt and then answer the question that way.
390
00:32:26,400 --> 00:32:27,440
And that would work really well.
391
00:32:27,440 --> 00:32:29,440
That's like a very effective technique.
392
00:32:29,440 --> 00:32:36,240
You touched upon it a few minutes ago about um how AI is got the potential actually to generate jobs.
393
00:32:36,240 --> 00:32:42,800
Uh, as you well know, there's a general concern out there right now about AI replacing many jobs.
394
00:32:42,800 --> 00:32:44,000
What's your response to that?
395
00:32:44,000 --> 00:32:48,160
I know it's a very general question, but it is one of the larger concerns out there right now.
396
00:32:48,160 --> 00:32:49,520
And it deservedly should be.
397
00:32:49,520 --> 00:32:52,640
Like it's um, this is a very disruptive technology.
398
00:32:52,640 --> 00:32:59,760
Like there are there are jobs that will be very that there are jobs that will be damaged by this, there will jobs that will be enhanced by this.
399
00:32:59,760 --> 00:33:05,680
And so there's the sort of negative and the the the pessimistic and the optimistic way to look at this.
400
00:33:05,680 --> 00:33:13,520
And I can actually focus on this as a software engineer because it turns out writing code is one of the things these models are best at.
401
00:33:13,520 --> 00:33:24,160
Like writing, it's it's interesting where the great thing about code is that it's got fact checking built in because if a model spits out code and you run it and get an error, then obviously there's a mistake.
402
00:33:24,160 --> 00:33:32,200
If it spits out an essay, you need to fact check every line of that essay, and that's a much harder process than just trying to run the compiler or whatever.
403
00:33:32,200 --> 00:33:35,080
So a lot of software engineers are terrified of this.
404
00:33:35,080 --> 00:33:41,400
They're like, hey, this is a technology which is going to if like anyone can write can get it to write code now.
405
00:33:41,400 --> 00:33:43,880
My 20 years of experience are no longer valuable.
406
00:33:43,880 --> 00:33:50,280
I may I need to find a new career in in plumbing or something which which won't be replaced by AI.
407
00:33:50,280 --> 00:34:01,080
Um my perspective on this is a developer who's been using these systems on a daily basis for like a couple of years now, I find that they enhance my value.
408
00:34:01,080 --> 00:34:08,120
Like I am so much more competent and capable as a developer because I've got these tools like assisting me.
409
00:34:08,120 --> 00:34:14,840
I can write code in dozens of new programming languages that I never learned before, um, but I still get to benefit from my 20 years of experience.
410
00:34:14,840 --> 00:34:35,080
Like, take somebody off the street who's never written any code before and ask them to build an iPhone app with Chat GPT, and they are going to run into so many pitfalls because you know, programming isn't just about can you write code, it's about thinking through the problems, understanding what's possible and what's not, understanding how to QA, what good code's having good taste.
411
00:34:35,080 --> 00:34:38,120
There's so much depth to what we do as software engineers.
412
00:34:38,120 --> 00:34:48,360
And I've I've said before that um generative AI probably gives me a like two to five times productivity boost on the part of my job that involves typing code on into a laptop.
413
00:34:48,360 --> 00:34:50,120
But that's only 10% of what I do.
414
00:34:50,120 --> 00:34:56,360
Like as a software engineer, most of my time isn't actually spent with the typing of the code, it's all of those other activities.
415
00:34:56,360 --> 00:34:59,000
The II systems help with those around other activities too.
416
00:34:59,000 --> 00:35:07,640
They can help me think think through architectural decisions and research library options and so on, but still I have to have that agency to understand what I'm doing.
417
00:35:07,640 --> 00:35:10,920
So, as a software engineer, I don't feel threatened.
418
00:35:10,920 --> 00:35:20,480
I think that there's my most optimistic view of this is that the cost of developing software goes down because an engineer like myself can be more ambitious and can take on more things.
419
00:35:20,480 --> 00:35:22,880
As a result, demand for software goes up.
420
00:35:22,880 --> 00:35:38,720
Because if you were a company that previously you'd never have dreamed of building like a custom CRM for your industry because it would have taken 20 engineers a year before you got any results, if it now takes four engineers three months to get results, maybe you're maybe you're in the market for software engineers now that you weren't before.
421
00:35:38,720 --> 00:35:40,880
But that's the software engineering side of things.
422
00:35:40,880 --> 00:35:43,520
That's sort of my sort of like rosy glasses.
423
00:35:43,520 --> 00:35:49,280
There are other industries where this stuff is just massively disruptive, and I don't think there's a there's a happy ending.
424
00:35:49,280 --> 00:35:52,000
Like my favorite example there is uh language translation, right?
425
00:35:52,000 --> 00:36:14,400
If you are a human being who earns money translating text from one language into another, the models are not as good at it as you, but they are good enough and they are vanishingly inexpensive that I know a lot of translators now are finding that their job has changed from translate from one language to another to here is an AI generated translation, fix it up, and that you get paid less for.
426
00:36:14,400 --> 00:36:15,280
And that sucks.
427
00:36:15,280 --> 00:36:24,800
Like that's an entire industry of people who, even before Chat GPT, like just Google Translate about five years ago got good enough that that industry took a massive hit.
428
00:36:24,800 --> 00:36:29,360
Um the question then is how many other examples are there like that?
429
00:36:29,360 --> 00:36:46,480
Um, professional illustrators who worked at the sort of like lower end of the scale, like being commissioned to do illustrations to illustrate blog posts, they are having a terrible time because they are you can now prompt an image generation model and get an illustration that's good enough for your blog post, right?
430
00:36:46,480 --> 00:36:54,400
It's um nowhere near what what a professional illustrator could do, but it's it's really taking a chunk out of that that that that end of the market.
431
00:36:54,400 --> 00:37:00,440
I think um in the in the movie industry, the group that are most affected to my understanding is concept artists.
432
00:37:00,760 --> 00:37:09,320
It used to be that if a director is dreaming up a sci-fi scenario, they are paying artists to just come up with those initial concepts to help them think through.
433
00:37:09,320 --> 00:37:12,840
That's the kind of thing which maybe they're turning to generative AI for.
434
00:37:12,840 --> 00:37:20,520
Um so yeah, so I am not gonna say that there aren't huge negative implications to a whole bunch of people around this stuff.
435
00:37:20,520 --> 00:37:33,480
And ideally, like hopefully this shakes out to a point where now maybe concept artists are taking on more ambitious projects and they find a new niche in the market that pays well and so forth, but I can't guarantee that's going to happen.
436
00:37:33,480 --> 00:37:34,920
And that that that sucks, you know.
437
00:37:34,920 --> 00:37:47,080
It it sucks to be in the like you you talk about cars and horses, where the automobile came along and caused mass unemployment against people in the the horse industry and created loads more jobs.
438
00:37:47,080 --> 00:37:49,560
Like now there are more jobs as drivers and people making cars.
439
00:37:49,560 --> 00:37:53,720
It still sucks if you're a professional farrier working on horses, you know.
440
00:37:53,720 --> 00:37:59,560
Being disrupted by technology that creates new jobs isn't great if you're one of those jobs that gets disrupted.
441
00:37:59,640 --> 00:38:00,600
It's interesting.
442
00:38:00,600 --> 00:38:08,440
Justine Bateman, uh, who in my youth uh uh family ties, um, she was uh Mallory on family ties.
443
00:38:08,440 --> 00:38:11,880
Uh she has a computer science degree, I believe.
444
00:38:11,880 --> 00:38:19,720
And um she has spoken out as an actress and a filmmaker, she has been railing against AI in a really big way.
445
00:38:19,720 --> 00:38:31,800
And it's kind of interesting to see a techie uh uh go against it, um, because even though it is awful, the jobs that are going to be lost, there's nothing we're gonna do to stop it.
446
00:38:31,800 --> 00:38:38,560
Um, and it will at least take it away from the big names in Hollywood and democratize it.
447
00:38:38,560 --> 00:38:42,640
So a lot of people are going to be able to make movies for a lot cheaper.
448
00:38:42,640 --> 00:38:43,160
Right?
449
00:38:43,160 --> 00:38:43,480
Right.
450
00:38:43,480 --> 00:38:45,680
It's like um the thing with the translators.
451
00:38:45,920 --> 00:38:46,880
There's a trade-off here.
452
00:38:44,920 --> 00:38:51,760
Like on the one hand, it really sucks if that was your your profession and it's been impacted in that way.
453
00:38:51,920 --> 00:38:59,200
But if we now have technology, that means a billion people can have conversations who never could have afforded a human translator before.
454
00:38:59,200 --> 00:39:08,480
That is, I mean, I don't like to make just straight up statements about one thing is worth something else, but that's a pretty compelling trade-off at that point, you know.
455
00:39:08,480 --> 00:39:16,960
But and yeah, for filmmaking, so much of filmmaking is expensive and slow moving and frustrating.
456
00:39:16,960 --> 00:39:23,360
I love, as a programmer, I love when frustrating parts of my job get get sped up.
457
00:39:23,360 --> 00:39:36,800
But do we uh does the economy shake out that people who were doing like dis skilled but but you know, but skilled but um frustrating aspects of their work still get employed and earn more money doing more creative things?
458
00:39:36,800 --> 00:39:38,080
God, I hope so.
459
00:39:38,080 --> 00:39:38,960
But I don't know.
460
00:39:38,960 --> 00:39:41,920
I'm not a I'm I'm not in that industry, I'm not an economist.
461
00:39:41,920 --> 00:39:46,880
I I have no way of I can't say with any certainty that it's gonna play out in a good way.
462
00:39:47,040 --> 00:39:48,240
That's my fear too.
463
00:39:48,240 --> 00:39:50,560
And and here's another angle for you.
464
00:39:50,560 --> 00:40:10,240
So yesterday, a friend, the sister of a friend of mine got scammed out of $20,000, and they kept her on the phone the entire time and went from uh blockchain uh or Bitcoin uh ATM to Bitcoin ATM, throwing in some money here, some money there.
465
00:40:10,240 --> 00:40:14,560
Um, otherwise uh she would get arrested and she totally fell for it.
466
00:40:14,560 --> 00:40:41,880
And it hit me that that not it won't be long before you're going to see some scammers take the voice of somebody you know, and and we've you know, a lot of people in AI have talked about this, um, and and and the solution is everybody should create a safe word, but then it hit me that all you have to do to get that safe word is you call the person whose safe word you need and whose voice you're gonna grab.
467
00:40:41,880 --> 00:40:42,840
And you know what I mean?
468
00:40:42,840 --> 00:40:53,240
Like if you have, let's say two siblings, you get both of their voices, you use 11 labs or something to emulate their voice, and then you grab that, you play man in the middle, right?
469
00:40:53,240 --> 00:40:56,600
And you grab that safe word and then you hand it over.
470
00:40:56,600 --> 00:41:07,080
It sort of feels like you you need a double safe word and a really um you really have to spend some time to get this right, and most people are not going to be able to handle that.
471
00:41:07,080 --> 00:41:12,920
Yeah, I mean, that there are there are a lot of bad things, like bad people can use this technology to do a lot of bad things.
472
00:41:12,920 --> 00:41:16,680
And in most of these cases, there's always an argument they could have done it before.
473
00:41:16,680 --> 00:41:21,320
Like if you get if you if you're talented in impersonating voices, you could have pulled off that scam.
474
00:41:21,320 --> 00:41:29,000
But not a lot of people are talented at personating voices, and it's it's that extra friction meant that wasn't necessarily a widespread scam.
475
00:41:29,000 --> 00:41:33,080
The quality of voice cloning and so forth these days is is shockingly good.
476
00:41:33,080 --> 00:41:41,880
Like the um, I recently found out that the the open AI, real the really good open AI voices, they can train those on like a 15-second audio sample.
477
00:41:41,880 --> 00:41:45,720
They they prepay a professional voice actor for 15 seconds of their voice.
478
00:41:45,720 --> 00:41:52,040
And they've deliberately not made that capability available to everyone else, but it's it's it's the way the models work.
479
00:41:52,040 --> 00:42:03,400
Um so yeah, I I mean that's one of the other things that scares me about this stuff is is as a society, are we ready to to understand and to to to to to cope with this?
480
00:42:03,400 --> 00:42:07,560
And if not, how quickly can we speak can can we get up to speed?
481
00:42:07,560 --> 00:42:12,280
The the um the one that works me the most is isn't voice cloning, it's the romance scams, right?
482
00:42:12,280 --> 00:42:34,720
The thing where you get a text message out of the blue, you apply they try and form a relationship with you those romance scams have been run out of effectively sweatshops in places like Indonesia and the Philippines for years where they get people with with good written English skills and effectively like force them to pull these scams on people even it's even cheaper if you can get a generative AI model to do that.
483
00:42:34,720 --> 00:42:59,360
And yeah I think that's just going to be a growing problem that we have is that like scams are going to become more prevalent and they'll be cheaper to run like and yeah it's it's something yeah that it's always scale like so many of these problems come down to the fact that the bad thing was possible before but now it's possible at a hundred times the volume and yeah what what do we what how do we how do we fight back against that I I don't know.
484
00:42:59,360 --> 00:43:12,480
Yeah watched you you talk about the romance scam it was like 16 minutes or some show on last week and it's exactly around that and how that is growing and growing and growing globally and it's just it's it's just so unfair it really is.
485
00:43:13,360 --> 00:43:23,200
Many of our listeners on today's podcast will be interested to to kind of get a little bit more insight about your Django story kind of how that how you created it.
486
00:43:23,200 --> 00:43:53,800
I know we don't have a lot of time but could you give us just an overview of how that was initiated and your your journey there and where it's at today absolutely so this is going back a long time this is two this is 21 years ago this is um 2003 um I was a university student and I had a blog and this in 2003 there were only about a hundred people with blogs talking about web development so we all knew each other um and this chap, um Adrian Holivati was a uh journalist, uh web developer working in Kansas, and on his blog, he put up a job ad.
487
00:43:53,800 --> 00:43:57,480
And my university offered us a year in industry placement program.
488
00:43:57,480 --> 00:44:01,960
So you could take a year off of university, go and work somewhere, and then then come back again.
489
00:44:01,960 --> 00:44:04,600
And it meant that you could get a student visa.
490
00:44:04,600 --> 00:44:11,960
So I got in touch with Adrian and said, hey, would this work as like a year-long sort of like paid internship kind of arrangement?
491
00:44:11,960 --> 00:44:12,680
And it did.
492
00:44:12,680 --> 00:44:14,600
So I moved out, moved from London.
493
00:44:14,600 --> 00:44:20,840
Um I moved from England to Kansas to Lawrence, Kansas, and spent a year working at this tiny little local newspaper.
494
00:44:20,840 --> 00:44:28,440
And yeah, Adrian and I were both PHP developers who wanted to use Python, and none of the Python web frameworks at the time quite did what we wanted.
495
00:44:28,440 --> 00:44:36,440
So we ended up building our own little thin abstraction layer over the mod Python Apache module to build newspaper websites with.
496
00:44:36,440 --> 00:44:39,960
And honestly, we had no idea that it was ever going to be an open source thing.
497
00:44:39,960 --> 00:44:43,960
We thought it was the CMS that we were using to build these newspaper websites.
498
00:44:43,960 --> 00:44:45,480
But um, I was there for a year.
499
00:44:45,480 --> 00:44:56,600
I left, and then six months after I left, they got the go-ahead from the university publishers to release this as open source, partly because Ruby on Rails had just come out and was like taking the world by storm.
500
00:44:56,600 --> 00:45:01,560
And they were looking at their thing and saying, hey, we've got a thing that looks a bit like Ruby on Rails, but it's for Python.
501
00:45:01,560 --> 00:45:04,600
This company, 37 Signals, are doing well out of their release.
502
00:45:04,600 --> 00:45:06,680
We should we should go ahead and put that out into the world.
503
00:45:06,680 --> 00:45:07,320
And they did.
504
00:45:07,320 --> 00:45:13,240
And um they they called it Django because Adrian Holivati is a huge Django Reinhardt gypsy jazz fan.
505
00:45:13,240 --> 00:45:18,600
He actually um like he has a YouTube video where he does gypsy jazz guitar covers of different things.
506
00:45:18,840 --> 00:45:20,600
He's a very talented musician.
507
00:45:20,600 --> 00:45:26,360
Um yeah, and that that put Django out into the world, and it just grew and grew and grew.
508
00:45:26,360 --> 00:45:29,240
And it's been, yeah, it's been out for nearly 20 years now.
509
00:45:29,240 --> 00:45:35,400
We're we're planning a 2021st birthday party for it, um, hopefully next year, which would be really fun.
510
00:45:35,400 --> 00:45:41,000
Um, but yeah, and so I was involved at the very start, um, and then tangentially involved after that.
511
00:45:41,000 --> 00:45:49,040
I haven't been like a uh core contributing developer for a very long time, but I'd throw ideas over the fence and occasionally knock up a few patches and so forth.
512
00:45:49,040 --> 00:45:50,480
Um, it's just been amazing.
513
00:45:50,480 --> 00:45:57,280
Like watching that grow, watching the community around it grow around the world and seeing all of these things that people have built on top of it.
514
00:45:57,280 --> 00:46:00,480
Yeah, I'm I'm really excited to see how that's worked out.
515
00:46:00,480 --> 00:46:04,720
The engineering quality of uh Django is just top notch.
516
00:46:04,720 --> 00:46:08,480
The only ORM I ever liked, I always hate ORMs.
517
00:46:08,480 --> 00:46:09,920
It's like just goes straight.
518
00:46:10,400 --> 00:46:11,920
And that was nothing to do with me.
519
00:46:11,920 --> 00:46:20,960
The RM So when I was working on it, um uh Adrian built a code generator that generated because oh like database code is really repetitive.
520
00:46:20,960 --> 00:46:25,280
So he built a thing that generated Python code for you to talk to your your models.
521
00:46:25,280 --> 00:46:36,560
Um, and then um Malcolm Trudenic was the person who joined the Django community and helped uh helped um turn that into something into what we have today, an incredible piece of work.
522
00:46:36,560 --> 00:46:37,680
It's such a good design.
523
00:46:37,680 --> 00:46:42,000
And yeah, for years after the ORM came out, I still wasn't very good at SQL.
524
00:46:42,000 --> 00:46:44,000
I just rely on the what the Django ORM did.
525
00:46:44,000 --> 00:46:49,920
It's only in the past maybe five years that I've got super confident in using SQL for these things instead.
526
00:46:49,920 --> 00:46:51,200
Yeah, interesting.
527
00:46:51,200 --> 00:46:58,560
Well, speaking of data, you have another project called Data Set, which I would love for you to explain.
528
00:46:58,560 --> 00:47:01,680
What does it do that no other database does?
529
00:47:01,680 --> 00:47:06,400
Like what problem is it that you're trying to solve, and where are you going with dataset?
530
00:47:06,400 --> 00:47:08,320
This is a very interesting question.
531
00:47:08,320 --> 00:47:14,160
And um, I wish I had the one sentence answer, but I don't, so I'm gonna have to I'll have to give you a few paragraphs.
532
00:47:14,160 --> 00:47:23,440
So, dataset is an open source tool I've been building for nearly seven years now, and it's a Python web application for exploring, analyzing, and publishing data.
533
00:47:23,440 --> 00:47:31,080
So the if the initial idea was um it actually came out of work I'd done at newspapers, where when you're a newspaper, you often publish data-driven stories.
534
00:47:31,160 --> 00:47:38,200
You'll have a story about um the number of um hospital beds currently available across the state or whatever.
535
00:47:38,200 --> 00:47:39,960
And those stories come with data.
536
00:47:39,960 --> 00:47:45,080
And what I wanted to start encouraging newspapers to publish the data behind the stories.
537
00:47:45,080 --> 00:47:59,000
This is something we started doing when I worked at The Guardian back in 2000 and 200 10, 2011, where the idea was you'd publish a story and then we'd put out the data behind the story, and we just publish it as a Google spreadsheet.
538
00:47:59,000 --> 00:48:03,800
We'd have a Google spreadsheet with these are the raw numbers that went into this piece of reporting.
539
00:48:03,800 --> 00:48:10,120
I always felt that should be a better thing than Google spreadsheet, something a little bit more open, a little more but with more capabilities.
540
00:48:10,120 --> 00:48:12,280
And so the first version of data set was just that.
541
00:48:12,280 --> 00:48:29,720
It was like, okay, take a bunch of data, stick it in a SQLite database, because SQLite doesn't require a separate server, it's just a file, and then deploy an application that gives you a little interface, like a web UI on top of this database where you can click around through it, and a JSON API so that you can start building things against it as well.
542
00:48:29,720 --> 00:48:33,880
And then other features like the ability to export CSV versions and so forth.
543
00:48:33,880 --> 00:48:35,160
So that was the initial idea.
544
00:48:35,160 --> 00:48:38,600
It was what's the best possible way of publishing data online?
545
00:48:38,600 --> 00:48:41,480
Because to my surprise, there weren't really any solutions to that.
546
00:48:41,480 --> 00:48:48,360
Like if you want to publish a million rows of data online, your options are basically stick a CSV file in an S3 bucket.
547
00:48:48,360 --> 00:48:49,560
And how's that useful?
548
00:48:49,560 --> 00:48:56,600
Like, like it's great for CSV nerds, but it's not exactly something that general the people can generally engage with.
549
00:48:56,600 --> 00:49:03,080
So that was the initial idea, and then I added plugins, and I realized that my inspiration there was actually WordPress, right?
550
00:49:03,080 --> 00:49:12,360
Where WordPress is a perfectly decent blogging engine with tens of thousands of plugins that mean any publishing problem you have, you can solve with WordPress plus some plugins.
551
00:49:12,360 --> 00:49:17,120
And I thought, okay, what if that was the answer for data exploration and analysis projects?
552
00:49:17,120 --> 00:49:29,440
Like any project you have that involves data, which is basically everything, if you could take dataset plus specific plugins for visualizations that you want to run or export formats, that would be a really cool thing to build.
553
00:49:29,440 --> 00:49:32,800
So that's how the project's been evolving over the past few years.
554
00:49:33,200 --> 00:49:45,440
And so you've got, there are, I think, 150 plugins now for things like GeoJSON export or visualize everything on a map, or um I've started building plugins for adding for um for editing data as well.
555
00:49:45,440 --> 00:49:56,880
So you can actually use dataset as a kind of air table alternative where you're loading in your data, making edits to it, running transformations against it, doing geocoding operations, all of that kind of thing.
556
00:49:56,880 --> 00:50:05,600
And I love this project because thanks to plugins, if there's anything in the world that I think is interesting, I can justify that as a plugin for dataset.
557
00:50:05,600 --> 00:50:18,160
I can be like, okay, this week I'm into GIS and I'm gonna do geospatial plugins, and then next week I'm doing some weird AI stuff, and I could write plugins for dataset that use language models to generate SQL queries or whatever it is.
558
00:50:18,160 --> 00:50:21,840
So effectively, this is the project I want to work on for the rest of my life.
559
00:50:21,840 --> 00:50:24,560
If I'm gonna do that, it needs to earn its keep.
560
00:50:24,560 --> 00:50:32,400
So for the past um year and a bit, I've been putting together the sort of business model side of it, which is effectively the WordPress thing again.
561
00:50:32,400 --> 00:50:34,880
It's the hosted SaaS version of data set.
562
00:50:34,880 --> 00:50:36,800
So dataset completely open source.
563
00:50:36,800 --> 00:50:42,160
If you are comfortable running an Ubuntu virtual machine somewhere, go ahead and install it and run it.
564
00:50:42,160 --> 00:50:52,160
Or you'll be able to pay me a summer month and I will run a private dataset instance for you and your team with all of the plugins and the integrations and API stuff and all of that kind of thing.
565
00:50:52,160 --> 00:51:02,280
And I'm working at the moment, I'm I need to put the final searches on the billing side so I can actually turn on like self-service payments for it, but it's getting there.
566
00:51:02,280 --> 00:51:06,040
It's um that that's an exciting um that that's called dataset cloud.
567
00:50:59,600 --> 00:51:12,520
It's dataset.cloud as opposed to dataset, which is dataset.io, spelt like the word cassette.
568
00:51:12,520 --> 00:51:15,960
So it's d-a-t-a-c-a-s-e-t-t-e.
569
00:51:15,960 --> 00:51:21,880
But yeah, so that's that's something I'm spending a lot of time on at the moment is is that commercial side of of the open source project.
570
00:51:21,880 --> 00:51:23,240
Did you say C?
571
00:51:23,240 --> 00:51:24,360
Do you mean S?
572
00:51:24,360 --> 00:51:25,160
I did mean S.
573
00:51:25,160 --> 00:51:26,520
D-A-T-H-E.
574
00:51:27,960 --> 00:51:28,680
Thank you for that.
575
00:51:28,680 --> 00:51:29,320
Yeah.
576
00:51:29,400 --> 00:51:35,080
So, Simon, obviously the engagement between developers and accessibility community is key.
577
00:51:35,080 --> 00:51:39,560
It's key for the ongoing progress, both from a conformance and usability perspective.
578
00:51:39,560 --> 00:51:43,480
As a lead developer yourself, you know, how can we bridge that gap?
579
00:51:43,640 --> 00:51:47,480
What are specific areas that we can assist with sharing with you?
580
00:51:47,560 --> 00:52:05,160
So the thing I find most difficult about like building accessible sites at the moment is there are lots of there are the WCAG guidelines and there are like um Chrome extensions that do audits and so forth, and I just don't trust them because just because my site passed an audit, that doesn't mean it's going to work in a screen reader.
581
00:52:05,160 --> 00:52:17,080
Like, especially with um like modern JavaScript things, where if I'm building an interface where I click through tabs and parts of the page update, how should I make sure that screen readers are notified about the correct piece of the page?
582
00:52:17,080 --> 00:52:20,520
I don't just want to be told that we're WCAG guidelines, I want demos.
583
00:52:20,520 --> 00:52:29,240
The thing that's missing for me is I would like almost like a cookbook of accessibility patterns where it's like, here's how to do modal dialogues and here's to how to do tab switching.
584
00:52:29,240 --> 00:52:33,880
And for each one, I want a video of what a screen reader does with that demo.
585
00:52:33,880 --> 00:52:35,360
Or I do I want multiple videos.
586
00:52:35,360 --> 00:52:42,920
Like show me for the two or three most popular screen readers, how do they behave when you lay out your modal dialogues or your tab interface?
587
00:52:42,920 --> 00:52:47,040
The one that's really um really relevant right now is chat interfaces, right?
588
00:52:47,040 --> 00:52:52,000
LLMs do that streaming text thing where you ask them a question and everything comes back a word at a time.
589
00:52:52,000 --> 00:52:55,280
How do I make that accessibility accessible to a screen reader?
590
00:52:55,280 --> 00:52:57,040
Like there must be patterns.
591
00:52:57,040 --> 00:53:05,760
What those patterns are, it's very difficult for me to find examples of those put together by experts with proof in sort of video form that they do the right thing.
592
00:53:05,760 --> 00:53:08,800
I would love to see that more of that kind of documentation.
593
00:53:08,800 --> 00:53:14,800
Yeah, I think from our perspective, you know, we the accessibility community shares as much as they possibly can across the board.
594
00:53:14,800 --> 00:53:17,680
And Joe has played a phenomenal part in that.
595
00:53:17,680 --> 00:53:22,240
But as you well know, different tag stacks, um, there are commonalities to your point.
596
00:53:22,240 --> 00:53:30,320
And I do believe there should be some type of base documentation, lessons learned, and again, the videos would go a long way.
597
00:53:30,320 --> 00:53:36,000
We've done it from our perspective, from a Service Node perspective, and we're more than willing to share what we've built.
598
00:53:36,000 --> 00:53:41,200
But a more open source, if you will, access to that type of content could go a long way.
599
00:53:41,280 --> 00:53:44,480
My my technical preference here, I like HTML and JavaScript.
600
00:53:44,480 --> 00:53:47,040
I don't want React and I don't want like VT and so forth.
601
00:53:47,040 --> 00:53:55,440
I just want to give me an accessible HTML and JavaScript demo of how to do like five or six of these common common interactions.
602
00:53:55,440 --> 00:54:01,040
And that's enough, like then if somebody's using React, they should be able to port that to React, you know.
603
00:54:01,040 --> 00:54:04,320
But having those example, the thing I want is proof.
604
00:54:04,320 --> 00:54:11,120
Like every time a new JavaScript library comes out, one of my first things I do is check to see if they've got any documentation about their accessibility.
605
00:54:11,120 --> 00:54:12,400
Most of them don't at all.
606
00:54:12,400 --> 00:54:17,040
Some of them will say we are we have the right ARIA tag.
607
00:54:17,040 --> 00:54:18,080
That's still not enough for me.
608
00:54:18,080 --> 00:54:25,680
I still don't believe I won't believe that it's accessible until I see video evidence that it's been tested and shown to work with the screen reader.
609
00:54:25,680 --> 00:54:31,560
And I feel like um I think Adobe do have some of this documentation for some of their accessible React things.
610
00:54:31,960 --> 00:54:45,000
Yeah, so what I'd recommend that you look at is um Charlie Tripp at Triplet, um, who's one of the accessibility architects that that um helps create what we're working on at my company?
611
00:54:45,000 --> 00:54:48,040
Um he wrote the book on accessibility.
612
00:54:48,040 --> 00:54:58,040
Uh so you can just go to the book on accessibility.com and he was working for T Mobile and for T Mobile built magenta ally.
613
00:54:58,040 --> 00:55:00,920
So Magenta A11Y.com.
614
00:55:01,240 --> 00:55:16,440
And it's got um it's got acceptance criteria, it's got uh sample HTML, it's got videos of screen readers or assistive technology, and I think it's gonna provide a lot of the solutions that you're looking for.
615
00:55:16,440 --> 00:55:17,560
That's really exciting.
616
00:55:17,560 --> 00:55:25,320
Like just that that for me, the thing that would make that would make my my life as an engineer who cares about this easier, it's it's demos.
617
00:55:25,320 --> 00:55:34,840
It's it's straight up demos that are proven to work with explanations of why this is the right pattern for doing a tabbed interface or replacing part of the screen.
618
00:55:34,840 --> 00:55:40,120
The other problem I've got at the moment, which is a really interesting one, is tables, just actual tables of data.
619
00:55:40,120 --> 00:55:44,280
Because my my software data set presents tables of data.
620
00:55:44,280 --> 00:55:46,520
I have no control over what those tables are.
621
00:55:46,520 --> 00:55:51,240
Often it'll be like a 40 column table with like 2,000 rows in it.
622
00:55:51,240 --> 00:55:59,000
I don't care how good my table markup is, that's gonna suck in a screen reader if you're trying to make sense of like 2,000 rows of data with 40 columns.
623
00:55:59,000 --> 00:56:06,840
Um, I had a great conversation actually with Ed Summers about this, where we talked about how this is a great opportunity for chat-based interfaces, right?
624
00:56:06,840 --> 00:56:15,920
If you can, if you've got that table and you can then say to it, what's the school with the highest number of um of children from this particular background?
625
00:56:16,080 --> 00:56:17,280
And it gives you an answer.
626
00:56:17,280 --> 00:56:18,160
That's just better.
627
00:56:18,160 --> 00:56:19,920
That's just a better way of interacting.
628
00:56:19,920 --> 00:56:28,080
So I'm really excited for my own data set um tool about like what are the plugins I can build that use language models to give you that sort of conversational interface.
629
00:56:28,080 --> 00:56:29,760
Because I realized I thought that was a gimmick.
630
00:56:29,760 --> 00:56:33,680
I thought having a conversation with the table feels like that's kind of fun, but is it really useful?
631
00:56:33,680 --> 00:56:36,560
From accessibility perspective, it's amazingly useful.
632
00:56:36,560 --> 00:56:37,760
Completely, completely agree.
633
00:56:37,760 --> 00:56:40,240
And we we touched upon this yesterday with Ed.
634
00:56:40,240 --> 00:56:57,120
Um, from a service now perspective, we will be releasing a conversational AI with Now Assist or Assist AI, uh full conversational AI, with the goal of creating a full conversational AI for an entire user journey from start to finish.
635
00:56:57,120 --> 00:57:01,600
Then we're going to be hopefully overlaying that with what we call intuitive page summarization.
636
00:57:01,600 --> 00:57:10,000
So an unusual here, what you just touched upon before, the key parts of that page, the most critical parts of that page will be read back right away.
637
00:57:10,000 --> 00:57:12,000
So they don't have to be tapping across everything.
638
00:57:12,000 --> 00:57:14,320
So we're actually working on both of them right now.
639
00:57:14,320 --> 00:57:19,120
And isn't it amazing that what you just described is now something that can be built?
640
00:57:19,120 --> 00:57:21,600
Like it's now feasible to build these kinds of things.
641
00:57:21,600 --> 00:57:32,400
Um with like just interacting with a web page, straight up a screenshot of an entire web page felt in felt um fed into a high quality model, will do a good enough job right now that it can start being useful.
642
00:57:32,400 --> 00:57:34,400
And that's fascinating.
643
00:57:34,400 --> 00:57:35,440
Yep, love it.
644
00:57:35,440 --> 00:57:36,000
Yeah.
645
00:57:36,000 --> 00:57:45,760
And in fact, to prepare for this podcast, I took um I took your blog, which is just incredible, I threw it into Notebook LM.
646
00:57:45,760 --> 00:58:00,120
I took a whole bunch of your podcasts, through that into Notebook LM, and then some of the questions I asked you, I did honestly get out of Notebook LM because it was able to munch through so much data and and provide that information.
647
00:58:00,120 --> 00:58:01,400
It's such a cool tool.
648
00:58:01,400 --> 00:58:02,760
Have you played with it?
649
00:57:59,840 --> 00:58:06,760
Yeah, I'm I'm really I'm I'm really fascinated by Notebook LM.
650
00:58:06,840 --> 00:58:11,240
Um, what you've just described, it's this is it's the best current consumer rag product.
651
00:58:11,240 --> 00:58:18,440
Like really what you're doing there is you're just dumping a bunch of documents into a rag system, and when you ask it a question, it looks things up for you and so forth.
652
00:58:18,440 --> 00:58:19,960
And it works really, really well.
653
00:58:19,960 --> 00:58:27,880
The um the questions that you gave me, I've been on a few podcasts, they were by far the best select selection of questions in advance I've ever seen.
654
00:58:27,880 --> 00:58:31,960
It's so interesting to hear that that was part of your process for putting those together.
655
00:58:31,960 --> 00:58:42,840
The other thing, you can't talk about Notebook LM without talking about their brilliantly weird podcast thing, where they can generate a podcast about the um about whatever content you've thrown in there.
656
00:58:42,840 --> 00:58:43,800
I love that.
657
00:58:43,800 --> 00:58:47,000
It's both a gimmick and incredibly useful.
658
00:58:47,000 --> 00:58:49,880
Um it is spookily good.
659
00:58:49,880 --> 00:58:55,400
Like it's the best version of like fake human voices that I've heard from anything.
660
00:58:55,400 --> 00:59:01,720
They just released a new preview feature last week, I think, where you can now interrupt the podcast and ask them a question.
661
00:59:01,720 --> 00:59:06,680
And so you can basically join in and be part of the the podcast experience.
662
00:59:06,680 --> 00:59:07,800
Very, very weird.
663
00:59:07,800 --> 00:59:10,280
They also added um custom instructions to it.
664
00:59:10,280 --> 00:59:28,680
So I like doing things like um, I fed in a uh big boring report about some like generative AI study, and I told them you are banana slugs, and I want you to have a conversation about this paper from the perspective of banana slugs talking about how this could affect your slug society, and they did it.
665
00:59:28,680 --> 00:59:33,080
It's like a monthly pipe and oh my god.
666
00:59:33,080 --> 00:59:34,520
That's incredible.
667
00:59:35,240 --> 00:59:36,440
How fun, yeah.
668
00:59:36,520 --> 00:59:42,040
And their team announced that they're leaving and starting their own company, so that should be a good one.
669
00:59:42,280 --> 00:59:44,040
That's that's the constant pattern with this thing.
670
00:59:44,040 --> 00:59:50,400
If you build anything good at all around AI, you should quit your job and raise a million, raise 10 million dollars off of VC straight.
671
00:59:44,840 --> 00:59:51,040
Or a billion.
672
00:59:51,200 --> 00:59:52,640
Some of them are getting a billion.
673
00:59:52,800 --> 00:59:54,400
The money is flowing freely right now.
674
00:59:54,400 --> 01:00:00,480
If you ever want to do a startup, if you can prove yourself on a product, then then yeah, it's uh it's the time to be doing that.
675
01:00:00,480 --> 01:00:04,800
That's something I always like to emphasize is it's important to have fun with this stuff.
676
01:00:04,800 --> 01:00:10,640
Like a lot of people haven't gotten on to how deeply entertaining these things can be if you give them the chance.
677
01:00:10,640 --> 01:00:14,160
Like don't just ask them to tell you jokes, they'll tell you crap jokes.
678
01:00:14,160 --> 01:00:21,120
But if you ask them to be banana slugs and talk about the impact of generative AI on their society, that keeps me entertained all the time.
679
01:00:21,120 --> 01:00:22,960
There's just so much you can do with that.
680
01:00:22,960 --> 01:00:24,080
I could listen to you all day.
681
01:00:24,080 --> 01:00:25,600
I really, really could, Simon.
682
01:00:25,600 --> 01:00:29,040
Um enjoyable, uh, educational.
683
01:00:29,040 --> 01:00:37,760
Um and and just honestly, again, um, as I mentioned before, your passion for what you do is pretty apparent, and your authenticity is pretty apparent.
684
01:00:37,760 --> 01:00:39,680
So thank you so much for your time today.
685
01:00:39,840 --> 01:00:41,360
Greatly, greatly appreciate it.
686
01:00:41,360 --> 01:00:42,080
Thanks very much.
687
01:00:42,080 --> 01:00:43,840
This has been a really fun conversation.
688
01:00:43,840 --> 01:00:44,640
Thanks, Simon.
689
01:00:44,640 --> 01:00:49,760
And just let our audience know where they can reach you and and read your stuff.
690
01:00:49,760 --> 01:00:56,960
So I'm online at Simon Willison.net is my blog, which I have updated every day since January the first this year.
691
01:00:56,960 --> 01:01:01,520
So I'm just about to hit a year-long streak, which I'm I'm quite excited about.
692
01:01:01,520 --> 01:01:04,160
Um that will link to all of my other stuff.
693
01:01:04,160 --> 01:01:09,040
I'm um I have a very active GitHub account with 900 projects on it at the moment.
694
01:01:09,040 --> 01:01:13,200
Um, and I'm on Blue Sky and Mastodon and Twitter and so forth as well.
695
01:01:13,200 --> 01:01:13,600
Great.
696
01:01:13,600 --> 01:01:16,160
Under Simon W at Simon W, right?
697
01:01:16,160 --> 01:01:23,760
Uh yes, or at SimonWilson.net on Blue Sky and um Simon at SimonWilson.net on Mastodon.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment