Introducing WordsPlot

AI-driven text visualization better than word clouds

Word clouds are the problem child of data visualization: typically color, orientation, position, etc., don’t encode any data. We can do better. I wrote an entire book with many different ways to visualize textual data. I wrote code examples with D3.js (same link). But I get asked by people who don’t code “How can I create one of these?” The short answer: with AI. All these were created by Claude.ai and I’ll show you how I create them:

Text-centric versions of bar, scatter, table and beeswarm plots.

All the text visualizations above were create with prompts to Claude.ai including small datasets. They include interactions such as tooltips and links to youtube videos. In each case, the element conveying data is complemented with, or replaced by text: a text-bar, text-scatter, text-table and text-beeswarm. Or scroll down to see the prompts for each.

I was going to use AI to create code and then make a website where people can cut and paste their text datasets. But then realized, there’s lots of little tweaks and features that I wanted, as I started adding menus and buttons. But that’s all wrong: it just makes a complicated interface.

Instead, AI is more malleable. Below is a recipe I call WordsPlot. The recipe has been tested and works fairly well on Claude.ai using 3.7 Sonnet. From last summer to now, AI has improved dramatically in its ability to create small visualizations with small datasets. I’ve gotten to the point where I have a text chunks I can assemble to create visualizations. I’ve used them on a number of different datasets and they tend to work, with the usual AI caveats. And they are much more easy to adapt and modify using language rather than buttons.

You can also see a video podcast with Enrico Bertini where I create these live
https://filwd.substack.com/p/visualizing-text-data-using-ai-w

WordsPlot STEPS:

Step 1. Data. Wordclouds use isolated words. Stop doing that. Chunks of text such as names, phrases, and sentences are far more meaningful. There’s huge amounts of data out there is tabular format, such as tab-delimited values. We’ll use that.

Make sure your table has at least 2 columns, including one with the phrases of interest, and make sure it has a header row. Here’s a simple example:

Catchphrase Character Season Emotion
Don't have a cow, man!	Bart Simpson	1	Anger
Why you little...!	Homer Simpson	1	Anger
That's unpossible!	Ralph Wiggum	6	Surprise
Sweet merciful crap!	Chief Wiggum	4	Surprise
Thank you, come again!	Apu Nahasapeemapetilon	3	Disgust
… 5-25 rows more …

In this example we have the phrase that we want to visualize (Catchphrase), two categories (Character and Emotion), and one numeric value (Season).

Step 2. Choose your plot. There are 4 kinds of text-centric plots I’ve been getting fairly good results with based on Bar, Table, Scatter, and Beeswarm plots. In addition to the phrase, each plot type requires at least one additional column:

Bar: A numeric value and a category value.
Table: Two category values
Scatter: Two numeric values
Beeswarm: One numeric value

Step 3: Consider optional extra encodings: Typically, I like to add color; and maybe text-size, font-weight, or typeface.

Step 4: Consider interactions: Tooltips, maybe hyperlinks.

Note that if you ask for too many things and/or try to use too much data, you may run into LLM limits on the free LLMs.

CREATING the WordsPlot PROMPT:

To create the prompt, you can cut and paste the from each section, and fill in the blanks.

A. PROMPT INTRO:

We will create a text-centric data visualization. You will create HTML that I can save and open in a browser. You can use Javascript and D3.js. The dataset is below.

B. PLOT PROMPT:
Pick one of the four visualization types:

Text Bar

1. This will be a variant of a bar chart with horizontal bars.
2. For this plot the X-axis will be _____ (numeric column) and the Y-axis will be _____ (category column). If a category on the y-axis occurs more than once, all data rows for that category should be in successive bars. Make sure the bar colors are sufficiently pale such that the text overtop will be legible against the bar or background.
3. For each mark, you will use the text from ____ (phrase column). This text will be superimposed on each bar. Do not wrap this text if it is long.

Text Table

1. This will be a table filled with text.
2. The rows should be _____ (category column), and the columns should be _____ (different category column). If there are more than 10 rows or 10 columns, group the remaining items into a category "Other" as the 11th row (or column) as needed.
3. Each cell will contain text. We will call this cell text a mark. The text will be _____ (phrase column). If more than one data row fits in a given cell, put it on a separate line in the same cell.

Text Scatter

1. This will be a variant of a bubble plot.
2. For this plot, the X-axis will be _____ (numeric column) and the Y-Axis will be _____ (numeric column). You should set the axes to range from the minimum to maximum value.
3. For each mark, you will use text from the _____ (phrase column) instead of circles. If the text is long, wrap the text so that each mark is roughly squarish, and use tight leading. You must use collision detection to adjust mark positions such that no text overlaps other text. Make sure that no text moves so far as to be outside the extents of the chart area.

Text Beeswarm

1. This will be a variant of a beeswarm plot.
2. For this plot, the x-axis will be __ (numeric column). You should set the axes to range from the minimum to the maximum value.
3. For each mark, you will use text from the _____ (phrase column) instead of dots. If the text is long, wrap the text so that each mark is roughly squarish, and use tight leading. You must use collision detection to adjust mark positions such that no text overlaps other text. Make sure that no text moves so far as to be outside the extents of the chart area.

C. OPTIONAL VISUAL ATTRIBUTES:

Add some additional attributes, such as color or font-weight:

Color, using a category column:

COLOR: Color each mark using _____ (category column) with a nice visceral color palette and include a color legend. Make sure the colors have sufficient contrast to be legible against the background, set to the very light color #fffbea.

Color, using a numeric column:

COLOR: Color each mark using _____ (numeric column) with a visceral color ramp and include a color legend. Make sure the colors have sufficient contrast to be legible against the background, set to the very light color #fffbea.

Font-Size:

FONT-SIZE: Set the font-size for each mark using _____ (numeric column). The difference between the smallest font size and largest font size should less than double.

Font-weight:

FONT-WEIGHT: Set the font-weight for each mark using _____ (numeric column). Use a variable font and the font-weight should range from thin (100) for low values to black (900) for high values.

Typeface:

TYPEFACE: Set the font-family for each mark using _____ (category column).

D. OPTIONAL INTERACTIONS

Add some interactions, but keep it simple:

When I hover over a mark, it should provide a tooltip identifying all the data in the row.

When I click on the mark, it should ... e.g. link to the URL provided in ____ (column with a URL)

E. DATA

Finally, add a line that says:

Here's the data:

then paste in the data.

Run it and see what you get! If it’s close, add a request with very specific cleanups, e.g. 1) Part of the legend is off the right edge of the screen, nudge it back to the left. 2) Some of the text does not have sufficient contrast: particulary the yellow text – please improve legibility by adjusting the light colored text, particularly the yellow text.

If it’s still not good after 4 or so attempts at refinement, start over again and/or reconsider your data. I typically download the html and do small edits: change the title, add a subtitle, maybe change a few colors. Also, some kinds of interactions won’t run in the AI’s browser, such as links, so opening the local download allows those to be explored.

Example text-bar chart:

We will create a text-centric data visualization. You will create HTML that I can save and open in a browser. You can use Javascript and D3.js. The dataset is below. 
1. This will be a variant of a bar chart with horizontal bars. 
2. For this plot the X-axis will be "Albums Sold (M)" and the Y-axis will be the columns Artist + "Song Title" concatenated. If a category on the y-axis occurs more than once, all data rows for that category should be in successive bars. Make sure the bar colors are sufficiently pale such that the text overtop will be legible against the bar or background. 
3. For each mark, you will use the text from "Iconic Phrase". This text will be superimposed on each bar. Do not wrap this text if it is long. 
COLOR: Color each mark using "Annoy vs Enjoy" with a visceral color ramp and include a color legend. Make sure the colors have sufficient contrast to be legible against the background, set to the very light color #fffbea.  
When I hover over a mark, it should provide a tooltip identifying all the data in the row.
When I click on the mark, it should link to the URL provided in URL column. 
Here's the data:
Artist	Song Title	Iconic Phrase	Year	Genre	Albums Sold (M)	Annoy vs Enjoy (0–10)	URL
Queen	We Will Rock You	"We will, we will rock you!"	1977	Rock	25	9	https://www.youtube.com/watch?v=-tJYN-eG1zk
ABBA	Dancing Queen	"You can dance, you can jive..."	1976	Disco/Pop	30	8	https://www.youtube.com/watch?v=xFrGuyw1V8s
<more rows...>

This resulted in a text-bar chart that looks like this:

Plot of popular earworms over bars — Earworms as chosen by AI, with the iconic lyric over the chart. Is there a correlation between “you can dance” and its sales, vs “how old are you” and its sales.

Depending on the result, I might ask for a refinement or two. I then download the result and explore it in a browser (the external links don’t work in Claude). For the purpose of sharing these, I cut and paste these into codepen. Use the interactive version of this text-bar to click through to the songs you’re now hoping to hear (or not hear) again.

Example text-scatterplot:

We will create a text-centric data visualization. You will create HTML that I can save and open in a browser. You can use Javascript and D3.js. The dataset is below. 
1. This will be a variant of a bubble plot. 
2. For this plot, the X-axis will be Year and the Y-Axis will be "Annoy vs Enjoy". You should set the axes to range from the minimum to maximum value. 
3. For each mark, you will use text from the concatenation of Artist + "Song Title" instead of circles. If the text is long, wrap the text so that each mark is roughly squarish, and use tight leading. You must use collision detection to adjust mark positions such that no text overlaps other text. Make sure that no text moves so far as to be outside the extents of the chart area. 
COLOR: Color each mark using Genre (if there are two genres, use only the first one). Use a nice visceral color palette and include a color legend. Make sure the colors have sufficient contrast to be legible against the background, set to the very light color #fffbea.  
FONT-SIZE: Set the font-size for each mark using "Albums Sold (M)". The difference between the smallest font size and largest font size should be less than double. 
When I hover over a mark, it should provide a tooltip identifying all the data in the row.
When I click on the mark, it should link to the URL provided in URL column. 
Here's the data:
Artist	Song Title	Iconic Phrase	Year	Genre	Albums Sold (M)	Annoy vs Enjoy (0–10)	URL
Queen	We Will Rock You	"We will, we will rock you!"	1977	Rock	25	9	https://www.youtube.com/watch?v=-tJYN-eG1zk
<more rows>

The first result was pretty good, but had a few small issues. A follow up prompt was required:
1) The marks with yellow text isn’t readable. Darken it a bit.
2)The Y-axis should range from min to max.
3) Part of the legend is cropped, nudge it back to the left a bit.
Which resulted in this scatterplot (interactive version, where I also tweaked some colors and moved the legend):

Earworms over time: there’s always new earworms but the styles are changing. Ketchup song?

The approach should work with other datasets – with the caveat that you’ll run out of tokens if you try too much data with the free version. Here’s a text-scatterplot of Simpson’s catchphrases, which tend to be shorter phrases. The one outlier in season 14 squishes the rest of the scatterplot, a common problem with all scatterplots:

Popular Simpsons’ catchphrases started early and dropped off after season 10.
Is “Everything’s coming up Milhouse!” the last great catchphrase?

Here’s longer text: the first sentence of various best-selling fiction books. It definitely struggles with getting the text to fit in the scatterplot, not overlap, and stay in the scatterplot, even with a couple followup prompts:

First sentence of top non-fiction books. There are as many themes as there are books.

Example text-table

Moving on to tables, most people tend not to think of them as visualizations, but they organize data, use visual attributes, and can be very effective. Here’s the table prompt – note that there are a few data transformations being asked for, which it handles fine (use first genre, convert year to decade, concatenate artist and title):

We will create a text-centric data visualization. You will create HTML that I can save and open in a browser. You can use Javascript and D3.js. The dataset is below. 
1. This will be a table filled with text. 
2. The rows should be Genre (if there are two genres, use only the first one). The columns should be set to decade, which you can derive from Year, please put them in order. If there are more than 10 rows or 10 columns, group the remaining items into a category "Other" as the 11th row (or column) as needed.
3. Each cell will contain text. We will call this cell text a mark. The text will be the concatenation of Artist + "Song Title". If more than one data row fits in a given cell, split that cell with a horizontal line for each extra row. 
COLOR: Color each cell (or subcell) using "Annoy to Enjoy". Use a nice visceral color ramp that goes from red for annoy to blue for enjoy and include a color legend. Make sure the text colors on top has sufficient contrast to be legible against the background. Set empty cells to #fffbea. 
FONT-WEIGHT: Set the font-weight for each mark using "Albums Sold (M)". Use a variable font and the font-weight should range from thin (100) for low values to black (900) for high values. 
When I click on the mark, it should link to the URL provided in URL column. 
Here's the data:

And here’s the result:

Pop has a lot of earworms. The AI seems negative about dance music.

Example Text-beeswarm

Last of all, the beeswarm. I wanted to try out typeface, and genres have relationship to typeface (metal bands like medieval blackletter). I used AI to create a different list with other genres including metal and jazz, for this one.

We will create a text-centric data visualization. You will create HTML that I can save and open in a browser. You can use Javascript and D3.js. The dataset is below. 
1. This will be a variant of a beeswarm plot. 
2. For this plot, the x-axis will be "Annoy to Enjoy". You should set the axes to range from the minimum to the maximum value. 
3. For each mark, you will use text from the concatenation of Artist + "Song Title" instead of dots. If the text is long, wrap the text so that each mark is roughly squarish, and use tight leading. You must use collision detection to adjust mark positions such that no text overlaps other text. Make sure that no text moves so far as to be outside the extents of the chart area. 
FONT-SIZE: Set the font-size for each mark using "Albums Sold (M)". The difference between the smallest font size and largest font size should be less than double. 
COLOR: Color each mark using Year with a visceral color ramp and include a color legend. Make sure the colors have sufficient contrast to be legible against the background, set to the very light color #fffbea.  
TYPEFACE: Set the font-family for each mark using Genre. Try to pick a typeface that resonates with the data value. 
When I hover over a mark, it should provide a tooltip identifying all the data in the row.
When I click on the mark, it should link to the URL provided in URL column. 
Here's the data:

Despite the prompt to pick out fonts that reasonate with data values, Claude picked out bland fonts. Claude also stopped responding to me at the point. It missed a legend for genre, and a bland color ramp. Not being able to iterate another round or two with Claude, I just decided to do it myself. I spent a bunch of time looking for some better fonts on fonts.google.com, squishing in a font legend and putting in a super vibrant color ramp. Here’s the result:

Fonts for genre! Devo vs Metallica? Devo got whipped.

So what?

Creating a lot of visualizations, using AI prompts, opens some interesting questions.

It’s way easier than writing code. A decade ago, I created various visualizations similar to the ones above. It was a lot of work and cognitively much more tasking. Now I can create a bunch of variants without a huge amount of effort. Thanks Claude!
It’s a long prompt. If we expect people to use AI to analyze data and generate visualizations, expect simple prompts to get simple analyses and simple visualizations. If you want more, it’s going to become a long prompt. Of course, we could potentially craft a “Text-vis AI” with a hidden system prompt containing much of the above that would steer the user in creating these visualizations. Will there be a migration on interfaces away from buttons and menus?
Prompt is highly malleable. I was able to merge data, extract the first value, convert years to decades, put the color in the foreground or background, tell it to make extra sub-cells, etc. I’m not constrained by whatever features were predetermined. That said, those make the prompt even longer or require more iterations.
AI doesn’t have a house style. AI is getting really good at creating code that works and functionally does what you want. Style-wise, it has no preference. Legends are all over the place. I edited every title. Sometimes it wants things centred, sometimes left aligned. I ask for visceral color palettes, which are hit and miss. I thought that my prompts refer to use d3.js, and there is a kind-of d3 house style that you see in every visualization that Mike Bostock creates, but, no, this does not automatically make it through.
Text directly in visualizations can be highly engaging! I’ve had songs running through my head (Is it really all about the bass?) and Nelson saying Ha-ha at all my typos. It’s been stupidly fun working with these examples. You don’t get that with standard scatterplots, and maybe only a tiny hint of that with word clouds. Text-versions of bars, scatter, table and beeswarm hopefully have a promising future; although I’m sure there will be people who prefer point-and-click versions.