Jupyter::Chatbook Cheatsheet

Quick reference for the Raku package “Jupyter::Chatbook”. (raku.landGitHub.)


0) Preliminary steps

Follow the instructions in the README of “Jupyter::Chatbook”:

For installation and setup problems see the issues (both open and closed) of package’s GitHub repository.
(For example, this comment.)


1) New LLM persona initialization

A) Create persona with #%chat or %%chat (and immediately send first message)

#%chat assistant1, name=ChatGPT model=gpt-4.1-mini prompt="You are a concise technical assistant."
Say hi and ask what I am working on.
# Hi! What are you working on?

Remark: For all “Jupyter::Chatbook” magic specs both prefixes %% and #% can be used.

Remark: For the prompt argument the following delimiter pairs can be used: '...'"..."«...»{...}⎡...⎦.

B) Create persona with #%chat <id> prompt (create only)

#%chat assistant2 prompt, conf=ChatGPT, model=gpt-4.1-mini
You are a code reviewer focused on correctness and edge cases.
# Chat object created with ID : assistant2.

You can use prompt specs from “LLM::Prompts”, for example:

#%chat yoda prompt
@Yoda
# Chat object created with ID : yoda.
Expanded prompt:
⎡You are Yoda.
Respond to ALL inputs in the voice of Yoda from Star Wars.
Be sure to ALWAYS use his distinctive style and syntax. Vary sentence length.⎦

The Raku package “LLM::Prompts” (GitHub link) provides a collection of prompts and an implementation of a prompt-expansion Domain Specific Language (DSL).


2) Notebook-wide chat with an LLM persona

Continue an existing chat object

Render the answer as Markdown:

#%chat assistant1 > markdown
Give me a 5-step implementation plan for adding authentication to a FastAPI app. VERY CONCISE.

Magic cell parameter values can be assigned using the equal sign (“=”):

#%chat assistant1 > markdown
Now rewrite step 2 with test-first details.

Default chat object (NONE)

#%chat
Does vegetarian sushi exist?
# Yes, vegetarian sushi definitely exists! It's a popular option for those who avoid fish or meat. Instead of raw fish, vegetarian sushi typically includes ingredients like:
- Avocado
- Cucumber
- Carrots
- Pickled radish (takuan)
- Asparagus
- Sweet potato
- Mushrooms (like shiitake)
- Tofu or tamago (Japanese omelette)
- Seaweed salad
These ingredients are rolled in sushi rice and nori seaweed, just like traditional sushi. Vegetarian sushi can be found at many sushi restaurants and sushi bars, and it's also easy to make at home.

Using the prompt-expansion DSL to modify the previous chat-cell result:

#%chat
!HaikuStyled>^
# Rice, seaweed embrace,
Avocado, crisp and bright,
Vegetarian.

3) Management of personas (#%chat <id> meta)

Query one persona

#%chat assistant1 meta
prompt
# "You are a concise technical assistant."
#%chat assistant1 meta
say
# Chat: assistant1
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# Prompts: You are a concise technical assistant.
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role : user
# content : Say hi and ask what I am working on.
# timestamp : 2026-03-14T09:23:01.989418-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role : assistant
# content : Hi! What are you working on?
# timestamp : 2026-03-14T09:23:03.222902-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role : user
# content : Give me a 5-step implementation plan for adding authentication to a FastAPI app. VERY CONCISE.
# timestamp : 2026-03-14T09:23:03.400597-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role : assistant
# content : 1. Install `fastapi` and `python-jose` for JWT handling.
# 2. Define user model and fake user database.
# 3. Create OAuth2 password flow with `OAuth2PasswordBearer`.
# 4. Implement token creation and verification functions.
# 5. Protect routes using dependency injection for authentication.
# timestamp : 2026-03-14T09:23:05.106661-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role : user
# content : Now rewrite step 2 with test-first details.
# timestamp : 2026-03-14T09:23:05.158446-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role : assistant
# content : 2. Write tests to verify user data retrieval and password verification; then define user model and fake user database accordingly.
# timestamp : 2026-03-14T09:23:06.901396-04:00
# Bool::True

Query all personas

#%chat all
keys
# NONE
assistant1
assistant2
ce
gc
html
latex
raku
yoda
#%chat all
gist
# {NONE => LLM::Functions::Chat(chat-id = NONE, llm-evaluator.conf.name = chatgpt, messages.elems = 4, last.message = ${:content("Rice, seaweed embrace, \nAvocado, crisp and bright, \nVegetarian."), :role("assistant"), :timestamp(DateTime.new(2026,3,14,9,23,10.770353078842163,:timezone(-14400)))}), assistant1 => LLM::Functions::Chat(chat-id = assistant1, llm-evaluator.conf.name = ChatGPT, messages.elems = 6, last.message = ${:content("2. Write tests to verify user data retrieval and password verification; then define user model and fake user database accordingly."), :role("assistant"), :timestamp(DateTime.new(2026,3,14,9,23,6.901396036148071,:timezone(-14400)))}), assistant2 => LLM::Functions::Chat(chat-id = assistant2, llm-evaluator.conf.name = chatgpt, messages.elems = 0), ce => LLM::Functions::Chat(chat-id = ce, llm-evaluator.conf.name = chatgpt, messages.elems = 0), gc => LLM::Functions::Chat(chat-id = gc, llm-evaluator.conf.name = chatgpt, messages.elems = 0), html => LLM::Functions::Chat(chat-id = html, llm-evaluator.conf.name = chatgpt, messages.elems = 0), latex => LLM::Functions::Chat(chat-id = latex, llm-evaluator.conf.name = chatgpt, messages.elems = 0), raku => LLM::Functions::Chat(chat-id = raku, llm-evaluator.conf.name = chatgpt, messages.elems = 0), yoda => LLM::Functions::Chat(chat-id = yoda, llm-evaluator.conf.name = chatgpt, messages.elems = 0)}

Delete one persona

#%chat assistant1 meta
delete
# Deleted: assistant1
Gist: LLM::Functions::Chat(chat-id = assistant1, llm-evaluator.conf.name = ChatGPT, messages.elems = 6, last.message = ${:content("2. Write tests to verify user data retrieval and password verification; then define user model and fake user database accordingly."), :role("assistant"), :timestamp(DateTime.new(2026,3,14,9,23,6.901396036148071,:timezone(-14400)))})

Clear message history of one persona (keep persona)

#%chat assistant2 meta
clear
# Cleared messages of: assistant2
Gist: LLM::Functions::Chat(chat-id = assistant2, llm-evaluator.conf.name = chatgpt, messages.elems = 0)

Delete all personas

#%chat all
drop
# Deleted 8 chat objects with names NONE assistant2 ce gc html latex raku yoda.

#%chat <id>|all meta command aliases / synonyms:

  • delete or drop
  • keys or names
  • clear or empty

4) Regular chat cells vs direct LLM-provider cells

Regular chat cells (#%chat)

  • Stateful across cells (conversation memory stored in chat objects).
  • Persona-oriented via identifier + optional prompt.
  • Backend chosen with conf (default: ChatGPT).

Direct provider cells (#%openai%%gemini%%llama%%dalle)

  • Direct single-call access to provider APIs.
  • Useful for explicit provider/model control.
  • Do not use chat-object memory managed by #%chat.

Remark: For all “Jupyter::Chatbook” magic specs both prefixes %% and #% can be used.

Examples

OpenAI’s (ChatGPT) models:

#%openai > markdown, model=gpt-4.1-mini
Write a regex for US ZIP+4.

Google’s (Gemini) models:

#%gemini > markdown, model=gemini-2.5-flash
Explain async/await in Python using three point each with less than 10 words.

Access llamafile, locally run models:

#%llama > markdown
Give me three Linux troubleshooting tips. VERY CONCISE.

Remark: In order to run the magic cell above you have to run a llamafile program/model on your computer. (For example, ./google_gemma-3-12b-it-Q4_K_M.llamafile.)

Access Ollama models:

#%chat ollama > markdown, conf=Ollama
Give me three Linux troubleshooting tips. VERY CONCISE.

Remark: In order to run the magic cell above you have to run an Ollama app on your computer.

Create images using DALL-E:

#%dalle, model=dall-e-3, size=landscape
A dark-mode digital painting of a lighthouse in stormy weather.

5) DALL-E interaction management

For a detailed discussion of the DALL-E interaction in Raku and magic cell parameter descriptions see “Day 21 – Using DALL-E models in Raku”.

Image generation:

#%dalle, model=dall-e-3, size=landscape, style=vivid
A dark-mode digital painting of a lighthouse in stormy weather.

Here we use a DALL-E meta cell to see how many images were generated in a notebook session:

#% dalle meta
elems
# 3

Here we export the second image — using the index 1 — into a file named “stormy-weather-lighthouse-2.png”:

#% dalle export, index=1
stormy-weather-lighthouse-2.png
# stormy-weather-lighthouse-2.png

Here we show all generated images:

#% dalle meta
show

Here we export all images (into file names with the prefix “cheatsheet”):

#% dalle export, index=all, prefix=cheatsheet

6) LLM provider access facilitation

API keys can be passed inline (api-key) or through environment variables.

Notebook-session environment setup

%*ENV<OPENAI_API_KEY> = "YOUR_OPENAI_KEY";
%*ENV<GEMINI_API_KEY> = "YOUR_GEMINI_KEY";
%*ENV<OLLAMA_API_KEY> = "YOUR_OLLAMA_KEY";

Ollama-specific defaults:

  • OLLAMA_HOST (default host fallback is http://localhost:11434)
  • OLLAMA_MODEL (default model if model=... not given)

The magic cells take as argument base-url. This allows to use LLMs that have ChatGPT compatible APIs. The argument base_url is a synonym of host for magic cell #%ollama.


7) Notebook/chatbook session initialization with custom code + personas JSON

Initialization runs when the extension is loaded.

A) Custom Raku init code

  • Env var override: RAKU_CHATBOOK_INIT_FILE
  • If not set, first existing file is used in this order:
  1. ~/.config/raku-chatbook/init.py
  2. ~/.config/init.raku

Use this for imports/helpers you always want in chatbook sessions.

B) Pre-load personas from JSON

  • Env var override: RAKU_CHATBOOK_LLM_PERSONAS_CONF
  • If not set, first existing file is used in this order:
  1. ~/.config/raku-chatbook/llm-personas.json
  2. ~/.config/llm-personas.json

The supported JSON shape is an array of dictionaries:

[
{
"chat-id": "raku",
"conf": "ChatGPT",
"prompt": "@CodeWriterX|Raku",
"model": "gpt-4.1-mini",
"max_tokens": 8192,
"temperature": 0.4
}
]

Recognized persona spec fields include:

  • chat-id
  • prompt
  • conf (or configuration)
  • modelmax-tokenstemperaturebase-url
  • api-key
  • evaluator-args (object)

Verify pre-loaded personas:

#%chat all
keys

Military forces interactions graphs

Introduction

Interesting analogies of Rock-Paper-Scissors (RPS) hand games can be made with military forces interactions; see [AAv1]. Those analogies are easily seen using graphs. For example, the extension of the graph of Rock-Paper-Scissors-Lizard-Spock, [Wv1], into the graph “Chuck Norris defeats all” is analogous to the extension of “older” (say, WWII) military forces interactions graphs with drones.

Here is the graph of Rock-Paper-Scissors-Lizard-Spock-ChuckNorris, [AA1]:

Chuck Norris defeats all

In this document (notebook), we use Raku to create graphs that show how military forces interact. We apply the know-how for making graphs for RPS-games detailed in the blog post “Rock-Paper-Scissors extensions”, [AA1].


Setup

The setup is the same as in [AA1] (notebook).


Convenient LLM function

We can define an LLM function that provides the graph edges dataset for different RPS variants. Here is such an LLM function using “LLM::Functions”, [AAp1], and “LLM::Prompts”, [AAv2]:

my sub rps-edge-dataset($description, Str:D $game-name = 'Rock-Paper-Scissors', *%args) {
    llm-synthesize([
        "Give the edges the graph for this $game-name variant description",
        'Give the edges as an array of dictionaries. Each dictionary with keys "from", "to", "label",',
        'where "label" has the action of "from" over "to".',
        $description,
        llm-prompt('NothingElse')('JSON')
        ], 
        e => %args<llm-evaluator> // %args<e> // %args<conf> // $conf4o-mini,
        form => sub-parser('JSON'):drop
    )
}

Remark: We reuse the sub definition  rps-edge-dataset from [AA1].

Remark:: Both “LLM::Functions” and “LLM::Prompts” are pre-loaded in Raku chatbooks.


Rock-Paper-Scissors and its Lizard-Spock extensions

Here is the graph of the standard RPS game and it “Lizard-Spock” extension:

#% html

# Graph edges: LLM-generated and LLM-translates
my @edges-emo =
    { from => '🪨', to => '✂️',   label => 'crushes' },
    { from => '✂️',  to => '📄',  label => 'cuts' },
    { from => '📄', to => '🪨',  label => 'covers' },
    { from => '🪨', to => '🦎',  label => 'crushes' },
    { from => '🦎', to => '🖖',  label => 'poisons' },
    { from => '🖖', to => '✂️',   label => 'smashes' },
    { from => '✂️',  to => '🦎',  label => 'decapitates' },
    { from => '🦎', to => '📄',  label => 'eats' },
    { from => '📄', to => '🖖',  label => 'disproves' },
    { from => '🖖', to => '🪨',  label => 'vaporizes' }
;

# Edge-label rules
my %edge-labels-emo;
@edges-emo.map({ %edge-labels-emo{$_<from>}{$_<to>} = $_<label> });

# RPS-3 Lizard-Spock extension
my $g-emo = Graph.new(@edges-emo, :directed);

# Standard RPS-3 as a subgraph
my $g-rps = $g-emo.subgraph(<🪨 ✂️ 📄>);

# Plot the graphs together
$g-rps.dot(|%opts, edge-labels => %edge-labels-emo, :svg)
~
$g-emo.dot(|%opts, edge-labels => %edge-labels-emo, :svg)


Simple analogy

We consider the following military analogy with RPS:

  • Tanks attack (and defeat) Infantry
  • Guerillas defend against Tanks
  • Infantry attacks Guerillas

Here we obtain the corresponding graph edges using an LLM:

my $war-game = rps-edge-dataset('tanks attack infantry, guerillas defend against tanks, infantry attacks querillas')

# [{from => Tanks, label => attack, to => Infantry} {from => Guerillas, label => defend, to => Tanks} {from => Infantry, label => attack, to => Guerillas}]

Plotting the graphs together:

#% html
my %edge-labels = Empty; 
for |$war-game -> %r { %edge-labels{%r<from>}{%r<to>} = %r<label> };
Graph.new($war-game, :directed).dot(|%opts-plain, :%edge-labels, :svg)
~
$g-rps.dot(|%opts, edge-labels => %edge-labels-emo, :svg)


Military forces interaction

Here is a Mermaid-JS-made graph of a more complicated military forces interactions diagram; see [NM1]:

Using diagram’s Mermaid code here the graph edges are LLM-generated:

#% html
my $mmd-descr = q:to/END/;
graph TD
AT[Anti-tank weapons] --> |defend|Arm[Armor]
Arm --> |attack|IA[Infantry and Artillery] 
Air[Air force] --> |attack|Arm
Air --> |attack|IA
M[Missiles] --> |defend|Air
IA --> |attack|M
IA --> |attack|AT
END

my $war-game2 = rps-edge-dataset($mmd-descr);

$war-game2 ==> to-html(field-names => <from label to>)

Direct assignment (instead of using LLMs):

my $war-game2 = $[
    {:from("Anti-tank weapons"), :label("defend"), :to("Armor")}, {:from("Armor"), :label("attack"), :to("Infantry and Artillery")}, 
    {:from("Air force"), :label("attack"), :to("Armor")}, {:from("Air force"), :label("attack"), :to("Infantry and Artillery")}, 
    {:from("Missiles"), :label("defend"), :to("Air force")}, {:from("Infantry and Artillery"), :label("attack"), :to("Missiles")}, 
    {:from("Infantry and Artillery"), :label("attack"), :to("Anti-tank weapons")}
];

The diagram does not correspond to modern warfare — it is taken from a doctoral thesis, [NM1], discussing reconstruction of historical military data. The corresponding graph can be upgraded with drones in a similar way as the Chuck-Norris-defeats-all upgrade in [AA1].

my $war-forces = Graph.new($war-game2, :directed); 
my $drone = "Air drones";
my $war-game-d = $war-game2.clone.append( $war-forces.vertex-list.map({ %( from => $drone, to => $_, label => 'attack' ) }) );
$war-game-d .= append( ['Missiles', 'Air force'].map({ %(from => $_, to => $drone, label => 'defend') }) );
my $war-forces-d = Graph.new($war-game-d, :directed);

# Graph(vertexes => 6, edges => 14, directed => True)

Here is the corresponding table:

#% html
game-table($war-forces-d, link-value => '⊙', missing-value => '')

Air dronesAir forceAnti-tank weaponsArmorInfantry and ArtilleryMissiles
Air drones
Air force
Anti-tank weapons
Armor
Infantry and Artillery
Missiles

Here is the graph with different coloring for “attack” edges (gray) and “defend” edges (blue):

#% html
$war-forces-d.vertex-coordinates = ($war-forces-d.vertex-list Z=> Graph::Cycle($war-forces-d.vertex-count).vertex-coordinates{^$war-forces-d.vertex-count}.values).Hash;

my %edge-labels;
$war-game-d.map({ %edge-labels{$_<from>}{$_<to>} = $_<label> });

my %highlight = 
    'SlateBlue' => Graph.new( $war-game-d.grep(*<label> eq 'defend'), :directed).edges;

$war-forces-d.dot(
    :%highlight,
    |merge-hash(%opts-plain, {:9graph-size, node-width => 0.7}),
    :%edge-labels, 
    :svg
)

Remark: The graph above is just an example — real-life military forces interactions are more complicated.


Generalized antagonism

Following the article “The General Lanchester Model Defining Multilateral Conflicts”, [SM1], we can make a graph for multiple, simultaneous conflicts (narrated exposition is given in the presentation “Upgrading Epidemiological Models into War Models”, [AAv1]):

#% html

# Graph edges
my @multi-conflict-edges = 
    %(from=>1, to=>5, label=>'Neutrality',   :!directed), %(from=>1, to=>3, label=>'Commensalism', :directed),
    %(from=>1, to=>4, label=>'Commensalism', :directed),  %(from=>2, to=>1, label=>'Coercion',     :directed),
    %(from=>2, to=>3, label=>'Alliance',     :!directed), %(from=>2, to=>4, label=>'Guerilla war', :directed),
    %(from=>3, to=>4, label=>'Conflict',     :!directed), %(from=>5, to=>3, label=>'Avoidance',    :directed),
    %(from=>5, to=>4, label=>'Alliance',     :!directed), %(from=>5, to=>2, label=>'Adaptation',   :directed);

@multi-conflict-edges .= deepmap({ $_ ~~ Bool:D ?? $_ !! $_.Str });

# Edg-label rules
my %edge-labels;
@multi-conflict-edges.map({ %edge-labels{$_<from>}{$_<to>} = $_<label> });

# Make an empty graph
my $mc = Graph.new;

# Add edge depending of its direction specification
my @dir-edges;
for @multi-conflict-edges -> %e { 
    $mc.edge-add(%e<from>, %e<to>, :directed);
    if !%e<directed> {
        $mc.edge-add(%e<to>, %e<from>, :directed)
    }
}

# Vertex coordinates via Cycle graph
$mc.vertex-coordinates = ($mc.vertex-list Z=> Graph::Cycle($mc.vertex-count).vertex-coordinates{^$mc.vertex-count}.values).Hash;

# Graph plot
$mc.dot(|merge-hash(%opts, {node-shape => 'square', :4edge-font-size }), :%edge-labels, highlight => { RosyBrown => <1 3 4>, SlateBlue => <2 5> }, :mixed, :svg)

Remark: The graph above is just for illustration. In order to do mathematical modeling additional interaction data is required; see [AAv1].


References

Articles, books, these

[AA1] Anton Antonov, “Rock-Paper-Scissors extensions”, (2025), RakuForPrediction at WordPress.

[AJ1] Archer Jones, “The Art of War in Western World”, (2000), University of Illinois Press. 768 pages, ISBN-10: 0252069668, ISBN-13: 978-0252069666.

[SM1] Sergei Makarenko et al., “Обобщенная модель Ланчестера, формализующая конфликт нескольких сторон”, [Eng. “The General Lanchester Model Defining Multilateral Conflicts”], (2021), Automation of Control Processes № 2 (64), doi: 10.35752/1991-2927-2021-2-64-66-76.

[NM1] Николай В. Митюков, “Математические модели и программные средства для реконструкции военно-исторических данных”, (2009), disserCat.

Packages

[AAp1] Anton Antonov, Graph Raku package, (2024-2025), GitHub/antononcube.

[AAp2] Anton Antonov, LLM::Functions Raku package, (2023-2024), GitHub/antononcube.

[AAp3] Anton Antonov, LLM::Prompts Raku package, (2023-2024), GitHub/antononcube.

[AAp4] Anton Antonov, Jupyter::Chatbook Raku package, (2023-2024), GitHub/antononcube.

[EMp1] Elizabeth Mattijsen, Text::Emoji Raku package, (2024-2025), GitHub/lizmat.

Videos

[AAv1] Anton Antonov, “Upgrading Epidemiological Models into War Models”, (2024), YouTube/@WolframResearch.

[Wv1] Wozamil, “Rock Paper Scissors Lizard Spock (Extended Cut) ~ The Big Bang Theory ~”, (2012), YouTube@Wozamil.

Rock-Paper-Scissors extensions

Introduction

It is easy to make a simple Rock-Paper-Scissors (RPS) game graph using the Raku package “Graph”, [AAp1]. Here is such a graph in which the arrow directions indicate which item (vertex) wins:

#%html
my $g0 = Graph.new(<🪨 ✂️ ✂️ 📄 📄 🪨>.Hash):d;
$g0.dot(:3graph-size, engine => 'neato'):svg

Easy, but now we want to:

In this post (notebook) we show how to do all of the above points.

Remark: Interesting analogies of the presented graphs can be made with warfare graphs, [AAv1]. For example, the graph tanks-infantry-guerillas is analogous to RPS.

TL;DR

  • LLMs “know” the RPS game and its upgrades.
  • LLMs know how to (mostly, reliably) translate to emojis.
  • The package “Graph” (via Graphviz DOT) can produce SVG plots that are readily rendered in different environments.
    • And the graphs of hand-games like RPS look good.
  • The class Graph have handy methods and attributes that make the creation and modification of graphs smooth(er).

Setup

This notebook is a Raku-chatbook, hence, its Jupyter session preloads certain packages and LLM-personas.

# Preloaded in any chatbook
# use LLM::Functions;
# use LLM::Prompts;

# Preloaded in a user init file
# use Graph;

# For this concrete session
use Text::Emoji;

LLM configurations:

my $conf4o = llm-configuration('chat-gpt', model => 'gpt-4o', :4096max-tokens, temperature => 0.4);
my $conf4o-mini = llm-configuration('chat-gpt', model => 'gpt-4o-mini', :4096max-tokens, temperature => 0.4);

($conf4o, $conf4o-mini)».Hash».elems

Default options of Graph.dot:

my $background = '#1F1F1F';
my $engine = 'neato';

my %opts =
    :$background,     
    :6graph-size, 
    :1edge-width,
    :3edge-font-size,
    edge-color => 'LightSlateGray',
    node-width => 0.2, node-height => 0.2, 
    node-shape => 'circle', 
    :node-labels, 
    :8node-font-size,
    node-fill-color => '#1F1F1F',
    node-color => 'LightSlateGray',
    node-stroke-width => 0.6,
    arrow-size => 0.25,
    :$engine;

my %opts-plain = merge-hash(%opts, {:5node-font-size, node-shape => 'ellipse', node-width => 0.27, node-height => 0.15});

(%opts, %opts-plain)».elems

Additional

sub game-table(Graph:D $g, Str:D :$link-value = '+', Str:D :$missing-value = '-') {
    cross-tabulate($g.edges(:dataset), <from>, <to>)
    ==> -> %h { %h.map({ $_.key => ($g.vertex-list Z=> $_.value{$g.vertex-list}).Hash }).Hash }()
    ==> to-dataset(:$missing-value)
    ==> -> %h { for $g.vertex-list { %h{$_}{$_} = ''}; %h }()
    ==> -> %h { $g.vertex-list.map({ [|%h{$_}, "" => $_].Hash }) }()
    ==> to-html(field-names => ["", |$g.vertex-list])
    ==> { .Str.subst('1', $link-value, :g).subst('(Any)', $missing-value, :g) }()
}


LLM request

Raku-chatbooks, [AAp4], can have initialization Raku code and specified preloaded LLM-personas. One such LLM-persona is “raku”. Here we use the “raku” chat object to get Raku code for the edges of the RPS extension Rock-Paper-Scissors-Lizard-Spock, [Wv1].

#% chat raku
Make an array the edges of a graph for the game Rock-Paper-Scissors-Lizard-Spock.
Each edges is represented with a hash with the keys "from", "to", "label".
The label corresponds to the action taken with the edge, like, "Paper covers Rock", "Paper disproves Spock".

my @edges = (
    { from => 'Rock',     to => 'Scissors', label => 'Rock crushes Scissors' },
    { from => 'Rock',     to => 'Lizard',   label => 'Rock crushes Lizard' },
    { from => 'Paper',    to => 'Rock',     label => 'Paper covers Rock' },
    { from => 'Paper',    to => 'Spock',    label => 'Paper disproves Spock' },
    { from => 'Scissors', to => 'Paper',    label => 'Scissors cuts Paper' },
    { from => 'Scissors', to => 'Lizard',   label => 'Scissors decapitates Lizard' },
    { from => 'Lizard',   to => 'Spock',    label => 'Lizard poisons Spock' },
    { from => 'Lizard',   to => 'Paper',    label => 'Lizard eats Paper' },
    { from => 'Spock',    to => 'Scissors', label => 'Spock smashes Scissors' },
    { from => 'Spock',    to => 'Rock',     label => 'Spock vaporizes Rock' },
);

We use the generated code in the next section.


Plain text graph

Here we create the Rock-Paper-Scissors-Lizard-Spock graph generated with the LLM-magic cell above:

my @edges =
    { from => 'Rock',     to => 'Scissors',  label => 'Rock crushes Scissors' },
    { from => 'Scissors', to => 'Paper',     label => 'Scissors cuts Paper' },
    { from => 'Paper',    to => 'Rock',      label => 'Paper covers Rock' },
    { from => 'Rock',     to => 'Lizard',    label => 'Rock crushes Lizard' },
    { from => 'Lizard',   to => 'Spock',     label => 'Lizard poisons Spock' },
    { from => 'Spock',    to => 'Scissors',  label => 'Spock smashes Scissors' },
    { from => 'Scissors', to => 'Lizard',    label => 'Scissors decapitates Lizard' },
    { from => 'Lizard',   to => 'Paper',     label => 'Lizard eats Paper' },
    { from => 'Paper',    to => 'Spock',     label => 'Paper disproves Spock' },
    { from => 'Spock',    to => 'Rock',      label => 'Spock vaporizes Rock' }
;

my $g = Graph.new(@edges, :directed);

# Graph(vertexes => 5, edges => 10, directed => True)

Here we make the edge labels:

my %edge-labels;
@edges.map({ %edge-labels{$_<from>}{$_<to>} = $_<label>.words[1] });

deduce-type(%edge-labels)

# Assoc(Atom((Str)), Assoc(Atom((Str)), Atom((Str)), 2), 5)

Here we plot the graph:

#% html
$g.dot(|%opts-plain, :%edge-labels):svg

Remark: Currently the class Graph does not “deal” with edge labels, but some of its methods (like, dot) do.


Convenient LLM functions

Graph edges

Instead of using chat-cells, we can define an LLM function that provides the graph edges dataset for different RPS variants. Here is such an LLM function using “LLM::Functions”, [AAp1], and “LLM::Prompts”, [AAv2]:

my sub rps-edge-dataset($description, Str:D $game-name = 'Rock-Paper-Scissors', *%args) {
    llm-synthesize([
        "Give the edges the graph for this $game-name variant description",
        'Give the edges as an array of dictionaries. Each dictionary with keys "from", "to", "label",',
        'where "label" has the action of "from" over "to".',
        $description,
        llm-prompt('NothingElse')('JSON')
        ], 
        e => %args<llm-evaluator> // %args<e> // %args<conf> // $conf4o-mini,
        form => sub-parser('JSON'):drop
    )
}

Remark:: Both “LLM::Functions” and “LLM::Prompts” are pre-loaded in Raku chatbooks.

Emoji translations

We can translate to emojis the plain-text vertex labels of RPS graphs in several ways:

  1. Manually
  2. Using to-emoji of “Text::Emoji”, [EMp1]
  3. Via LLMs

Here we take option 2:

my %additional = spock => to-emoji(':vulcan-salute:'), paper => to-emoji(":page-with-curl:");
say (:%additional);
@edges.map(*<from>).map({ $_ => to-emoji(":$_:", %additional) })

# additional => {paper => 📃, spock => 🖖}
# (Rock => 🪨 Scissors => ✂️ Paper => 📃 Rock => 🪨 Lizard => 🦎 Spock => 🖖 Scissors => ✂️ Lizard => 🦎 Paper => 📃 Spock => 🖖)

Again, let us define an LLM function for that does emojification. (I.e. for option 3.)

One way is to do a simple application of the prompt “Emojify” and process its result into a dictionary:

my $res = llm-synthesize( llm-prompt("Emojify")($g.vertex-list), e => $conf4o-mini  );
$res.split(/\s+/, :skip-empty)».trim.Hash

It is better to have a function that provides a more “immediate” result:

my sub emoji-rules($words, *%args) {
    llm-synthesize( [
        llm-prompt("Emojify")($words), 
        'Make a JSON dictionary of the original words as keys and the emojis as values', 
        llm-prompt('NothingElse')('JSON') 
        ], 
        e => %args<llm-evaluator> // %args<e> // %args<conf> // $conf4o-mini,
        form => sub-parser('JSON'):drop
    )
}


Emoji graph

Let us remake the game graph using suitable emojis. Here are the corresponding egdes:

my @edges-emo =
    { from => '🪨', to => '✂️',   label => 'crushes' },
    { from => '✂️',  to => '📄',  label => 'cuts' },
    { from => '📄', to => '🪨',  label => 'covers' },
    { from => '🪨', to => '🦎',  label => 'crushes' },
    { from => '🦎', to => '🖖',  label => 'poisons' },
    { from => '🖖', to => '✂️',   label => 'smashes' },
    { from => '✂️',  to => '🦎',  label => 'decapitates' },
    { from => '🦎', to => '📄',  label => 'eats' },
    { from => '📄', to => '🖖',  label => 'disproves' },
    { from => '🖖', to => '🪨',  label => 'vaporizes' }
;

my $g-emo = Graph.new(@edges-emo, :directed);

# Graph(vertexes => 5, edges => 10, directed => True)

Here is a table of the upgraded game that shows the interaction between the different roles (hand plays):

#% html
game-table($g-emo)

✂️📄🖖🦎🪨
✂️++
📄++
🖖++
🦎++
🪨++

Here we make the edge labels:

my %edge-labels-emo;
@edges-emo.map({ %edge-labels-emo{$_<from>}{$_<to>} = $_<label> });

deduce-type(%edge-labels-emo)

# Assoc(Atom((Str)), Assoc(Atom((Str)), Atom((Str)), 2), 5)

Here we plot the graph (using a variety of setup options):

#% html
$g-emo.dot(|%opts, edge-labels => %edge-labels-emo):svg


Chuck Norris defeats them all!

Consider the image (from www.merchandisingplaza.us):

Let us try to remake it with a graph plot. At this point we simply add a “foot connection” to all five vertices in the graph(s) above:

my $chuck = "🦶🏻";
my $g-chuck = $g.clone.edge-add( ($chuck X=> $g.vertex-list).Array, :directed);

# Graph(vertexes => 6, edges => 15, directed => True)

But we also have to rename the vertices to be hand-gestures:

$g-chuck .= vertex-replace( { Scissors => '✌🏻', Rock => '✊🏻', Lizard => '🤏🏻', Spock => '🖖🏻', 'Paper' => '✋🏻' } )

# Graph(vertexes => 6, edges => 15, directed => True)

Here is the interactions table of the upgraded game:

#% html
game-table($g-chuck)

✊🏻✋🏻✌🏻🖖🏻🤏🏻🦶🏻
✊🏻++
✋🏻++
✌🏻++
🖖🏻++
🤏🏻++
🦶🏻+++++

In order to ensure that we get an “expected” graph plot, we can take the vertex coordinates of a wheel graph or compute them by hand. Here we do the latter:

my @vs = <✊🏻 🖖🏻 🤏🏻 ✌🏻 ✋🏻>;
my %vertex-coordinates = @vs.kv.map( -> $i, $v { $v => [cos(π/2 + $i * 2 * π / 5), sin(π/2 + $i * 2 * π / 5)] });
%vertex-coordinates<🦶🏻> = [0, 0];
$g-chuck.vertex-coordinates = %vertex-coordinates;

deduce-type(%vertex-coordinates)

# Struct([✊🏻, ✋🏻, ✌🏻, 🖖🏻, 🤏🏻, 🦶🏻], [Array, Array, Array, Array, Array, Array])

Here we plot the graph:

#% html
$g-chuck.dot(
    background => '#5f5b4f',
    graph-label => 'Chuck Norris Defeats All'.uc,
    font-color => '#b8aa79',
    :6graph-size, 
    :2edge-width,
    :4edge-font-size,
    edge-color => 'AntiqueWhite',
    node-width => 0.56, node-height => 0.56, 
    node-shape => 'circle', 
    :node-labels, 
    :38node-font-size,
    node-fill-color => '#b8aa79',
    node-color => 'Gray',
    node-stroke-width => 0.6,
    arrow-size => 0.26,
    engine => 'neato',
    :svg
)


Using LLMs

Matching the colors

We can use “LLM vision” to get the colors of the original image:

my $url = 'https://www.merchandisingplaza.us/40488/2/T-shirts-Chuck-Norris-Chuck-Norris-Rock-Paper-Scissors-Lizard-Spock-TShirt-l.jpg';
llm-vision-synthesize('What are the dominant colors in this image? Give them in hex code.', $url)

The dominant colors in the image are:

- Olive Green: #5B5D4A
- Beige: #D0C28A
- White: #FFFFFF
- Black: #000000

Graph generating with LLMs

Instead of specifying the graph edges by hand, we can use LLM-vision and suitable prompting. The results are not that good, but YMMV.

my $res2 =
llm-vision-synthesize([
    'Give the edges the graph for this image of Rock-Paper-Scissors-Lizard-Spock-Chuck -- use relevant emojis.',
    'Give the edges as an array of dictionaries. Each dictionary with keys "from" and "to".',
    llm-prompt('NothingElse')('JSON')
    ], 
    $url,
    e => $conf4o,
    form => sub-parser('JSON'):drop
    )

# [{from => ✋, to => ✌️} {from => ✌️, to => ✊} {from => ✊, to => 🦎} {from => 🦎, to => 🖖} {from => 🖖, to => ✋} {from => ✋, to => ✊} {from => ✊, to => ✋} {from => ✌️, to => 🦎} {from => 🦎, to => ✋} {from => 🖖, to => ✌️} {from => ✌️, to => 🖖} {from => 🖖, to => ✊}]

#% html
Graph.new($res2, :directed).dot(:5graph-size, engine => 'neato', arrow-size => 0.5):svg


Rock-Paper-Scissors-Fire-Water

One notable variant is Rock-Paper-Scissors-Fire-Water. Here is its game table:

#% html
my @edges = |('🔥' X=> $g0.vertex-list), |($g0.vertex-list X=> '💦'), '💦' => '🔥';
my $g-fire-water = $g0.clone.edge-add(@edges, :directed);

game-table($g-fire-water)

✂️💦📄🔥🪨
✂️++
💦+
📄++
🔥+++
🪨++

Here is the graph:

#% html
$g-fire-water.dot(|%opts, engine => 'neato'):svg


Complete RPS upgrade via LLMs

Consider the game RPS-9:

my $txt = data-import('https://www.umop.com/rps9.htm', 'plaintext');
text-stats($txt)

# (chars => 2143 words => 355 lines => 46)

Extract the game description:

my ($start, $end) = 'relationships in RPS-9:', 'Each gesture beats out';
my $txt-rps9 = $txt.substr( $txt.index($start) + $start.chars .. $txt.index($end) - 1 ) 

ROCK POUNDS OUT
FIRE, CRUSHES SCISSORS, HUMAN &
SPONGE.
FIRE MELTS SCISSORS, 
BURNS PAPER, HUMAN & SPONGE.
SCISSORS SWISH THROUGH AIR,
CUT PAPER, HUMAN & SPONGE.
HUMAN CLEANS WITH SPONGE,
WRITES PAPER, BREATHES
AIR, DRINKS WATER.
SPONGE SOAKS PAPER, USES
AIR POCKETS, ABSORBS WATER,
CLEANS GUN.
PAPER FANS AIR,
COVERS ROCK, FLOATS ON WATER,
OUTLAWS GUN.
AIR BLOWS OUT FIRE,
ERODES ROCK, EVAPORATES WATER,
TARNISHES GUN.
WATER ERODES ROCK, PUTS OUT
FIRE, RUSTS SCISSORS & GUN.
GUN TARGETS ROCK,
FIRES, OUTCLASSES SCISSORS, SHOOTS HUMAN.

Here we invoke the defined LLM function to get the edges of the corresponding graph:

my @rps-edges = |rps-edge-dataset($txt-rps9)

# [{from => ROCK, label => POUNDS OUT, to => FIRE} {from => ROCK, label => CRUSHES, to => SCISSORS} {from => ROCK, label => CRUSHES, to => HUMAN}, ..., {from => GUN, label => FIRES, to => FIRE}]

Here we translate the plaintext vertices into emojis:

my %emojied = emoji-rules(@rps-edges.map(*<from to>).flat.unique.sort)

{AIR => 🌬️, FIRE => 🔥, GUN => 🔫, HUMAN => 👤, PAPER => 📄, ROCK => 🪨, SCISSORS => ✂️, SPONGE => 🧽, WATER => 💧}

Here is the graph plot:

#% html
my $g-rps9 = Graph.new(@rps-edges, :directed).vertex-replace(%emojied);
$g-rps9.vertex-coordinates = $g-rps9.vertex-list Z=> Graph::Cycle(9).vertex-coordinates.values;

my %edge-labels = Empty;
$res3.map({ %edge-labels{%emojied{$_<from>}}{%emojied{$_<to>}} = "\"$_<label>\"" });

my %opts2 = %opts , %(:14node-font-size, node-shape => 'circle', node-width => 0.3, edge-width => 0.4);
$g-rps9.dot(|%opts2, :!edge-labels, engine => 'neato', :svg)

Here is the game table:

#% html
game-table($g-rps9)

✂️🌬️👤💧📄🔥🔫🧽🪨
✂️+++
🌬️++++
👤++++
💧++++
📄++++
🔥++++
🔫+++
🧽++++
🪨++++

Future plans

In the (very near) future I plan to use the built-up RPS graph making know-how to make military forces interaction graphs. (Discussed in [AJ1, SM1, NM1, AAv1].)


References

Articles, books, theses

[AJ1] Archer Jones, “The Art of War in Western World”, (2000), University of Illinois Press. 768 pages, ISBN-10: 0252069668, ISBN-13: 978-0252069666.

[SM1] Sergei Makarenko et al., “Обобщенная модель Ланчестера, формализующая конфликт нескольких сторон”, [Eng. “The General Lanchester Model Defining Multilateral Conflicts”], (2021), Automation of Control Processes № 2 (64), doi: 10.35752/1991-2927-2021-2-64-66-76.

[NM1] Николай В. Митюков, “Математические модели и программные средства для реконструкции военно-исторических данных”, (2009), disserCat.

Packages

[AAp1] Anton Antonov, Graph Raku package, (2024-2025), GitHub/antononcube.

[AAp2] Anton Antonov, LLM::Functions Raku package, (2023-2024), GitHub/antononcube.

[AAp3] Anton Antonov, LLM::Prompts Raku package, (2023-2024), GitHub/antononcube.

[AAp4] Anton Antonov, Jupyter::Chatbook Raku package, (2023-2024), GitHub/antononcube.

[EMp1] Elizabeth Mattijsen, Text::Emoji Raku package, (2024-2025), GitHub/lizmat.

Videos

[AAv1] Anton Antonov, “Upgrading Epidemiological Models into War Models”, (2024), YouTube/@WolframResearch.

[Wv1] Wozamil, “Rock Paper Scissors Lizard Spock (Extended Cut) ~ The Big Bang Theory ~”, (2012), YouTube@Wozamil.

Doomsday clock parsing and plotting

Introduction

The Doomsday Clock is a symbolic timepiece maintained by the Bulletin of the Atomic Scientists (BAS) since 1947. It represents how close humanity is perceived to be to global catastrophe, primarily nuclear war but also including climate change and biological threats. The clock’s hands are set annually to reflect the current state of global security; midnight signifies theoretical doomsday.

In this post(notebook) we consider two tasks:

  • Parsing of Doomsday Clock reading statements
  • Evolution of Doomsday Clock times
    • We extract relevant Doomsday Clock timeline data from the corresponding Wikipedia page.
      • (Instead of using a page from BAS.)
    • We show how timeline data from that Wikipedia page can be processed with LLMs.
    • The result plot shows the evolution of the minutes to midnight.
      • The plot could show trends, highlighting significant global events that influenced the clock setting.
      • Hence, we put in informative callouts and tooltips.

The data extraction and visualization in the post (notebook) serve educational purposes or provide insights into historical trends of global threats as perceived by experts. We try to make the ingestion and processing code universal and robust, suitable for multiple evaluations now or in the (near) future.

Remark: Keep in mind that the Doomsday Clock is a metaphor and its settings are not just data points but reflections of complex global dynamics (by certain experts and a board of sponsors.)

Remark: Currently (2024-12-30) Doomsday Clock is set at 90 seconds before midnight.

Remark: This post (notebook) is the Raku-version of the Wolfram Language (WL) notebook with the same name, [AAn1]. That is why the “standard” Raku-grammar approach is not used. (Although, in the preliminary versions of this work relevant Raku grammars were generated via both LLMs and Raku packages.)

I was very impressed by the looks and tune-ability of WL’s ClockGauge, so, I programmed a similar clock gauge in Raku’s package “JavaScript::D3” (which is based on D3.js.)


Setup

use LLM::Functions;
use LLM::Prompts;
use LLM::Configurations;
use Text::SubParsers;

use Data::Translators;
use Data::TypeSystem;
use Data::Importers;
use Data::Reshapers;
use Hash::Merge;

use FunctionalParsers :ALL;
use FunctionalParsers::EBNF;

use Math::DistanceFunctions::Edit;

use Lingua::NumericWordForms;

JavaScript::D3

my $background = 'none';
my $stroke-color = 'Ivory';
my $fill-color = 'none';

JavaScript::Google::Charts

my $format = 'html';
my $titleTextStyle = { color => 'Ivory' };
my $backgroundColor = '#1F1F1F';
my $legendTextStyle = { color => 'Silver' };
my $legend = { position => "none", textStyle => {fontSize => 14, color => 'Silver'} };

my $hAxis = { title => 'x', titleTextStyle => { color => 'Silver' }, textStyle => { color => 'Gray'}, logScale => False, format => 'scientific'};
my $vAxis = { title => 'y', titleTextStyle => { color => 'Silver' }, textStyle => { color => 'Gray'}, logScale => False, format => 'scientific'};

my $annotations = {textStyle => {color => 'Silver', fontSize => 10}};
my $chartArea = {left => 50, right => 50, top => 50, bottom => 50, width => '90%', height => '90%'};

my $background = '1F1F1F';

Functional parsers

my sub parsing-test-table(&parser, @phrases) {
    my @field-names = ['statement', 'parser output'];
    my @res = @phrases.map({ @field-names Z=> [$_, &parser($_.words).raku] })».Hash.Array;
    to-html(@res, :@field-names)
}

Data ingestion

Here we ingest the Doomsday Clock timeline page and show corresponding statistics:

my $url = "https://thebulletin.org/doomsday-clock/past-announcements/";
my $txtEN = data-import($url, "plaintext");

text-stats($txtEN)

# (chars => 73722 words => 11573 lines => 756)

By observing the (plain) text of that page we see the Doomsday Clock time setting can be extracted from the sentence(s) that begin with the following phrase:

my $start-phrase = 'Bulletin of the Atomic Scientists';
my $sentence = $txtEN.lines.first({ / ^ $start-phrase /})

# Bulletin of the Atomic Scientists, with a clock reading 90 seconds to midnight


Grammar and parsers

Here is a grammar in Extended Backus-Naur Form (EBNF) for parsing Doomsday Clock statements:

my $ebnf = q:to/END/;
<TOP> = <clock-reading>  ;
<clock-reading> = <opening> , ( <minutes> | [ <minutes> , [ 'and' | ',' ] ] , <seconds> ) , 'to' , 'midnight' ;
<opening> = [ { <any> } ] , 'clock' , [ 'is' ] , 'reading' ; 
<any> = '_String' ;
<minutes> = <integer> <& ( 'minute' | 'minutes' ) ;
<seconds> = <integer> <& ( 'second' | 'seconds' ) ;
<integer> = '_Integer' <@ &{ $_.Int } ;
END

text-stats($ebnf)

# (chars => 364 words => 76 lines => 6)

Remark: The EBNF grammar above can be obtained with LLMs using a suitable prompt with example sentences. (We do not discuss that approach further in this notebook.)

Here the parsing functions are generated from the EBNF string above:

my @defs = fp-ebnf-parse($ebnf, <CODE>, name => 'Doomed2', actions => 'Raku::Code').head.tail;
.say for @defs.reverse

# my &pINTEGER = apply(&{ $_.Int }, symbol('_Integer'));
# my &pSECONDS = sequence-pick-left(&pINTEGER, (alternatives(symbol('second'), symbol('seconds'))));
# my &pMINUTES = sequence-pick-left(&pINTEGER, (alternatives(symbol('minute'), symbol('minutes'))));
# my &pANY = symbol('_String');
# my &pOPENING = sequence(option(many(&pANY)), sequence(symbol('clock'), sequence(option(symbol('is')), symbol('reading'))));
# my &pCLOCK-READING = sequence(&pOPENING, sequence((alternatives(&pMINUTES, sequence(option(sequence(&pMINUTES, option(alternatives(symbol('and'), symbol(','))))), &pSECONDS))), sequence(symbol('to'), symbol('midnight'))));
# my &pTOP = &pCLOCK-READING;

Remark: The function fb-ebnf-parse has a variety of actions for generating code from EBNF strings. For example, with actions => 'Raku::Class' the generation above would produce a class, which might be more convenient to do further development with (via inheritance or direct changes.)

Here the imperative code above — assigned to @defs — is re-written using the infix form of the parser combinators:

my &pINTEGER = satisfy({ $_ ~~ /\d+/ }) «o {.Int};
my &pMINUTES = &pINTEGER «& (symbol('minute') «|» symbol('minutes')) «o { [minute => $_,] };
my &pSECONDS = &pINTEGER «& (symbol('second') «|» symbol('seconds')) «o { [second => $_,] };
my &pANY = satisfy({ $_ ~~ /\w+/ });
my &pOPENING = option(many(&pANY)) «&» symbol('clock') «&» option(symbol('is')) «&» symbol('reading');
my &pCLOCK-READING = &pOPENING «&» (&pMINUTES «|» option(&pMINUTES «&» option(symbol('and') «|» symbol(','))) «&» &pSECONDS) «&» symbol('to') «&» symbol('midnight');
my &pTOP = &pCLOCK-READING;

We must redefine the parser pANY (corresponding to the EBNF rule “<any>”) in order to prevent pANY of gobbling the word “clock” and in that way making the parser pOPENING fail.

&pANY = satisfy({ $_ ne 'clock' && $_ ~~ /\w+/});

Here are random sentences generated with the grammar:

.say for fp-random-sentence($ebnf, 12).sort;

# clock  reading 681 minutes to midnight
#  clock  reading 788 minutes to midnight
#  clock is reading  584 seconds to midnight
#  clock is reading  721 second to midnight
#  clock is reading 229 minute and 631 second to midnight
#  clock is reading 458 minutes to midnight
#  clock is reading 727 minute to midnight
# F3V; clock is reading 431 minute to midnight
# FXK<GQ 3RJJJ clock is reading  369 seconds to midnight
# NRP FNSEE K0EQO OPE clock is reading 101 minute to midnight
# QJDV; R<K7S; JMQ>HD AA31 clock is reading 369 minute  871 second to midnight
# QKQGK FZJ@BB M8C1BD BPI;C: clock  reading 45 minute  925 second to midnight

Verifications of the (sub-)parsers:

"90 seconds".words.&pSECONDS

# ((() [second => 90]))

"That doomsday clock is reading".words.&pOPENING

# ((() (((((That doomsday)) clock) (is)) reading)))

Here the “top” parser is applied:

my $str = "the doomsday clock is reading 90 seconds to midnight";
$str.words.&pTOP

# ((() ((((((((the doomsday)) clock) (is)) reading) (() [second => 90])) to) midnight)))

Here the sentence extracted above is parsed and interpreted into an association with keys “Minutes” and “Seconds”:

$sentence.words.&pTOP.tail.flat.grep(* ~~ Pair)

# (second => 90)

Let us redefine pCLOCK-READING to return a minutes-&-seconds dictionary, pTOP to return a corresponding date-time:

&pCLOCK-READING = &pCLOCK-READING «o { $_.flat.grep(* ~~ Pair).Hash };

&pTOP = &pCLOCK-READING «o { 
    Date.today.DateTime.earlier(seconds => ($_<minute> // 0) * 60 + ($_<second>// 0) ) 
};

Here we assign and show the results of those two parsers:

my $doom-reading = $sentence.words.&pCLOCK-READING.head.tail;
my $doom-time = $sentence.words.&pTOP.head.tail;

.say for (:$doom-reading, :$doom-time)

# doom-reading => {second => 90}
# doom-time => 2024-12-31T23:58:30Z


Plotting the clock

Using the interpretation derived above plot the corresponding clock with js-dr-clock-gauge:

#% js
js-d3-clock-gauge($doom-time)

Let us define a map with clock-gauge plot options.

my @scale-ranges = (0, 0.01 ... 0.66).rotor(2=>-1).map({ ([0, 60], $_) });
my @scale-ranges2 = (0, 0.01 ... 0.82).rotor(2=>-1).map({ ([0, 60], $_) });
my %opts = 
    background => 'none',
    stroke-color => 'Black', stroke-width => 0,
    title-color => 'Ivory', title-font-family => 'Helvetica',
    hour-hand-color => 'Orange', second-hand-color => 'Red',
    color-scheme => 'Magma',
    fill-color => 'AntiqueWhite',
    :@scale-ranges,
    color-scheme-interpolation-range => (0.11, 0.95),
    margins => {top => 60, left => 20, right => 20, bottom => 60},
    height => 420,
    gauge-labels => {Doomsday => [0.5, 0.35], 'clock' => [0.5 ,0.28]}, 
    gauge-labels-color => 'DarkSlateGray',
    gauge-labels-font-family => 'Krungthep',
    ;

  %opts.elems

# 16

Here are different “doomsday clock” examples:

#% js
[
   {color-scheme => 'Plasma', fill-color => 'MistyRose', gauge-labels-color => 'Orchid'},
   {color-scheme => 'Spectral', fill-color => '#4e65ac', stroke-color => 'DarkRed', stroke-width => 10, gauge-labels => %()},
   {color-scheme => 'Cividis', fill-color => 'DarkSlateGray', gauge-labels => {Doomsday => [0.5, 0.6], 'clock' => [0.5 ,0.36]}, scale-ranges => @scale-ranges2},
].map({ js-d3-clock-gauge(:23hour, :58minute, :30second, |merge-hash(%opts.clone, $_, :!deep)) }).join("\n")


More robust parsing

More robust parsing of Doomsday Clock statements can be obtained in these three ways:

  • “Fuzzy” match of words
    • For misspellings like “doomsdat” instead of “doomsday.”
  • Parsing of numeric word forms.
    • For statements, like, “two minutes and twenty five seconds.”
  • Delegating the parsing to LLMs when grammar parsing fails.

Fuzzy matching

The parser satisfy can be used to handle misspellings (via, say, edit-distance from “Math::DistanceFunctions”):

#% html
my &pDD = satisfy({ edit-distance($_, "doomsday") ≤ 2 }) «o {"doomsday"};
my @phrases = "doomsdat", "doomsday", "dumzday";

parsing-test-table(&pDD, @phrases)

statementparser output
doomsdat(((), “doomsday”),).Seq
doomsday(((), “doomsday”),).Seq
dumzday().Seq

But since “FunctionalParsers” provides the generic parser fuzzy-symbol (that takes a word and a distance as arguments) we use that parser below.

#% html
my &pDD2 = fuzzy-symbol("doomsday", 2);
my @phrases = "doomsdat", "doomsday", "dumzday";

parsing-test-table(&pDD2, @phrases)

statementparser output
doomsdat(((), “doomsday”),)
doomsday(((), “doomsday”),)
dumzday()

In order to include the misspelling handling into the grammar we manually rewrite the grammar. (The grammar is small, so, it is not that hard to do.)

my &pINTEGER = satisfy({ $_ ~~ /\d+/ }) «o {.Int};
my &pMINUTES = &pINTEGER «& (fuzzy-symbol('minute', 2) «|» fuzzy-symbol('minutes', 2)) «o { [minute => $_,] };
my &pSECONDS = &pINTEGER «& (fuzzy-symbol('second', 2) «|» fuzzy-symbol('seconds', 2)) «o { [second => $_,] };
my &pANY = satisfy({ edit-distance($_, 'clock') > 2 && $_ ~~ /\w+/ });
my &pOPENING = option(many(&pANY)) «&» fuzzy-symbol('clock', 1) «&» option(symbol('is')) «&» fuzzy-symbol('reading', 2);
my &pCLOCK-READING = &pOPENING «&» (&pMINUTES «|» option(&pMINUTES «&» option(symbol('and') «|» symbol(','))) «&» &pSECONDS) «&» symbol('to') «&» fuzzy-symbol('midnight', 2);

&pCLOCK-READING = &pCLOCK-READING «o { $_.flat.grep(* ~~ Pair).Hash };

&pTOP = &pCLOCK-READING «o { 
    Date.today.DateTime.earlier(seconds => ($_<minute> // 0) * 60 + ($_<second>// 0) ) 
};

Here is a verification table with correct- and incorrect spellings:

#% html
my @phrases =
    "doomsday clock is reading 2 seconds to midnight", 
    "dooms day cloc is readding 2 minute and 22 sekonds to mildnight";

parsing-test-table(shortest(&pCLOCK-READING), @phrases)

statementparser output
doomsday clock is reading 2 seconds to midnight(((), {:second(2)}),)
dooms day cloc is readding 2 minute and 22 sekonds to mildnight(((), {:minute(2), :second(22)}),)

Parsing of numeric word forms

One way to make the parsing more robust is to implement the ability to parse integer names (or numeric word forms) not just integers.

Remark: For a fuller discussion — and code — of numeric word forms parsing see the tech note “Integer names parsing” of the paclet “FunctionalParsers”, [AAp1].

First, we make an association that connects integer names with corresponding integer values:

my %worded-values = (^100).map({ to-numeric-word-form($_) => $_ });
%worded-values.elems

# 100

Remark: The function to-numeric-word-form is provided by “Lingua::NumericWordForms”, [AAp3].

Here is how the rules look like:

%worded-values.pick(6)

# (ninety four => 94 forty three => 43 ninety eight => 98 seventy three => 73 ninety two => 92 eleven => 11)

Here we program the integer names parser:

my &pUpTo10 = alternatives( |(^10)».&to-numeric-word-form.map({ symbol($_.trim) }) );
my &p10s = alternatives( |(10, 20 ... 90)».&to-numeric-word-form.map({ symbol($_.trim) }) );
my &pWordedInteger = (&p10s «&» &pUpTo10 «|» &p10s «|» &pUpTo10) «o { %worded-values{$_.flat.join(' ')} };

Here is a verification table of that parser:

#% html
my @phrases = "three", "fifty seven", "thirti one";
parsing-test-table(&pWordedInteger, @phrases)

statementparser output
three(((), 3),).Seq
fifty seven(((), 57), ((“seven”,), 50)).Seq
thirti one().Seq

There are two parsing results for “fifty seven”, because &pWordedInteger is defined with:

&p10s «|» &pUpTo10 «|» p10s ...  

This can be remedied by using just or shortest:

#% html
parsing-test-table( just(&pWordedInteger), @phrases)

statementparser output
three(“((), 3),).Seq
fifty seven(“((), 57),).Seq
thirti one().Seq

Let us change &pINTEGER to parse both integers and integer names:

#% html
&pINTEGER = &satisfy({ $_ ~~ /\d+/ }) «o {.Int} «|» &pWordedInteger;

my @phrases = "12", "3", "three", "forty five";
parsing-test-table( just(&pINTEGER), @phrases)

statementparser output
12($((), 12),).Seq
3($((), 3),).Seq
three($((), 3),).Seq
forty five($((), 45),).Seq

Remark: &pINTEGER has to be evaluated before the definitions of the rest of the parsers programmed in the previous subsection.

Let us try the new parser using integer names for the clock time:

my $str = "the doomsday clock is reading two minutes and forty five seconds to midnight";

$str.words
==> take-first(&pCLOCK-READING)()

# ((() {minute => 2, second => 45}))

Enhance with LLM parsing

There are multiple ways to employ LLMs for extracting “clock readings” from arbitrary statements for Doomsday Clock readings, readouts, and measures. Here we use LLM few-shot training:

my &flop = llm-example-function([
    "the doomsday clock is reading two minutes and forty five seconds to midnight" => '{"minute":2, "second": 45}', 
    "the clock of the doomsday gives 92 seconds to midnight" => '{"minute":0, "second": 92}', 
    "The bulletin atomic scientist maybe is set to a minute an 3 seconds." => '{"minute":1, "second": 3}'
   ], 
   e => $conf4o,
   form => sub-parser('JSON')
)

Here is an example invocation:

&flop("Maybe the doomsday watch is at 23:58:03")

# {minute => 1, second => 57}

The following function combines the parsing with the grammar and the LLM example function — the latter is used for fallback parsing:

my sub get-clock-reading(Str:D $st) {
    my $op = just(&pCLOCK-READING)($st.words); 
    my %h = $op.elems > 0 && $op.head.head.elems == 0 ?? $op.head.tail !! &flop( $st );
    return Date.today.DateTime.earlier(seconds => (%h<minute> // 0) * 60 + (%h<second> // 0) ) 
}

# &get-clock-reading

Robust parser demo

Here is the application of the combine function above over a certain “random” Doomsday Clock statement:

my $s = "You know, sort of, that dooms-day watch is 1 and half minute be... before the big boom. (Of doom...)";

$s.&get-clock-reading

# 2024-12-31T23:58:30Z

Remark: The same type of robust grammar-and-LLM combination is explained in more detail in the video “Robust LLM pipelines (Mathematica, Python, Raku)”, [AAv1]. (See, also, the corresponding notebook [AAn1].)


Timeline

In this section we extract Doomsday Clock timeline data and make a corresponding plot.

Parsing page data

Instead of using the official Doomsday clock timeline page we use Wikipedia.
We can extract the Doomsday Clock timeline using LLMs. Here we get the plaintext of the Wikipedia page and show statistics:

my $url = "https://en.wikipedia.org/wiki/Doomsday_Clock";
my $txtWk = data-import($url, "plaintext");

text-stats($txtWk)

# (chars => 42728 words => 6231 lines => 853)

Here we get the Doomsday Clock timeline table from that page in JSON format using an LLM (or ingest a previous extraction saved as a CSV file):

my $res; 
if False {
  $res = llm-synthesize([
    "Give the time table of the doomsday clock as a time series that is a JSON array.", 
    "Each element of the array is a dictionary with keys 'Year', 'MinutesToMidnight', 'Time', 'Summary', 'Description'.",
    "Do not shorten or summarize the descriptions -- use their full texts.",
    "The column 'Summary' should have summaries of the descriptions, each summary no more than 10 words.",
    $txtWk, 
    llm-prompt("NothingElse")("JSON")
   ], 
   e => $conf4o,
   form => sub-parser('JSON'):drop
  );
} else {
  my @field-names = <Year MinutesToMidnight Time Summary Description>;
  my $url = 'https://raw.githubusercontent.com/antononcube/RakuForPrediction-blog/refs/heads/main/Data/doomsday-clock-timeline-table.csv';
  $res = data-import($url, headers => 'auto');
  $res = $res.map({ my %h = $_.clone; %h<Year> = %h<Year>.Int; %h<MinutesToMidnight> = %h<MinutesToMidnight>.Num; %h }).Array
}

deduce-type($res)

# Vector(Struct([Description, MinutesToMidnight, Summary, Time, Year], [Str, Num, Str, Str, Int]), 26)

Here the LLM result is tabulated:

#% html
my @field-names = <Year MinutesToMidnight Time Summary Description>;
$res ==> to-html(:@field-names, align => 'left')

Remark: The LLM derived summaries in the table above are based on the descriptions in the column “Reason” in the Wikipedia data table.
The tooltips of the plot below use the summaries.

Timeline plot

In order to have informative Doomsday Clock evolution plot we obtain and partition dataset’s time series into step-function pairs:

my @dsDoomsdayTimes = |$res;
my @ts0 = @dsDoomsdayTimes.map({ <Year MinutesToMidnight role:tooltip> Z=> $_<Year MinutesToMidnight Summary> })».Hash;

my @ts1 = @dsDoomsdayTimes.rotor(2=>-1).map({[ 
    %( <Year MinutesToMidnight mark role:tooltip> Z=> $_.head<Year MinutesToMidnight MinutesToMidnight Summary>),
    %( <Year MinutesToMidnight mark role:tooltip> Z=> [$_.tail<Year>, $_.head<MinutesToMidnight>, NaN, '']) 
]}).map(*.Slip);

@ts1 = @ts1.push( merge-hash(@ts0.tail, {mark => @ts0.tail<MinutesToMidnight>}) );

deduce-type(@ts1):tally

# Vector(Struct([MinutesToMidnight, Year, mark, role:tooltip], [Num, Int, Num, Str]), 51)

Here are added callout annotations indicating the year and the minutes before midnight:

my @ts2 = @ts1.map({ 
    my %h = $_.clone;
    my $s = ($_<MinutesToMidnight> * 60) mod 60;
    $s = $s > 0 ?? " {$s}s" !! '';
    if %h<mark> === NaN { 
        %h<role:annotation> = '';
    } else { 
        %h<role:annotation> = "{%h<Year>}: {floor($_<MinutesToMidnight>)}m" ~ $s; 
    }
    %h
});

deduce-type(@ts2):tally

# Vector(Struct([MinutesToMidnight, Year, mark, role:annotation, role:tooltip], [Num, Int, Num, Str, Str]), 51)

Finally, here is the plot:

#% html
js-google-charts('ComboChart',
    @ts2,
    column-names => <Year MinutesToMidnight mark role:annotation role:tooltip>,
    width => 1200,
    height => 500,
    title => "Doomsday clock: minutes to midnight, {@dsDoomsdayTimes.map(*<Year>).Array.&{ (.min, .max).join('-') }}",
    series => {
        0 => {type => 'line', lineWidth => 4, color => 'DarkOrange'},
        1 => {type => 'scatter', pointSize => 10, opacity => 0.1, color => 'Blue'},
    },
    hAxis => { title => 'Year',  format => '####', titleTextStyle => { color => 'Silver' }, textStyle => { color => 'Gray'},     
                viewWindow => { min => 1945, max => 2026}
            },
    vAxes => { 
        0 => { title => 'Minutes to Midnight', titleTextStyle => { color => 'Silver' }, textStyle => { color => 'Gray'} }, 
        1 => { titleTextStyle => { color => 'Silver' }, textStyle => { color => 'Gray'}, ticks => (^18).map({ [ v => $_, f => ($_ ?? "23::{60-$_}" !! '00:00' ) ] }).Array } 
    },
    :$annotations,
    :$titleTextStyle,
    :$backgroundColor,
    :$legend,
    :$chartArea,
    :$format,
    div-id => 'DoomsdayClock',
    :!png-button
)

Remark: The plot should be piecewise constant — simple linear interpolation between the blue points would suggest gradual change of the clock times.

Remark: By hovering with the mouse over the blue points the corresponding descriptions can be seen. We considered using clock-gauges as tooltips, but showing clock-settings reasons is more informative.

Remark: The plot was intentionally made to resemble the timeline plot in Doomsday Clock’s Wikipedia page.

Remark: The plot has deficiencies:

  • Tooltips with arbitrary width
    • This can be remedied with the (complicated) HTML tooltip procedure described in [AA1].
      • But I decided to just make the LLM data extraction to produce short summaries of the descriptions.
  • No right vertical axis ticks
    • The Doomsday Clock timeline plot in Wikipedia and its reproduction in [AAn1] have the “HH::MM” time axis.

I gave up smoothing out those deficiencies after attempting to fix or address each of them a few times. (It is not that important to figure out Google Charts interface settings for that kind of plots.)

Conclusion

As expected, parsing, plotting, or otherwise processing the Doomsday Clock settings and statements are excellent didactic subjects for textual analysis (or parsing) and temporal data visualization. The visualization could serve educational purposes or provide insights into historical trends of global threats as perceived by experts. (Remember, the clock’s settings are not just data points but reflections of complex global dynamics.)

One possible application of the code in this notebook is to make a “web service“ that gives clock images with Doomsday Clock readings. For example, click on this button:

References

Articles, notebooks

[AA1] Anton Antonov, “Geographic Data in Raku Demo”, (2024), RakuForPrediction at WordPress.

[AAn1] Anton Antonov, “Doomsday clock parsing and plotting”, (2024), Wolfram Community.

[AAn2] Anton Antonov, “Making robust LLM computational pipelines from software engineering perspective”, (2024), Wolfram Community.

Paclets

[AAp1] Anton Antonov, FunctionalParsers Raku package, (2023-2024), GitHub/antononcube.

[AAp2] Anton Antonov, “FunctionalParsers”, (2023), Wolfram Language Paclet Repository.

[AAp3] Anton Antonov, Lingua::NumericWordForms Raku package, (2021-2024), GitHub/antononcube.

Videos

[AAv1] Anton Antonov, “Robust LLM pipelines (Mathematica, Python, Raku)”, (2024), YouTube/@AAA4prediction.

Chatbook New Magic Cells

Introduction

In this blog post (notebook), we showcase the recently added “magic” cells (in May 2024) to the notebooks of “Jupyter::Chatbook”, [AA1, AAp5, AAv1].

“Jupyter::Chatbook” gives “LLM-ready” notebooks and it is built on “Jupyter::Kernel”, [BDp1], created by Brian Duggan. “Jupyter::Chatbook” has the general principle that Raku packages used for implementing interactive service access cells are also pre-loaded into the notebooks Raku contexts. (I.e. at the beginning of notebooks’ Raku sessions.)

Here is a mind-map that shows the Raku packages that are “pre-loaded” and the available interactive cells:

#% mermaid, format=svg, background=SlateGray
mindmap
(**Chatbook**)
(Direct **LLM** access)
OpenAI
ChatGPT
DALL-E
Google
PaLM
Gemini
MistralAI
LLaMA
(Direct **DeepL** access)
Plain text result
JSON result
(**Notebook-wide chats**)
Chat objects
Named
Anonymous
Chat meta cells
Prompt DSL expansion
(Direct **MermaidInk** access)
SVG result
PNG result
(Direct **Wolfram|Alpha** access)
wa1["Plain text result"]
wa2["Image result"]
wa3["Pods result"]
(**Pre-loaded packages**)
LLM::Functions
LLM::Prompts
Text::SubParsers
Data::Translators
Data::TypeSystem
Clipboard
Text::Plot
Image::Markup::Utilities
WWW::LLaMA
WWW::MermaidInk
WWW::OpenAI
WWW::PaLM
WWW::Gemini
WWW::WolframAlpha
Lingua::Translation::DeepL

Remark: Recent improvement is Mermaid-JS cells to have argument for output format and background. Since two months aga (beginning of March, 2024) by default the output format is SVG. In that way diagrams are obtained 2-3 times faster. Before March 9, 2023, “PNG” was the default format (and the only one available.)

The structure of the rest of the notebook:

  • DeepL
    Translation from multiple languages into multiple other languages
  • Google’s Gemini
    Replaces both PaLM and Bard
  • Wolfram|Alpha
    Computational search engine

DeepL

In this section we show magic cells for direct access of the translation service DeepL. The API key can be set as a magic cell argument; without such key setting the env variable DEEPL_AUTH_KEY is used. See “Lingua::Translation::DeepL”, [AAp1], for more details.

#% deepl, to-lang=German, formality=less, format=text
I told you to get the frames from the other warehouse!
# Ich habe dir gesagt, du sollst die Rahmen aus dem anderen Lager holen!

#% deepl, to-lang=Russian, formality=more, format=text
I told you to get the frames from the other warehouse!
# Я же просил Вас взять рамки с другого склада!

DeepL’s source languages:

#% html
deepl-source-languages().pairs>>.Str.sort.List

==> to-html(:multicolumn, columns => 4)
bulgarian BGfinnish FIjapanese JAslovak SK
chinese ZHfrench FRlatvian LVslovenian SL
czech CSgerman DElithuanian LTspanish ES
danish DAgreek ELpolish PLswedish SV
dutch NLhungarian HUportuguese PTturkish TR
english ENindonesian IDromanian ROukrainian UK
estonian ETitalian ITrussian RU(Any)

DeepL’s target languages:

#% html
deepl-target-languages().pairs>>.Str.sort.List

==> to-html(:multicolumn, columns => 4)
bulgarian BGestonian ETjapanese JArussian RU
chinese simplified ZHfinnish FIlatvian LVslovak SK
czech CSfrench FRlithuanian LTslovenian SL
danish DAgerman DEpolish PLspanish ES
dutch NLgreek ELportuguese PTswedish SV
english ENhungarian HUportuguese brazilian PT-BRturkish TR
english american EN-USindonesian IDportuguese non-brazilian PT-PTukrainian UK
english british EN-GBitalian ITromanian RO(Any)

Google’s Gemini

In this section we show magic cells for direct access of the LLM service Gemini by Google. The API key can be set as a magic cell argument; without such key setting the env variable GEMINI_API_KEY is used. See “WWW::Gemini”, [AAp2], for more details.

Using the default model

#% gemini
Which LLM you are and what is your model?
I am Gemini, a multi-modal AI language model developed by Google.
#% gemini
Up to which date you have been trained?
I have been trained on a massive dataset of text and code up until April 2023. However, I do not have real-time access to the internet, so I cannot access information beyond that date. If you have any questions about events or information after April 2023, I recommend checking a reliable, up-to-date source.

Using a specific model

In this subsection we repeat the questions above, and redirect the output to formatted as Markdown.

#% gemini > markdown, model=gemini-1.5-pro-latest
Which LLM are you? What is the name of the model you use?
I'm currently running on the Gemini Pro model.

I can't share private information that could identify me specifically,&nbsp;but I can tell you that I am a large language model created by Google AI.
#% gemini > markdown, model=gemini-1.5-pro-latest
Up to which date you have been trained?
I can access pretty up-to-date information,&nbsp;which means I don't really have a&nbsp;"knowledge cut-off"&nbsp;date like some older models.

However, it’s important to remember:

  • I am not constantly updating. My knowledge is based on a snapshot of the internet taken at a certain point in time.
  • I don’t have access to real-time information. I can’t tell you what happened this morning, or what the stock market is doing right now.
  • The world is constantly changing. Even if I had information up to a very recent date, things would still be outdated quickly!

If you need very specific and current information, it’s always best to consult reliable and up-to-date sources.


Wolfram|Alpha

In this section we show magic cells for direct access to Wolfram|Alpha (W|A) by Wolfram Research, Inc. The API key can be set as a magic cell argument; without such key setting the env variable WOLFRAM_ALPHA_API_KEY is used. See “WWW::WolframAlpha”, [AAp3], for more details.

W|A provides different API endpoints. Currently, “WWW::WolframAlpha” gives access to three of them: simpleresult, and query. In a W|A magic the endpoint can be specified with the argument “type” or its synonym “path”.

Simple (image output)

When using the W|A’s API /simple endpoint we get images as results.

#% wolfram-alpha
Calories in 5 servings of potato salad.

Here is how the image above can be generated and saved in a regular code cell:

my $imgWA = wolfram-alpha('Calories in 5 servings of potato salad.', path => 'simple', format => 'md-image');
image-export('WA-calories.png', $imgWA)
WA-calories.png

Result (plaintext output)

#% w|a, type=result
Biggest province in China
 the biggest administrative division in  China by area is Xinjiang, China. The area of Xinjiang, China is about 629869 square miles

Pods (Markdown output)

#% wa, path=query
GDP of China vs USA in 2023

Input interpretation

scanner: Data

China United States | GDP | nominal 2023

Results

scanner: Data

China | $17.96 trillion per year United States | $25.46 trillion per year (2022 estimates)

Relative values

scanner: Data

| visual | ratios | | comparisons United States | | 1.417 | 1 | 41.75% larger China | | 1 | 0.7055 | 29.45% smaller

GDP history

scanner: Data

Economic properties

scanner: Data

| China | United States GDP at exchange rate | $17.96 trillion per year (world rank: 2nd) | $25.46 trillion per year (world rank: 1st) GDP at parity | $30.33 trillion per year (world rank: 1st) | $25.46 trillion per year (world rank: 2nd) real GDP | $16.33 trillion per year (price-adjusted to year-2000 US dollars) (world rank: 2nd) | $20.95 trillion per year (price-adjusted to year-2000 US dollars) (world rank: 1st) GDP in local currency | ¥121 trillion per year | $25.46 trillion per year GDP per capita | $12720 per year per person (world rank: 93rd) | $76399 per year per person (world rank: 12th) GDP real growth | +2.991% per year (world rank: 131st) | +2.062% per year (world rank: 158th) consumer price inflation | +1.97% per year (world rank: 175th) | +8% per year (world rank: 91st) unemployment rate | 4.89% (world rank: 123rd highest) | 3.61% (world rank: 157th highest) (2022 estimate)

GDP components

scanner: Data

| China | United States final consumption expenditure | $9.609 trillion per year (53.49%) (world rank: 2nd) (2021) | $17.54 trillion per year (68.88%) (world rank: 1st) (2019) gross capital formation | $7.688 trillion per year (42.8%) (world rank: 1st) (2021) | $4.504 trillion per year (17.69%) (world rank: 2nd) (2019) external balance on goods and services | $576.7 billion per year (3.21%) (world rank: 1st) (2022) | -$610.5 billion per year (-2.4%) (world rank: 206th) (2019) GDP | $17.96 trillion per year (100%) (world rank: 2nd) (2022) | $25.46 trillion per year (100%) (world rank: 1st) (2022)

Value added by sector

scanner: Data

| China | United States agriculture | $1.311 trillion per year (world rank: 1st) (2022) | $223.7 billion per year (world rank: 3rd) (2021) industry | $7.172 trillion per year (world rank: 1st) (2022) | $4.17 trillion per year (world rank: 2nd) (2021) manufacturing | $4.976 trillion per year (world rank: 1st) (2022) | $2.497 trillion per year (world rank: 2nd) (2021) services, etc. | $5.783 trillion per year (world rank: 2nd) (2016) | $13.78 trillion per year (world rank: 1st) (2015)

Download and export pods images

W|A’s query-pods contain URLs to images (which expire within a day.) We might want to download and save those images. Here is a way to do it:

# Pods as JSON text -- easier to extract links from
my $pods = wolfram-alpha-query('GDP of China vs USA in 2023', format => 'json');

# Extract URLs
my @urls = do with $pods.match(/ '"src":' \h* '"' (<-["]>+) '"'/, :g) {
$/.map({ $_[0].Str })
};

# Download images as Markdown images (that can be shown in Jupyter notebooks or Markdown files)
my @imgs = @urls.map({ image-import($_, format => 'md-image') });

# Export images
for ^@imgs.elems -> $i { image-export("wa-$i.png", @imgs[$i] ) }

References

Articles

[AA1] Anton Antonov, “Jupyter::Chatbook”, (2023), RakuForPrediction at WordPress.

Packages

[AAp1] Anton Antonov, Lingua::Translation::DeepL Raku package, (2024), GitHub/antononcube.

[AAp2] Anton Antonov, WWW::Gemini Raku package, (2024), GitHub/antononcube.

[AAp3] Anton Antonov, WWW::WolframAlpha Raku package, (2024), GitHub/antononcube.

[AAp4] Anton Antonov, WWW::OpenAI Raku package, (2024), GitHub/antononcube.

[AAp5] Anton Antonov, Jupyter::Chatbook Raku package, (2024), GitHub/antononcube.

[BDp1] Brian Duggan, Jupyter::Kernel Raku package, (2017), GitHub/bduggan.

Videos

[AAv1] Anton Antonov, “Integrating Large Language Models with Raku”, (2023), YouTube/@therakuconference6823.

WWW::WolframAlpha

Introduction

This blog post proclaims the Raku package “WWW::WolframAlpha” that provides access to the answer engine Wolfram|Alpha, [WA1, Wk1]. For more details of the Wolfram|Alpha’s API usage see the documentation, [WA2].

Remark: To use the Wolfram|Alpha API one has to register and obtain an authorization key.


Installation

Package installations from both sources use zef installer (which should be bundled with the “standard” Rakudo installation file.)

To install the package from Zef ecosystem use the shell command:

zef install WWW::WolframAlpha

To install the package from the GitHub repository use the shell command:

zef install https://github.com/antononcube/Raku-WWW-WolframAlpha.git

Usage examples

Remark: When the authorization key, auth-key, is specified to be Whatever then the functions wolfam-alpha* attempt to use the env variable WOLFRAM_ALPHA_API_KEY.

The package has an universal “front-end” function wolfram-alpha for the different endpoints provided by Wolfram|Alpha Web API.

(Plaintext) results

Here is a result call:

use WWW::WolframAlpha;
wolfram-alpha-result('How many calories in 4 servings of potato salad?');

# about 720 dietary Calories

Simple (image) results

Here is a simple call (produces an image):

wolfram-alpha-simple('What is popularity of the name Larry?', format => 'md-image');

Remark: Pretty good conjectures of Larry Wall’s birthday year or age can be made using the obtained graphs.

Full queries

For the so called full queries Wolfram|Alpha returns complicated data of pods in either XML or JSON format; see “Explanation of Pods”.

Here we get the result of a full query and show its (complicated) data type (using “Data::TypeSystem”):

use Data::TypeSystem;

my $podRes = wolfram-alpha-query('convert 44 lbs to kilograms', output => 'json', format => 'hash');

deduce-type($podRes)
# Assoc(Atom((Str)), Assoc(Vector(Atom((Str)), 18), Tuple([Atom((Int)) => 4, Atom((Rat)) => 2, Atom((Str)) => 10, Struct([count, template, type, values, word], [Int, Str, Str, Array, Str]) => 1, Tuple([Struct([error, expressiontypes, id, numsubpods, position, scanner, subpods, title], [Bool, Hash, Str, Int, Int, Str, Array, Str]), Struct([error, expressiontypes, id, numsubpods, position, primary, scanner, subpods, title], [Bool, Hash, Str, Int, Int, Bool, Str, Array, Str]), Struct([error, expressiontypes, id, numsubpods, position, scanner, states, subpods, title], [Bool, Array, Str, Int, Int, Str, Array, Array, Str]), Struct([error, expressiontypes, id, numsubpods, position, scanner, subpods, title], [Bool, Hash, Str, Int, Int, Str, Array, Str]), Struct([error, expressiontypes, id, numsubpods, position, scanner, states, subpods, title], [Bool, Hash, Str, Int, Int, Str, Array, Array, Str]), Struct([error, expressiontypes, id, numsubpods, position, scanner, subpods, title], [Bool, Array, Str, Int, Int, Str, Array, Str])]) => 1], 18), 18), 1)

Here we convert the query result into Markdown (data-translation can be also used):

wolfram-alpha-pods-to-markdown($podRes, header-level => 4):plaintext;

Input interpretation

scanner: Identity

convert 44 lb (pounds) to kilograms

Result

scanner: Identity

19.96 kg (kilograms)

Additional conversions

scanner: Unit

3 stone 2 pounds

19958 grams

Comparison as mass

scanner: Unit

≈ 1.6 × mass of a Good Delivery gold bar ( 400 oz t )

Interpretations

scanner: Unit

mass

Corresponding quantities

scanner: Unit

Relativistic energy E from E = mc^2: | 1.794×10^18 J (joules) | 1.12×10^37 eV (electronvolts)

Weight w of a body from w = mg: | 44 lbf (pounds-force) | 1.4 slugf (slugs-force) | 196 N (newtons) | 1.957×10^7 dynes | 19958 ponds

Volume V of water from V = m/ρ_(H_2O): | 5.3 gallons | 42 pints | 20 L (liters) | 19958 cm^3 (cubic centimeters) | (assuming conventional water density ≈ 1000 kg/m^3)


Command Line Interface

Playground access

The package provides a Command Line Interface (CLI) script:

wolfram-alpha --help

# Usage:
#   wolfram-alpha [<words> ...] [--path=<Str>] [--output-format=<Str>] [-a|--auth-key=<Str>] [--timeout[=UInt]] [-f|--format=<Str>] [--method=<Str>] -- Command given as a sequence of words.
#   
#     --path=<Str>             Path, one of 'result', 'simple', or 'query'. [default: 'result']
#     --output-format=<Str>    The format in which the response is returned. [default: 'Whatever']
#     -a|--auth-key=<Str>      Authorization key (to use WolframAlpha API.) [default: 'Whatever']
#     --timeout[=UInt]         Timeout. [default: 10]
#     -f|--format=<Str>        Format of the result; one of "json", "hash", "values", or "Whatever". [default: 'Whatever']
#     --method=<Str>           Method for the HTTP POST query; one of "tiny" or "curl". [default: 'tiny']

Remark: When the authorization key argument “auth-key” is specified set to “Whatever” then wolfram-alpha attempts to use the env variable WOLFRAM_ALPHA_API_KEY.


Mermaid diagram

The following flowchart corresponds to the steps in the package function wolfram-alpha-query:


References

[AAp1] Anton Antonov, Data::TypeSystem Raku package, (2023), GitHub/antononcube.

[WA1] Wolfram Alpha LLC, Wolfram|Alpha.

[WA2] Wolfram Alpha LLC, Web API documentation.

[Wk1] Wikipedia entry, WolframAlpha.

ML::NLPTemplateEngine

This blog posts proclaims and describes the Raku package “ML::NLPTemplateEnine” that aims to create (nearly) executable code for various computational workflows

Package’s data and implementation make a Natural Language Processing (NLP) Template Engine (TE), [Wk1], that incorporates Question Answering Systems (QAS’), [Wk2], and Machine Learning (ML) classifiers.

The current version of the NLP-TE of the package heavily relies on Large Language Models (LLMs) for its QAS component.

Future plans involve incorporating other types of QAS implementations.

The Raku package implementation closely follows the Wolfram Language (WL) implementations in “NLP Template Engine”, [AAr1, AAv1], and the WL paclet “NLPTemplateEngine”, [AAp2, AAv2].

An alternative, more comprehensive approach to building workflows code is given in [AAp2].

Problem formulation

We want to have a system (i.e. TE) that:

  1. Generates relevant, correct, executable programming code based on natural language specifications of computational workflows
  2. Can automatically recognize the workflow types
  3. Can generate code for different programming languages and related software packages

The points above are given in order of importance; the most important are placed first.

Reliability of results

One of the main reasons to re-implement the WL NLP-TE, [AAr1, AAp1], into Raku is to have a more robust way of utilizing LLMs to generate code. That goal is more or less achieved with this package, but YMMV — if incomplete or wrong results are obtained run the NLP-TE with different LLM parameter settings or different LLMs.


Installation

From Zef ecosystem:

zef install ML::NLPTemplateEngine;

From GitHub:

zef install https://github.com/antononcube/Raku-ML-NLPTemplateEngine.git

Usage examples

Quantile Regression (WL)

Here the template is automatically determined:

use ML::NLPTemplateEngine;

my $qrCommand = q:to/END/;
Compute quantile regression with probabilities 0.4 and 0.6, with interpolation order 2, for the dataset dfTempBoston.
END

concretize($qrCommand);
# qrObj=
# QRMonUnit[dfTempBoston]⟹
# QRMonEchoDataSummary[]⟹
# QRMonQuantileRegression[12, {0.4, 0.6}, InterpolationOrder->2]⟹
# QRMonPlot["DateListPlot"->False,PlotTheme->"Detailed"]⟹
# QRMonErrorPlots["RelativeErrors"->False,"DateListPlot"->False,PlotTheme->"Detailed"];

Remark: In the code above the template type, “QuantileRegression”, was determined using an LLM-based classifier.

Latent Semantic Analysis (R)

my $lsaCommand = q:to/END/;
Extract 20 topics from the text corpus aAbstracts using the method NNMF. 
Show statistical thesaurus with the words neural, function, and notebook.
END

concretize($lsaCommand, template => 'LatentSemanticAnalysis', lang => 'R');
# lsaObj <-
# LSAMonUnit(aAbstracts) %>%
# LSAMonMakeDocumentTermMatrix(stemWordsQ = TRUE, stopWords = Automatic) %>%
# LSAMonEchoDocumentTermMatrixStatistics(logBase = 10) %>%
# LSAMonApplyTermWeightFunctions(globalWeightFunction = "IDF", localWeightFunction = "None", normalizerFunction = "Cosine") %>%
# LSAMonExtractTopics(numberOfTopics = 20, method = "NNMF", maxSteps = 16, minNumberOfDocumentsPerTerm = 20) %>%
# LSAMonEchoTopicsTable(numberOfTerms = 20, wideFormQ = TRUE) %>%
# LSAMonEchoStatisticalThesaurus(words = c("neural", "function", "notebook"))

Random tabular data generation (Raku)

my $command = q:to/END/;
Make random table with 6 rows and 4 columns with the names <A1 B2 C3 D4>.
END

concretize($command, template => 'RandomTabularDataset', lang => 'Raku', llm => 'gemini');
# random-tabular-dataset(6, 4, "column-names-generator" => <A1 B2 C3 D4>, "form" => "Table", "max-number-of-values" => 24, "min-number-of-values" => 6, "row-names" => False)

Remark: In the code above it was specified to use Google’s Gemini LLM service.


How it works?

The following flowchart describes how the NLP Template Engine involves a series of steps for processing a computation specification and executing code to obtain results:

Here’s a detailed narration of the process:

  1. Computation Specification:
    • The process begins with a “Computation spec”, which is the initial input defining the requirements or parameters for the computation task.
  2. Workflow Type Decision:
    • A decision step asks if the workflow type is specified.
  3. Guess Workflow Type:
    • If the workflow type is not specified, the system utilizes a classifier to guess relevant workflow type.
  4. Raw Answers:
    • Regardless of how the workflow type is determined (directly specified or guessed), the system retrieves “raw answers”, crucial for further processing.
  5. Processing and Templating:
    • The raw answers undergo processing (“Process raw answers”) to organize or refine the data into a usable format.
    • Processed data is then utilized to “Complete computation template”, preparing for executable operations.
  6. Executable Code and Results:
    • The computation template is transformed into “Executable code”, which when run, produces the final “Computation results”.
  7. LLM-Based Functionalities:
    • The classifier and the answers finder are LLM-based.
  8. Data and Templates:
    • Code templates are selected based on the specifics of the initial spec and the processed data.

Bring your own templates

0. Load the NLP-Template-Engine package (and others):

use ML::NLPTemplateEngine;
use Data::Importers;
use Data::Summarizers;

1. Get the “training” templates data (from CSV file you have created or changed) for a new workflow (“SendMail”):

my $url = 'https://raw.githubusercontent.com/antononcube/NLP-Template-Engine/main/TemplateData/dsQASParameters-SendMail.csv';
my @dsSendMail = data-import($url, headers => 'auto');

records-summary(@dsSendMail, field-names => <DataType WorkflowType Group Key Value>);
# +-----------------+----------------+-----------------------------+----------------------------+----------------------------------------------------------------------------------+
# | DataType        | WorkflowType   | Group                       | Key                        | Value                                                                            |
# +-----------------+----------------+-----------------------------+----------------------------+----------------------------------------------------------------------------------+
# | Questions => 48 | SendMail => 60 | All                   => 9  | ContextWordsToRemove => 12 | 0.35                                                                       => 9  |
# | Defaults  => 7  |                | Who the email is from => 4  | Threshold            => 12 | {_String..}                                                                => 8  |
# | Templates => 3  |                | What it the content   => 4  | TypePattern          => 12 | to                                                                         => 4  |
# | Shortcuts => 2  |                | What it the body      => 4  | Parameter            => 12 | _String                                                                    => 4  |
# |                 |                | What it the title     => 4  | Template             => 3  | {"to", "email", "mail", "send", "it", "recipient", "addressee", "address"} => 4  |
# |                 |                | What subject          => 4  | body                 => 1  | None                                                                       => 4  |
# |                 |                | Who to send it to     => 4  | Emailing             => 1  | body                                                                       => 3  |
# |                 |                | (Other)               => 27 | (Other)              => 7  | (Other)                                                                    => 24 |
# +-----------------+----------------+-----------------------------+----------------------------+----------------------------------------------------------------------------------+

2. Add the ingested data for the new workflow (from the CSV file) into the NLP-Template-Engine:

add-template-data(@dsSendMail);
# (ParameterTypePatterns Shortcuts Questions Templates Defaults ParameterQuestions)

3. Parse natural language specification with the newly ingested and onboarded workflow (“SendMail”):

"Send email to [email protected] with content RandomReal[343], and the subject this is a random real call."
        ==> concretize(template => "SendMail") 
# SendMail[<|"To"->{"[email protected]"},"Subject"->"this is a random real call","Body"->{"RandomReal[343]"},"AttachedFiles"->None|>]

4. Experiment with running the generated code!


References

Articles

[Wk1] Wikipedia entry, Template processor.

[Wk2] Wikipedia entry, Question answering.

Functions, packages, repositories

[AAr1] Anton Antonov, “NLP Template Engine”, (2021-2022), GitHub/antononcube.

[AAp1] Anton Antonov, NLPTemplateEngine WL paclet, (2023), Wolfram Language Paclet Repository.

[AAp2] Anton Antonov, DSL::Translators Raku package, (2020-2024), GitHub/antononcube.

[WRI1] Wolfram Research, FindTextualAnswer, (2018), Wolfram Language function, (updated 2020).

Videos

[AAv1] Anton Antonov, “NLP Template Engine, Part 1”, (2021), YouTube/@AAA4Prediction.

[AAv2] Anton Antonov, “Natural Language Processing Template Engine” presentation given at WTC-2022, (2023), YouTube/@Wolfram.

Notebook transformations

Introduction

In this blog post we describe a series of different (computational) notebook transformations using different tools. We are using a series of recent articles and notebooks for processing the English and Russian texts of a recent 2-hour long interview. The workflows given in the notebooks are in Raku and Wolfram Language (WL).

Remark: Wolfram Language (WL) and Mathematica are used as synonyms in this document.

Remark: Using notebooks with Large Language Model (LLM) workflows is convenient because the WL LLM functions are also implemented in Python and Raku, [AA1, AAp1, AAp2].

We can say that this blog post attempts to advertise the Raku package “Markdown::Grammar”, [AAp3], demonstrated in the videos:

TL;DR: Using Markdown as an intermediate format we can transform easily enough between Jupyter- and Mathematica notebooks.


Transformation trip

The transformation trip starts with the notebook of the article  “LLM aids for processing of the first Carlson-Putin interview”, [AA1]. 

  1. Make the Raku Jupyter notebook
  2. Convert the Jupyter notebook into Markdown
    • Using Jupyter’s built-in converter
  3. Publish the Markdown version to WordPress, [AA2]
  4. Convert the Markdown file into a Mathematica notebook
  5. Publish that to Wolfram Community
    • That notebook was deleted by moderators, because it does not feature Wolfram Language (WL)
  6. Make the corresponding Mathematica notebook using WL LLM functions
  7. Publish to Wolfram Community
  8. Make the Russian version with the Russian transcript
  9. Publish to Wolfram Community
    • That notebook was deleted by the moderators, because it is not in English
  10. Convert the Mathematica notebook to Markdown
    • Using Kuba Podkalicki’s M2MD, [KPp1]
  11. Publish to WordPress, [AA3]
  12. Convert the Markdown file to Jupyter
  13. Re-make the (Russian described) workflows using Raku, [AAn5]
  14. Re-make workflows using Python, [AAn6], [AAn7]

Here is the corresponding Mermaid-JS diagram (using the package “WWW::MermaidInk”, [AAp6]):

use WWW::MermaidInk;

my $diagram = q:to/END/;
graph TD
A[Make the Raku Jupyter notebook] --> B[Convert the Jupyter notebook into Markdown]
B --> C[Publish to WordPress]
C --> D[Convert the Markdown file into a Mathematica notebook]
D --> E[Publish that to Wolfram Community]
E --> F[Make the corresponding Mathematica notebook using WL functions]
F --> G[Publish to Wolfram Community]
G --> H[Make the Russian version with the Russian transcript]
H --> I[Publish to Wolfram Community]
I --> J[Convert the Mathematica notebook to Markdown]
J --> K[Publish to WordPress]
K --> L[Convert the Markdown file to Jupyter]
L --> M[Re-make the workflows using Raku]
M --> N[Re-make the workflows using Python]
N -.-> Nen([English])
N -.-> Nru([Russian])
C -.-> WordPress{{Word Press}}
K -.-> WordPress
E -.-> |Deleted:<br>features Raku| WolframCom{{Wolfram Community}}
G -.-> WolframCom
I -.-> |"Deleted:<br>not in English"|WolframCom
D -.-> MG[[Markdown::Grammar]]
B -.-> Ju{{Jupyter}}
L -.-> jupytext[[jupytext]]
J -.-> M2MD[[M2MD]]
E -.-> RakuMode[[RakuMode]]
END

say mermaid-ink($diagram, format => 'md-image');

Clarifications

Russian versions

The first Carlson-Putin interview that is processed in the notebooks was held both in English and Russian. I think just doing the English study is “half-baked.” Hence, I did the workflows with the Russian text and translated to Russian the related explanations.

Remark: The Russian versions are done in all three programming languages: Python, Raku, Wolfram Language. See [AAn4, AAn5, AAn7].

Using different programming languages

From my point of view, having Raku-enabled Mathematica / WL notebook is a strong statement about WL. Fair amount of coding was required for the paclet “RakuMode”, [AAp4].

To have that functionality implemented is preconditioned on WL having extensive external evaluation functionalities.

When we compare WL, Python, and R over Machine Learning (ML) projects, WL always appears to be the best choice for ML. (Overall.)

I do use these sets of comparison posts at Wolfram Community to support my arguments in discussions regarding which programming language is better. (Or bigger.)

Example comparison: WL workflows

The following three Wolfram Community posts are more or less the same content — “Workflows with LLM functions” — but in different programming languages:

Example comparison: LSA over mandala collections

The following Wolfram Community posts are more or less the same content — “LSA methods comparison over random mandalas deconstruction”, [AAv1] — but in different programming languages:

Remark: The movie, [AAv1], linked in those notebooks also shows a comparison with the LSA workflow in R.

Using Raku with LLMs

I generally do not like using Jupyter notebooks, but using Raku with LLMs is very convenient [AAv2, AAv3, AAv4]. WL is clunkier when it comes to pre- or post-processing of LLM results.

Also, the Raku Chatbooks, [AAp5], provided better environment for display of the often Markdown formatted results of LLMs. (Like the ones in notebooks discussed here.)


References

Articles

[AA1] Anton Antonov, “Workflows with LLM functions”, (2023), RakuForPrediction at WordPress.

[AA2] Anton Antonov, “LLM aids for processing of the first Carlson-Putin interview”, (2024), RakuForPrediction at WordPress.

[AA3] Anton Antonov, “LLM помогает в обработке первого интервью Карлсона-Путина”, (2024), MathematicaForPrediction at WordPress.

[AA4] Anton Antonov, “Markdown to Mathematica converter”, (2022). Wolfram Community.

Notebooks

[AAn1] Anton Antonov, “LLM aids for processing of the first Carlson-Putin interview”, (Raku/Jupyter), (2024), RakuForPrediction-book at GitHub/antononcube.

[AAn2] Anton Antonov, “LLM aids for processing of the first Carlson-Putin interview”, (Raku/Mathematica), (2024), WolframCloud/antononcube.

[AAn3] Anton Antonov, “LLM aids for processing of the first Carlson-Putin interview”, (WL/Mathematica), (2024), WolframCloud/antononcube.

[AAn4] Anton Antonov, “LLM aids for processing of the first Carlson-Putin interview”, (in Russian), (WL/Mathematica), (2024), WolframCloud/antononcube.

[AAn5] Anton Antonov, “LLM aids for processing of the first Carlson-Putin interview”, (in Russian), (Raku/Jupyter), (2024), RakuForPrediction-book at GitHub/antononcube.

[AAn6] Anton Antonov, “LLM aids for processing of the first Carlson-Putin interview”, (Python/Jupyter), (2024), PythonForPrediction-blog at GitHub/antononcube.

[AAn7] Anton Antonov, “LLM aids for processing of the first Carlson-Putin interview”, (in Russian), (Python/Jupyter), (2024), PythonForPrediction-blog at GitHub/antononcube.

Packages, paclets

[AAp1] Anton Antonov, LLM::Functions Raku package, (2023-2024), GitHub/antononcube.

[AAp2] Anton Antonov, LLM::Prompts Raku package, (2023), GitHub/antononcube.

[AAp3] Anton Antonov, Markdown::Grammar Raku package, (2022-2023), GitHub/antononcube.

[AAp4] Anton Antonov, RakuMode WL paclet, (2022-2023), Wolfram Language Paclet Repository.

[AAp5] Anton Antonov, Jupyter::Chatbook Raku package, (2023-2024), GitHub/antononcube.

[AAp6] Anton Antonov, WWW::MermaidInk Raku package, (2023), GitHub/antononcube.

[KPp1] Kuba Podkalicki’s, M2MD WL paclet, (2018-2023), GitHub/kubaPod.

Videos

[AAv1] Anton Antonov “Random Mandalas Deconstruction in R, Python, and Mathematica (Greater Boston useR Meetup, Feb 2022)” (2022), YouTube/@AAA4Prediction.

[AAv2] Anton Antonov, “Jupyter Chatbook LLM cells demo (Raku)” (2023), YouTube/@AAA4Prediction.

[AAv3] Anton Antonov, “Jupyter Chatbook multi cell LLM chats teaser (Raku)”, (2023), YouTube/@AAA4Prediction.

[AAv4] Anton Antonov “Integrating Large Language Models with Raku”, (2023), YouTube/@therakuconference6823.

[AAv5] Anton Antonov, “Markdown to Mathematica converter (CLI and StackExchange examples)”, (2022), Anton A. Antonov’s channel at YouTube.

[AAv6] Anton Antonov, “Markdown to Mathematica converter (Jupyter notebook example)”, (2022), Anton A. Antonov’s channel at YouTube.

TLDR LLM solutions for software manuals

… aka “How to use software manuals effectively without reading them”

Introduction

In this blog post (generated from this  Jupyter notebook) we use Large Language Model (LLM) functions, [AAp1, AA1], for generating (hopefully) executable, correct, and harmless code for Operating System resources managements.

In order to be concrete and useful, we take the Markdown files of the articles “It’s time to rak!”, [EM1], that explain the motivation and usage of the Raku module “App::Rak”, [EMp1], and we show how meaningful, file finding shell commands can be generated via LLMs exposed to the code-with-comments from those articles.

In other words, we prefer to apply the attitude Too Long; Didn’t Read (TLDR) to the articles and related Raku module README (or user guide) file. (Because “App::Rak” is useful, but it has too many parameters that we prefer not to learn that much about.)

Remark: We say that “App::Rak” uses a Domain Specific Language (DSL), which is done with Raku’s Command Line Interface (CLI) features.

Procedure outline

  1. Clone the corresponding article repository
  2. Locate and ingest the “App::Rak” dedicated Markdown files
  3. Extract code blocks from the Markdown files
  4. Get comment-and-code line pairs from the code blocks
    • Using Raku text manipulation capabilities
      • (After observing code examples)
  5. Generate from the comment-and-code pairs LLM few-shot training rules
  6. Use the LLM example function to translate natural language commands into (valid and relevant) “App::Rak” DSL commands
    • With a few or a dozen natural language commands
  7. Use LLMs to generate natural language commands in order to test LLM-TLDR-er further

Step 6 says how we do our TLDR — we use LLM-translations of natural language commands.

Alternative procedure

Instead of using Raku to process text we can make LLM functions for extracting the comment-and-code pairs. (That is also shown below.)

Extensions

  1. Using LLMs to generate:
    • Stress tests for “App::Rak”
    • Variants of the gathered commands
      • And make new training rules with them
    • EBNF grammars for gathered commands
  2. Compare OpenAI and PaLM and or their different models
    • Which one produces best results?
    • Which ones produce better result for which subsets of commands?

Article’s structure

The exposition below follows the outlines of procedure subsections above.

The stress-testing extensions and EBNF generation extension have thier own sections: “Translating randomly generated commands” and “Grammar generation” respectively.

Remark: The article/document/notebook was made with the Jupyter framework, using the Raku package “Jupyter::Kernel”, [BD1].


Setup

use Markdown::Grammar;
use Data::Reshapers;
use Data::Summarizers;
use LLM::Functions;
use Text::SubParsers;


Workflow

File names

my $dirName = $*HOME ~ '/GitHub/lizmat/articles';
my @fileNames = dir($dirName).grep(*.Str.contains('time-to-rak'));
@fileNames.elems

4

Texts ingestion

Here we ingest the text of each file:

my %texts = @fileNames.map({ $_.basename => slurp($_) });
%texts.elems

4

Here are the number of characters per document:

%texts>>.chars

{its-time-to-rak-1.md => 7437, its-time-to-rak-2.md => 8725, its-time-to-rak-3.md => 14181, its-time-to-rak-4.md => 9290}

Here are the number of words per document:

%texts>>.words>>.elems

{its-time-to-rak-1.md => 1205, its-time-to-rak-2.md => 1477, its-time-to-rak-3.md => 2312, its-time-to-rak-4.md => 1553}

Get Markdown code blocks

With the function md-section-tree we extract code blocks from Markdown documentation files into data structures amenable for further programmatic manipulation (in Raku.) Here we get code blocks from each text:

my %docTrees = %texts.map({ $_.key => md-section-tree($_.value, modifier => 'Code', max-level => 0) });
%docTrees>>.elems

{its-time-to-rak-1.md => 1, its-time-to-rak-2.md => 11, its-time-to-rak-3.md => 24, its-time-to-rak-4.md => 16}

Here we put all blocks into one array:

my @blocks = %docTrees.values.Array.&flatten;
@blocks.elems

52

Extract command-and-code line pairs

Here from each code block we parse-extract comment-and-code pairs and we form the LLM training rules:

my @rules;
@blocks.map({ 
    given $_ { 
        for m:g/ '#' $<comment>=(\V+) \n '$' $<code>=(\V+) \n / -> $m {
           @rules.push( ($m<comment>.Str.trim => $m<code>.Str.trim) ) 
         } } }).elems

52

Here is the number of rules:

@rules.elems

69

Here is a sample of the rules:

.say for @rules.pick(4)

save --after-context as -A, requiring a value => rak --after-context=! --save=A
Show all directory names from current directory down => rak --find --/file
Reverse the order of the characters of each line => rak '*.flip' twenty
Show number of files / lines authored by Scooby Doo => rak --blame-per-line '*.author eq "Scooby Doo"' --count-only

Nice tabulation with LLM function

In order to tabulate “nicely” the rules in the Jupyter notebook, we make an LLM functions to produce an HTML table and then specify the corresponding “magic cell.” (This relies on the Jupyter-magics features of [BDp1].) Here is an LLM conversion function, [AA1]:

my &ftbl = llm-function({"Convert the $^a table $^b into an HTML table."}, e=>llm-configuration('PaL<', max-tokens=>800))

-> **@args, *%args { #`(Block|5361560043184) ... }

Here is the HTML table derivation:

%%html
my $tblHTML=&ftbl("plain text", to-pretty-table(@rules.pick(12).sort, align => 'l', field-names => <Key Value>))

KeyValue
Produce the frequencies of the letters in file “twenty”rak ‘slip .comb’ twenty –type=code –frequencies
Search all files and all subdirectoriesrak foo *
Search for literal string “foo” from the current directoryrak foo
Show all filenames from current directory on downrak –find –treasure
Show all the lines that consist of “seven”rak ^seven$ twenty
Show all unique “name” fields in JSON filesrak –json-per-file ‘*’ –unique
Show the lines ending with “o”rak o$ twenty
add / change description -i at a later timerak –description=’Do not care about case’ –save=i
look for literal string “foo”, don’t check case or accentsrak foo -im
remove the –frobnicate custom optionrak –save=frobnicate
same, with a regular expressionrak ‘/ foo $/’
save –ignorecase as -i, without descriptionrak –ignorecase –save=i

Nice tabulation with “Markdown::Grammar”

Instead of using LLMs for HTML conversion it is more “productive” to use the HTML interpreter provided by “Markdown::Grammar”:

%%html
sub to-html($x) { md-interpret($x.Str.lines[1..*-2].join("\n").subst('+--','|--', :g).subst('--+','--|', :g), actions=>Markdown::Actions::HTML.new) }
to-pretty-table(@rules.pick(12).sort) ==> to-html

KeyValue
Find files that have “lib” in their name from the current dirrak lib –find
Look for strings containing y or Yrak –type=contains –ignorecase Y twenty
Show all directory names from current directory downrak –find –/file
Show all lines with numbers between 1 and 65rak ‘/ \d+ /’
Show the lines that contain “six” as a wordrak §six twenty
look for “Foo”, while taking case into accountrak Foo
look for “foo” in all filesrak foo
produce extensive help on filesystem filtersrak –help=filesystem –pager=less
save –context as -C, setting a default of 2rak –context='[2]’ –save=C
save searching in Rakudo’s committed files as –rakudorak –paths=’~/Github/rakudo’ –under-version-control –save=rakudo
search for “foo” and show 4 lines of contextrak foo -C=4
start rak with configuration file at /usr/local/rak-config.jsonRAK_CONFIG=/usr/local/rak-config.json rak foo

Remark: Of course, in order to program the above sub we need to know how to use “Markdown::Grammar”. Producing HTML tables with LLMs is much easier — only knowledge of “spoken English” is required.

Code generation examples

Here we define an LLM function for generating “App::Rak” shell commands:

my &frak = llm-example-function(@rules, e => llm-evaluator('PaLM'))

-> **@args, *%args { #`(Block|5361473489952) ... }

my @cmds = ['Find files that have ".nb" in their names', 'Find files that have ".nb"  or ".wl" in their names',
 'Show all directories of the parent directory', 'Give me files without extensions and that contain the phrase "notebook"', 
 'Show all that have extension raku or rakumod and contain Data::Reshapers'];

my @tbl = @cmds.map({ %( 'Command' => $_, 'App::Rak' => &frak($_) ) }).Array;

@tbl.&dimensions

(5 2)

Here is a table showing the natural language commands and the corresponding translations to the “App::Rak” CLI DSL:

%%html
to-pretty-table(@tbl, align=>'l', field-names => <Command App::Rak>) ==> to-html

CommandApp::Rak
Find files that have “.nb” in their namesrak –extensions=nb –find
Find files that have “.nb” or “.wl” in their namesrak –find –extensions=nb,wl
Show all directories of the parent directoryrak –find –/file –parent
Give me files without extensions and that contain the phrase “notebook”rak –extensions= –type=contains notebook
Show all that have extension raku or rakumod and contain Data::Reshapersrak ‘/ Data::Reshapers /’ –extensions=raku,rakumod

Verification

Of course, the obtained “App::Rak” commands have to be verified to:

  • Work
  • Produce expected results

We can program to this verification with Raku or with the Jupyter framework, but we not doing that here. (We do the verification manually outside of this notebook.)

Remark: I tried a dozen of generated commands. Most worked. One did not work because of the current limitations of “App::Rak”. Others needed appropriate nudging to produce the desired results.

Here is an example of command that produces code that “does not work”:

&frak("Give all files that have extensions .nd and contain the command Classify")

rak '*.nd <command> Classify' --extensions=nd

Here are a few more:

&frak("give the names of all files in the parent directory")

rak --find --/file --/directory

&frak("Find all directories in the parent directory")

rak --find --/file --parent

Here is a generated command that exposes an “App::Rak” limitation:

&frak("Find all files in the parent directory")

rak --find ..


Translating randomly generated commands

Consider testing the applicability of the approach by generating a “good enough” sample of natural language commands for finding files or directories.

We can generate such commands via LLM. Here we define an LLM function with two parameters the returns a Raku list:

my &fcg = llm-function({"Generate $^_a natural language commands for finding $^b in a file system. Give the commands as a JSON list."}, form => sub-parser('JSON'))

-> **@args, *%args { #`(Block|5361560082992) ... }

my @gCmds1 = &fcg(4, 'files').flat;
@gCmds1.raku

["Find all files in the current directory", "Find all files with the .txt extension in the current directory", "Search for all files with the word 'report' in the file name", "Search for all files with the word 'data' in the file name in the Documents folder"]

Here are the corresponding translations to the “App::Rak” DSL:

%%html
my @tbl1 = @gCmds1.map({ %( 'Command' => $_, 'App::Rak' => &frak($_) ) }).Array;
@tbl1 ==> to-pretty-table(align=>'l', field-names => <Command App::Rak>) ==> to-html

CommandApp::Rak
Find all files in the current directoryrak –find
Find all files with the .txt extension in the current directoryrak –extensions=txt
Search for all files with the word ‘report’ in the file namerak report –find
Search for all files with the word ‘data’ in the file name in the Documents folderrak data Documents

Let use redo the generation and translation using different specs:

my @gCmds2 = &fcg(4, 'files that have certain extensions or contain certain words').flat;
@gCmds2.raku

["Find all files with the extension .txt", "Locate all files that have the word 'project' in their name", "Show me all files with the extension .jpg", "Find all files that contain the word 'report'"]

%%html
my @tbl2 = @gCmds2.map({ %( 'Command' => $_, 'App::Rak' => &frak($_) ) }).Array;
@tbl2 ==> to-pretty-table( align=>'l', field-names => <Command App::Rak>) ==> to-html

CommandApp::Rak
Find all files with the extension .txtrak –extensions=txt
Locate all files that have the word ‘project’ in their namerak –find project
Show me all files with the extension .jpgrak –extensions=jpg
Find all files that contain the word ‘report’rak report –find

Remark: Ideally, there would be an LLM-based system that 1) hallucinates “App::Rak” commands, 2) executes them, and 3) files GitHub issues if it thinks the results are sub-par. (All done authomatically.) On a more practical note, we can use a system that has the first two components “only” to stress test “App::Rak”.


Alternative programming with LLM

In this subsection we show how to extract comment-and-code pairs using LLM functions. (Instead of working hard with Raku regexes.)

Here is LLM function that specifies the extraction:

my &fcex = llm-function({"Extract consecutive line pairs in which the first start with '#' and second with '\$' from the text $_. Group the lines as key-value pairs and put them in JSON format."}, 
form => 'JSON') 

-> **@args, *%args { #`(Block|5361473544264) ... }

Here are three code blocks:

%%html
my @focusInds = [3, 12, 45];
[@blocks[@focusInds],] ==> to-pretty-table(align=>'l') ==> to-html

012
```
# Look for “ve” at the end of all lines in file “twenty”# Show the lines containing “ne”# List all known extensions
$ rak –type=ends-with ve twenty$ rak ne twenty# rak –list-known-extensions
twentytwenty`
5:fi𝐯𝐞1:o𝐧𝐞
12:twel𝐯𝐞9:ni𝐧𝐞
`19:ni𝐧𝐞teen
`

Here we extract the command-and-code lines from the code blocks:

%%html
&fcex(@blocks[@focusInds]) ==> to-pretty-table(align=>'l') ==> to-html

ValueKey
# rak –list-known-extensions# List all known extensions
$ rak ne twenty# Show the lines containing “ne”
$ rak –type=ends-with ve twenty# Look for “ve” at the end of all lines in file “twenty”

Grammar generation

The “right way” of translating natural language DSLs to CLI DSLs like the one of “App::Rak” is to make a grammar for the natural language DSL and the corresponding interpreter. This might be a lengthy process, so, we might consider replacing it, or jump-starting it, with LLM-basd grammar generation: we ask an LLM to generate a grammar for a collection DSL sentences. (For example, the keys of the rules above.) In this subsection we make a “teaser” demonstration of latter approach.

Here we create an LLM function for generating grammars over collections of sentences:

my &febnf = llm-function({"Generate an $^a grammar for the collection of sentences:\n $^b "}, e => llm-configuration("OpenAI", max-tokens=>900))

-> **@args, *%args { #`(Block|5060670827264) ... }

Here we generate an EBNF grammar for the “App::Rak” code-example commands:

my $ebnf = &febnf('EBNF', @rules>>.key)

 Look for the lines that contains two consecutive words that start with "ba" Show all the lines where the fifth character is "e"

SentenceList → Sentence | SentenceList Sentence

Sentence → ProduceResultsPipe | SpecifyLiteral | SpecifyRegExp | SaveIgnoreCase | SaveIgnoremark | AddChangeDescIgnoreCase | LiteralStringCheck | SaveWhitespace | SearchRakudo | SaveAfterContext | SaveBeforeContext | SaveContext | SearchContext | SmartCase | SearchCase | RemoveOption | StartRak | SearchFile | SearchSubDir | Extension | NoExtension | BehaviourFiles | HelpFilesystem | SearchDir | FindName | FindNumber | FindScooby | FindAnywhere | FindWord | FindStart | FindEnd | NumberCharacters | FindY | FindU | FindNE | FindSix | FindSeven | FindEight | FreqLetters | ShowContain | TitleCase | ReverseOrder | Optionally

ProduceResultsPipe → "produce" "results" "without" "any" "highlighting"
SpecifyLiteral → "specify" "a" "literal" "pattern" "at" "the" "end" "of" "a" "line"
SpecifyRegExp → "same," "with" "a" "regular" "expression"
SaveIgnoreCase → "save" "--ignorecase" "as" "-i," "without" "description"
SaveIgnoremark → "save" "--ignoremark" "as" "-m," "with" "description"
AddChangeDescIgnoreCase → "add" "/" "change" "description" "-i" "at" "a" "later" "time"
LiteralStringCheck → "look" "for" "literal" "string" "\"foo\"," "don't" "check" "case" "or" "accents"
SaveWhitespace → "save" "looking" "for" "whitespace" "at" "end" "of" "a" "line" "as" "--wseol"
SearchRakudo → "search" "for" "'sub" "min'" "in" "Rakudo's" "source"
SaveAfterContext → "save" "--after-context" "as" "-A," "requiring" "a" "value"
SaveBeforeContext → "save" "--before-context" "as" "-B," "requiring" "a" "value"
SaveContext → "save" "--context" "as" "-C," "setting" "a" "default" "of" "2"
SearchContext → "search" "for" "\"foo\"" "and" "show" "two" "lines" "of" "context"
SmartCase → "set" "up" "smartcase" "by" "default"
SearchCase → "look" "for" "\"Foo\"," "while" "taking" "case" "into" "account"
RemoveOption → "remove" "the" "--frobnicate" "custom" "option"
CheckOption → "check" "there's" "no" "\"frobnicate\"" "option" "anymore"
StartRak → "start" "rak" "with" "configuration" "file" "at" "/usr/local/rak-config.json"
SearchFile → "look" "for" "\"foo\"" "in" "all" "files"
SearchSubDir → "search" "all" "files" "and" "all" "subdirectories"
Extension → "only" "accept" "files" "with" "the" ".bat" "extension"
NoExtension → "only" "accept" "files" "without" "extension"
BehaviourFiles → "only" "accept" "Raku" "and" "Markdown" "files" 
HelpFilesystem → "produce" "extensive" "help" "on" "


References

Articles

[AA1] Anton Antonov, “Workflows with LLM functions”, (2023), RakuForPrediction at WordPress.

[AA2] Anton Antonov, “Graph representation of grammars”, (2023), RakuForPrediction at WordPress.

[EM1] Elizabeth Mattijsen, “It’s time to rak! Series’ Articles”, (2022), Lizmat series at Dev.to.

Packages, repositories

[AAp1] Anton Antonov, LLM::Functions Raku package, (2023), GitHub/antononcube.

[AAp2] Anton Antonov, WWW::OpenAI Raku package, (2023), GitHub/antononcube.

[AAp3] Anton Antonov, WWW::PaLM Raku package, (2023), GitHub/antononcube.

[AAp4] Anton Antonov, Text::SubParsers Raku package, (2023), GitHub/antononcube.

[AAp5] Anton Antonov, Markdown::Grammar Raku package, (2023), GitHub/antononcube.

[BDp1] Brian Duggan, Jupyter::Kernel Raku package, (2017-2023), GitHub/bduggan.

[EMp1] Elizabeth Mattijsen, App::Rak Raku package, (2022-2023), GitHub/lizmat.

[EMr1] Elizabeth Mattijsen, articles, (2018-2023) GitHub/lizmat.

Further work on WWW::OpenAI

Introduction

The Raku package “WWW::OpenAI” provides access to the machine learning service OpenAI, [OAI1]. For more details of the OpenAI’s API usage see the documentation, [OAI2].

The package “WWW::OpenAI” was proclaimed approximately two months ago — see [AA1]. This blog post shows all the improvements and additions since then together with the “original” features.

Remark: The Raku package “WWW::OpenAI” is much “less ambitious” than the official Python package, [OAIp1], developed by OpenAI’s team.

Demo videos

Functionalities

  • Universal “front-end” – to access OpenAI’s services
  • Models — list OpenAI models
  • Code generation — code generation using chat- and text-completion
  • Image generation — using DALL-E
  • Moderation — probabilities of different moderation labels
  • Audio transcription and translation — voice to text
  • Embeddings — text into numerical vector
  • Finding textual answers — text inquiries
  • CLI — for the “best” UI

Before running the usage examples…

Remark: To use the OpenAI API one has to register and obtain authorization key.

Remark: When the authorization key, auth-key, is specified to be Whatever then the functions openai-* attempt to use the env variable OPENAI_API_KEY.

The following flowchart corresponds to the steps in the package function openai-playground:


Universal “front-end”

The package has an universal “front-end” function openai-playground for the different functionalities provided by OpenAI.

Here is a simple call for a “chat completion”:

use WWW::OpenAI;
openai-playground('Where is Roger Rabbit?', max-tokens => 64);
# [{finish_reason => stop, index => 0, logprobs => (Any), text => 
# 
# Roger Rabbit is a fictional character created by Disney in 1988. He has appeared in several movies and television shows, but is not an actual person.}]

Another one using Bulgarian:

openai-playground('Колко групи могат да се намерят в този облак от точки.', max-tokens => 64);
# [{finish_reason => length, index => 0, logprobs => (Any), text => 
# 
# В зависимост от размера на облака от точки, може да бъдат}]

Remark: The function openai-completion can be used instead in the examples above. See the section “Create chat completion” of [OAI2] for more details.


Models

The current OpenAI models can be found with the function openai-models:

openai-models
# (ada ada-code-search-code ada-code-search-text ada-search-document ada-search-query ada-similarity ada:2020-05-03 babbage babbage-code-search-code babbage-code-search-text babbage-search-document babbage-search-query babbage-similarity babbage:2020-05-03 code-davinci-edit-001 code-search-ada-code-001 code-search-ada-text-001 code-search-babbage-code-001 code-search-babbage-text-001 curie curie-instruct-beta curie-search-document curie-search-query curie-similarity curie:2020-05-03 cushman:2020-05-03 davinci davinci-if:3.0.0 davinci-instruct-beta davinci-instruct-beta:2.0.0 davinci-search-document davinci-search-query davinci-similarity davinci:2020-05-03 gpt-3.5-turbo gpt-3.5-turbo-0301 if-curie-v2 if-davinci-v2 if-davinci:3.0.0 text-ada-001 text-ada:001 text-babbage-001 text-babbage:001 text-curie-001 text-curie:001 text-davinci-001 text-davinci-002 text-davinci-003 text-davinci-edit-001 text-davinci:001 text-embedding-ada-002 text-search-ada-doc-001 text-search-ada-query-001 text-search-babbage-doc-001 text-search-babbage-query-001 text-search-curie-doc-001 text-search-curie-query-001 text-search-davinci-doc-001 text-search-davinci-query-001 text-similarity-ada-001 text-similarity-babbage-001 text-similarity-curie-001 text-similarity-davinci-001 whisper-1)

Code generation

There are two types of completions : text and chat. Let us illustrate the differences of their usage by Raku code generation. Here is a text completion:

openai-completion(
        'generate Raku code for making a loop over a list',
        type => 'text',
        max-tokens => 120,
        format => 'values');
# my @list = <a b c d e f g h i j>;
# for @list -> $item {
#     say $item;
# }

Here is a chat completion:

openai-completion(
        'generate Raku code for making a loop over a list',
        type => 'chat',
        max-tokens => 120,
        format => 'values');
# Here's an example of how to make a loop over a list in Raku:
# 
# ```
# my @list = (1, 2, 3, 4, 5);
# 
# for @list -> $item {
#     say $item;
# }
# ```
# 
# In this code, we define a list `@list` with some values. Then, we use a `for` loop to iterate over each item in the list. The `-> $item` syntax specifies that we want to assign each item to the variable `$item` as we loop through the list. Finally, we use the

Remark: The argument “type” and the argument “model” have to “agree.” (I.e. be found agreeable by OpenAI.) For example:

  • model => 'text-davinci-003' implies type => 'text'
  • model => 'gpt-3.5-turbo' implies type => 'chat'

Remark: The video “Streamlining ChatGPT code generation and narration workflows (Raku)”, uses Raku and Wolfram Language (WL) made Raku- and OpenAI-access packages. Nevertheless, the code generation demonstrated can be replicated with “WWW::OpenAI”.


Image generation

Remark: See the files “Image-generation*” for more details.

Images can be generated with the function openai-create-image — see the section “Images” of [OAI2].

Here is an example:

my $imgB64 = openai-create-image(
        "racoon with a sliced onion in the style of Raphael",
        response-format => 'b64_json',
        n => 1,
        size => 'small',
        format => 'values',
        method => 'tiny');

Here are the options descriptions:

  • response-format takes the values “url” and “b64_json”
  • n takes a positive integer, for the number of images to be generated
  • size takes the values ‘1024×1024’, ‘512×512’, ‘256×256’, ‘large’, ‘medium’, ‘small’.

Here we generate an image, get its URL, and place (embed) a link to it via the output of the code cell:

my @imgRes = |openai-create-image(
        "racoon and onion in the style of Roy Lichtenstein",
        response-format => 'url',
        n => 1,
        size => 'small',
        method => 'tiny');

'![](' ~ @imgRes.head<url> ~ ')';

Moderation

Here is an example of using OpenAI’s moderation:

my @modRes = |openai-moderation(
"I want to kill them!",
format => "values",
method => 'tiny');

for @modRes -> $m { .say for $m.pairs.sort(*.value).reverse; }
# violence => 0.9635829329490662
# hate => 0.2717878818511963
# hate/threatening => 0.006235524546355009
# sexual => 8.503619142175012e-07
# violence/graphic => 2.7227645915672838e-08
# self-harm => 1.6152158499593838e-09
# sexual/minors => 1.3727728953583096e-09

Audio transcription and translation

Here is an example of using OpenAI’s audio transcription:

my $fileName = $*CWD ~ '/resources/HelloRaccoonsEN.mp3';
say openai-audio(
        $fileName,
        format => 'json',
        method => 'tiny');
# {
#   "text": "Raku practitioners around the world, eat more onions!"
# }

To do translations use the named argument type:

my $fileName = $*CWD ~ '/resources/HowAreYouRU.mp3';
say openai-audio(
        $fileName,
        type => 'translations',
        format => 'json',
        method => 'tiny');
# {
#   "text": "How are you, bandits, hooligans? I've lost my mind because of you. I've been working as a guard for my whole life."
# }

Embeddings

Embeddings can be obtained with the function openai-embeddings. Here is an example of finding the embedding vectors for each of the elements of an array of strings:

my @queries = [
    'make a classifier with the method RandomForeset over the data dfTitanic',
    'show precision and accuracy',
    'plot True Positive Rate vs Positive Predictive Value',
    'what is a good meat and potatoes recipe'
];

my $embs = openai-embeddings(@queries, format => 'values', method => 'tiny');
$embs.elems;
# 4

Here we show:

  • That the result is an array of four vectors each with length 1536
  • The distributions of the values of each vector
use Data::Reshapers;
use Data::Summarizers;

say "\$embs.elems : { $embs.elems }";
say "\$embs>>.elems : { $embs>>.elems }";
records-summary($embs.kv.Hash.&transpose);
# $embs.elems : 4
# $embs>>.elems : 1536 1536 1536 1536
# +--------------------------------+------------------------------+-------------------------------+-------------------------------+
# | 3                              | 1                            | 0                             | 2                             |
# +--------------------------------+------------------------------+-------------------------------+-------------------------------+
# | Min    => -0.6049936           | Min    => -0.6674932         | Min    => -0.5897995          | Min    => -0.6316293          |
# | 1st-Qu => -0.0128846505        | 1st-Qu => -0.012275769       | 1st-Qu => -0.013175397        | 1st-Qu => -0.0125476065       |
# | Mean   => -0.00075456833016081 | Mean   => -0.000762535416627 | Mean   => -0.0007618981246602 | Mean   => -0.0007296895499115 |
# | Median => -0.00069939          | Median => -0.0003188204      | Median => -0.00100605615      | Median => -0.00056341792      |
# | 3rd-Qu => 0.012142678          | 3rd-Qu => 0.011146013        | 3rd-Qu => 0.012387738         | 3rd-Qu => 0.011868718         |
# | Max    => 0.22202122           | Max    => 0.22815572         | Max    => 0.21172291          | Max    => 0.21270473          |
# +--------------------------------+------------------------------+-------------------------------+-------------------------------+

Here we find the corresponding dot products and (cross-)tabulate them:

use Data::Reshapers;
use Data::Summarizers;
my @ct = (^$embs.elems X ^$embs.elems).map({ %( i => $_[0], j => $_[1], dot => sum($embs[$_[0]] >>*<< $embs[$_[1]])) }).Array;

say to-pretty-table(cross-tabulate(@ct, 'i', 'j', 'dot'), field-names => (^$embs.elems)>>.Str);
# +---+----------+----------+----------+----------+
# |   |    0     |    1     |    2     |    3     |
# +---+----------+----------+----------+----------+
# | 0 | 1.000000 | 0.724412 | 0.756557 | 0.665149 |
# | 1 | 0.724412 | 1.000000 | 0.811169 | 0.715543 |
# | 2 | 0.756557 | 0.811169 | 1.000000 | 0.698977 |
# | 3 | 0.665149 | 0.715543 | 0.698977 | 1.000000 |
# +---+----------+----------+----------+----------+

Remark: Note that the fourth element (the cooking recipe request) is an outlier. (Judging by the table with dot products.)


Finding textual answers

Here is an example of finding textual answers:

my $text = "Lake Titicaca is a large, deep lake in the Andes 
on the border of Bolivia and Peru. By volume of water and by surface 
area, it is the largest lake in South America";

openai-find-textual-answer($text, "Where is Titicaca?")
# [Andes on the border of Bolivia and Peru .]

By default openai-find-textual-answer tries to give short answers. If the option “request” is Whatever then depending on the number of questions the request is one those phrases:

  • “give the shortest answer of the question:”
  • “list the shortest answers of the questions:”

In the example above the full query given to OpenAI’s models is

Given the text “Lake Titicaca is a large, deep lake in the Andes on the border of Bolivia and Peru. By volume of water and by surface area, it is the largest lake in South America” give the shortest answer of the question:
Where is Titicaca?

Here we get a longer answer by changing the value of “request”:

openai-find-textual-answer($text, "Where is Titicaca?", request => "answer the question:")
# [Titicaca is in the Andes on the border of Bolivia and Peru .]

Remark: The function openai-find-textual-answer is inspired by the Mathematica function FindTextualAnswer; see [JL1]. Unfortunately, at this time implementing the full signature of FindTextualAnswer with OpenAI’s API is not easy. (Or cheap to execute.)

Multiple questions

If several questions are given to the function openai-find-textual-answer then all questions are spliced with the given text into one query (that is sent to OpenAI.)

For example, consider the following text and questions:

my $query = 'Make a classifier with the method RandomForest over the data dfTitanic; show precision and accuracy.';

my @questions =
        ['What is the dataset?',
         'What is the method?',
         'Which metrics to show?'
        ];

Then the query send to OpenAI is:

Given the text: “Make a classifier with the method RandomForest over the data dfTitanic; show precision and accuracy.” list the shortest answers of the questions:

  1. What is the dataset?
  2. What is the method?
  3. Which metrics to show?

The answers are assumed to be given in the same order as the questions, each answer in a separated line. Hence, by splitting the OpenAI result into lines we get the answers corresponding to the questions.

If the questions are missing question marks, it is likely that the result may have a completion as a first line followed by the answers. In that situation the answers are not parsed and a warning message is given.


CLI

Playground access

The package provides a Command Line Interface (CLI) script:

openai-playground --help
# Usage:
#   openai-playground <text> [--path=<Str>] [-n[=UInt]] [--max-tokens[=UInt]] [-m|--model=<Str>] [-r|--role=<Str>] [-t|--temperature[=Real]] [-l|--language=<Str>] [--response-format=<Str>] [-a|--auth-key=<Str>] [--timeout[=UInt]] [--format=<Str>] [--method=<Str>] -- Text processing using the OpenAI API.
#   openai-playground [<words> ...] [-m|--model=<Str>] [--path=<Str>] [-n[=UInt]] [--max-tokens[=UInt]] [-r|--role=<Str>] [-t|--temperature[=Real]] [-l|--language=<Str>] [--response-format=<Str>] [-a|--auth-key=<Str>] [--timeout[=UInt]] [--format=<Str>] [--method=<Str>] -- Command given as a sequence of words.
#   
#     <text>                     Text to be processed or audio file name.
#     --path=<Str>               Path, one of 'chat/completions', 'images/generations', 'moderations', 'audio/transcriptions', 'audio/translations', 'embeddings', or 'models'. [default: 'chat/completions']
#     -n[=UInt]                  Number of completions or generations. [default: 1]
#     --max-tokens[=UInt]        The maximum number of tokens to generate in the completion. [default: 100]
#     -m|--model=<Str>           Model. [default: 'Whatever']
#     -r|--role=<Str>            Role. [default: 'user']
#     -t|--temperature[=Real]    Temperature. [default: 0.7]
#     -l|--language=<Str>        Language. [default: '']
#     --response-format=<Str>    The format in which the generated images are returned; one of 'url' or 'b64_json'. [default: 'url']
#     -a|--auth-key=<Str>        Authorization key (to use OpenAI API.) [default: 'Whatever']
#     --timeout[=UInt]           Timeout. [default: 10]
#     --format=<Str>             Format of the result; one of "json" or "hash". [default: 'json']
#     --method=<Str>             Method for the HTTP POST query; one of "tiny" or "curl". [default: 'tiny']

Remark: When the authorization key argument “auth-key” is specified set to “Whatever” then openai-playground attempts to use the env variable OPENAI_API_KEY.

Finding textual answers

The package provides a CLI script for finding textual answers:

openai-find-textual-answer --help
# Usage:
#   openai-find-textual-answer <text> -q=<Str> [--max-tokens[=UInt]] [-m|--model=<Str>] [-t|--temperature[=Real]] [-r|--request=<Str>] [-p|--pairs] [-a|--auth-key=<Str>] [--timeout[=UInt]] [--format=<Str>] [--method=<Str>] -- Text processing using the OpenAI API.
#   openai-find-textual-answer [<words> ...] -q=<Str> [--max-tokens[=UInt]] [-m|--model=<Str>] [-t|--temperature[=Real]] [-r|--request=<Str>] [-p|--pairs] [-a|--auth-key=<Str>] [--timeout[=UInt]] [--format=<Str>] [--method=<Str>] -- Command given as a sequence of words.
#   
#     <text>                     Text to be processed or audio file name.
#     -q=<Str>                   Questions separated with '?' or ';'.
#     --max-tokens[=UInt]        The maximum number of tokens to generate in the completion. [default: 300]
#     -m|--model=<Str>           Model. [default: 'Whatever']
#     -t|--temperature[=Real]    Temperature. [default: 0.7]
#     -r|--request=<Str>         Request. [default: 'Whatever']
#     -p|--pairs                 Should question-answer pairs be returned or not? [default: False]
#     -a|--auth-key=<Str>        Authorization key (to use OpenAI API.) [default: 'Whatever']
#     --timeout[=UInt]           Timeout. [default: 10]
#     --format=<Str>             Format of the result; one of "json" or "hash". [default: 'json']
#     --method=<Str>             Method for the HTTP POST query; one of "tiny" or "curl". [default: 'tiny']

Refactoring

Separate files for each OpenAI functionality

The original implementation of “WWW::OpenAI” had design and implementation that were very similar to those of “Lingua::Translation::DeepL”, [AAp1].

Major refactoring of the original code was done — now each OpenAI functionality targeted by “WWW::OpenAI” has its code placed in a separate file.

In order to do the refactoring, of course, a comprehensive enough suite of unit tests had to be put in place. Since running the tests costs money, the tests are placed in the “./xt” directory.

De-Cro-ing the requesting code

The first implementation of “WWW::OpenAI” used “Cro::HTTP::Client” to access OpenAI’s services. Often when I use “Cro::HTTP::Client” on macOS I get the errors:

Cannot locate symbol ‘SSL_get1_peer_certificate’ in native library

(See longer discussions about this problem here and here.)

Given the problems of using “Cro::HTTP::Client” and the implementations with curl and “HTTP::Tiny”, I decided it is better to make the implementation of “WWW::OpenAI” more lightweight by removing the code related to “Cro::HTTP::Client”.


References

Articles

[AA1] Anton Antonov, “WWW::OpenAI (for ChatGPT and other statistical gimmicks)”, (2023), RakuForPrediction at WordPress.

[JL1] Jérôme Louradour, “New in the Wolfram Language: FindTextualAnswer”, (2018), blog.wolfram.com.

Packages, platforms

[AAp1] Anton Antonov, Lingua::Translation::DeepL Raku package, (2022), GitHub/antononcube.

[AAp2] Anton Antonov, Text::CodeProcessing, (2021), GitHub/antononcube.

[OAI1] OpenAI Platform, OpenAI platform.

[OAI2] OpenAI Platform, OpenAI documentation.

[OAIp1] OpenAI, OpenAI Python Library, (2020), GitHub/openai.

Videos

[AAv1] Anton Antonov, “Racoons playing with pearls and onions”, (2023), YouTube/@AAA4prediction.

[AAv2] Anton Antonov, “OpenAIMode demo (Mathematica)”, (2023), YouTube/@AAA4prediction.

[AAv3] Anton Antonov, “OpenAIMode code generation workflows demo (Mathematica et al.)”, (2023), YouTube/@AAA4prediction.

[AAv4] Anton Antonov, “Streamlining ChatGPT code generation and narration workflows (Raku)”, (2023), YouTube/@AAA4prediction.