Pi Day 2026: Formulas, Series, and Plots for π

Introduction

  • Happy Pi Day! Today (3/14) we celebrate the most famous mathematical constant: π ≈ 3.141592653589793…
  • π is irrational and transcendental, appears in circles, waves, probability, physics, and even random walks.
  • Raku (with its built-in π constant, excellent rational support, lazy lists, and unicode operators) makes experimenting with π relatively easy and enjoyable.
  • In this blog post (notebook) we explore a selection of formulas and algorithms.

0. Setup

use Math::NumberTheory;
use BigRoot;
use Image::Markup::Utilities;
use Graphviz::DOT::Chessboard;
use Data::Reshapers;
use JavaScript::D3;
use JavaScript::D3::Utilities;

D3.js

#%javascript
require.config({
paths: {
d3: 'https://d3js.org/d3.v7.min'
}});
require(['d3'], function(d3) {
console.log(d3);
});
my $title-color = 'Ivory';
my $stroke-color = 'SlateGray';

1. Continued fraction approximation

The built-in Raku constant pi (or π) is fairly low precision:

say π.fmt('%.25f')
# 3.1415926535897930000000000

One way to remedy that is to use continued fractions. For example, using the (first) sequence line of On-line Encyclopedia of Integer Sequences (OEIS) A001203 produces  with precision 56:

my @s = 3, 7, 15, 1, 292, 1, 1, 1, 2, 1, 3, 1, 14, 2, 1, 1, 2, 2, 2, 2, 1, 84, 2, 1, 1, 15, 3, 13, 1, 4, 2, 6, 6, 99, 1, 2, 2, 6, 3, 5, 1, 1, 6, 8, 1, 7, 1, 2, 3, 7, 1, 2, 1, 1, 12, 1, 1, 1, 3, 1, 1, 8, 1, 1, 2, 1, 6, 1, 1, 5, 2, 2, 3, 1, 2, 4, 4, 16, 1, 161, 45, 1, 22, 1, 2, 2, 1, 4, 1, 2, 24, 1, 2, 1, 3, 1, 2, 1;
my $pi56 = from-continued-fraction(@s».FatRat.List);
# 3.14159265358979323846264338327950288419716939937510582097

Here we verify the precision using Wolfram Language:

"wolframscript -code 'N[Pi, 100] - $pi56'"
andthen .&shell(:out)
andthen .out.slurp(:close)
# 0``56.

More details can be found in Wolfram MathWorld page “Pi Continued Fraction”, [EW1].


2. Continued fraction terms plots

It is interesting to consider the plotting the terms of continued fraction terms of .

First we ingest the more “pi-terms” from OEIS A001203 (20k terms):

my @ds = data-import('https://oeis.org/A001203/b001203.txt').split(/\s/)».Int.rotor(2);
my @terms = @ds».tail;
@terms.elems
# 20000

Here is the summary:

sink records-summary(@terms)
# +-------------------+
# | numerical |
# +-------------------+
# | 1st-Qu => 1 |
# | Median => 2 |
# | Min => 1 |
# | Max => 20776 |
# | Mean => 12.6809 |
# | 3rd-Qu => 5 |
# +-------------------+

Here is an array plot of the first 128 terms of the continued fraction approximating :

#% html
my @mat = |@terms.head(128)».&integer-digits(:2base);
my $max-digits = @mat».elems.max;
@mat .= map({ [|(0 xx (``max-digits - ``_.elems)), |$_] });
dot-matrix-plot(transpose(@mat), size => 10):svg
cell 26 output 1 svg 1

Next, we show the Pareto principle manifestation of for the continued fraction terms. First we observe that the terms a distribution similar to Benford’s law:

#% js
my @tally-pi = tally(@terms).sort(-*.value).head(16) <</>> @terms.elems;
my @terms-b = random-variate(BenfordDistribution.new(:10base), 2_000);
my @tally-b = tally(@terms-b).sort(-*.value).head(16) <</>> @terms-b.elems;
js-d3-bar-chart(
[
|@tally-pi.map({ %( x => ``_.key, y => ``_.value, group => 'π') }),
|@tally-b.map({ %( x => ``_.key, y => ``_.value, group => 'Benford') })
],
plot-label => "Pi continued fraction terms vs. Benford's law",
:$title-color,
:$background)

Here is the Pareto principle plot — ≈5% of the unique term values correspond to ≈80% of the terms:

#% js
js-d3-list-line-plot(
pareto-principle-statistic(@terms),
plot-label => "Pi continued fraction terms vs. Benford's law",
:$title-color,
:$background,
stroke-width => 5,
:grid-lines
)

3. Classic Infinite Series

Many ways to express π as an infinite sum — some converge slowly, others surprisingly fast.

Leibniz–Gregory series (1671/ Madhava earlier)

Raku implementation:

sub pi-leibniz($n) {
4 * [+] map { (``_ %% 2 ?? 1 !! -1) / (2 * ``_.FatRat + 1) }, 0 ..^ $n
}
my $piLeibniz = pi-leibniz(1_000);
# 3.140592653839792925963596502869395970451389330779724489367457783541907931239747608265172332007670207231403885276038710899938066629552214564551237742887150050440512339302537072825852760246628025562008569471700451065826106184744099667808080815231833582150382088582680381403109153574884416966097481526954707518119416184546424446286573712097944309435229550466609113881892172898692240992052089578302460852737674933105951137782047028552762288434104643076549100475536363928011329215789260496788581009721784276311248084584199773204673225752150684898958557383759585526225507807731149851003571219339536433193219280858501643712664329591936448794359666472018649604860641722241707730107406546936464362178479780167090703126423645364670050100083168338273868059379722964105943903324595829044270168232219388683725629678859726914882606728649659763620568632099776069203461323565260334137877

Verify with Wolfram Language (again):

"wolframscript -code 'N[Pi, 1000] - $piLeibniz'"
andthen .&shell(:out)
andthen .out.slurp(:close)
# 0.000999999750000312499...814206`866.9999998914263

Nilakantha series (faster convergence):

Raku:

sub pi-nilakantha($n) {
3 + [+] map {
($_ %% 2 ?? -1 !! 1 ) * 4 / ((2 * $_.FatRat) * (2 * $_ + 1) * (2 * $_ + 2))
}, 1 .. $n
}
pi-nilakantha(1_000);
# 3.141592653340542051900128736253203567152539255317954874674304859504426172618558702218695071137605738966036069683335561974900086119307836254205910905806190030949758215864755464129701335459521079534522811851010296642538249613529207613335816447914992502190861349451746347920350033634355181084537761886275546599078437173552420948534950023442771396391252038722980428723971632669306434394851189528826699233048019261441283970866004550291393472342649870962106821115715774722114776992400455398838055772839725805047379519366309217982783671029012753365224924699602163737619311405432798527164991008945233085366633073462699045511265528492985424805854418596455931463431855615794431867539190155631617285217459790661344075940516099637034367441911754544671168909454186231972510120715400925996293656987342326715209388299050131213232932065481743222390684073879385764855135985734675127240826
"wolframscript -code 'N[Pi, 1000] - {pi-nilakantha(1_000)}'"
andthen .&shell(:out)
andthen .out.slurp(:close)
# 2.4925118...83814206`860.3966372344514*^-10

3. Beautiful Products

Wallis product (1655) — elegant infinite product:

Raku running product:

my $p = 2.0;
for 1 .. 1_000 -> $n {
``p *= (2 * ``n) * (2 * ``n) / ( (2 * ``n - 1 ) * ( 2 * $n + 1) );
say "``n → {``p / ``piLeibniz} relative error" if ``n %% 100;
}
# 100 → 0.9978331595460779 relative error
# 200 → 0.9990719099195204 relative error
# 300 → 0.9994865459690567 relative error
# 400 → 0.9996941876848563 relative error
# 500 → 0.9998188764663584 relative error
# 600 → 0.9999020455903246 relative error
# 700 → 0.9999614733132168 relative error
# 800 → 1.0000060557070767 relative error
# 900 → 1.0000407377794782 relative error
# 1000 → 1.000068487771041 relative error

4. Very Fast Modern Series — Chudnovsky Algorithm

One of the fastest-converging series used in record computations:

Each term adds roughly 14 correct digits. Cannot be implemented easily in Raku, since Raku does not have bignum sqrt and power operations.


5. Spigot Algorithms — Digits “Drip” One by One

Spigot algorithms compute decimal digits using only integer arithmetic — no floating-point errors accumulate.

The classic Rabinowitz–Wagon spigot (based on a transformed Wallis product) produces base-10 digits sequentially.

Simple (but bounded) version outline in Raku:

sub spigot-pi($digits) {
my ``len = (10 * ``digits / 3).floor + 1;
my @a = 2 xx $len;
my @result;
for 1..$digits {
my $carry = 0;
for ``len-1 ... 0 -> ``i {
my ``x = 10 * @a[``i] + ``carry * (``i + 1);
@a[``i] = ``x % (2 * $i + 1);
``carry = ``x div (2 * $i + 1);
}
@result.push($carry div 10);
@a[0] = $carry % 10;
# (handle carry-over / nines adjustment in full impl)
}
@result.head(1).join('.') ~ @result[1..*].join
}
spigot-pi(50);
# 314159265358979323846264338327941028841971693993751
"wolframscript -code 'N[Pi, 100] - {spigot-pi(50).FatRat / 10e49.FatRat}'"
andthen .&shell(:out)
andthen .out.slurp(:close)
# 2.3969628881355243801510070603398913366797194459230781640628621`41.37966130996076*^-16

6. BBP Formula — Hex Digits Without Predecessors

Bailey–Borwein–Plouffe (1995) formula lets you compute the nth hexadecimal digit of π directly (without earlier digits):

Very popular for distributed π-hunting projects. The best known digit-extraction algorithm.

Raku snippet for partial sum (base 16 sense):

sub bbp-digit-sum($n) {
[+] (0..$n).map: -> $k {
my $r = 1/16**$k;
$r * (4/(8*$k+1) - 2/(8*$k+4) - 1/(8*$k+5) - 1/(8*$k+6))
}
}
say bbp-digit-sum(100).base(16).substr(0,20);
# 3.243F6B

7. (Instead of) Conclusion

  • π contains (almost surely) every finite sequence of digits — your birthday appears infinitely often.
  • The Feynman point: six consecutive 9s starting at digit 762.
  • Memorization world record > 100,000 digits.
  • π appears in the normal distribution, quantum mechanics, random walks, Buffon’s needle problem (probability ≈ 2/π).

Let us plot a random walk using the terms of continued fraction of Pi — the 20k or OEIS A001203 — to determine directions:

#% js
my @path = angle-path(@terms)».reverse».List;
my &pi-path-map = {
given @terms[$_] // 0 {
when $_ ≤ 100 { 0 }
when $_ ≤ 1_000 { 1 }
default { 2 }
}
}
@path = @path.kv.map( -> $i, $p {[|$p, &pi-path-map($i).Str]});
my %opts = color-scheme => 'Observable10', background => '#1F1F1F', :!axes, :!legends, stroke-width => 2;
js-d3-list-line-plot(@path, :800width, :500height, |%opts)

In the plot above the blue segments correspond to origin terms ≤ 100, yellow segments to terms between 100 and 1000, and red segment for origin terms greater than 1000.


References

[EW1] Eric Weisstein, “Pi Continued Fraction”Wolfram MathWorld.

Collatz conjecture visualizations

Introduction

This blog post (notebook) presents various visualizations related to the Collatz conjecture, [WMW1, Wk1] using Raku.

The Collatz conjecture, a renowned, unsolved mathematical problem, questions whether iteratively applying two basic arithmetic operations will lead every positive integer to ultimately reach the value of 1.

In this notebook the so-called “shortcut” version of the Collatz function is used:

That function is used repeatedly to form a sequence, beginning with any positive integer, and taking the result of each step as the input for the next.

The Collatz conjecture is: This process will eventually reach the number 1, regardless of which positive integer is chosen initially.

Raku-wise, subs for the Collatz sequences are easy to define. The visualizations are done with the packages “Graph”, [AAp1], “JavaScript::D3”, [AAp2], and “Math::NumberTheory”, [AAp3].

There are many articles, blog posts, and videos dedicated to visualizations of the Collatz conjecture. (For example, [KJR1, PZ1, Vv1]).

Remark: Consider following the warnings in [Vv1] and elsewhere:

Do not work on this [Collatz] problem! (Do real math instead.)

Remark: Notebook’s visualizations based on “JavaScript::D3” look a lot like the visualizations in [PZ1] — D3js is used in both.

Remark: See the Bulgarian version of this post: “Визуализации свързани с хипотезата на Колац”.


Setup

use Data::Reshapers;
use Data::Summarizers;
use Data::TypeSystem;
use Data::Translators;
use Graph;
use JavaScript::D3;
use Math::NumberTheory;

#%javascript

require.config({
     paths: {
     d3: 'https://d3js.org/d3.v7.min'
}});

require(['d3'], function(d3) {
     console.log(d3);
});

my $background = 'none';
my $stroke-color = 'Ivory';
my $fill-color = 'none';
my $title-color = 'DarkGray';

Additional subs are defined for getting color-blending sequences.

sub darker-shades(Str $hex-color, Int $steps) {
    my @rgb = $hex-color.subst(/ ^ '#'/).comb(2).map({ :16($_) });
    my @shades;
    for 1..$steps -> $step {
        my @darker = @rgb.map({ ($_ * (1 - $step / ($steps + 1))).Int });
        @shades.push: '#' ~ @darker.map({ sprintf '%02X', $_ }).join;
    }
    return @shades;
}

#say darker-shades("#34495E", 5);

sub blend-colors(Str $color1, Str $color2, Int $steps) {
    my @rgb1 = $color1.subst(/ ^ '#'/).comb(2).map({ :16($_) });
    my @rgb2 = $color2.subst(/ ^ '#'/).comb(2).map({ :16($_) });
    my @blended;

    for ^$steps -> $step {
        my @blend = (@rgb1 Z @rgb2).map({
            ($_[0] + ($step / $steps) * ($_[1] - $_[0])).Int
        });
        @blended.push: '#' ~ @blend.map({ sprintf '%02X', $_ }).join;
    }
    
    return @blended;
}

#say blend-colors("#34495E", "#FFEBCD", 5);


Collatz function definition

Here is a sub for the shortcut version of the Collatz function:

sub collatz(UInt $n is copy, Int:D $max-steps = 1000) {
    return [] if $n == 0;
    my @sequence = $n;
    while $n != 1 && @sequence.elems < $max-steps {
        $n = ($n %% 2 ?? $n div 2 !! (3 * $n + 1) / 2).Int;
        @sequence.push: $n;
    }
    return @sequence;
}

Here is an example using  as a sequence seed (i.e. starting value):

collatz(26)

# [26 13 20 10 5 8 4 2 1]

The next integer, , produces a much longer sequence:

collatz(27).elems

# 71


Simple visualizations

Collatz sequence numbers

Here is the simplest, informative Collatz sequence — or hailstone numbers — plot:

#% js
js-d3-list-line-plot(collatz(171), :$background, :$title-color, title => 'Hailstone numbers of 171')

Let us make a multi-line plot for a selection of seeds.

my @data = (1..1_000).map({ collatz($_) }).grep({ 30 ≤ $_.elems ≤ 150 && $_.max ≤ 600 }).pick(10).sort(*.head).map({my $i = $_.head; $_.kv.map(-> $x, $y {%(group => $i, :$x, :$y )}).Array }).map(*.Slip).Array;

deduce-type(@data)

# Vector(Assoc(Atom((Str)), Atom((Int)), 3), 320)

#% js
js-d3-list-line-plot(@data.flat, :$background)

Remark: Using simple sampling like the code block below would generally produce very non-uniform length- and max-value sequences.
Hence, we do the filtering above.

my @data = (^100).pick(9).sort.map(-> $i {collatz($i).kv.map(-> $x, $y {%(group => $i, :$x, :$y )}).Array }).map(*.Slip).Array;


Distributions

Here are Collatz sequences and their corresponding lengths and max-values:

my $m = 100_000;
my @cSequences = (1..$m).map({ collatz($_) });
my @cLengths = @cSequences».elems;
my @cMaxes = @cSequences».max;

my @dsCollatz = (1...$m) Z @cLengths Z @cMaxes;
@dsCollatz = @dsCollatz.map({ <seed length max>.Array Z=> $_.Array })».Hash;

sink records-summary(@dsCollatz, field-names => <seed length max>)

# +-------------------+--------------------+------------------------+
# | seed              | length             | max                    |
# +-------------------+--------------------+------------------------+
# | Min    => 1       | Min    => 1        | Min    => 1            |
# | 1st-Qu => 25000.5 | 1st-Qu => 47       | 1st-Qu => 42272        |
# | Mean   => 50000.5 | Mean   => 72.88948 | Mean   => 320578.18243 |
# | Median => 50000.5 | Median => 68       | Median => 85292        |
# | 3rd-Qu => 75000.5 | 3rd-Qu => 97       | 3rd-Qu => 162980       |
# | Max    => 100000  | Max    => 222      | Max    => 785412368    |
# +-------------------+--------------------+------------------------+

Here are histograms of the Collarz sequences lengths and max-value distributions:

#% js
js-d3-histogram(
    @cLengths, 
    100,
    :$background,
    :600width, 
    :400height, 
    title => "Collatz sequences lengths distribution (up to $m)",
    :$title-color
  )
~
js-d3-histogram(
    @cMaxes».log(10), 
    100,
    :$background,
    :600width, 
    :400height, 
    title => "Collatz sequences lg(max-value) distribution (up to $m)",
    :$title-color
  )

Here is a scatter plot of seed vs. sequence length:

#% js
js-d3-list-plot(
    @cLengths, 
    :$background, 
    :2point-size,
    :800width, 
    :400height, 
    title => 'Collatz sequences lengths',
    x-label => 'seed',
    y-label => 'sequence length',
    :$title-color
  )


Benford’s law adherence

It is of interest to see the manifestation of Benford’s law for the first digits of Collatz hailstones.
Here is the corresponding digit tally:

my %digitTally = @cSequences.race(:4degree).map({ $_».comb».head }).flat.&tally

# {1 => 2067347, 2 => 1375360, 3 => 870823, 4 => 857427, 5 => 581237, 6 => 448351, 7 => 334441, 8 => 443043, 9 => 310919}

Here is a comparison with the corresponding Benford’s law values:

#% html
sub benford-law(UInt:D $d, UInt:D $b = 10) { log($d + 1, $b) - log($d, $b) };

my @dsDigitTally = 
    %digitTally.sort(*.key.Int).map({%( 
        digit => $_.key, 
        value => round($_.value / %digitTally.values.sum, 10 ** -7), 
        benford => round(benford-law($_.key.Int), 10 ** -7)) }) 
==> to-html(field-names => <digit value benford>)

digitvaluebenford
10.28362760.30103
20.18869120.1760913
30.11947170.1249387
40.11763380.09691
50.07974220.0791812
60.06151110.0669468
70.04588330.0579919
80.06078280.0511525
90.04265620.0457575

Good adherence is observed for a relatively modest number of sequences.
Here is a corresponding bar chart:

#% js
my @data = 
    |@dsDigitTally.map({ <x y group>.Array Z=> [|$_<digit value>, 'Collatz'] })».Hash,
    |@dsDigitTally.map({ <x y group>.Array Z=> [|$_<digit benford>, 'Benford'] })».Hash;
    
js-d3-bar-chart(
    @data,
    title => "First digits frequencies (up to $m)",
    :$title-color,
    x-label => 'digit',
    y-label => 'frequency', 
    :!grid-lines, 
    :$background, 
    :700width, 
    :400height,
    margins => { :50left }
)


Sunflower embedding

A certain concentric pattern emerges in the spiral embedding plots of the Collatz sequences lengths mod 8. (Using mod 3 makes the pattern clearer.)
Similarly, a clear spiral pattern is seen for the maximum values.

#% js
my @sunflowerLengths = sunflower-embedding(16_000, with => { collatz($_).elems mod 8 mod 3 + 1}):d;
my @sunflowerMaxes = sunflower-embedding(16_000, with => { collatz($_).max mod 8 mod 3 + 1}):d;

js-d3-list-plot(@sunflowerLengths, 
    background => 'none',
    point-size => 4,
    width => 500, height => 440, 
    :!axes, 
    :!legends,
    color-scheme => 'Observable10',
    margins => {:20top, :20bottom, :50left, :50right}
 )

~

js-d3-list-plot(@sunflowerMaxes, 
    background => 'none',
    point-size => 4,
    width => 500, height => 440, 
    :!axes, 
    :!legends,
    color-scheme => 'Observable10',
    margins => {:20top, :20bottom, :50left, :50right}
 )


Small graphs

Define a sub for graph-edge relationship between consecutive integers in Collatz sequences:

proto sub collatz-edges(|) {*}

multi sub collatz-edges(Int:D $n) {
    ($n mod 3 == 2) ?? [$n => 2 * $n, $n => (2 * $n - 1) / 3] !! [$n => 2 * $n,]
}

multi sub collatz-edges(@edges where @edges.all ~~ Pair:D) {
    my @leafs = @edges».value.unique;
    @edges.append(@leafs.map({ collatz-edges($_.Int) }).flat)
}

# &collatz-edges

For didactic purposes let use derive the edges of a graph using a certain small number of iterations:

my @edges = Pair.new(2, 4);

for ^12 { @edges = collatz-edges(@edges) }

deduce-type(@edges)

# Vector((Any), 536)

Make the graph:

my $g = Graph.new(@edges.map({ $_.value.Str => $_.key.Str })):directed

# Graph(vertexes => 155, edges => 154, directed => True)

Plot the graph using suitable embedding:

#% html
$g.dot(
    engine => 'dot',
    :$background,
    vertex-label-color => 'Gray',
    vertex-shape => 'ellipse',
    vertex-width => 0.8,
    vertex-height => 0.6,
    :24vertex-font-size,
    edge-thickness => 6,
    graph-size => 12
):svg

The Collatz sequence paths can be easily followed in the tree graph.


Big graph

Let us make a bigger, visually compelling graph:

my $root = 64;
my @edges = Pair.new($root, 2 * $root);
for ^20 { @edges = collatz-edges(@edges) }
my $gBig = Graph.new(@edges.map({ $_.value.Str => $_.key.Str })):!directed;

# Graph(vertexes => 2581, edges => 2580, directed => False)

Next we find the path lengths from the root to each vertex in order to do some sort concentric coloring:

my %path-lengths = $gBig.vertex-list.race(:4degree).map({ $_ => $gBig.find-path($_, $root.Str).head.elems });
%path-lengths.values.unique.elems

# 22

We make a blend of these colors:

JavaScript::D3::Utilities::get-named-colors()<darkred plum orange>

# (#8B0000 #DDA0DD #FFA500)

Here is the graph plot:

#%html
my %classes = $gBig.vertex-list.classify({ %path-lengths{$_} });
my @colors = |blend-colors("#8B0000", "#DDA0DD", 16), |blend-colors("#DDA0DD", "#FFA500", %classes.elems - 16);
my %highlight = %classes.map({ @colors[$_.key - 1] => $_.value });

$gBig.dot(
    engine => 'neato',
    :%highlight,
    :$background,
    vertex-shape => 'circle',
    vertex-width => 0.55,
    :0vertex-font-size,
    vertex-color => 'Red',
    vertex-stroke-width => 2,
    edge-thickness => 8,
    edge-color => 'Purple',
    graph-size => 10
):svg


References

Articles, blog posts

[KJR1] KJ Runia, “The Collatz Conjecture”, (2020), OpenCurve.info.

[PZ1] Parker Ziegler, “Playing with the Collatz Conjecture”, (2021), ObservableHQ.

[Wk1] Wikipedia entry, “Collatz conjecture”.

[WMW1] Wolfram Math World entry, “Collatz Problem”.

Packages

[AAp1] Anton Antonov, Graph Raku package, (2024-2025), GitHub/antononcube.

[AAp2] Anton Antonov, JavaScript::D3 Raku package, (2022-2025), GitHub/antononcube.

[AAp3] Anton Antonov, Math::NumberTheory Raku package, (2025), GitHub/antononcube.

Videos

[Vv1] Veritasium, “The Simplest Math Problem No One Can Solve – Collatz Conjecture”, (2021), YouTube@Veritasium.

Geographic Data in Raku Demo

Last weekend I made a demo presentation showcasing the capabilities of the Raku packages:

This post encapsulates the essence of that presentation, offering a walk-through of how these packages can be leveraged to create good, informative geographic visualizations.

Here is the video recording of the presentation, [AAv4]:


Main Packages

The primary focus of our exploration is on two Raku packages:

  1. Data::Geographics: This package provides comprehensive country and city data, which is crucial for geographic data visualization and analysis.
  2. JavaScript::Google::Charts: This package interfaces with Google Charts, an established framework for creating various types of charts, including geographic plots.

Geographic Data Visualization

Data::Geographics: The Protagonist

The “Data::Geographics” package is the star of the presentation. It provides extensive data on countries and cities, which is essential for geographic data visualization and analysis. Initially, I attempted to create geographic plots using JavaScript freehand, but it proved challenging. Instead, I found it more practical to use the “JavaScript::Google::Charts” package, which offers a more structured framework for creating pre-defined chart types.

Creating Geographic Plots

Using the “JavaScript::Google::Charts” package, I demonstrated how to generate geographic plots. For instance, we visualized country data with a simple plot highlighting countries known to the “Data::Geographics” package in shades of green, while unknown regions were depicted in gray. (That is presentation’s “opening image.”)

Notably, Google Charts geo plots get be generated with suitable Large Language Model prompts and directly displayed in Raku chatbooks.

Data Analysis with Data::Geographics

Beyond simple visualization, certain analytical tasks can be done using the country data in “Data::Geographics”. For example, I conducted a rudimentary analysis of gross domestic product (GDP) and electricity production using linear regression.

The package also includes city data, enabling us to perform proximity searches and create neighbor graphs.


Practical Demonstrations

Country Data

Currently, “Data::Geographics” knows about 29 countries (≈195 data elements for each.) Here are the countries:

#% html
use Data::Geographics;
country-data().keys.sort ==> to-html(:multicolumn, columns => 3)
BotswanaHungarySerbia
BrazilIranSlovakia
BulgariaIraqSouthAfrica
CanadaJapanSouthKorea
ChinaMexicoSpain
CzechRepublicNorthKoreaSweden
DenmarkPolandTurkey
FinlandRomaniaUkraine
FranceRussiaUnitedStates
GermanySaudiArabia(Any)

Name Recognition

The package “DSL::Entity::Geographics” was specially made to recognize city and country names, which is particularly useful for conversational agents.

Here is named entity recognition example:

use DSL::Enitity::Geographics;
entity-city-and-state-name('Las Vegas, Nevada', 'Raku::System')

# United_States.Nevada.Las_Vegas

Correlation Plots

We created correlation plots to analyze the relationship between GDP and electricity production. Using Google Charts’ built-in functionality, we plotted regression lines to visualize trends. But Google Charts’ very nice “trend lines” functionality has certain limitations over logarithmic plots. Hence, that gave us the excuse to do linear regression with “Math::Fitting”:

City Data Tabulation and Visualization

City data visualization was another highlight. We filtered city data to display information such as population and location. By integrating Google Maps links, we provided an interactive way to explore city locations.

Tabulation

#% html
@dsCityData.pick(12)
==> { .sort(*<ID>) }()
==> to-html(field-names => <State City Population LocationLink>)
==> { $_.subst(:g, / <?after '<td>'> ('http' .*?) <before '</td>'> /, { "<a href=\"$0\">link</a>" }) }()
StateCityPopulationLocationLink
AlabamaMontgomery200603link
CaliforniaFresno542107link
MassachusettsWorcester206518link
NevadaLas Vegas641903link
TexasEl Paso678815link
VirginiaChesapeake249422link

City locations

Here are city locations plotted with “JavaScript::D3”:

Here are city locations plotted with “JavaScript::Google::Charts”:

Remark: In both plots above Las Vegas, Nevada and cities close to it are given focus.

Proximity Searches

Using the “Math::Nearest” package, we performed proximity searches to find the nearest neighbors of a given city. This feature is particularly useful for geographic analysis and planning.

Graph Visualization

For visualizing neighbor graphs, we used the packages “WWW::MermaidInk” and “JavaScript::D3”. The former interfaces with a web service to generate graph diagrams. The latter has its own built-in graph plotting functionalities. (Based on the force-directed graph plotting component of D3.js.)

Both approaches allow the creation of appealing visual representations of city connections.

Here is a Nearest Neighbor Graph plotted with “JavaScript::D3”:

Here is a Nearest Neighbor Graph plotted with “WWW::MermaidInk”:


Future Plans and Enhancements

While the current capabilities of “Data::Geographics” and “JavaScript::Google::Charts” are impressive, there is always room for improvement. Future plans include:

  • Enhancing the “Math::Fitting” package to support multidimensional regression.
  • Exploring the potential of “JavaScript::D3” for more flexible and advanced visualizations.

Conclusion

In summary, the combination of “Data::Geographics”, “JavaScript::Google::Charts” in Raku provides a powerful toolkit for geographic data visualization and analysis. “JavaScript::D3” is also very applicable exploratory data analysis. The function objects (functors) created by “Math::Nearest” and “Math::Fitting” make them very convenient to use.


References

Articles, blog posts

[AA1] Anton Antonov, “Age at creation for programming languages stats”, (2024), RakuForPrediction.

Packages

[AAp1] Anton Antonov, Data::Geographics Raku package, (2024), GitHub/antononcube.

[AAp2] Anton Antonov, Data::Reshapers Raku package, (2021-2024), GitHub/antononcube.

[AAp3] Anton Antonov, Data::Summarizers Raku package, (2021-2023), GitHub/antononcube.

[AAp4] Anton Antonov, Data::Translators Raku package, (2023-2024), GitHub/antononcube.

[AAp5] Anton Antonov, Data::TypeSystem Raku package, (2023-2024), GitHub/antononcube.

[AAp6] Anton Antonov, DSL::Entity::Geographics Raku package, (2021-2024), GitHub/antononcube.

[AAp7] Anton Antonov, Math::DistanceFunctions Raku package, (2024), GitHub/antononcube.

[AAp8] Anton Antonov, Math::Nearest Raku package, (2024), GitHub/antononcube.

[AAp9] Anton Antonov, JavaScript::D3 Raku package, (2022-2024), GitHub/antononcube.

[AAp10] Anton Antonov, JavaScript::Google::Charts Raku package, (2024), GitHub/antononcube.

Videos

[AAv1] Anton Antonov, “The Raku-ju hijack hack for D3.js”, (2022), YouTube/@AAA4prediction. (7 min.)

[AAv2] Anton Antonov, “Random mandalas generation (with D3.js via Raku)”, (2022), YouTube/@AAA4prediction. (2 min.)

[AAv3] Anton Antonov, “Exploratory Data Analysis with Raku”, (2024), YouTube/@AAA4prediction. (21 min.)

[AAv4] Anton Antonov, “Geographics data in Raku demo”, (2024), YouTube/@AAA4prediction. (37 min.)

Age at creation for programming languages stats

Introduction

In this post (notebook) we ingest programming languages creation data from Programming Language DataBase” and visualize several statistics of it.

We do not examine the data source and we do not want to reason too much about the data using the stats. We started this notebook by just wanting to make the bubble charts (both 2D and 3D.) Nevertheless, we are tempted to say and justify statements like:

  • Pareto holds, as usual.
  • Language creators tend to do it more than once.
  • Beware the Second system effect.

References

Here are reference links with explanations and links to dataset files:


Setup

use Data::Importers;
use Data::Reshapers;
use Data::Summarizers;
use Data::TypeSystem;

use JavaScript::D3;

Data ingestion

Here we ingest the TSV file:

my $url = "https://pldb.io/posts/age.tsv";
my @dsData = data-import($url, headers => 'auto');

deduce-type(@dsData)
# Vector(Assoc(Atom((Str)), Atom((Str)), 13), 214)

Here we define a preferred order of the columns:

my @field-names = ['id', 'name', |(@dsData.head.keys (-) <id name>).keys.sort];
# [id name ageAtCreation appeared creators foundationScore inboundLinksCount measurements numberOfJobsEstimate numberOfUsersEstimate pldbScore rank tags]

Convert suitable column values to integers:

@dsData = @dsData.map({
$_<ageAtCreation> = $_<ageAtCreation>.UInt;
$_<rank> = $_<rank>.Int;
$_<pldbScore> = $_<pldbScore>.Int;
$_<appeared> = $_<appeared>.Int;
$_<numberOfUsersEstimate> = $_<numberOfUsersEstimate>.Int;
$_<numberOfJobsEstimate> = $_<numberOfJobsEstimate>.Int;
$_<foundationScore> = $_<foundationScore>.Int;
$_<measurements> = $_<measurements>.Int;
$_<inboundLinksCount> = $_<inboundLinksCount>.Int;
$_
}).Array;

deduce-type(@dsData)
# Vector(Struct([ageAtCreation, appeared, creators, foundationScore, id, inboundLinksCount, measurements, name, numberOfJobsEstimate, numberOfUsersEstimate, pldbScore, rank, tags], [Int, Int, Str, Int, Str, Int, Int, Str, Int, Int, Int, Int, Str]), 214)

Show summary:

sink records-summary(@dsData, max-tallies => 7, field-names => @field-names.sort[^7]);
sink records-summary(@dsData, max-tallies => 7, field-names => @field-names.sort[7..12]);

Focus languages to be used in the plots below:

my @focusLangs = ["C++", "Fortran", "Java", "Mathematica", "Perl 6", "Raku", "SQL", "Wolfram Language"];
# [C++ Fortran Java Mathematica Perl 6 Raku SQL Wolfram Language]

Here we find the most important tags (used in the plots below):

my @topTags = @dsData.map(*<tags>).&tally.sort({ $_.value }).reverse.head(7)>>.key;
# [pl dataNotation textMarkup library grammarLanguage queryLanguage stylesheetLanguage]

Here we add the column “group” based on the focus languages and most important tags:

@dsData = @dsData.map({ 
$_<group> = do if $_<name> ∈ @focusLangs { "focus" } elsif $_<tags> ∈ @topTags { $_<tags> } else { "other" };
$_
});

deduce-type(@dsData)
# Vector(Struct([ageAtCreation, appeared, creators, foundationScore, group, id, inboundLinksCount, measurements, name, numberOfJobsEstimate, numberOfUsersEstimate, pldbScore, rank, tags], [Int, Int, Str, Int, Str, Str, Int, Int, Str, Int, Int, Int, Int, Str]), 214)

Distributions

Here are the distributions of the variables/columns:

  • age at creation
    • i.e. “How old was the creator?”
  • appeared”
    • i.e. “In what year the programming language was proclaimed?”
#% js
my %opts = title-color => 'Silver', background => 'none', bins => 40, format => 'html', div-id => 'hist';
js-d3-histogram(@dsData.map(*<ageAtCreation>), title => 'Age at creation', |%opts)
~
js-d3-histogram(@dsData.map(*<appeared>), title => 'Appeared', |%opts)

Here are corresponding Box-Whisker plots:

#% js
my %opts = :horizontal, :outliers, title-color => 'Silver', stroke-color => 'White', background => 'none', width => 400, format => 'html', div-id => 'box';
js-d3-box-whisker-chart(@dsData.map(*<ageAtCreation>), title => 'Age at creation', |%opts)
~
js-d3-box-whisker-chart(@dsData.map(*<appeared>), title => 'Appeared', |%opts)

Here are tables of the corresponding statistics:

my @field-names = <ageAtCreation appeared>;
sink records-summary(select-columns(@dsData, @field-names), :@field-names)
# +---------------------+-----------------------+
# | ageAtCreation | appeared |
# +---------------------+-----------------------+
# | Min => 16 | Min => 1948 |
# | 1st-Qu => 30 | 1st-Qu => 1978 |
# | Mean => 36.766355 | Mean => 1993.009346 |
# | Median => 35 | Median => 1994.5 |
# | 3rd-Qu => 42 | 3rd-Qu => 2008 |
# | Max => 70 | Max => 2023 |
# +---------------------+-----------------------+

Pareto principle manifestation

Number of creations

Here is the Pareto principle statistic for the number of created (or renamed) programming languages per creator:

my %creations = @dsData.map(*<creators>).&tally;
my @paretoStats = pareto-principle-statistic(%creations);
@paretoStats.head(6)
# (Niklaus Wirth => 0.037383 Breck Yunits => 0.070093 John Backus => 0.093458 Chris Lattner => 0.11215 Larry Wall => 0.130841 Tim Berners-Lee => 0.149533)

Here is the corresponding plot:

#% js
js-d3-list-plot( @paretoStats>>.value,
title => 'Pareto principle: number languages per creators team',
title-color => 'Silver',
background => 'none',
:grid-lines,
format => 'html',
div-id => 'langPareto'
)

Remark: We can see that ≈30% of the creators correspond to ≈50% of the languages.

Popularity

Obviously, programmers can and do use more than one programming language. Nevertheless, it is interesting to see the Pareto principle plot for the languages “mind share” based on the number of users estimates.

#% js
my %users = @dsData.map({ $_<name> => $_<numberOfUsersEstimate>.Int });
my @paretoStats = pareto-principle-statistic(%users);
say @paretoStats.head(8);

js-d3-list-plot( @paretoStats>>.value,
title => 'Pareto principle: number users per language',
title-color => 'Silver',
background => 'none',
:grid-lines,
format => 'html',
div-id => 'popPareto'
)

Remark: Again, the plot above is “wrong” — programmers use more than one programming language.


Correlations

In order to see meaningful correlation, pairwise plots we take logarithms of the large value columns:

my @corColnames = <appeared ageAtCreation numberOfUsersEstimate numberOfJobsEstimate rank measurements>;
my @dsDataVar = select-columns(@dsData, @corColnames);
@dsDataVar = @dsDataVar.map({
my %h = $_.clone;
%h<numberOfUsersEstimate> = log(%h<numberOfUsersEstimate> + 1, 10);
%h<numberOfJobsEstimate> = log(%h<numberOfJobsEstimate> + 1, 10);
%h
}).Array;

deduce-type(@dsDataVar)

# Vector(Struct([ageAtCreation, appeared, measurements, numberOfJobsEstimate, numberOfUsersEstimate, rank], [Int, Int, Int, Num, Num, Int]), 214)

Here make a Cartesian product of the focus columns and make scatter points plot for each pair of that product:

#% js
(@corColnames X @corColnames)>>.reverse>>.Array.map( -> $c {
my @points = @dsDataVar.map({ %( x => $_{$c.head}, y => $_{$c.tail} ) });
js-d3-list-plot( @points, width => 180, height => 180, x-label => $c.head, y-label => $c.tail, format => 'html', div-id => 'cor')
}).join("\n")

Remark: Given the names of the data columns and the corresponding obvious interpretations we can say that the stronger correlations make sense.


Bubble chart 2D

In this section we make an informative 2D bubble chart with (tooltips).

Here we make a dataset for the bubble chart:

my @dsData2 = @dsData.map({
%( x => $_<appeared>, y => $_<ageAtCreation>, z => log($_<numberOfUsersEstimate>, 10), group => $_<group>, label => "<b>{$_<name>}</b> by {$_<creators>}")
});

deduce-type(@dsData2)
# Vector(Struct([group, label, x, y, z], [Str, Str, Int, Int, Num]), 214)

Here is the bubble chart:

#% js
js-d3-bubble-chart(@dsData2,
z-range-min => 1,
z-range-max => 16,
title-color => 'Silver',
title-font-size => 20,
x-label => "appeared",
y-label => "lg(rank)",
title => 'Age at creation',
width => 1200,
margins => { left => 60, bottom => 50, right => 200},
background => 'none',
:grid-lines,
format => 'html',
div-id => 'bubbleLang'
);

Remark: The programming language J is a clear outlier because of creators’ ages.


Second system effect traces

In this section we try — and fail — to demonstrate that the more programming languages a team of creators makes the less successful those languages are. (Maybe, because they are more cumbersome and suffer the Second system effect?)

Remark: This section is mostly made “for fun.” It is not true that each sets of languages per creators team is made of comparable languages. For example, complementary languages can be in the same set. (See, HTTP, HTML, URL.) Some sets are just made of the same language but with different names. (See, Perl 6 and Raku, and Mathematica and Wolfram Language.) Also, older languages would have the First mover advantage.

Make creators to index association:

my %creators = @dsData.map(*<creators>).&tally.pairs.grep(*.value > 1);
my %nameToIndex = %creators.keys.sort Z=> ^%creators.elems;
%nameToIndex.elems
# 40

Make a bubble chart dataset with relative popularity per creators team:

my @nUsers = @dsData.grep({ %creators{$_<creators>}:exists });

@nUsers = |group-by(@nUsers, <creators>).map({

my $m = max(1, $_.value.map(*<numberOfUsersEstimate>).max.sqrt);

$_.value.map({ %( x => $_<appeared>, y => %nameToIndex{$_<creators>}, z => $_<numberOfUsersEstimate>.sqrt/$m, group => $_<creators>, label => "<b>{$_<name>}</b>" ) })

})>>.Array.flat;

@nUsers .= sort(*<group>);

deduce-type(@nUsers)
# Vector(Struct([group, label, x, y, z], [Str, Str, Int, Int, Num]), 110)

Here is the corresponding bubble chart:

#% js
js-d3-bubble-chart(@nUsers,
z-range-min => 1,
z-range-max => 16,
title => 'Second system effect',
title-color => 'Silver',
title-font-size => 20,
x-label => "appeared",
y-label => "creators",
z-range-min => 3,
z-range-max => 10,
width => 1000,
height => 900,
margins => { left => 60, bottom => 50, right => 200},
background => 'none',
grid-lines => (Whatever, %nameToIndex.elems),
opacity => 0.9,
format => 'html',
div-id => 'secondBubble'
);

From the plot above we cannot decisively say that:

The most recent creation of a team of programming language creators is not team’s most popular creation.

That statement, though, does hold for a fair amount of cases.


References

Articles, notebooks

[AA1] Anton Antonov, “Age at creation for programming languages stats”, (2024), MathematicaForPrediction at WordPress.

[AAn1] Anton Antonov, “Computational exploration for the ages of programming language creators dataset”, (2024), Wolfram Community.

Packages

[AAp1] Anton Antonov, Data::Importers Raku package, (2024), GitHub/antononcube.

[AAp2] Anton Antonov, Data::Reshapers Raku package, (2021-2024), GitHub/antononcube.

[AAp3] Anton Antonov, Data::Summarizers Raku package, (2021-2023), GitHub/antononcube.

[AAp4] Anton Antonov, JavaScript::D3 Raku package, (2022-2024), GitHub/antononcube.

[AAp5] Anton Antonov, Jupyter::Chatbook Raku package, (2023-2024), GitHub/antononcube.

Videos

[AAv1] Anton Antonov, “Exploratory Data Analysis with Raku”, (2024), YouTube/@AAA4Prediction.

Статистики върху възрастта на създателите на езици за програмиране

Въведение

В тази статия (и съответният тефтер) ние зареждаме таблица от данни характеризиращи създаването на различни езици за програмиране от страницата Programming Language DataBase” и визуализираме няколко статистики върху тях.

Ние не разглеждаме тук източника на данните и не желаем особено да разсъждаваме твърде много върху данните. (Използвайки тези статистики и въобще.)

Ние започнахме изчисленията по-долу, просто защто искахме да направим балонни графики (както 2D, така и 3D). Въпреки това, изкушени сме да кажем и обосновем твърдения като:

  • Парето принципа е валиден, както обикновено.
  • Създателите на езици са склонни да го правят повече от веднъж.
  • Внимавайте за проявата на “ефекта на втората система”.

Справки

Ето референтни връзки с обяснения и връзки към файлове с данни:


Подготовка

use Data::Importers;
use Data::Reshapers;
use Data::Summarizers;
use Data::TypeSystem;

use JavaScript::D3;

Зареждане на данни

Тук получаваме TSV файла:

my $url = "https://pldb.io/posts/age.tsv";
my @dsDataLines = data-import($url).lines.map({ $_.split("\t") })>>.Array;
deduce-type(@dsDataLines)
# Vector(Vector(Atom((Str)), 13), 216)

Правим таблицата от данни:

my @field-names = @dsDataLines.head.Array;
my @dsData = @dsDataLines.tail(*-2).map({ @field-names.Array Z=> $_.Array })>>.Hash;

deduce-type(@dsData)
# Vector(Assoc(Atom((Str)), Atom((Str)), 13), 214)

Превръщаме в цели числа стойностите на подходящи колони:

@dsData = @dsData.map({
$_<ageAtCreation> = $_<ageAtCreation>.UInt;
$_<rank> = $_<rank>.Int;
$_<pldbScore> = $_<pldbScore>.Int;
$_<appeared> = $_<appeared>.Int;
$_<numberOfUsersEstimate> = $_<numberOfUsersEstimate>.Int;
$_<numberOfJobsEstimate> = $_<numberOfJobsEstimate>.Int;
$_<foundationScore> = $_<foundationScore>.Int;
$_<measurements> = $_<measurements>.Int;
$_<inboundLinksCount> = $_<inboundLinksCount>.Int;
$_
}).Array;

deduce-type(@dsData)
# Vector(Struct([ageAtCreation, appeared, creators, foundationScore, id, inboundLinksCount, measurements, name, numberOfJobsEstimate, numberOfUsersEstimate, pldbScore, rank, tags], [Int, Int, Str, Int, Str, Int, Int, Str, Int, Int, Int, Int, Str]), 214)

Показване на рекапитулация на таблицата:

sink records-summary(@dsData, max-tallies => 7, field-names => @field-names.sort[^7]);
sink records-summary(@dsData, max-tallies => 7, field-names => @field-names.sort[7..12]);




Списък от езици на фокус, който ще се използва в графиките по-долу:

my @focusLangs = ["C++", "Fortran", "Java", "Mathematica", "Perl 6", "Raku", "SQL", "Wolfram Language"];
# [C++ Fortran Java Mathematica Perl 6 Raku SQL Wolfram Language]

Тук намираме най-важните етикети (“tags”) (използвани в графиките по-долу):

my @topTags = @dsData.map(*<tags>).&tally.sort({ $_.value }).reverse.head(7)>>.key;
# [pl textMarkup dataNotation library grammarLanguage stylesheetLanguage queryLanguage]

Тук добавяме колоната “група” въз основа на езици на фокус и най-важните етикети:

@dsData = @dsData.map({ 
$_<group> = do if $_<name> ∈ @focusLangs { "focus" } elsif $_<tags> ∈ @topTags { $_<tags> } else { "other" };
$_
});

deduce-type(@dsData)
# Vector(Struct([ageAtCreation, appeared, creators, foundationScore, group, id, inboundLinksCount, measurements, name, numberOfJobsEstimate, numberOfUsersEstimate, pldbScore, rank, tags], [Int, Int, Str, Int, Str, Str, Int, Int, Str, Int, Int, Int, Int, Str]), 214)

Разпределения

Ето разпределенията на променливите/колоните:

  • възраст при създаване (“ageAtCreation”)
    • т.е. “На колко години е бил създателят?”
  • година на поява (“appeared”)
    • т.е. “През коя година езикът за програмиране е обявен?”
#% js
my %opts = title-color => 'Silver', background => 'none', bins => 40, format => 'html', div-id => 'hist';
js-d3-histogram(@dsData.map(*<ageAtCreation>), title => 'Възраст при създаване', |%opts)
~
js-d3-histogram(@dsData.map(*<appeared>), title => 'Появил се', |%opts)

Ето съответните Box-Whisker графики:

#% js
my %opts = :horizontal, :outliers, title-color => 'Silver', stroke-color => 'White', background => 'none', width => 400, format => 'html', div-id => 'box';
js-d3-box-whisker-chart(@dsData.map(*<ageAtCreation>), title => 'Възраст при създаване', |%opts)
~
js-d3-box-whisker-chart(@dsData.map(*<appeared>), title => 'Появил се', |%opts)

Ето таблици на съответната статистика:

my @field-names = <ageAtCreation appeared>;
sink records-summary(select-columns(@dsData, @field-names), :@field-names)





Проява на принципа на Парето

Брой творения

Ето статистиката на принципа на Парето за броя на създадените (или само преименувани) езици за програмиране за всеки създател:

my %creations = @dsData.map(*<creators>).&tally;
my @paretoStats = pareto-principle-statistic(%creations);
@paretoStats.head(6)
# (Niklaus Wirth => 0.037383 Breck Yunits => 0.070093 John Backus => 0.093458 Chris Lattner => 0.11215 Larry Wall => 0.130841 Tim Berners-Lee => 0.149533)

Ето съответната графика:

#% js
js-d3-list-plot( @paretoStats>>.value,
title => 'Принцип на Парето: брой езици на екип от създатели',
title-color => 'Silver',
background => 'none',
:grid-lines,
format => 'html',
div-id => 'langPareto'
)

Забележка: Можем да видим, че ≈30% от създателите съответстват на ≈50% от езиците.

Популярност

Очевидно е, че програмистите могат да използват повече от един език за програмиране. Въпреки това е интересно да се види графиката на Парето принципа за “умствения дял” на езиците въз основа на оценките на броя на потребителите.

#% js
my %users = @dsData.map({ $_<name> => $_<numberOfUsersEstimate>.Int });
my @paretoStats = pareto-principle-statistic(%users);
say @paretoStats.head(8);

js-d3-list-plot( @paretoStats>>.value,
title => 'Принцип на Парето: брой потребители на език',
title-color => 'Silver',
background => 'none',
:grid-lines,
format => 'html',
div-id => 'popPareto'
)

Забележка: Отново, графиката по-горе е “грешна” — програмистите използват повече от един език за програмиране.


Корелации

За да видим смислени корелации, (графики на двойки от колони), вземаме логаритми от колоните с големи стойности:

my @corColnames = <appeared ageAtCreation numberOfUsersEstimate numberOfJobsEstimate rank measurements>;
my @dsDataVar = select-columns(@dsData, @corColnames);
@dsDataVar = @dsDataVar.map({
my %h = $_.clone;
%h<numberOfUsersEstimate> = log(%h<numberOfUsersEstimate> + 1, 10);
%h<numberOfJobsEstimate> = log(%h<numberOfJobsEstimate> + 1, 10);
%h
}).Array;

deduce-type(@dsDataVar)

# Vector(Struct([ageAtCreation, appeared, measurements, numberOfJobsEstimate, numberOfUsersEstimate, rank], [Int, Int, Int, Num, Num, Int]), 214)

Тук правим декартово произведение на фокус-колоните и правим точкова графика за всяка двойка от това произведение:

#% js
(@corColnames X @corColnames)>>.reverse>>.Array.map( -> $c {
my @points = @dsDataVar.map({ %( x => $_{$c.head}, y => $_{$c.tail} ) });
js-d3-list-plot( @points, width => 180, height => 180, x-label => $c.head, y-label => $c.tail, format => 'html', div-id => 'cor')
}).join("\n")

Забележка: Като се имат предвид имената на колоните и съответните очевидни интерпретации, можем да кажем, че по-силните корелации имат смисъл.


Балонна графика 2D

В този раздел правим информативна 2D балонна графика (“bubble chart”) с динамични подсказки.

Тук правим масив от асоциации (речници) за балонната графика:

my @dsData2 = @dsData.map({
%( x => $_<appeared>, y => $_<ageAtCreation>, z => log($_<numberOfUsersEstimate>, 10), group => $_<group>, label => "<b>{$_<name>}</b> от {$_<creators>}")
});

deduce-type(@dsData2)
# Vector(Struct([group, label, x, y, z], [Str, Str, Int, Int, Num]), 214)

Ето балонната графика:

#% js
js-d3-bubble-chart(@dsData2,
z-range-min => 1,
z-range-max => 16,
title-color => 'Silver',
title-font-size => 20,
x-label => "появил се",
y-label => "lg(ранг)",
title => 'Възраст при създаване',
width => 1200,
margins => { left => 60, bottom => 50, right => 200},
background => 'none',
:grid-lines,
format => 'html',
div-id => 'bubbleLang'
);

Забележка: Езикът за програмиране J е ясен аутсайдер поради възрастта на създателите му.


Следи от ефекта на втората система

В тази секция се опитваме — и не успяваме — да покажем, че колкото повече езици за програмиране прави един екип от създатели, толкова по-малко успешни са тези езици. (Може би, защото са по-тромави и страдат от ефекта на втората система.)

Забележка: Този раздел е направен предимно “за забавление”. Не е вярно, че всеко множество от езици на екип от създатели е съставен от сравними езици. Например, допълващи се езици могат да бъдат в едно и също множество. (Вижте HTTP, HTML, URL.) Някои множества са направени от един и същ език, но с различни имена. (Вижте Perl 6 и Raku, и Mathematica и Wolfram Language.) Също така, по-старите езици имат предимството на първия ход.

Създаване на асоциация на създатели към индекс:

my %creators = @dsData.map(*<creators>).&tally.pairs.grep(*.value > 1);
my %nameToIndex = %creators.keys.sort Z=> ^%creators.elems;
%nameToIndex.elems
# 40

Създаване на набор от данни за балонна графика с относителна популярност на екип от създатели:

my @nUsers = @dsData.grep({ %creators{$_<creators>}:exists });

@nUsers = |group-by(@nUsers, <creators>).map({

my $m = max(1, $_.value.map(*<numberOfUsersEstimate>).max.sqrt);

$_.value.map({ %( x => $_<appeared>, y => %nameToIndex{$_<creators>}, z => $_<numberOfUsersEstimate>.sqrt/$m, group => $_<creators>, label => "<b>{$_<name>}</b>" ) })

})>>.Array.flat;

@nUsers .= sort(*<group>);

deduce-type(@nUsers)
# Vector(Struct([group, label, x, y, z], [Str, Str, Int, Int, Num]), 110)

Ето съответната балонна графика:

#% js
js-d3-bubble-chart(@nUsers,
z-range-min => 1,
z-range-max => 16,
title => 'Ефект на втората система',
title-color => 'Silver',
title-font-size => 20,
x-label => "появил се",
y-label => "създатели",
z-range-min => 3,
z-range-max => 10,
width => 1000,
height => 900,
margins => { left => 60, bottom => 50, right => 200},
background => 'none',
grid-lines => (Whatever, %nameToIndex.elems),
opacity => 0.9,
format => 'html',
div-id => 'secondBubble'
);

От графиката по-горе не можем категорично да кажем, че:

Най-новото творение на екип от създатели на езици за програмиране не е най-популярното творение на екипа.

Това твърдение обаче е валидно за доста случаи.


Справки

Статии, тефтери

[AA1] Антон Антонов, “Age at creation for programming languages stats”, (2024), MathematicaForPrediction в WordPress.
(Бг.: “Статистики върху възръста на създателите на програмни езици”.)

[AAn1] Антон Антонов, “Computational exploration for the ages of programming language creators dataset”, (2024), Wolfram Community.
(Бг.: “Изчислително проучване за възрастта на създателите на езици за програмиране”.)

Пакети

[AAp1] Антон Антонов, Data::Importers Raku пакет, (2024), GitHub/antononcube.

[AAp2] Антон Антонов, Data::Reshapers Raku пакет, (2021-2024), GitHub/antononcube.

[AAp3] Антон Антонов, Data::Summarizers Raku пакет, (2021-2023), GitHub/antononcube.

[AAp4] Антон Антонов, JavaScript::D3 Raku пакет, (2022-2024), GitHub/antononcube.

[AAp5] Антон Антонов, Jupyter::Chatbook Raku пакет, (2023-2024), GitHub/antononcube.

Видеоклипове

[AAv1] Антон Антонов, “Exploratory Data Analysis with Raku”, (2024), YouTube/@AAA4Prediction.