1/25/22, 2:16 PM Exploring Microsoft PowerPoint AI, using Python | by Piero Paialunga | Jan, 2022 | Towards Data Science
Get started Open in app
Follow 616K Followers
You have 2 free member-only stories left this month. Sign up for Medium and get an extra one
Photo by Tadas Sar on Unsplash
Exploring Microsoft PowerPoint AI, using
https://towardsdatascience.com/exploring-powerpoint-ai-using-python-75f94d55f8f4 1/18
1/25/22, 2:16 PM Exploring Microsoft PowerPoint AI, using Python | by Piero Paialunga | Jan, 2022 | Towards Data Science
Python
Get started Open in app
Here’s how to replicate the PowerPoint AI using Machine Learning and Python
Piero Paialunga 23 hours ago · 5 min read
A couple of days ago I was working on a PowerPoint presentation for my PhD research
and this happened:
Screenshot made by me. SEE THE ALT TEXT!
https://towardsdatascience.com/exploring-powerpoint-ai-using-python-75f94d55f8f4 2/18
1/25/22, 2:16 PM Exploring Microsoft PowerPoint AI, using Python | by Piero Paialunga | Jan, 2022 | Towards Data Science
It was not the exact same image, it was actually way more explicit, with the x label,y
Get started Open in app
label, title and all of that, but it is not really important right now.
The very interesting thing is the Alt Text. The AI system of PowerPoint is not only able to
detect that we actually have a 2d plot (or Chart) but it recognizes that we are talking
about a boxplot!
Of course, I don’t exactly know how they do this, but as I work with Machine Learning
and Data Science all the days of my life I can try to take a guess. As the readers may
know, the technology that it is very widely used to classify images is known as
Convolutional Neural Networks (CNNs).
They may have used CNNs as a multi-class classifier. Here is an example of a Butterly
image classifier (more than 70 species/classes). A way more complicated thing that they
may have done is image captioning. Nonetheless, CNNs are surely used in their deep
learning algorithm, at the very minimum as basic bricks of something that is much
larger and complex.
In this very small example I will show how it is possible to build a Machine Learning
model that helps you distinguish boxplots and other kinds of plots, for example
lineplots.
Let’s do this.
0. The Libraries
These are the libraries that I used for this notebook:
import keras
import matplotlib.pyplot as plt
import numpy as np
import warnings
warnings.filterwarnings("ignore")
from matplotlib import image
import pandas as pd
import tensorflow as tf
from tensorflow.keras import datasets, layers, models
import numpy as np
import random
from os import listdir
f th i t i fil j i
https://towardsdatascience.com/exploring-powerpoint-ai-using-python-75f94d55f8f4 3/18
1/25/22, 2:16 PM Exploring Microsoft PowerPoint AI, using Python | by Piero Paialunga | Jan, 2022 | Towards Data Science
from os.path import isfile, join
import math
Get started Open in app
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import LabelEncoder
from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Flatten
from keras.layers import Dense
from keras.layers import Dropout
plt.style.use('ggplot')
plt.rcParams['font.family'] = 'sans-serif'
plt.rcParams['font.serif'] = 'Ubuntu'
plt.rcParams['font.monospace'] = 'Ubuntu Mono'
plt.rcParams['font.size'] = 14
plt.rcParams['axes.labelsize'] = 12
plt.rcParams['axes.labelweight'] = 'bold'
plt.rcParams['axes.titlesize'] = 12
plt.rcParams['xtick.labelsize'] = 12
plt.rcParams['ytick.labelsize'] = 12
plt.rcParams['legend.fontsize'] = 12
plt.rcParams['figure.titlesize'] = 12
plt.rcParams['image.cmap'] = 'jet'
plt.rcParams['image.interpolation'] = 'none'
plt rcParams['figure figsize'] = (12 10)
Hosted on Jovian View File
In a few words, I used keras, matplotlib, and a curious library known as
RandomWords that generate random english words. I used it to make up the x and y
axes.
1. Data Generation
The very fun part of this notebook is actually the data generation. I tried to build the
lineplots and boxplots in the most general way as possible, making up the x and y
labels, creating different lines and boxplots, again, in the most general way as
possible.
With this setup that you can virtually create an infinite numbers and kinds of plots. I
created two classes of data and performed a binary classification, but you can slightly
modify the code and create multiple classes.
Let’s dive in:
https://towardsdatascience.com/exploring-powerpoint-ai-using-python-75f94d55f8f4 4/18
1/25/22, 2:16 PM Exploring Microsoft PowerPoint AI, using Python | by Piero Paialunga | Jan, 2022 | Towards Data Science
1.1 Line Plots
Get started Open in app
The code that I used to create the line plot is the following:
def plot_line():
plt.figure(figsize=(10,10))
n_max_line = 10
n_line = np.random.choice(np.arange(1,n_max_line)
x_max,x_min = 10,-10
x_lims = [np.random.choice(np.arange(x_min,0,0.1)
x = np.linspace(min(x_lims),max(x_lims),100)
k_min,k_max = -5,5
for n in range(n_line):
pick_degree = np.random.choice(np.arange(1,5,
y = 0
for degree in range(pick_degree):
k_random = np.random.choice(np.linspace(k
y=y+k_random*x**degree
plt.plot(x,y)
plt.xlabel(r.get_random_word(),fontsize=35)
plt.ylabel(r.get_random_word(),fontsize=35)
#plt.savefig(savename)
Hosted on Jovian View File
It has different degrees of randomness:
The x axis label and y axis label have random names
The x axis limits are random
The y axis show polynomials with random numbers of degrees and random values
of coefficients
The number of lines is random as well
Here is an example:
https://towardsdatascience.com/exploring-powerpoint-ai-using-python-75f94d55f8f4 5/18
1/25/22, 2:16 PM Exploring Microsoft PowerPoint AI, using Python | by Piero Paialunga | Jan, 2022 | Towards Data Science
plot_line()
Get started Open in app
Hosted on Jovian View File
1.2 Box Plots
The code that I used to create the box plot is the following:
def plot_box():
n_max_box = 5
n_box = np.random.choice(np.arange(1,n_max_box))
x_max,x_min = 10,-10
column_names = []
sigma_s = np.arange(1,10,1)
column_values = []
plt.figure(figsize=(10,10))
for n in range(n_box):
x_lims = [np.random.choice(np.arange(x_min,0,
x = np.linspace(min(x_lims),max(x_lims),100)
column_name = r.get_random_word()
column_names.append(column_name)
pick_sigma = x.mean()/np.random.choice(sigma_
pick_sigma = np.abs(pick_sigma)
column_values.append(np.random.normal(x.mean(
column_values = np.array(column_values).T
https://towardsdatascience.com/exploring-powerpoint-ai-using-python-75f94d55f8f4 6/18
1/25/22, 2:16 PM Exploring Microsoft PowerPoint AI, using Python | by Piero Paialunga | Jan, 2022 | Towards Data Science
data = pd.DataFrame(column_values,columns=column_
sns boxplot(data=data)
Get started Open in app
Hosted on Jovian View File
Different degrees of random here as well:
The x axis label and y axis label have random names
The x axis limits are random
The x axis quantities names are random
The y axis show samples from a gaussian distributions with random values of
standard deviation
The number of boxplots is random as well
def plot_box():
n_max_box = 5
n_box = np.random.choice(np.arange(1,n_max_box))
x_max,x_min = 10,-10
column_names = []
sigma_s = np.arange(1,10,1)
column_values = []
plt.figure(figsize=(10,10))
for n in range(n_box):
x_lims = [np.random.choice(np.arange(x_min,0,
x = np.linspace(min(x_lims),max(x_lims),100)
column_name = r.get_random_word()
column_names.append(column_name)
pick_sigma = x.mean()/np.random.choice(sigma_
pick_sigma = np.abs(pick_sigma)
column_values.append(np.random.normal(x.mean(
column_values = np.array(column_values).T
data = pd.DataFrame(column_values,columns=column_
sns boxplot(data=data)
Hosted on Jovian View File
https://towardsdatascience.com/exploring-powerpoint-ai-using-python-75f94d55f8f4 7/18
1/25/22, 2:16 PM Exploring Microsoft PowerPoint AI, using Python | by Piero Paialunga | Jan, 2022 | Towards Data Science
Get started Open in app
plot box()
Hosted on Jovian View File
https://towardsdatascience.com/exploring-powerpoint-ai-using-python-75f94d55f8f4 8/18
1/25/22, 2:16 PM Exploring Microsoft PowerPoint AI, using Python | by Piero Paialunga | Jan, 2022 | Towards Data Science
1.2 Training Set and Test Set
Get started Open in app
Actually, the codes that I used to build the training set and test set are slightly
differences from the one above, that I was using to show you the results. Here is what
you will need:
Here you create the plots:
def plot_line(savename):
plt.figure(figsize=(10,10))
n_max_line = 10
n_line = np.random.choice(np.arange(1,n_max_line)
x_max,x_min = 10,-10
x_lims = [np.random.choice(np.arange(x_min,0,0.1)
x = np.linspace(min(x_lims),max(x_lims),100)
k_min,k_max = -5,5
for n in range(n_line):
pick_degree = np.random.choice(np.arange(1,5,
y = 0
for degree in range(pick_degree):
k_random = np.random.choice(np.linspace(k
y=y+k_random*x**degree
plt.plot(x,y)
plt.xlabel(r.get_random_word(),fontsize=35)
plt.ylabel(r.get_random_word(),fontsize=35)
plt.savefig(savename)
#plt.show()
plt.close()
def plot_box(savename):
n_max_box = 5
n_box = np.random.choice(np.arange(1,n_max_box))
x_max,x_min = 10,-10
column_names = []
sigma_s = np.arange(1,10,1)
l l []
Hosted on Jovian View File
https://towardsdatascience.com/exploring-powerpoint-ai-using-python-75f94d55f8f4 9/18
1/25/22, 2:16 PM Exploring Microsoft PowerPoint AI, using Python | by Piero Paialunga | Jan, 2022 | Towards Data Science
Get started Open in app
Here you create k of them and store them. CREATE A TRAINING SET AND TEST
SET FOLDER FIRST OR IT WON’T WORK!
def build_training_set(num):
mypath = 'TrainingSet/'
for n in range(1,num+1):
print('%i instance has been started'%(n))
plot_box(mypath+'boxplot_'+str(n)+'.png')
print('Boxplot %i has been stored!'%(n))
plot line(mypath+'lineplot '+str(n)+' png')
Hosted on Jovian View File
Here you read them and label them
def extract_training_set():
mypath = 'TrainingSet/'
onlyfiles = [f for f in listdir(mypath) if isfile
training_set_arrays = []
training_set_labels = []
for file in onlyfiles:
split_file = file.split('.')
if split_file[-1]=='png':
training_set_labels.append(split_file[0].
Hosted on Jovian View File
After you define this function, you will have your dataset by doing this:
build_training_set(50)
1 instance has been started
Boxplot 1 has been stored!
https://towardsdatascience.com/exploring-powerpoint-ai-using-python-75f94d55f8f4 10/18
1/25/22, 2:16 PM Exploring Microsoft PowerPoint AI, using Python | by Piero Paialunga | Jan, 2022 | Towards Data Science
Lineplot 1 has been stored!
Get started
2 instance Open
has in app
been started
Boxplot 2 has been stored!
Lineplot 2 has been stored!
3 instance has been started
Boxplot 3 has been stored!
Lineplot 3 has been stored!
4 instance has been started
Boxplot 4 has been stored!
Lineplot 4 has been stored!
5 instance has been started
Boxplot 5 has been stored!
Lineplot 5 has been stored!
6 instance has been started
Boxplot 6 has been stored!
Lineplot 6 has been stored!
7 instance has been started
Boxplot 7 has been stored!
Lineplot 7 has been stored!
Hosted on Jovian View File
Here are some examples of the training set:
plt.figure(figsize=(32,32))
for i in range(9):
plt.subplot(3,3,i+1)
plt.xticks([])
plt.yticks([])
plt.grid(False)
x=np.random.randint(len(X_train))
plt.imshow(X_train[x], cmap=plt.cm.binary)
plt.xlabel(labels_train[x], fontsize=60)
https://towardsdatascience.com/exploring-powerpoint-ai-using-python-75f94d55f8f4 11/18
1/25/22, 2:16 PM Exploring Microsoft PowerPoint AI, using Python | by Piero Paialunga | Jan, 2022 | Towards Data Science
p ( _ [ ], )
plt.show()
Get started Open in app
Hosted on Jovian View File
The exact same process has to be done for the test set and the strings has to be converted
to something more readable to a ML model (sklearn will do this for you with the so
called LabelEncoder feature):
def build_test_set(num):
mypath = 'TestSet/'
for n in range(1,num+1):
print('%i instance has been started'%(n))
plot_box(mypath+'boxplot_'+str(n)+'.png')
print('Boxplot %i has been stored!'%(n))
plot_line(mypath+'lineplot_'+str(n)+'.png')
print('Lineplot %i has been stored!'%(n))
build_test_set(10)
https://towardsdatascience.com/exploring-powerpoint-ai-using-python-75f94d55f8f4 12/18
1/25/22, 2:16 PM Exploring Microsoft PowerPoint AI, using Python | by Piero Paialunga | Jan, 2022 | Towards Data Science
1 instance
Get started has been
Open started
in app
Boxplot 1 has been stored!
Lineplot 1 has been stored!
2 instance has been started
Boxplot 2 has been stored!
Lineplot 2 has been stored!
3 instance has been started
Boxplot 3 has been stored!
Lineplot 3 has been stored!
4 instance has been started
Boxplot 4 has been stored!
Lineplot 4 has been stored!
5 i t h b t t d
Hosted on Jovian View File
2. Machine Learning Model
The Machine Learning model that we are going to use is basically the application of
different Convolutional layers and some Max Pooling operations, it will then end up
with a softmax that will tell you the probability of the image to belong to the first class.
The model that I used was the same of this article I published and you can find more
details about how the structure actually works.
size = X_train[0].shape[0]
classifier = Sequential()
# Step 1 - Convolution
classifier.add(Conv2D(3, (3, 3), input_shape = (size,
classifier.add(MaxPooling2D(pool_size = (2, 2)))
l ifi dd(C 2D(3 (3 3) i t h
https://towardsdatascience.com/exploring-powerpoint-ai-using-python-75f94d55f8f4
( i 13/18
1/25/22, 2:16 PM Exploring Microsoft PowerPoint AI, using Python | by Piero Paialunga | Jan, 2022 | Towards Data Science
classifier.add(Conv2D(3, (3, 3), input_shape = (size,
Get started Open in app
classifier.add(MaxPooling2D(pool_size = (2, 2)))
classifier.add(Flatten())
#classifier.add(Dense(units = 32, activation = 'relu'
classifier.add(Dense(units = 1, activation = 'sigmoid
# Compiling the CNN
classifier.summary()
ERROR! Session/line number was not unique in database.
logging moved to new session 87
Model: "sequential_4"
______________________________________________________
Layer (type) Output Shape
======================================================
conv2d 7 (Conv2D) (None 718 718 3)
Hosted on Jovian View File
Here is how you train and test your model:
train_images, test_images = X_train,X_test
train_images=np.array(train_images)
test_images=np.array(test_images)
train_images, test_images = train_images / 255.0, tes
Train_images=[]
Test_images=[]
for i in range(len(train_images)):
a=train_images[i].reshape(size,size,3)
Train_images.append(a)
Train_images=np.array(Train_images)
for j in range(len(test_images)):
b=test_images[j].reshape(size,size,3)
https://towardsdatascience.com/exploring-powerpoint-ai-using-python-75f94d55f8f4 14/18
1/25/22, 2:16 PM Exploring Microsoft PowerPoint AI, using Python | by Piero Paialunga | Jan, 2022 | Towards Data Science
Test_images.append(b)
Test_images=np.array(Test_images)
Get started Open in app
train_images,test_images=Train_images, Test_images
classifier.compile(optimizer = 'adam', loss = 'binary
history = classifier.fit(train_images, train_labels,
validation_data=(test_images, test_la
Train on 100 samples, validate on 20 samples
Epoch 1/10
100/100 [==============================] - 12s
122ms/step - loss: 15.4656 - accuracy: 0.5200 -
val_loss: 13.8420 - val_accuracy: 0.5000
Epoch 2/10
Hosted on Jovian View File
And as we can see, the final result is perfect. Even if it may sounds exciting, I have to
say that the experiment is pretty easy (we are all able to distinguish a plot with box
and a plot with lines) and the model is more than sufficiently powerful (a little bit of
overkill here).
3. Final Results
As a final prove that the model is correctly distinguish boxplots and lineplots, here are
some examples:
y_pred = classifier.predict(X_test).astype(int)
y_pred_string = le.inverse_transform(y_pred)
y_pred_string
array(['lineplot', 'boxplot', 'boxplot', 'lineplot',
'lineplot'
Hosted on Jovian View File
https://towardsdatascience.com/exploring-powerpoint-ai-using-python-75f94d55f8f4 15/18
1/25/22, 2:16 PM Exploring Microsoft PowerPoint AI, using Python | by Piero Paialunga | Jan, 2022 | Towards Data Science
Get started Open in app
And here are the plots:
plt.figure(figsize=(32,32))
for i in range(9):
plt.subplot(3,3,i+1)
plt.xticks([])
plt.yticks([])
plt.grid(False)
x=np.random.randint(len(X_test))
plt.imshow(X_test[x], cmap=plt.cm.binary)
plt.title('Predicted label: "%s", Real label: "%s
plt.show()
Hosted on Jovian View File
https://towardsdatascience.com/exploring-powerpoint-ai-using-python-75f94d55f8f4 16/18