Conte
nt
begins
in cell
Machine learning tools w
Make smarter decisions with your data using machine le
With just a few steps, you can use the scikit-learn library
and uncover insights—no complicated setup needed. Jus
and bring your data to life with easy-to-read visualizatio
Note: Use of this template requires an active Microsoft 365 subscription.
your
arrow
keys
to
move
to cell
B2.
Start with a dataset
Iris dataset (as Excel table)
sepal_length sepal_width petal_length petal_width
5.1 3.5 1.4 0.2
4.9 3 1.4 0.2
4.7 3.2 1.3 0.2
4.6 3.1 1.5 0.2
5 3.6 1.4 0.2
5.4 3.9 1.7 0.4
4.6 3.4 1.4 0.3
5 3.4 1.5 0.2
4.4 2.9 1.4 0.2
4.9 3.1 1.5 0.1
5.4 3.7 1.5 0.2
4.8 3.4 1.6 0.2
4.8 3 1.4 0.1
4.3 3 1.1 0.1
5.8 4 1.2 0.2
5.7 4.4 1.5 0.4
5.4 3.9 1.3 0.4
5.1 3.5 1.4 0.3
5.7 3.8 1.7 0.3
5.1 3.8 1.5 0.3
5.4 3.4 1.7 0.2
5.1 3.7 1.5 0.4
4.6 3.6 1 0.2
5.1 3.3 1.7 0.5
4.8 3.4 1.9 0.2
5 3 1.6 0.2
5 3.4 1.6 0.4
5.2 3.5 1.5 0.2
5.2 3.4 1.4 0.2
4.7 3.2 1.6 0.2
4.8 3.1 1.6 0.2
5.4 3.4 1.5 0.4
5.2 4.1 1.5 0.1
5.5 4.2 1.4 0.2
4.9 3.1 1.5 0.1
5 3.2 1.2 0.2
5.5 3.5 1.3 0.2
4.9 3.1 1.5 0.1
4.4 3 1.3 0.2
5.1 3.4 1.5 0.2
5 3.5 1.3 0.3
4.5 2.3 1.3 0.3
4.4 3.2 1.3 0.2
5 3.5 1.6 0.6
5.1 3.8 1.9 0.4
4.8 3 1.4 0.3
5.1 3.8 1.6 0.2
4.6 3.2 1.4 0.2
5.3 3.7 1.5 0.2
5 3.3 1.4 0.2
7 3.2 4.7 1.4
6.4 3.2 4.5 1.5
6.9 3.1 4.9 1.5
5.5 2.3 4 1.3
6.5 2.8 4.6 1.5
5.7 2.8 4.5 1.3
6.3 3.3 4.7 1.6
4.9 2.4 3.3 1
6.6 2.9 4.6 1.3
5.2 2.7 3.9 1.4
5 2 3.5 1
5.9 3 4.2 1.5
6 2.2 4 1
6.1 2.9 4.7 1.4
5.6 2.9 3.6 1.3
6.7 3.1 4.4 1.4
5.6 3 4.5 1.5
5.8 2.7 4.1 1
6.2 2.2 4.5 1.5
5.6 2.5 3.9 1.1
5.9 3.2 4.8 1.8
6.1 2.8 4 1.3
6.3 2.5 4.9 1.5
6.1 2.8 4.7 1.2
6.4 2.9 4.3 1.3
6.6 3 4.4 1.4
6.8 2.8 4.8 1.4
6.7 3 5 1.7
6 2.9 4.5 1.5
5.7 2.6 3.5 1
5.5 2.4 3.8 1.1
5.5 2.4 3.7 1
5.8 2.7 3.9 1.2
6 2.7 5.1 1.6
5.4 3 4.5 1.5
6 3.4 4.5 1.6
6.7 3.1 4.7 1.5
6.3 2.3 4.4 1.3
5.6 3 4.1 1.3
5.5 2.5 4 1.3
5.5 2.6 4.4 1.2
6.1 3 4.6 1.4
5.8 2.6 4 1.2
5 2.3 3.3 1
5.6 2.7 4.2 1.3
5.7 3 4.2 1.2
5.7 2.9 4.2 1.3
6.2 2.9 4.3 1.3
5.1 2.5 3 1.1
5.7 2.8 4.1 1.3
6.3 3.3 6 2.5
5.8 2.7 5.1 1.9
7.1 3 5.9 2.1
6.3 2.9 5.6 1.8
6.5 3 5.8 2.2
7.6 3 6.6 2.1
4.9 2.5 4.5 1.7
7.3 2.9 6.3 1.8
6.7 2.5 5.8 1.8
7.2 3.6 6.1 2.5
6.5 3.2 5.1 2
6.4 2.7 5.3 1.9
6.8 3 5.5 2.1
5.7 2.5 5 2
5.8 2.8 5.1 2.4
6.4 3.2 5.3 2.3
6.5 3 5.5 1.8
7.7 3.8 6.7 2.2
7.7 2.6 6.9 2.3
6 2.2 5 1.5
6.9 3.2 5.7 2.3
5.6 2.8 4.9 2
7.7 2.8 6.7 2
6.3 2.7 4.9 1.8
6.7 3.3 5.7 2.1
7.2 3.2 6 1.8
6.2 2.8 4.8 1.8
6.1 3 4.9 1.8
6.4 2.8 5.6 2.1
7.2 3 5.8 1.6
7.4 2.8 6.1 1.9
7.9 3.8 6.4 2
6.4 2.8 5.6 2.2
6.3 2.8 5.1 1.5
6.1 2.6 5.6 1.4
7.7 3 6.1 2.3
6.3 3.4 5.6 2.4
6.4 3.1 5.5 1.8
6 3 4.8 1.8
6.9 3.1 5.4 2.1
6.7 3.1 5.6 2.4
6.9 3.1 5.1 2.3
5.8 2.7 5.1 1.9
6.8 3.2 5.9 2.3
6.7 3.3 5.7 2.5
6.7 3 5.2 2.3
6.3 2.5 5 1.9
6.5 3 5.2 2
6.2 3.4 5.4 2.3
5.9 3 5.1 1.8
aset
species
setosa TIP
setosa Make sure your dataset mainly
has numbers as values. In this
setosa example, the "species" column
setosa will be excluded from the
setosa clustering in the Python formula.
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa GOOD TO KNOW
setosa The target variable (what the
model predicts) must be in
setosa numbers. This is the "species"
setosa column in this example. The
setosa Python script handles this using
label encoding, converting the
setosa categories into numbers:
setosa - setosa -> 0
- versicolor -> 1
setosa - virginica -> 2
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
your
arrow
keys
to
move
to cell
B2.
K-means clustering
GOOD TO KNOW
K-means clustering is an unsupervised machine learning algorithm that groups similar
flowers from three species. The goal is to partition the data into groups with shared
Learn more.
1. Define the number of clusters
TIP
Use the elbow method to find the optimal number of clusters. Look for the elbow
point where the graph sharply changes from steep to flat.
Select cell G14 to see the Python formula ----------> #NAME?
TRY HERE
n_clusters: 10
that groups similar data points into clusters. In this example, the Iris dataset contains measurements o
oups with shared characteristics.
2. Visualize the clusters
Select cell Q11 to see the Python formula ---------->
et contains measurements of iris
usters
hon formula ----------> #NAME?
your
arrow
keys
to
move
to cell
B2.
Logistic regression
GOOD TO KNOW
Logistic regression is a supervised machine learning algorithm used for classification tasks. In
from three species. The goal is to predict a flower's species (setosa, versicolor, or virginica)
petal width).
Learn more.
1. Define the train split percentage
TIP
A train-test split is needed for this model. The
training set teaches the model using examples
with known species, while the testing set
evaluates its performance on new, unseen data.
TRY HERE
train percentage split: 40
Based on the input percentage, the data will
be split as: 40% train, 60% test.
or classification tasks. In this example, the Iris dataset contains measurements of iris flowers
a, versicolor, or virginica) based on its features (sepal length, sepal width, petal length, and
2. Visualize the model's performance
Select cell P11 to see the sample Python formula -----> #NAME?
GOOD TO KNOW
The confusion matrix provides a detailed breakdown of the
model's performance by comparing the true labels with the
predicted labels for each variable. The chart shows the counts of
true positives (TP), true negatives (TN), false positives (FP), and
false negatives (FN).
predicted Definitions
A B C TP: Actual class matches predicte
A TP FN FN FN: Actual class is positive but pre
actual
B FP TP TN FP: Actual class is negative but pr
C FP FP TP
down of the
bels with the
ows the counts of
ositives (FP), and
ass matches predicted class
ass is positive but predicted as another class
ass is negative but predicted as positive
your
arrow
keys
to
move
to cell
B2.
Random forest
GOOD TO KNOW
Random forest is a supervised machine learning algorithm used for classification and feature i
make predictions, essentially a "forest" of trees working together. Each tree in the forest is tra
random subset of features when making splits, a process called randomization. In this examp
flowers from three species. The goal is to determine which features (sepal length, sepal widt
important in predicting the species (setosa, versicolor, or virginica).
Learn more.
1. Define the number of trees
TRY HERE
tree count: 10
TIP
Experiment with different tree counts to see how it
affects the model's feature importance shown in step
2.
ssification and feature importance. It combines multiple decision trees to
h tree in the forest is trained on a random subset of the data and considers a
mization. In this example, the Iris dataset contains measurements of iris
(sepal length, sepal width, petal length, and petal width) are most
a).
2. Your features below
Select cell O10 to see the sample Python formula --------> #NAME?