ORANGE DATA MINING NOCODE
TOOL
CASE STUDY PALMER PENGUINS MODEL
Rudransh lamba
X-C
Roll no. 26
1. Problem Scoping
Who
Who are the stakeholders?
Research communities and data scientists of Antarctica.
What do you know about them?
They collect data on different species of penguins.
What
What is the problem?
It is difficult to identify the species of some Palmer
Penguins.
How do you know that it is a problem?
Environmental – Harsh climate and remote, icy terrain.
Biological – Migration, nesting behaviour, and stress
from human contact.
Logistical – Limited access to islands and short research
seasons.
Data gaps – Missing values in the dataset itself point to
real-world collection difficulties and limits.
Where
What is the context/situation in which the stakeholders
experience the problem?
Collecting data from the remote continent of Antarctica.
Where is the problem located?
In Antarctica.
Why
Why will this solution be of value to the stakeholders?
The solution will help predict the species of Palmer
Penguins from the collected data.
How will the situation improve their situation?
They can study the data without being in the harsh
climatic conditions on Antarctica.
Our Research Who
Community
Has a problem It is difficulty to What
that identify the
species of some
Palmer Penguins
When / while Collecting data Where
from the remote
continent of
Antarctica
An ideal solution Predict the Why
would species of
Palmer Penguins
from the
collected data
2. Data Acquisition
Data acquired from
[Link]
ilTyUhmUv4DWT1BFsaCoQ2BmF
[Link]
study-palmer-penguins
3. Data Exploration
- Opening ODM tool
-Insert training and testing data files from google drive
link
- Notice Missing values
- Insert Feature Statistics Widget and connect output of
Train Data to Input of Feature statistics
- Insert Impute widget and connect to Train data
- Remove instances with unknown values
- Connect Feature Statistics to Impute widget
Now the data is clean and without any missing values.
- We need to change the Feature type for species, from
Categorical Feature to Categorical Label.
Add Select columns and connect to Impute widget
- Drag species feature to Target box
- Splitting the data
Insert Data Sampler and connect to Select Columns
- Insert Data info and connect to Data Sampler
- Insert another Data info and connect to Data Sampler
- Double click on Data info(1)
The data has been split now.
4. Modelling and Evaluation
- Insert Test and Score and connect to Data Sampler
- Insert Tree widget and connect to the input of Test and
Score
- Connect Widget Data Sampler to Test and Score again
-Double click on the connection made and Disconnect
Test data from Data sample and connect it with
Remaining Data.
Evaluation
These are the evaluation results for the model.
- Evaluating using another model, Insert Random Forest
widget and connect to Test and Score.
5. Prediction
- Connect Predictions widget with Test Data Output
- Connect Random Forest widget to Prediction widget
(The connection is dotted because we are not feeding it
data yet)
- Connect Data sampler to Random Forest model(The
connecting is now normal)
These are all the predictions made by Random Forest
- We can also connect Data Sampler to Tree model and
connect Tree model to Predictions (Using two models at
the same time)
Final predictions made by both models (Random Forest
and Tree)