Jamhuriya University of Science & Technology (JUST)
CA416 - Principles of Data Science
Chapter 9: Introduction to Machine Learning & Algorithms.
Shafie Abdi Mohamed
[email protected] 1
Jamhuriya University of Science & Technology (JUST)
Topics
• Introduction to Machine Learning
• Why Machine Learning
• Machine learning life cycle process
• Types of ML
• Find Dataset Repositories in Online Resource
• Lecture Tutorial
2
Jamhuriya University of Science & Technology (JUST)
Introduction to Machine Learning
What do you know about Machine Learning?
Discussion……………..
3
Jamhuriya University of Science & Technology (JUST)
Introduction to Machine Learning
Machine learning is a tool that allows systems the ability to learn and improve automatically based upon
experience. Machine learning does not need specific programming to carry out an activity. Machine learning is
the development of computer programs that can access data, and through a series of algorithms use the data to
learn for itself what action should be taken based on that data.
The primary objective of machine learning is to allow the system to learn automatically without human intervention.
This allows the system to adjust and take action accordingly. The learning process begins with the system observing
reference data and experiences based on that data. The system then begins to understand and learn what actions to
take when specific patterns within a data set present themselves.
4
Jamhuriya University of Science & Technology (JUST)
Introduction to Machine Learning
Field of Machine learning in Data Science
5
Jamhuriya University of Science & Technology (JUST)
Introduction to Machine Learning
• Machine learning is an exciting field and a subset of
artificial intelligence. In another words Machine learning
(ML) is a sort of AI technology (AI) that enables software
applications to improve their prediction accuracy without
being expressly designed to do so.
6
Jamhuriya University of Science & Technology (JUST)
Why Machine Learning?
It gives enterprises a view of trends in customer behavior and business operational patterns, as well as supports the
development of new products. Many of today's leading companies, such as Facebook, Google and Uber, make machine
learning a central part of their operations. Machine learning has become a significant competitive differentiator for
many companies
7
Jamhuriya University of Science & Technology (JUST)
Why Machine Learning?
8
Jamhuriya University of Science & Technology (JUST)
Why Machine Learning?
According to these applications you can see why it is important.
9
Jamhuriya University of Science & Technology (JUST)
Machine learning life cycle process
Machine learning has given the computer systems the abilities to automatically learn without being explicitly programmed.
But how does a machine learning system work? So, it can be described using the life cycle of machine learning. Machine
learning life cycle is a cyclic process to build an efficient machine learning project. The main purpose of the life cycle is to
find a solution to the problem or project.
10
Jamhuriya University of Science & Technology (JUST)
Machine learning life cycle process
11
Jamhuriya University of Science & Technology (JUST)
Machine learning life cycle process
1. Gathering Data: It is the first step of the machine learning life cycle. The goal of this step is to identify and obtain all data-
related problems.
2. Data preparation: It is a step where we put our data into a suitable place and prepare it to use in our machine learning
training(including Data exploration).
3. Data Wrangling: It is the process of cleaning and converting raw data into a useable format(Missing Values, Duplicate data
Invalid data &Noise).
4. Data Analysis: After the cleaned and prepared data is passed on to the analysis step(Selection of analytical techniques,
Building models, Review the result)
12
Jamhuriya University of Science & Technology (JUST)
Machine learning life cycle process
5. Train Model: Now the next step is to train the model, in this step we train our model to improve its performance for better
outcome of the problem.
6. Test Model: Testing the model determines the percentage accuracy of the model as per the requirement of project or
problem.
7. Deployment: The last step of machine learning life cycle is deployment, where we deploy the model in the real-world system.
13
Jamhuriya University of Science & Technology (JUST)
Types of Machine learning
Machine learning involves showing a large volume of data to a machine so that it can learn and make predictions, find
patterns, or classify data. The three machine learning types are supervised, unsupervised, and reinforcement learning.
1. Supervised Learning: uses a training set to teach models to yield the desired output.
2. Unsupervised Learning: uses machine learning algorithms to analyze and cluster unlabeled datasets.
3. Reinforcement Learning: an area of Machine Learning. It is about taking suitable action to maximize reward in a particular
situation
14
Jamhuriya University of Science & Technology (JUST)
Types of Machine learning
15
Jamhuriya University of Science & Technology (JUST)
Types of Machine learning
Types of Machine Learning and Algorithms
Reinforcement can be used:
1. Robotics for industrial
Automation
2. ML & Data processing
3. Create training systems
16
Jamhuriya University of Science & Technology (JUST)
Types of Machine learning- Supervised
Supervised learning is effective for a variety of business purposes, including sales forecasting, inventory
optimization, and fraud detection. Some examples of use cases include:
• Predicting real estate prices
• Classifying whether bank transactions are fraudulent or not
• Finding disease risk factors
• Determining whether loan applicants are low-risk or high-risk
• Predicting the failure of industrial equipment's mechanical parts
17
Jamhuriya University of Science & Technology (JUST)
Find Dataset Repositories in Online Resource
Dataset Finders
• Kaggle: This data science platform has many interesting, user-contributed datasets for cognitive computing.
• The UCI Machine Learning Repository has been a go-to resource for open datasets for decades. Users can also
access the information without registering.
• Dataset Search on Google: Dataset Search has over 25 million datasets from across the internet.
18
Jamhuriya University of Science & Technology (JUST)
Find Dataset Repositories in Online Resource
Searching Dataset: Kaggle
Click the link below
https://www.kaggle.com/
19
Jamhuriya University of Science & Technology (JUST)
Find Dataset Repositories in Online Resource
You may contribute your dataset or download any uploaded datasets.
https://www.kaggle.com/datasets
20
Jamhuriya University of Science & Technology (JUST)
Find Dataset Repositories in Online Resource
Click the link below to the above dataset
https://www.kaggle.com/datasets/hanifalirsyad/coffee-scrap-coffeereview
21
Jamhuriya University of Science & Technology (JUST)
Lecture Tutorial: Linear Regression Project
• In this session, we will use a practical regression project to implement the entire
machine learning pipeline.
• This will be based on a real-world advertising dataset.
• The overall task is to future sales prediction, given past records of sold properties
22
Jamhuriya University of Science & Technology (JUST)
Lecture Tutorial: Linear Regression Project
Linear regression is an algorithm that provides a linear relationship between an independent variable and a
dependent variable to predict the outcome of future events.
The dataset given here contains the data about the sales of the product. The dataset is about the
advertising cost incurred by the business on various advertising platforms. Below is the description of
all the columns in the dataset:
1.TV: Advertising cost spent in dollars for advertising on TV;
2.Radio: Advertising cost spent in dollars for advertising on Radio;
3.Newspaper: Advertising cost spent in dollars for advertising on Newspaper;
4.Sales: Number of units sold;
You can download dataset; https://raw.githubusercontent.com/amankharwal/Website-data/master/advertising.csv
23
Jamhuriya University of Science & Technology (JUST)
Lecture Tutorial: Linear Regression Project
Import necessary Libraries
You may need to
install ploty library
for figure
visualization as
follows
pip install plotly
24
Jamhuriya University of Science & Technology (JUST)
Lecture Tutorial: Linear Regression Project
Getting and preparing the data
Dataset contains
almost 200 records
and 4 variables
25
Jamhuriya University of Science & Technology (JUST)
Lecture Tutorial: Linear Regression Project
Preprocessing Step: Missing and Duplicate values
There are no
duplicate values
in this dataset.
There are no
missing values
in this dataset.
26
Jamhuriya University of Science & Technology (JUST)
Lecture Tutorial: Linear Regression Project
Preprocessing Step: Outliers
This also seems
that there are
no outliers
27
Jamhuriya University of Science & Technology (JUST)
Lecture Tutorial: Linear Regression Project
Visualizing Data
Ordinary Least Squares regression (OLS) is
a common technique for estimating
coefficients of linear regression equations
which describe the relationship between
one or more independent quantitative
variables and a dependent variable (simple
or multiple linear regression)
As the visibility of
sales is increasing
according to
Newspaper
28
Jamhuriya University of Science & Technology (JUST)
Lecture Tutorial: Linear Regression Project
Splitting data into train and test
The split() function
generates the indices for
splitting the data into
training and testing in that
order
29
Jamhuriya University of Science & Technology (JUST)
Lecture Tutorial: Linear Regression Project
Linear regression algorithm and model training
This is prefect score
test which equivalent
90%
30
Jamhuriya University of Science & Technology (JUST)
Lecture Tutorial: Linear Regression Project
Future sales prediction
It means that our sales will
reach 32.9 on average for
all the products of the
company
31