10-Feb-22
Machine Learning
Divya R, Dept of CS, KU Divya R, Dept of CS, KU
Example Learning Problems What is Machine Learning?
• Ex: A handwriting recognition learning problem: • The term machine learning was first introduced
– Task T: recognizing and classifying handwritten words within by Arthur Samuel in 1959. According to him:
images
– Performance measure P: percent of words correctly classified
– Training experience E: a database of handwritten words with
given classifications
• Machine learning is an application of artificial
• A robot driving learning problem: intelligence that involves algorithms and data that
– Task T: driving on public four-lane highways using vision sensors
automatically analyse and make decision by itself
– Performance measure P: average distance traveled before an
error (as judged by human overseer) without human intervention.
– Training experience E: a sequence of images and steering • It describes how computer perform tasks on their own
commands recorded while observing a human driver by previous experiences.
Divya R, Dept of CS, KU 3 Divya R, Dept of CS, KU 4
1
10-Feb-22
Divya R, Dept of CS, KU Divya R, Dept of CS, KU
Divya R, Dept of CS, KU Divya R, Dept of CS, KU
2
10-Feb-22
How does Machine Learning work?
• A Machine Learning system learns from historical
data, builds the prediction models, and whenever it
receives new data, predicts the output for it.
• The accuracy of predicted output depends upon the
amount of data, as the huge amount of data helps to
build a better model which predicts the output more
accurately.
Divya R, Dept of CS, KU Divya R, Dept of CS, KU
Machine Learning Applications Example: Spam Filter
• Product recommendations in online shopping platforms
• Speech recognition
• Natural language processing
• Sentiment and emotion analysis
• Traffic prediction
• Virtual personal assistant
• Social media services
• Email spam filtering
• Anomaly detection
• Fraud detection and prevention
• Content recommendation (news, music, movies, and so on)
• Weather forecasting
• Stock market forecasting
• Market basket analysis
Divya R, Dept of CS, KU Divya R, Dept of CS, KU
• Object and scene recognition in images and Slide from Mackassy
3
10-Feb-22
Example: Digit Recognition
Divya R, Dept of CS, KU Divya R, Dept of CS, KU
Slide from Mackassy
Divya R, Dept of CS, KU Divya R, Dept of CS, KU
4
10-Feb-22
Issues in Machine Learning
• What algorithms perform best for which type of problems
and representations?
• How much training data is sufficient?
• How can prior knowledge be used?
• How can you choose a useful next training experience?
• How does noisy data influence accuracy?
• How do you reduce a learning problem to a set of function
approximations?
• How can the learner automatically alter its representation to
improve its ability to represent and learn the target
function?
Divya R, Dept of CS, KU Divya R, Dept of CS, KU
Issues in Machine Learning (cont..)
• Not enough training data : Machine learning takes a lot of data for most of the
algorithms to function properly. For a simple task, it needs thousands of examples to
make something out of it, and for advanced tasks like image or speech recognition, it
may need lakhs(millions) of examples.
• Lack of Quality Data: Noisy data, dirty data, and incomplete data are the
quintessential enemies of ideal Machine Learning. The solution to this conundrum is to
take the time to evaluate and scope data with meticulous data governance, data
integration, and data exploration until you get clear data.
• Irrelevant Features: Our training data must always contain more relevant and less to
none irrelevant features.
• Nonrepresentative training data: If train our model by using a nonrepresentative
training set, it won’t be accurate in predictions it will be biased against one class or a
group.
• Overfitting and Underfitting : model performs well on training data but fails to
generalize well. Underfitting happens when our model is too simple to learn something
from the data.
• Lack of Skilled Resources Divya R, Dept of CS, KU Divya R, Dept of CS, KU
5
10-Feb-22
Machine Learning Areas Supervised (Inductive) Learning
• Supervised Learning: Data and corresponding labels • The supervised learning is based on supervision, and
are given it is the same as when a student learns things in the
supervision of the teacher.
• Unsupervised Learning: Only data is given, no
labels provided • It is a technique where the program is given labelled
input data and the expected output data.
• Semi-supervised Learning: Some (if not all) labels • It gets the data from training data containing sets of
are present examples. They generate two kinds of results :
• Reinforcement Learning: An agent interacting with • Classification: They notify the class of the data it is
the world makes observations, takes actions, and is presented with.
rewarded or punished; it should learn to choose • Regression: they expect the product to produce a
actions in such a way as to obtain a lot of reward numerical value.
Divya R, Dept of CS, KU Divya R, Dept of CS, KU
Steps Involved in Supervised Learning Classification
• Classification algorithms are used when the output variable is
• First Determine the type of training dataset. categorical, which means there are two classes such as Yes-
• Collect/Gather the labelled training data. No, Male-Female, True-false, etc.
• Split the training dataset into training dataset, test dataset, • In classification, we predict labels y (classes) for inputs x
and validation dataset. • Examples:
• Determine the input features of the training dataset, which – OCR (input: images, classes: characters)
should have enough knowledge so that the model can – Medical diagnosis (input: symptoms, classes: diseases)
accurately predict the output. – Automatic essay grader (input: document, classes: grades)
• Determine the suitable algorithm for the model, such as – Fraud detection (input: account activity, classes: fraud / no fraud)
support vector machine, decision tree, etc. – Customer service email routing
• Execute the algorithm on the training dataset. Sometimes – Recommended articles in a newspaper, recommended books
we need validation sets as the control parameters, which – DNA and protein sequence identification
are the subset of training datasets. – Categorization and identification of astronomical images
• Evaluate the accuracy of the model by providing the test – Financial investments
set. If the model predicts the correct output, which means – Biometrics: Recognition/authentication using physical and/or
our model is accurate. behavioral characteristics: Face, iris, signature, etc
– … many more
Divya R, Dept of CS, KU Divya R, Dept of CS, KU
6
10-Feb-22
Classification Techniques Illustrating Classification Task
Tid Attrib1 Attrib2 Attrib3 Class
1 Yes Large 125K No
• Decision Tree based Methods 2
3
No
No
Medium
Small
100K
70K
No
No
• Rule-based Methods 4
5
Yes
No
Medium
Large
120K
95K
No
Yes
6 No Medium 60K No
• Naïve Bayes and Bayesian Belief Networks 7 Yes Large 220K No Learn
8 No Small 85K Yes Model
• Neural Networks 10
9
10
No
No
Medium
Small
75K
90K
No
Yes
• Support Vector Machines
Apply
• and more... Tid Attrib1 Attrib2 Attrib3 Class
?
Model
11 No Small 55K
12 Yes Medium 80K ?
13 Yes Large 110K ?
14 No Small 95K ?
15 No Large 67K ?
10
Divya R, Dept of CS, KU Divya R, Dept of CS, KU
Steps
Training Training
Labels
Training
Images
Image Learned
Training
Features model
Testing
Image Learned
Classification
Prediction Object recognition
Features model https://ai.googleblog.com/2014/09/building-deeper-understanding-
of-images.html
Test Image Divya R, Dept of CS, KU
Slide credit: D. Hoiem and L. Lazebnik
Divya R, Dept of CS, KU
7
10-Feb-22
Regression Regression (cont..)
• Regression algorithms are used if there is a relationship • The relationship between variables in the linear
between the input variable and the output variable. regression model can be explained using the below
• Regression is a supervised learning technique which helps image. Here we are predicting the salary of an
in finding the correlation between variables and enables employee on the basis of the year of experience.
us to predict the continuous output variable based on the
one or more predictor variables.
• More specifically, Regression analysis helps us to
understand how the value of the dependent variable is
changing corresponding to an independent variable when
other independent variables are held fixed.
• It is mainly used for prediction, forecasting, time series
modeling, and determining the causal-effect relationship
between variables.
Divya R, Dept of CS, KU Divya R, Dept of CS, KU
Regression (cont..) Regression Types
• Linear Regression
• Example: Price of a • Logistic Regression
used car • Polynomial Regression
• x : car attributes • Support Vector Regression
y : price • Decision Tree Regression
• Random Forest Regression
y = g (x | q )
• Ridge Regression
g ( ) model,
• Lasso Regression:
q parameters
Divya R, Dept of CS, KU Divya R, Dept of CS, KU
8
10-Feb-22
Unsupervised Learning
• Unsupervised learning is a learning method in which a
machine learns without any supervision.
• This type of algorithm consists of input data without
labelled response.
• The training is provided to the machine with the set of
data that has not been labeled, classified, or categorized,
and the algorithm needs to act on that data without any
supervision.
• The goal of unsupervised learning is to restructure the
input data into new features or a group of objects with
similar patterns.
• Methods:
Regression – clustering: group data according to "distance"
Colorize B&W images automatically
– association: find frequent co-occurrences
https://tinyclouds.org/colorize/
Divya R, Dept of CS, KU Divya R, Dept of CS, KU
Clustering
Crime prediction using k-means clustering
http://www.grdjournals.com/uploads/article/GRDJE/V02/I05/0
Divya R, Dept of CS, KU 176/GRDJEV02I050176.pdf Divya R, Dept of CS, KU
9
10-Feb-22
Learning Associations Association Analysis Applications
Method for discovering interesting relationships Market Basket Analysis: given a database of customer
between attributes transactions, where each transaction is a set of items the
Classic application: Market Basket Analysis goal is to find groups of items which are frequently
Given a set of transactions, find rules that will purchased together.
predict the occurrence of an item based on the Bioinformatics
occurrences of other items in the transaction Medical Diagnosis
Web mining
• Basket analysis:
Medical Treatments (each patient is represented as a
P (Y | X ) probability that somebody who buys X also transaction containing the ordered set of diseases)
buys Y where X and Y are products/services.
Example: P ( bread | milk ) = 0.7
Divya R, Dept of CS, KU Divya R, Dept of CS, KU
Reinforcement Learning
• This model is used in making a sequence of decisions.
• Reinforcement learning is a feedback-based learning
method, in which a learning agent gets a reward for each
right action and gets a penalty for each wrong action.
• It is an learning by interacting with the environment.
• It can be said that it is an trail and error method in finding
the best outcome based on experience.
• The goal of an agent is to get the most reward points, and
hence, it improves its performance.
• Example:
– Credit assignment problem
– Game playing
Reinforcement learning
– Robot in a maze
Learning to play Break Out
https://www.youtube.com/watch?v=V1eYniJ0Rnk
Divya R, Dept of CS, KU Divya R, Dept of CS, KU
10
10-Feb-22
The goal of the robot is to get the reward that is the diamond and avoid the
hurdles that are fired. The robot learns by trying all the possible paths and then
choosing the path which gives him the reward with the least hurdles. Each right
step will give the robot a reward and each wrong step will subtract the reward of
the robot. The total reward will be calculated when it reaches the final reward
that is the diamond. Divya R, Dept of CS, KU Divya R, Dept of CS, KU
11