0% found this document useful (0 votes)

155 views25 pages

Chapter1: Introduction: Notes On MLAPP

The document is chapter 1 of an introduction to machine learning notes. It defines machine learning as using patterns in data to predict future data or make decisions. It describes the main types of machine learning including supervised learning (classification and regression), unsupervised learning (clustering, dimensionality reduction, graph structure, matrix completion), and reinforcement learning. It also discusses basic concepts such as parametric vs non-parametric models, the curse of dimensionality, and examples of parametric models for classification and regression like linear regression.

Uploaded by

wuziqi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

155 views25 pages

Chapter1: Introduction: Notes On MLAPP

Uploaded by

wuziqi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Chapter1: Introduction

Notes on MLAPP

Wu Ziqing

School of Computer Science and Engineering

Nanyang Technological University

13/07/2018

Wu Ziqing (NTU) Chapter1: Introduction 13/07/2018 1 / 25

Outline

1 What is Machine Learning

Definition

2 Types of Machine Learning

Supervised Machine Learning
Classification
Regression
Unsupervised Machine Learning
Clustering
Dimensionality Reduction
Graph Structure
Matrix Completion
Reinforcement Learning

Wu Ziqing (NTU) Chapter1: Introduction 13/07/2018 2 / 25

Outline

3 Basic Concepts in Machine Learning

Parametric and Non-Parametric models
KNN classifier
The Curse of Dimensionality
Parametric Models for Classification and Regression
Linear Regression
Logistic Regression
Overfitting
Model Selection
No Free Lunch Theorem

Wu Ziqing (NTU) Chapter1: Introduction 13/07/2018 3 / 25

Definition of Machine Learning
What is Machine Learning

Definition of Machine Learning:

A set of methods that can automatically detect patterns of data,

and then use the uncovered patterns to predict future data, or to
perform other kinds of decision making under uncertainty.

Wu Ziqing (NTU) Chapter1: Introduction 13/07/2018 4 / 25

Types of Machine Learning

Supervised/Predictive Machine Learning

Unsupervised/Descriptive Machine Learning
Reinforcement Learning

Wu Ziqing (NTU) Chapter1: Introduction 13/07/2018 5 / 25

Supervised Machine Learning
Goal: to learn a mapping from inputs x to output y , given a labelled
input-output set:

D = {(xi , yi )}N
i=1

D: Also called Training Set

N: Number of training samples
xi : Usually is a D-dimensional vector. Can also be complex structured
object: image, text, time series and so on.
Also called the Attribute,Feature, or Covariates.
yi : If it is categorical, the problem is called classification or pattern
recognition.
If it is nominal and real-valued, the problems is called regression.
If it has some natural ordering, it problems is called Ordinal
Regression.
Wu Ziqing (NTU) Chapter1: Introduction 13/07/2018 6 / 25
Supervised Machine Learning
Classification

Types of classification according to output y ∈ {1, ..., C }:

Binary Classification: C = 2, there are only two labels of y : 0 or 1.
Multi-label Classification: C > 2, there are many labels of y .
Single Output Classification: each label of y are mutually
exclusive, thus, each input in x can be only mapped to one label of y .
Multiple Output Classification: labels on y are not mutually
exclusive, thus, each input in x can be mapped to many labels. (e.g.,
a person can be mapped to tall and strong)
This is better view as predicting multiple related binary class labels.
(e.g., a person is tall/not tall, strong/not strong)

Wu Ziqing (NTU) Chapter1: Introduction 13/07/2018 7 / 25

Supervised Machine Learning
Classification

Function Approximation: We can formalise a Multi-label Classification

as follows:
We assume that y = f (x) for some unknown function f (x), our goal
is to estimate f (x) given training set D.
We can then estimate output ŷ = fˆ(x) = argmaxp(y = c|x, D),
which means to find the most probable class label given a novel input
x and a training set D. This is also called Generalisation.

Wu Ziqing (NTU) Chapter1: Introduction 13/07/2018 8 / 25

Supervised Machine Learning
Classification

Some real-world applications of classification:

Document classification (e.g., email spam filtering)

Image classification (e.g., handwriting recognition)
Object detection (e.g., face detection)

Wu Ziqing (NTU) Chapter1: Introduction 13/07/2018 9 / 25

Supervised Machine Learning
Regression

Estimate f (x) to predict Continuous output ŷ given an input x and a

training set D.
fˆ(x) can be models like linear regression, polynomial regression and so on.

Wu Ziqing (NTU) Chapter1: Introduction 13/07/2018 10 / 25

Unsupervised Machine Learning

Goal: Find interesting patterns given only the input data (also known as
knowledge discovery):

D = {xi }N
i=1

The task can be formalise as (unconditional) density estimation, i.e.,

the model should be in the form of:

p(xi |θ)

Wu Ziqing (NTU) Chapter1: Introduction 13/07/2018 11 / 25

Unsupervised Machine Learning
Clustering

To cluster a set of data into groups:

1 find the number of clusters K where K = argmaxK p(K |D)
2 infer which cluster k each data point zi belongs to by
zi = argmaxk p(k|xi , D)

Wu Ziqing (NTU) Chapter1: Introduction 13/07/2018 12 / 25

Unsupervised Machine Learning
Dimensionality Reduction

For high dimensional data, the variability of the data may only appear on a
few latent factors. We can perform dimensionality reduction by
projecting the high dimensional data to lower dimensional subspace to
capture the essence of the data.

Advantages:
better prediction accuracy
enabling fast nearest neighbour searches
facilitates visualisation of high dimensional data

The most common approach of dimensionality reduction is called

Principal Component Analysis (PCA).

Wu Ziqing (NTU) Chapter1: Introduction 13/07/2018 13 / 25

Unsupervised Machine Learning
Graph Structure

We want to learn the correlation of each variables, by a graph structure, in

a data set by computing:

Ĝ = argmaxG p(G |D)

In the graph, each variable is represented by a node and each edge

represents the direct dependence between two variables.

This Sparse Graphical Model can help to:

Gain knowledge by interpreting the graph structure.
Get better joint probability estimators to model correlations and make
predictions.

Wu Ziqing (NTU) Chapter1: Introduction 13/07/2018 14 / 25

Unsupervised Machine Learning
Matrix Completion

Sometimes the designed matrix will have missing values such as NAN (Not
a Number). Matrix Completion (Imputation) can infer plausible missing
values. Matrix Completion can be applied to:
Image inpainting: to denoise/ complete an image.
Collaborative filtering: To complete the missing entries in a matrix.
Market Basket Analysis: in a binary matrix, predict which cell will be
turned on given a few has been turned on. (predict what items will be
purchased together.)

Wu Ziqing (NTU) Chapter1: Introduction 13/07/2018 15 / 25

Reinforcement Learning
Types of Machine Learning

Goal: Learn how to behave given occasional reward/punishment signals.

Wu Ziqing (NTU) Chapter1: Introduction 13/07/2018 16 / 25

Paramatric and Non-Parametric Models

Parametric Model: the model has fixed number of parameters.

Advantage: faster to use
Disadvantage: making stronger assumptions about nature of data
distribution

Non-Parametric Model: its number of parameters grows with the

amount of training data.
Advantage: more flexible
Disadvantage: computationally intractable for large datasets

Wu Ziqing (NTU) Chapter1: Introduction 13/07/2018 17 / 25

Parametric and Non-Parametric models
KNN classifier

K Nearest Neighbour (KNN) classifier is an example of

Non-Parametric classifier. It works as following:
1 looks at K nearest neighbours of a test data x
2 calculate the fraction of its neighbours’ classes:
1 X
p(y = c|x, D, K ) = II(yi = c)
K
i∈NK (x,D)

where i ∈ NK (x, D) denotes the K nearest numbers to x in D and

II(e) is the indicator function which equals to 1 iff e is true.
3 assign a class c to x accordingly

Wu Ziqing (NTU) Chapter1: Introduction 13/07/2018 18 / 25

Parametric and Non-Parametric models
The Curse of Dimensionality

Curse of Dimensionality: With high dimensional data, some algorithm

will perform poorly.

For example in D-dimensional-space KNN, for a hyper cube1 to contain

fraction f of total data as the neighbour data, the expected edge length is
eD (f ) = f 1/D . For a 10-d dataset, even to contain 1% of the data, the
length of the cube needs to extend to 63% of the length along each
dimension.

1
an n-dimensional shape with each line aligned in one dimension, perpendicular to
each other and of the same length
Wu Ziqing (NTU) Chapter1: Introduction 13/07/2018 19 / 25
Parametric Models for Classification and Regression
Linear Regression

Linear regression asserts that the response is a linear function of the input:
y (x) = w T x + = D 2
P
j=1 wj xj + , where = N (µ, σ )

Or it can be written as:

p(y |x, θ) = N (x|µ(x), σ 2 (x)),
where µ(x) = w T x, σ 2 (x) = σ 2 and θ = (w , σ 2 )

We can also replace x with non-linear function of x, Φ(x), to make it

Polynomial Regression:
p(y |x, θ) = N (x|w T Φ(x), σ 2 (x))
Changing the input into a function of input is known as basis function
expansion.
Wu Ziqing (NTU) Chapter1: Introduction 13/07/2018 20 / 25
Parametric Models for Classification and Regression
Logistic Regression

To fit the linear regression to classification, we perform the following two

steps:
1 change the Gaussian Distribution into Bernoulli Distribution, which is
more suitable for binary response:
p(y |x, w ) = Ber (y |xµ(x)), where µ(x) = p(y = 1|x)
2 pass the linear combination of the inputs into a sigmoid function, to
ensure 0 ≤ µ(x) ≤ 1:
1
µ(x) = sigm(w T x), where sigm(η) = 1+e (−η)

If we set a threshold of 0.5,

ŷ (x) = 1 ⇐⇒ p(y = 1|x) > 0.5

Wu Ziqing (NTU) Chapter1: Introduction 13/07/2018 21 / 25

Parametric Models for Classification and Regression
Overfitting

If we try to include every minor variation into the model, we are more
likely to include the noise than the true signal. It is called Overfitting.

Although overfitting may lead to perfect result in training set, it may

cause inaccurate prediction for novel data.

Wu Ziqing (NTU) Chapter1: Introduction 13/07/2018 22 / 25

Parametric Models for Classification and Regression
Model Selection

To determine weather the model is too complexed to overfit, we can

compute the Misclassification Rate, which is the proportion of the
incorrect prediction:

err (f , D) = N1 N
P
i=1 II(f (xi ) 6= yi )

What we are interested in is Generalisation error, which is the expected

value of Misclassification Rate over future data. Since we do not have
future data, we can divide the training data into training set and
validation set.

Wu Ziqing (NTU) Chapter1: Introduction 13/07/2018 23 / 25

Parametric Models for Classification and Regression
Model Selection(Cont.)

If the number of data in validation set is not enough for reliable prediction,
we can use the technique of Cross Validation (CV).
1 Divide the data into K folds.
2 For each fold k ∈ {1, 2, ..., K }, train the model using the rest of the
fold and test against kt h fold.
3 Do the training and testing in a round robin fashion.

Wu Ziqing (NTU) Chapter1: Introduction 13/07/2018 24 / 25

Parametric Models for Classification and Regression
No Free Lunch Theorem

No Free Lunch Theorem: There is no universally best model.

“All models are wrong, but some are useful.” — George Box

Wu Ziqing (NTU) Chapter1: Introduction 13/07/2018 25 / 25

Week 09 Lesson 1 Intro Machine Learning 1 To 32
No ratings yet
Week 09 Lesson 1 Intro Machine Learning 1 To 32
61 pages
Intro to Machine Learning for Data Science
No ratings yet
Intro to Machine Learning for Data Science
37 pages
DAC ML Tutorial Final Deck
No ratings yet
DAC ML Tutorial Final Deck
150 pages
Session 5
No ratings yet
Session 5
36 pages
Machine Learning
No ratings yet
Machine Learning
32 pages
Unit-1 - Machine Learning
No ratings yet
Unit-1 - Machine Learning
85 pages
Unit-1 - Session 1 - Supervised & Unsupervised
No ratings yet
Unit-1 - Session 1 - Supervised & Unsupervised
24 pages
Lecture 2 - Time Series Fundamentals - ML Basics
No ratings yet
Lecture 2 - Time Series Fundamentals - ML Basics
78 pages
Week 01
No ratings yet
Week 01
37 pages
Classification
No ratings yet
Classification
53 pages
ML 1 PPT Unit 1
No ratings yet
ML 1 PPT Unit 1
93 pages
Machine Learning Concepts Explained
No ratings yet
Machine Learning Concepts Explained
17 pages
Unit-4 AML (1. Basics and K-NN)
No ratings yet
Unit-4 AML (1. Basics and K-NN)
25 pages
Machine Learning Concepts and Techniques
No ratings yet
Machine Learning Concepts and Techniques
49 pages
Introduction to Machine Learning Basics
No ratings yet
Introduction to Machine Learning Basics
41 pages
GML Slides 2024 04 29
No ratings yet
GML Slides 2024 04 29
206 pages
Financial Machine Learning-Unit-1: Dr. J.Dhanalakshmi
No ratings yet
Financial Machine Learning-Unit-1: Dr. J.Dhanalakshmi
70 pages
06 Lectureslides LinearClassification Fixed
No ratings yet
06 Lectureslides LinearClassification Fixed
52 pages
Unit 3 in Machine Intelligence
No ratings yet
Unit 3 in Machine Intelligence
62 pages
MLT Study
No ratings yet
MLT Study
22 pages
Python 06 MachineLearning
No ratings yet
Python 06 MachineLearning
45 pages
Project Report 2
No ratings yet
Project Report 2
11 pages
Aiya Session 4
No ratings yet
Aiya Session 4
42 pages
Machine Learning For Beginners
100% (1)
Machine Learning For Beginners
30 pages
Intro to Machine Learning Concepts
No ratings yet
Intro to Machine Learning Concepts
70 pages
Ai Notes V
No ratings yet
Ai Notes V
7 pages
MLP RL1
No ratings yet
MLP RL1
6 pages
105 Machine Learning Paper
No ratings yet
105 Machine Learning Paper
6 pages
Unit-1 ML
No ratings yet
Unit-1 ML
19 pages
Alok Bishoyi's Insights on Machine Learning
100% (1)
Alok Bishoyi's Insights on Machine Learning
7 pages
Machine Learning Overview Guide
No ratings yet
Machine Learning Overview Guide
68 pages
Supervised Learning Guide
No ratings yet
Supervised Learning Guide
46 pages
Machine Learning
No ratings yet
Machine Learning
35 pages
Machine Learning Basics for Beginners
No ratings yet
Machine Learning Basics for Beginners
14 pages
Machinelearning
No ratings yet
Machinelearning
59 pages
PR & ML: CS5691: Machine Learning
No ratings yet
PR & ML: CS5691: Machine Learning
42 pages
Unit 3 ML
No ratings yet
Unit 3 ML
119 pages
Machine Learning Basics for Beginners
100% (5)
Machine Learning Basics for Beginners
134 pages
QSRI Lecture1
No ratings yet
QSRI Lecture1
45 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
Lecture 2
No ratings yet
Lecture 2
22 pages
Developing A Machining Learning Models From Start To Finish.
No ratings yet
Developing A Machining Learning Models From Start To Finish.
59 pages
Intro to Supervised Learning
No ratings yet
Intro to Supervised Learning
55 pages
ChatGPT - Machine Learning Overview
No ratings yet
ChatGPT - Machine Learning Overview
34 pages
Understanding Perceptrons and Deep Learning
No ratings yet
Understanding Perceptrons and Deep Learning
36 pages
Statistic Inference Unit 2 Notes
No ratings yet
Statistic Inference Unit 2 Notes
34 pages
Introduction 1175
No ratings yet
Introduction 1175
58 pages
Supervised Learning in Image Classification
No ratings yet
Supervised Learning in Image Classification
25 pages
Pattern Revision
No ratings yet
Pattern Revision
63 pages
Lec 04
No ratings yet
Lec 04
70 pages
(English (Auto-Generated) ) All Machine Learning Algorithms Explained in 17 Min (DownSub - Com)
No ratings yet
(English (Auto-Generated) ) All Machine Learning Algorithms Explained in 17 Min (DownSub - Com)
19 pages
Machine Learning Report
No ratings yet
Machine Learning Report
58 pages
Machine Learning
No ratings yet
Machine Learning
38 pages
Machine Learning Fundamentals Overview
No ratings yet
Machine Learning Fundamentals Overview
14 pages
Unit 1 Part 3
No ratings yet
Unit 1 Part 3
11 pages
BTMMeeting25Nov2020 StatisticalLearning
No ratings yet
BTMMeeting25Nov2020 StatisticalLearning
49 pages
Data Science Interview - 1
No ratings yet
Data Science Interview - 1
32 pages
Data Mining Notes C3
No ratings yet
Data Mining Notes C3
11 pages
Correlation in Data Mining Explained
No ratings yet
Correlation in Data Mining Explained
12 pages
Data Mining Notes C1
No ratings yet
Data Mining Notes C1
2 pages
Chapter2 Probability
No ratings yet
Chapter2 Probability
45 pages
Understanding Large Language Models
No ratings yet
Understanding Large Language Models
6 pages
Machine Learning 2
No ratings yet
Machine Learning 2
7 pages
Asian Ethnicity Recognition with YOLO
No ratings yet
Asian Ethnicity Recognition with YOLO
5 pages
Multi-layer Neural Networks Basics
No ratings yet
Multi-layer Neural Networks Basics
55 pages
Guided Backpropagation
No ratings yet
Guided Backpropagation
11 pages
ML Chapter 3
No ratings yet
ML Chapter 3
25 pages
Deep Learning Image Colorization Guide
No ratings yet
Deep Learning Image Colorization Guide
19 pages
Top Pattern Recognition Journals
No ratings yet
Top Pattern Recognition Journals
1 page
Executive PG Programme in Data Science
100% (1)
Executive PG Programme in Data Science
51 pages
DL syllabus-PHD
No ratings yet
DL syllabus-PHD
3 pages
Trust and Reputation in Multi
No ratings yet
Trust and Reputation in Multi
3 pages
Deep Learning All Modules
No ratings yet
Deep Learning All Modules
445 pages
RNNs: A Guide for AI Enthusiasts
No ratings yet
RNNs: A Guide for AI Enthusiasts
21 pages
Artificial Vs ML
No ratings yet
Artificial Vs ML
7 pages
Thesis Presentation2
No ratings yet
Thesis Presentation2
43 pages
SCADA Attack Detection via Autoencoder
No ratings yet
SCADA Attack Detection via Autoencoder
8 pages
FAI Syllabus
No ratings yet
FAI Syllabus
4 pages
Data Augmnetation
No ratings yet
Data Augmnetation
35 pages
PRML Handout
No ratings yet
PRML Handout
3 pages
2024 MTH058 Lecture09 Meta Learning
No ratings yet
2024 MTH058 Lecture09 Meta Learning
25 pages
Pratiksha Trust Initiative Overview 2023
No ratings yet
Pratiksha Trust Initiative Overview 2023
24 pages
Machine Learning Concepts and Techniques
No ratings yet
Machine Learning Concepts and Techniques
2 pages
Predicting NBA Games Using Neural Networks
100% (5)
Predicting NBA Games Using Neural Networks
18 pages
Final
No ratings yet
Final
12 pages
ML Labs
No ratings yet
ML Labs
46 pages
Delineating The Effective Use of Self-Supervised Learning in Single Cell Genomics
No ratings yet
Delineating The Effective Use of Self-Supervised Learning in Single Cell Genomics
14 pages
LLMs in Planning: Insights and Applications
No ratings yet
LLMs in Planning: Insights and Applications
97 pages
Volume 1-Tech - Trends - Volume - 1 - AI - Shared - by - VN - Market - Report - Community
No ratings yet
Volume 1-Tech - Trends - Volume - 1 - AI - Shared - by - VN - Market - Report - Community
64 pages
Sentiment Analysis Using Convolutional Neural Network
No ratings yet
Sentiment Analysis Using Convolutional Neural Network
6 pages
Deepfake - Content-3
No ratings yet
Deepfake - Content-3
81 pages

Chapter1: Introduction: Notes On MLAPP

Uploaded by

Chapter1: Introduction: Notes On MLAPP

Uploaded by

Chapter1: Introduction

School of Computer Science and Engineering

Wu Ziqing (NTU) Chapter1: Introduction 13/07/2018 1 / 25

1 What is Machine Learning

2 Types of Machine Learning

Wu Ziqing (NTU) Chapter1: Introduction 13/07/2018 2 / 25

3 Basic Concepts in Machine Learning

Wu Ziqing (NTU) Chapter1: Introduction 13/07/2018 3 / 25

Definition of Machine Learning:

A set of methods that can automatically detect patterns of data,

Wu Ziqing (NTU) Chapter1: Introduction 13/07/2018 4 / 25

Supervised/Predictive Machine Learning

Wu Ziqing (NTU) Chapter1: Introduction 13/07/2018 5 / 25

D: Also called Training Set

Types of classification according to output y ∈ {1, ..., C }:

Wu Ziqing (NTU) Chapter1: Introduction 13/07/2018 7 / 25

Function Approximation: We can formalise a Multi-label Classification

Wu Ziqing (NTU) Chapter1: Introduction 13/07/2018 8 / 25

Some real-world applications of classification:

Document classification (e.g., email spam filtering)

Wu Ziqing (NTU) Chapter1: Introduction 13/07/2018 9 / 25

Estimate f (x) to predict Continuous output ŷ given an input x and a

Wu Ziqing (NTU) Chapter1: Introduction 13/07/2018 10 / 25

The task can be formalise as (unconditional) density estimation, i.e.,

Wu Ziqing (NTU) Chapter1: Introduction 13/07/2018 11 / 25

To cluster a set of data into groups:

Wu Ziqing (NTU) Chapter1: Introduction 13/07/2018 12 / 25

The most common approach of dimensionality reduction is called

Wu Ziqing (NTU) Chapter1: Introduction 13/07/2018 13 / 25

We want to learn the correlation of each variables, by a graph structure, in

Ĝ = argmaxG p(G |D)

In the graph, each variable is represented by a node and each edge

This Sparse Graphical Model can help to:

Wu Ziqing (NTU) Chapter1: Introduction 13/07/2018 14 / 25

Wu Ziqing (NTU) Chapter1: Introduction 13/07/2018 15 / 25

Goal: Learn how to behave given occasional reward/punishment signals.

Wu Ziqing (NTU) Chapter1: Introduction 13/07/2018 16 / 25

Parametric Model: the model has fixed number of parameters.

Non-Parametric Model: its number of parameters grows with the

Wu Ziqing (NTU) Chapter1: Introduction 13/07/2018 17 / 25

K Nearest Neighbour (KNN) classifier is an example of

where i ∈ NK (x, D) denotes the K nearest numbers to x in D and

Wu Ziqing (NTU) Chapter1: Introduction 13/07/2018 18 / 25

Curse of Dimensionality: With high dimensional data, some algorithm

For example in D-dimensional-space KNN, for a hyper cube1 to contain

Or it can be written as:

We can also replace x with non-linear function of x, Φ(x), to make it

To fit the linear regression to classification, we perform the following two

If we set a threshold of 0.5,

ŷ (x) = 1 ⇐⇒ p(y = 1|x) > 0.5

Wu Ziqing (NTU) Chapter1: Introduction 13/07/2018 21 / 25

Although overfitting may lead to perfect result in training set, it may

Wu Ziqing (NTU) Chapter1: Introduction 13/07/2018 22 / 25

To determine weather the model is too complexed to overfit, we can

What we are interested in is Generalisation error, which is the expected

Wu Ziqing (NTU) Chapter1: Introduction 13/07/2018 23 / 25

Wu Ziqing (NTU) Chapter1: Introduction 13/07/2018 24 / 25

No Free Lunch Theorem: There is no universally best model.

Wu Ziqing (NTU) Chapter1: Introduction 13/07/2018 25 / 25

You might also like