0% found this document useful (0 votes)
6 views199 pages

Introduction

The document provides an introduction to Machine Learning (ML), defining it as a field that enables computers to learn from data without being explicitly programmed. It discusses the differences between traditional programming and ML, outlines when ML is applicable, and presents various problems that require ML solutions, such as handwriting recognition and automated driving. Additionally, it highlights the essential components of a machine learning algorithm, including datasets and model building.

Uploaded by

anjurishi141
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views199 pages

Introduction

The document provides an introduction to Machine Learning (ML), defining it as a field that enables computers to learn from data without being explicitly programmed. It discusses the differences between traditional programming and ML, outlines when ML is applicable, and presents various problems that require ML solutions, such as handwriting recognition and automated driving. Additionally, it highlights the essential components of a machine learning algorithm, including datasets and model building.

Uploaded by

anjurishi141
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 199

Introduction to Machine Learning1

MA325 - Machine Learning

A. Senthil Thilak

Department of Mathematical and Computational Sciences


National Institute of Technology Karnataka

1
Topics from reference (Mitchell, 1997): Chapter 1
A. Senthil Thilak (NITK) Introduction to Machine Learning 1 / 44
What is Machine Learning (ML)?

A. Senthil Thilak (NITK) Introduction to Machine Learning 2 / 44


What is Machine Learning (ML)?

• ARTHUR SAMUEL (1959) - A field of study that gives Computers the


ability to learn without being explicitly programmed.

A. Senthil Thilak (NITK) Introduction to Machine Learning 2 / 44


What is Machine Learning (ML)?

• ARTHUR SAMUEL (1959) - A field of study that gives Computers the


ability to learn without being explicitly programmed.
• TOM M. MITCHELL (1998) - The study of algorithms that
• improve their performance P
• at some task T
• with experience E

A. Senthil Thilak (NITK) Introduction to Machine Learning 2 / 44


What is Machine Learning (ML)?

• ARTHUR SAMUEL (1959) - A field of study that gives Computers the


ability to learn without being explicitly programmed.
• TOM M. MITCHELL (1998) - The study of algorithms that
• improve their performance P
• at some task T
• with experience E
A well-defined learning task is given by < P, T, E >.

A. Senthil Thilak (NITK) Introduction to Machine Learning 2 / 44


What is Machine Learning (ML)?

• ARTHUR SAMUEL (1959) - A field of study that gives Computers the


ability to learn without being explicitly programmed.
• TOM M. MITCHELL (1998) - The study of algorithms that
• improve their performance P
• at some task T
• with experience E
A well-defined learning task is given by < P, T, E >.
• HERBERT ALEXANDER SIMON (Father of AI) (1950s) - A process
by which a system improves its performance from experience (or) ML is
concerned with Computer programs that automatically improve their
performance through experience.

A. Senthil Thilak (NITK) Introduction to Machine Learning 2 / 44


What is Machine Learning (ML)?

• Precisely, ML is the branch of Computer Science that deals with


extraction of knowledge from data.
• ML algorithms are those that automate the decision making process from
known examples.
• At the intersection of Statistics, Artificial Intelligence and Computer
Science/programming.
• Also known as Predictive analytics or Statistical Learning and is a part
of data-driven research domain.
• Applications have become ubiquitous in everyday life
- From automatic recommendations of movies to watch, to what food to
order or which products to buy, to personalized online radio &
recognizing your friends in your photos, etc.

A. Senthil Thilak (NITK) Introduction to Machine Learning 3 / 44


Rule-based/Traditional Programming vs. Machine Learning

• Rule-based/Traditional Programming:

A. Senthil Thilak (NITK) Introduction to Machine Learning 4 / 44


Rule-based/Traditional Programming vs. Machine Learning

Traditional Programming
• Rule-based/Traditional Programming:

Data
Computer Output
Program

Machine Learning

Data
Computer Program
Output

4
Slide credit: Pedro Domingos

A. Senthil Thilak (NITK) Introduction to Machine Learning 4 / 44


Rule-based/Traditional Programming vs. Machine Learning

Traditional Programming
Traditional Programming
• Rule-based/Traditional Programming:
Data
Data Computer Output
Output
Program Computer
Program

• Machine
MachineMachineLearning
Learning
Learning:

Data
Data Computer Program
Program
Output Computer
Output
4
Slide credit: Pedro Domingos 4
Slide credit: Pedro Domingos

A. Senthil Thilak (NITK) Introduction to Machine Learning 4 / 44


When do we use ML?

A. Senthil Thilak (NITK) Introduction to Machine Learning 5 / 44


When do we use ML?

• Human expertise does not exist (Space Navigation)

A. Senthil Thilak (NITK) Introduction to Machine Learning 5 / 44


When do we use ML?

• Human expertise does not exist (Space Navigation)


• Humans can not explain their expertise (Speech Recognition)

A. Senthil Thilak (NITK) Introduction to Machine Learning 5 / 44


When do we use ML?

• Human expertise does not exist (Space Navigation)


• Humans can not explain their expertise (Speech Recognition)
• Models require customization (Personalized Medicine)

A. Senthil Thilak (NITK) Introduction to Machine Learning 5 / 44


When do we use ML?

• Human expertise does not exist (Space Navigation)


• Humans can not explain their expertise (Speech Recognition)
• Models require customization (Personalized Medicine)
• Models depend on huge amount of data (Genomics)

A. Senthil Thilak (NITK) Introduction to Machine Learning 5 / 44


When do we use ML?

• Human expertise does not exist (Space Navigation)


• Humans can not explain their expertise (Speech Recognition)
• Models require customization (Personalized Medicine)
• Models depend on huge amount of data (Genomics)

Learning is not always necessary/useful!!! - There is no need to “learn” to


calculate Payroll.

A. Senthil Thilak (NITK) Introduction to Machine Learning 5 / 44


Problems that require ML
Problems that require ML
A classic example of a task that requires machine learning:
• Reading hand-written It is codes:
very hard to say what makes a 2

6
Slide credit: Geoffrey Hinton

Very hard to identify what makes the digit 2!!!


Problems that require ML
A classic example of a task that requires machine learning:
• Reading hand-written It is codes:
very hard to say what makes a 2

6
Slide credit: Geoffrey Hinton

Very hard to identify what makes the digit 2!!!


• Recognizing face from a digital image:
Problems that require ML
A classic example of a task that requires machine learning:
• Reading hand-written It is codes:
very hard to say what makes a 2

6
Slide credit: Geoffrey Hinton

Very hard to identify what makes the digit 2!!!


• Recognizing face from a digital image: The main problem is the way in
which the pixels are perceived by a computer is different from how
humans perceive a face!!!
Few more Problems that require ML

A. Senthil Thilak (NITK) Introduction to Machine Learning 7 / 44


Few more Problems that require ML

• Recognizing patterns
- Facial identities of facial expressions
- Handwritten or spoken words
- Medical images

A. Senthil Thilak (NITK) Introduction to Machine Learning 7 / 44


Few more Problems that require ML

• Recognizing patterns
- Facial identities of facial expressions
- Handwritten or spoken words
- Medical images
• Generating patterns
- Generating images or motion sequences

A. Senthil Thilak (NITK) Introduction to Machine Learning 7 / 44


Few more Problems that require ML

• Recognizing patterns
- Facial identities of facial expressions
- Handwritten or spoken words
- Medical images
• Generating patterns
- Generating images or motion sequences
• Recognizing anomalies
- Suspicious/Unusual credit card transactions
- Unusual patterns of sensor readings in a nuclear power plant

A. Senthil Thilak (NITK) Introduction to Machine Learning 7 / 44


Few more Problems that require ML

• Recognizing patterns
- Facial identities of facial expressions
- Handwritten or spoken words
- Medical images
• Generating patterns
- Generating images or motion sequences
• Recognizing anomalies
- Suspicious/Unusual credit card transactions
- Unusual patterns of sensor readings in a nuclear power plant
• Prediction
- Future stock prices or currency exchange rates
- Weather forecasting

A. Senthil Thilak (NITK) Introduction to Machine Learning 7 / 44


Learning Task - Some well-posed Learning problems:
Learning: Improve task T with respect to performance metric P with
experience E

A. Senthil Thilak (NITK) Introduction to Machine Learning 8 / 44


Learning Task - Some well-posed Learning problems:
Learning: Improve task T with respect to performance metric P with
experience E
• The Checkers Learning Problem:

A. Senthil Thilak (NITK) Introduction to Machine Learning 8 / 44


Learning Task - Some well-posed Learning problems:
Learning: Improve task T with respect to performance metric P with
experience E
• The Checkers Learning Problem:
• T: Playing checkers
• P: % of games won against an opponent
• E: Playing practice games against itself

A. Senthil Thilak (NITK) Introduction to Machine Learning 8 / 44


Learning Task - Some well-posed Learning problems:
Learning: Improve task T with respect to performance metric P with
experience E
• The Checkers Learning Problem:
• T: Playing checkers
• P: % of games won against an opponent
• E: Playing practice games against itself
• Handwriting Recognition Learning Problem:

A. Senthil Thilak (NITK) Introduction to Machine Learning 8 / 44


Learning Task - Some well-posed Learning problems:
Learning: Improve task T with respect to performance metric P with
experience E
• The Checkers Learning Problem:
• T: Playing checkers
• P: % of games won against an opponent
• E: Playing practice games against itself
• Handwriting Recognition Learning Problem:
• T: Recognizing and Classifying handwritten words within images
• P: % of words correctly classified
• E: Database of handwritten words with given human classifications

A. Senthil Thilak (NITK) Introduction to Machine Learning 8 / 44


Learning Task - Some well-posed Learning problems:
Learning: Improve task T with respect to performance metric P with
experience E
• The Checkers Learning Problem:
• T: Playing checkers
• P: % of games won against an opponent
• E: Playing practice games against itself
• Handwriting Recognition Learning Problem:
• T: Recognizing and Classifying handwritten words within images
• P: % of words correctly classified
• E: Database of handwritten words with given human classifications
• Robotic/Automated driving Learning Problem:

A. Senthil Thilak (NITK) Introduction to Machine Learning 8 / 44


Learning Task - Some well-posed Learning problems:
Learning: Improve task T with respect to performance metric P with
experience E
• The Checkers Learning Problem:
• T: Playing checkers
• P: % of games won against an opponent
• E: Playing practice games against itself
• Handwriting Recognition Learning Problem:
• T: Recognizing and Classifying handwritten words within images
• P: % of words correctly classified
• E: Database of handwritten words with given human classifications
• Robotic/Automated driving Learning Problem:
• T: Driving on public four-lane highways using vision sensors
• P: Average distance travelled before a human-judged error
• E: A sequence of images and steering commands recorded while observing a
human driver

A. Senthil Thilak (NITK) Introduction to Machine Learning 8 / 44


Learning Task - Some well-posed Learning problems:
Learning: Improve task T with respect to performance metric P with
experience E
• The Checkers Learning Problem:
• T: Playing checkers
• P: % of games won against an opponent
• E: Playing practice games against itself
• Handwriting Recognition Learning Problem:
• T: Recognizing and Classifying handwritten words within images
• P: % of words correctly classified
• E: Database of handwritten words with given human classifications
• Robotic/Automated driving Learning Problem:
• T: Driving on public four-lane highways using vision sensors
• P: Average distance travelled before a human-judged error
• E: A sequence of images and steering commands recorded while observing a
human driver
• Spam filtering Learning Problem:

A. Senthil Thilak (NITK) Introduction to Machine Learning 8 / 44


Learning Task - Some well-posed Learning problems:
Learning: Improve task T with respect to performance metric P with
experience E
• The Checkers Learning Problem:
• T: Playing checkers
• P: % of games won against an opponent
• E: Playing practice games against itself
• Handwriting Recognition Learning Problem:
• T: Recognizing and Classifying handwritten words within images
• P: % of words correctly classified
• E: Database of handwritten words with given human classifications
• Robotic/Automated driving Learning Problem:
• T: Driving on public four-lane highways using vision sensors
• P: Average distance travelled before a human-judged error
• E: A sequence of images and steering commands recorded while observing a
human driver
• Spam filtering Learning Problem:
• T: Categorize email messages as spam or legitimate
• P: % of email messages correctly classified
• E: Database of emails with human-given labels as spam/legitimate
A. Senthil Thilak (NITK) Introduction to Machine Learning 8 / 44
Building blocks of a Machine Learning Algorithm

A. Senthil Thilak (NITK) Introduction to Machine Learning 9 / 44


Building blocks of a Machine Learning Algorithm

1 Dataset

A. Senthil Thilak (NITK) Introduction to Machine Learning 9 / 44


Building blocks of a Machine Learning Algorithm

1 Dataset (Structured/Unstructured)

A. Senthil Thilak (NITK) Introduction to Machine Learning 9 / 44


Building blocks of a Machine Learning Algorithm

1 Dataset (Structured/Unstructured) (PRIMARY)

A. Senthil Thilak (NITK) Introduction to Machine Learning 9 / 44


Building blocks of a Machine Learning Algorithm

1 Dataset (Structured/Unstructured) (PRIMARY)


• Data scrubbing (if needed) - The Process of refining the dataset to make it
workable by modifying and/or removing incomplete, incorrectly
formatted, irrelevant or duplicate data.
• Training dataset
• Testing dataset

A. Senthil Thilak (NITK) Introduction to Machine Learning 9 / 44


Building blocks of a Machine Learning Algorithm

1 Dataset (Structured/Unstructured) (PRIMARY)


• Data scrubbing (if needed) - The Process of refining the dataset to make it
workable by modifying and/or removing incomplete, incorrectly
formatted, irrelevant or duplicate data.
• Training dataset
• Testing dataset
2 Building a Learning Model/Algorithm

A. Senthil Thilak (NITK) Introduction to Machine Learning 9 / 44


Building blocks of a Machine Learning Algorithm

1 Dataset (Structured/Unstructured) (PRIMARY)


• Data scrubbing (if needed) - The Process of refining the dataset to make it
workable by modifying and/or removing incomplete, incorrectly
formatted, irrelevant or duplicate data.
• Training dataset
• Testing dataset
2 Building a Learning Model/Algorithm (Multiple approaches exist!!!)

A. Senthil Thilak (NITK) Introduction to Machine Learning 9 / 44


Building blocks of a Machine Learning Algorithm

1 Dataset (Structured/Unstructured) (PRIMARY)


• Data scrubbing (if needed) - The Process of refining the dataset to make it
workable by modifying and/or removing incomplete, incorrectly
formatted, irrelevant or duplicate data.
• Training dataset
• Testing dataset
2 Building a Learning Model/Algorithm (Multiple approaches exist!!!)
3 Performance/Error analysis

A. Senthil Thilak (NITK) Introduction to Machine Learning 9 / 44


Building blocks of a Machine Learning Algorithm

1 Dataset (Structured/Unstructured) (PRIMARY)


• Data scrubbing (if needed) - The Process of refining the dataset to make it
workable by modifying and/or removing incomplete, incorrectly
formatted, irrelevant or duplicate data.
• Training dataset
• Testing dataset
2 Building a Learning Model/Algorithm (Multiple approaches exist!!!)
3 Performance/Error analysis
4 Error Minimization & Model generalization

A. Senthil Thilak (NITK) Introduction to Machine Learning 9 / 44


Building blocks of a Machine Learning Algorithm

1 Dataset (Structured/Unstructured) (PRIMARY)


• Data scrubbing (if needed) - The Process of refining the dataset to make it
workable by modifying and/or removing incomplete, incorrectly
formatted, irrelevant or duplicate data.
• Training dataset
• Testing dataset
2 Building a Learning Model/Algorithm (Multiple approaches exist!!!)
3 Performance/Error analysis
4 Error Minimization & Model generalization

A. Senthil Thilak (NITK) Introduction to Machine Learning 9 / 44


Anatomy of a Dataset

A. Senthil Thilak (NITK) Introduction to Machine Learning 10 / 44


Anatomy of a Dataset

• Constitutes the input variables called

A. Senthil Thilak (NITK) Introduction to Machine Learning 10 / 44


Anatomy of a Dataset

• Constitutes the input variables called Features needed for prediction and

A. Senthil Thilak (NITK) Introduction to Machine Learning 10 / 44


Anatomy of a Dataset

• Constitutes the input variables called Features needed for prediction and
possibly, known outputs

A. Senthil Thilak (NITK) Introduction to Machine Learning 10 / 44


Anatomy of a Dataset

• Constitutes the input variables called Features needed for prediction and
possibly, known outputs called Labels.

A. Senthil Thilak (NITK) Introduction to Machine Learning 10 / 44


Anatomy of a Dataset

• Constitutes the input variables called Features needed for prediction and
possibly, known outputs called Labels.
• May be Structured or Unstructured.

A. Senthil Thilak (NITK) Introduction to Machine Learning 10 / 44


Anatomy of a Dataset

• Constitutes the input variables called Features needed for prediction and
possibly, known outputs called Labels.
• May be Structured or Unstructured.
• Structured dataset - Data is defined and labeled in a table with a schema
(or)

A. Senthil Thilak (NITK) Introduction to Machine Learning 10 / 44


Anatomy of a Dataset

• Constitutes the input variables called Features needed for prediction and
possibly, known outputs called Labels.
• May be Structured or Unstructured.
• Structured dataset - Data is defined and labeled in a table with a schema
(or) A tabular dataset containing data organized into rows and columns.

A. Senthil Thilak (NITK) Introduction to Machine Learning 10 / 44


Anatomy of a Dataset

• Constitutes the input variables called Features needed for prediction and
possibly, known outputs called Labels.
• May be Structured or Unstructured.
• Structured dataset - Data is defined and labeled in a table with a schema
(or) A tabular dataset containing data organized into rows and columns.
• Column - Feature (Also known as a Variable, a dimension or an
attribute)

A. Senthil Thilak (NITK) Introduction to Machine Learning 10 / 44


Anatomy of a Dataset

• Constitutes the input variables called Features needed for prediction and
possibly, known outputs called Labels.
• May be Structured or Unstructured.
• Structured dataset - Data is defined and labeled in a table with a schema
(or) A tabular dataset containing data organized into rows and columns.
• Column - Feature (Also known as a Variable, a dimension or an
attribute) - Represented as a Vector.

A. Senthil Thilak (NITK) Introduction to Machine Learning 10 / 44


Anatomy of a Dataset

• Constitutes the input variables called Features needed for prediction and
possibly, known outputs called Labels.
• May be Structured or Unstructured.
• Structured dataset - Data is defined and labeled in a table with a schema
(or) A tabular dataset containing data organized into rows and columns.
• Column - Feature (Also known as a Variable, a dimension or an
attribute) - Represented as a Vector.
• Multiple columns/vectors together represented as matrices.

A. Senthil Thilak (NITK) Introduction to Machine Learning 10 / 44


Anatomy of a Dataset

• Constitutes the input variables called Features needed for prediction and
possibly, known outputs called Labels.
• May be Structured or Unstructured.
• Structured dataset - Data is defined and labeled in a table with a schema
(or) A tabular dataset containing data organized into rows and columns.
• Column - Feature (Also known as a Variable, a dimension or an
attribute) - Represented as a Vector.
• Multiple columns/vectors together represented as matrices.
• Row - Single observation of a given feature/vairable set. (Also known as
case or value)

A. Senthil Thilak (NITK) Introduction to Machine Learning 10 / 44


Before we proceed, I first want to explain the anatomy of a tabular dataset. A
tabular (table-based) dataset contains data organized in rows and columns. In
each column is a feature. A feature is also known as a variable, a dimension
or an attribute—but they all mean the same thing.
Anatomy of a Dataset (contd...) Each individual row represents a single observation of a given
feature/variable. Rows are sometimes referred to as a case or value, but in
this book, we will use the term “row.”

Figure 1: Example of a tabular dataset

Figure: A Tabular dataset


Each column is known as a vector. Vectors store your X and y values and
multiple vectors (columns) are commonly referred to as matrices. In the case
of supervised learning, y will already exist in your dataset and be used to
identify patterns in relation to independent variables (X). The y values are
commonly expressed in the final column, as shown in Figure 2.

Figure 2: The y value is often but not always expressed in the far right column

Figure: A Labeled dataset


Next, within the first compartment of the toolbox is a range of scatterplots,
including 2-D, 3-D, and 4-D plots. A 2-D scatterplot consists of a vertical
axis (known as the y-axis) and a horizontal axis (known as the x-axis) and
A. Senthil Thilak (NITK) Introduction to Machine Learning 11 / 44
Data Scrubbing/Cleaning

• The Process of refining the dataset to make it workable by modifying


and/or removing incomplete, incorrectly formatted, irrelevant or
duplicate data.

A. Senthil Thilak (NITK) Introduction to Machine Learning 12 / 44


Data Scrubbing/Cleaning

• The Process of refining the dataset to make it workable by modifying


and/or removing incomplete, incorrectly formatted, irrelevant or
duplicate data.
• May involve Text → Numeric data conversion & redesigning the
features.

A. Senthil Thilak (NITK) Introduction to Machine Learning 12 / 44


Data Scrubbing/Cleaning

• The Process of refining the dataset to make it workable by modifying


and/or removing incomplete, incorrectly formatted, irrelevant or
duplicate data.
• May involve Text → Numeric data conversion & redesigning the
features.
• May be laborious & time consuming.

Data Scrubbing involves


• Feature Selection - Identify the features most relevant to the hypothesis;
• Remove irrelevant features/vairables
• Merge multiple features into one, if possible (Downside - Leads to less
information about relationships between specific features).
• Row Compression

A. Senthil Thilak (NITK) Introduction to Machine Learning 12 / 44


the number of rows and thereby compress the total number of data points.
This can involve merging two or more rows into one. For example, in the
following dataset, “Tiger” and “Lion” can be merged and renamed
Data Scrubbing (contd...)
“Carnivore.”

However, by merging these two rows (Tiger & Lion), the feature values for
• One-hot Encoding - Convert text-based features into numerical values by
transforming features into binary form

A. Senthil Thilak (NITK) Introduction to Machine Learning 13 / 44


the number of rows and thereby compress the total number of data points.
This can involve merging two or more rows into one. For example, in the
following dataset, “Tiger” and “Lion” can be merged and renamed
Data Scrubbing (contd...)
“Carnivore.”

However, by merging these two rows (Tiger & Lion), the feature values for
• One-hot Encoding - Convert text-based features into numerical values by
transforming features into binary form - 1/0 or T/F -

A. Senthil Thilak (NITK) Introduction to Machine Learning 13 / 44


the number of rows and thereby compress the total number of data points.
This can involve merging two or more rows into one. For example, in the
following dataset, “Tiger” and “Lion” can be merged and renamed
Data Scrubbing (contd...)
“Carnivore.”

However, by merging these two rows (Tiger & Lion), the feature values for
• One-hot Encoding - Convert text-based features into numerical values by
transforming features into binary form - 1/0 or T/F - May sometimes
lead to additional new features
• Handling missing data through mode, median & alike.
A. Senthil Thilak (NITK) Introduction to Machine Learning 13 / 44
Setting up the data

A. Senthil Thilak (NITK) Introduction to Machine Learning 14 / 44


Setting up the data
• Next to data cleaning/scrubbing, split the data into two segments for training and testing.

A. Senthil Thilak (NITK) Introduction to Machine Learning 14 / 44


SETTING UP YOUR DATA
Once you have cleaned your dataset, the next job is to split the data into two
segments for testing and training. It is very important not to test your model
Setting up the data with the same data that you used for training. The ratio of the two splits
should be approximately 70/30 or 80/20. This means that your training data
should account for 70 percent to 80 percent of the rows in your dataset, and
the other 20 percent to 30 percent of rows is your test data. It is vital to split
• Next to data cleaning/scrubbing,
your data by rows andsplit the data into two segments for training
not columns. and testing.

Figure 1: Training and test partitioning of the dataset 70/30

Before you split your data, it is important that you randomize all rows in the
dataset. This helps to avoid bias in your model, as your original dataset might
be arranged sequentially depending on the time it was collected or some other
factor. Unless you randomize your data, you may accidentally omit important
variance from the training data that will cause unwanted surprises when you

A. Senthil Thilak (NITK) Introduction to Machine Learning 14 / 44


SETTING UP YOUR DATA
Once you have cleaned your dataset, the next job is to split the data into two
segments for testing and training. It is very important not to test your model
Setting up the data with the same data that you used for training. The ratio of the two splits
should be approximately 70/30 or 80/20. This means that your training data
should account for 70 percent to 80 percent of the rows in your dataset, and
the other 20 percent to 30 percent of rows is your test data. It is vital to split
• Next to data cleaning/scrubbing,
your data by rows andsplit the data into two segments for training
not columns. and testing.

Figure 1: Training and test partitioning of the dataset 70/30

Before you split your data, it is important that you randomize all rows in the
• Do not test the model
dataset. with the to
This helps same
avoiddata used
bias in your for training!!!
model, as your original dataset might
be arranged sequentially depending on the time it was collected or some other
factor. Unless you randomize your data, you may accidentally omit important
variance from the training data that will cause unwanted surprises when you

A. Senthil Thilak (NITK) Introduction to Machine Learning 14 / 44


SETTING UP YOUR DATA
Once you have cleaned your dataset, the next job is to split the data into two
segments for testing and training. It is very important not to test your model
Setting up the data with the same data that you used for training. The ratio of the two splits
should be approximately 70/30 or 80/20. This means that your training data
should account for 70 percent to 80 percent of the rows in your dataset, and
the other 20 percent to 30 percent of rows is your test data. It is vital to split
• Next to data cleaning/scrubbing,
your data by rows andsplit the data into two segments for training
not columns. and testing.

Figure 1: Training and test partitioning of the dataset 70/30

Before you split your data, it is important that you randomize all rows in the
• Do not test the model
dataset. with the to
This helps same
avoiddata used
bias in your for training!!!
model, as your original dataset might
• Split by rows notbe arranged sequentially depending on the time it was collected or some other
by columns.
factor. Unless you randomize your data, you may accidentally omit important
variance from the training data that will cause unwanted surprises when you

A. Senthil Thilak (NITK) Introduction to Machine Learning 14 / 44


SETTING UP YOUR DATA
Once you have cleaned your dataset, the next job is to split the data into two
segments for testing and training. It is very important not to test your model
Setting up the data with the same data that you used for training. The ratio of the two splits
should be approximately 70/30 or 80/20. This means that your training data
should account for 70 percent to 80 percent of the rows in your dataset, and
the other 20 percent to 30 percent of rows is your test data. It is vital to split
• Next to data cleaning/scrubbing,
your data by rows andsplit the data into two segments for training
not columns. and testing.

Figure 1: Training and test partitioning of the dataset 70/30

Before you split your data, it is important that you randomize all rows in the
• Do not test the model
dataset. with the to
This helps same
avoiddata used
bias in your for training!!!
model, as your original dataset might
• Split by rows notbe arranged sequentially depending on the time it was collected or some other
by columns.
factor. Unless you randomize your data, you may accidentally omit important
• Preferred ratio - variance
70:30 or from the training data that will cause unwanted surprises when you
80:20
A. Senthil Thilak (NITK) Introduction to Machine Learning 14 / 44
Setting up the data (contd...)

• Before splitting, it’s important to randomize all rows in the dataset to avoid bias in the
model;

A. Senthil Thilak (NITK) Introduction to Machine Learning 15 / 44


Setting up the data (contd...)

• Before splitting, it’s important to randomize all rows in the dataset to avoid bias in the
model; Might have been arranged sequentially depending on the time it was collected or
some other factor.

A. Senthil Thilak (NITK) Introduction to Machine Learning 15 / 44


Setting up the data (contd...)

• Before splitting, it’s important to randomize all rows in the dataset to avoid bias in the
model; Might have been arranged sequentially depending on the time it was collected or
some other factor. Non-radomized data may lead to missing important variance.

A. Senthil Thilak (NITK) Introduction to Machine Learning 15 / 44


Setting up the data (contd...)

• Before splitting, it’s important to randomize all rows in the dataset to avoid bias in the
model; Might have been arranged sequentially depending on the time it was collected or
some other factor. Non-radomized data may lead to missing important variance.
• Cross-validation

A. Senthil Thilak (NITK) Introduction to Machine Learning 15 / 44


Setting up the data (contd...)

• Before splitting, it’s important to randomize all rows in the dataset to avoid bias in the
model; Might have been arranged sequentially depending on the time it was collected or
some other factor. Non-radomized data may lead to missing important variance.
• Cross-validation
• Though splitting data can be effective in developing models from existing
data, a question mark remains as to whether the model will work on new
data.

A. Senthil Thilak (NITK) Introduction to Machine Learning 15 / 44


Setting up the data (contd...)

• Before splitting, it’s important to randomize all rows in the dataset to avoid bias in the
model; Might have been arranged sequentially depending on the time it was collected or
some other factor. Non-radomized data may lead to missing important variance.
• Cross-validation
• Though splitting data can be effective in developing models from existing
data, a question mark remains as to whether the model will work on new
data.
• If the existing dataset is too small to construct an accurate model, or if the
training/test partition of data is not appropriate, this can lead to poor
estimations of performance.

A. Senthil Thilak (NITK) Introduction to Machine Learning 15 / 44


Setting up the data (contd...)

• Before splitting, it’s important to randomize all rows in the dataset to avoid bias in the
model; Might have been arranged sequentially depending on the time it was collected or
some other factor. Non-radomized data may lead to missing important variance.
• Cross-validation
• Though splitting data can be effective in developing models from existing
data, a question mark remains as to whether the model will work on new
data.
• If the existing dataset is too small to construct an accurate model, or if the
training/test partition of data is not appropriate, this can lead to poor
estimations of performance.
• Effective solution is Cross validation!!!

A. Senthil Thilak (NITK) Introduction to Machine Learning 15 / 44


Setting up the data (contd...)

• Before splitting, it’s important to randomize all rows in the dataset to avoid bias in the
model; Might have been arranged sequentially depending on the time it was collected or
some other factor. Non-radomized data may lead to missing important variance.
• Cross-validation
• Though splitting data can be effective in developing models from existing
data, a question mark remains as to whether the model will work on new
data.
• If the existing dataset is too small to construct an accurate model, or if the
training/test partition of data is not appropriate, this can lead to poor
estimations of performance.
• Effective solution is Cross validation!!! Can be done by two primary
methods - Exhaustive cross validation & k-fold validation.

A. Senthil Thilak (NITK) Introduction to Machine Learning 15 / 44


Setting up the data (contd...)

• Exhaustive cross validation:

A. Senthil Thilak (NITK) Introduction to Machine Learning 16 / 44


Setting up the data (contd...)

• Exhaustive cross validation:


• Involves finding and testing all possible combinations to divide the
original sample into a training set and a test set.

A. Senthil Thilak (NITK) Introduction to Machine Learning 16 / 44


Setting up the data (contd...)

• Exhaustive cross validation:


• Involves finding and testing all possible combinations to divide the
original sample into a training set and a test set.
• k-fold validation:

A. Senthil Thilak (NITK) Introduction to Machine Learning 16 / 44


Setting up the data (contd...)

• Exhaustive cross validation:


• Involves finding and testing all possible combinations to divide the
original sample into a training set and a test set.
• k-fold validation:
• Involves splitting data into k assigned buckets and reserving one of those
buckets to test the training model at each round.

A. Senthil Thilak (NITK) Introduction to Machine Learning 16 / 44


Setting up the data (contd...)
Fortunately, there is an effective workaround for this issue. Rather than
• Exhaustive cross validation:splitting the data into two segments (one for training and one for testing), we
can implement what is known as cross validation. Cross validation
maximizes the availability of training data by splitting data into various
combinations and testing each specific combination.
• Involves finding and testing all possible combinations to divide the
Cross validation can be performed through two primary methods. The first
original sample intomethod is exhaustive cross
a training set validation,
and a whichtestinvolves
set. finding and testing all
possible combinations to divide the original sample into a training set and a
test set. The alternative and more common method is non-exhaustive cross
• k-fold validation: validation, known as k-fold validation. The k-fold validation technique
involves splitting data into k assigned buckets and reserving one of those
buckets to test the training model at each round.
• Involves splitting data into k assigned buckets and reserving one of those
To perform k-fold validation, data are first randomly assigned to k number of
equal sized buckets. One bucket is then reserved as the test bucket and is used
buckets to test the training model at each round.
to measure and evaluate the performance of the remaining (k-1) buckets.

Figure 2: k-fold validation

The cross validation process is repeated k number of times (“folds”). At each


fold, one bucket is reserved to test the training model generated by the other
A. Senthil Thilak (NITK) Introduction
buckets. The processtoisMachine
repeated Learning
until all buckets have been utilized as both a 16 / 44
Setting up the data (contd...)

• Data are first randomly assigned to k number of equal sized buckets.

A. Senthil Thilak (NITK) Introduction to Machine Learning 17 / 44


Setting up the data (contd...)

• Data are first randomly assigned to k number of equal sized buckets. One bucket is then
reserved as the test bucket and is used to measure and evaluate the performance of the
remaining (k − 1) buckets.

A. Senthil Thilak (NITK) Introduction to Machine Learning 17 / 44


Setting up the data (contd...)

• Data are first randomly assigned to k number of equal sized buckets. One bucket is then
reserved as the test bucket and is used to measure and evaluate the performance of the
remaining (k − 1) buckets.
• Repeated k number of times.

A. Senthil Thilak (NITK) Introduction to Machine Learning 17 / 44


Setting up the data (contd...)

• Data are first randomly assigned to k number of equal sized buckets. One bucket is then
reserved as the test bucket and is used to measure and evaluate the performance of the
remaining (k − 1) buckets.
• Repeated k number of times. Each time, one bucket is reserved to test the training model
generated by the other buckets.

A. Senthil Thilak (NITK) Introduction to Machine Learning 17 / 44


Setting up the data (contd...)

• Data are first randomly assigned to k number of equal sized buckets. One bucket is then
reserved as the test bucket and is used to measure and evaluate the performance of the
remaining (k − 1) buckets.
• Repeated k number of times. Each time, one bucket is reserved to test the training model
generated by the other buckets.
• The process is repeated until all buckets have been utilized as both a training and test
bucket.

A. Senthil Thilak (NITK) Introduction to Machine Learning 17 / 44


Setting up the data (contd...)

• Data are first randomly assigned to k number of equal sized buckets. One bucket is then
reserved as the test bucket and is used to measure and evaluate the performance of the
remaining (k − 1) buckets.
• Repeated k number of times. Each time, one bucket is reserved to test the training model
generated by the other buckets.
• The process is repeated until all buckets have been utilized as both a training and test
bucket. The results are then aggregated/combined to formulate a single model.

A. Senthil Thilak (NITK) Introduction to Machine Learning 17 / 44


Setting up the data (contd...)

• Data are first randomly assigned to k number of equal sized buckets. One bucket is then
reserved as the test bucket and is used to measure and evaluate the performance of the
remaining (k − 1) buckets.
• Repeated k number of times. Each time, one bucket is reserved to test the training model
generated by the other buckets.
• The process is repeated until all buckets have been utilized as both a training and test
bucket. The results are then aggregated/combined to formulate a single model.
• By using all available data for both training and testing purposes, the k-fold validation
technique dramatically minimizes potential error (such as overfitting) found by relying
on a fixed split of training and test data.

A. Senthil Thilak (NITK) Introduction to Machine Learning 17 / 44


How much data is needed?

• In general, machine learning works best when training dataset includes a full
range of feature combinations.

A. Senthil Thilak (NITK) Introduction to Machine Learning 18 / 44


How much data is needed?

• In general, machine learning works best when training dataset includes a full
range of feature combinations.
• For instance, we need to know the investment capability for data scientists with
a university degree, 5+ years professional experience, monthly/annual salary,
other sources of income and those who don’t have children, as well as data
scientists with a university degree, 5+ years professional experience,
monthly/annual salary, other sources of income and those having children.

A. Senthil Thilak (NITK) Introduction to Machine Learning 18 / 44


How much data is needed?

• In general, machine learning works best when training dataset includes a full
range of feature combinations.
• For instance, we need to know the investment capability for data scientists with
a university degree, 5+ years professional experience, monthly/annual salary,
other sources of income and those who don’t have children, as well as data
scientists with a university degree, 5+ years professional experience,
monthly/annual salary, other sources of income and those having children.
• The more available combinations, the more effective the model will be at
capturing how each attribute affects y (the data scientist’s investment
capability).

A. Senthil Thilak (NITK) Introduction to Machine Learning 18 / 44


How much data is needed?

• In general, machine learning works best when training dataset includes a full
range of feature combinations.
• For instance, we need to know the investment capability for data scientists with
a university degree, 5+ years professional experience, monthly/annual salary,
other sources of income and those who don’t have children, as well as data
scientists with a university degree, 5+ years professional experience,
monthly/annual salary, other sources of income and those having children.
• The more available combinations, the more effective the model will be at
capturing how each attribute affects y (the data scientist’s investment
capability).
• At a minimum, a machine learning model should typically have ten times as
many data points as the total number of features.

A. Senthil Thilak (NITK) Introduction to Machine Learning 18 / 44


How much data is needed?

• In general, machine learning works best when training dataset includes a full
range of feature combinations.
• For instance, we need to know the investment capability for data scientists with
a university degree, 5+ years professional experience, monthly/annual salary,
other sources of income and those who don’t have children, as well as data
scientists with a university degree, 5+ years professional experience,
monthly/annual salary, other sources of income and those having children.
• The more available combinations, the more effective the model will be at
capturing how each attribute affects y (the data scientist’s investment
capability).
• At a minimum, a machine learning model should typically have ten times as
many data points as the total number of features.
• The other point to remember is that more relevant data is usually better!!!

A. Senthil Thilak (NITK) Introduction to Machine Learning 18 / 44


Handling Issues in Model fitting

• Model fitting - The process of adjusting the model parameters to


effectively capture the patterns and relationships in the input data.

A. Senthil Thilak (NITK) Introduction to Machine Learning 19 / 44


Handling Issues in Model fitting

• Model fitting - The process of adjusting the model parameters to


effectively capture the patterns and relationships in the input data.

• Issues in model fitting - Noisy data, Overfitting and Underfitting

A. Senthil Thilak (NITK) Introduction to Machine Learning 19 / 44


Handling noise in Machine learning
Noise in ML - Random or irrelevant data resulting in unpredictable situations that are different
from what is expected.

A. Senthil Thilak (NITK) Introduction to Machine Learning 20 / 44


Handling noise in Machine learning
Noise in ML - Random or irrelevant data resulting in unpredictable situations that are different
from what is expected.

Causes of Noise:

A. Senthil Thilak (NITK) Introduction to Machine Learning 20 / 44


Handling noise in Machine learning
Noise in ML - Random or irrelevant data resulting in unpredictable situations that are different
from what is expected.

Causes of Noise:
- Inaccurate data collection -

A. Senthil Thilak (NITK) Introduction to Machine Learning 20 / 44


Handling noise in Machine learning
Noise in ML - Random or irrelevant data resulting in unpredictable situations that are different
from what is expected.

Causes of Noise:
- Inaccurate data collection -

A. Senthil Thilak (NITK) Introduction to Machine Learning 20 / 44


Handling noise in Machine learning
Noise in ML - Random or irrelevant data resulting in unpredictable situations that are different
from what is expected.

Causes of Noise:
- Inaccurate data collection - Errors in data collection such as malfunctioning sensors or
human error during data entry.
- Measurement errors such as inaccurate instruments or environmental conditions.

A. Senthil Thilak (NITK) Introduction to Machine Learning 20 / 44


Handling noise in Machine learning
Noise in ML - Random or irrelevant data resulting in unpredictable situations that are different
from what is expected.

Causes of Noise:
- Inaccurate data collection - Errors in data collection such as malfunctioning sensors or
human error during data entry.
- Measurement errors such as inaccurate instruments or environmental conditions.
- Inherent variability resulting from either natural fluctuations or unforeseen events.

A. Senthil Thilak (NITK) Introduction to Machine Learning 20 / 44


Handling noise in Machine learning
Noise in ML - Random or irrelevant data resulting in unpredictable situations that are different
from what is expected.

Causes of Noise:
- Inaccurate data collection - Errors in data collection such as malfunctioning sensors or
human error during data entry.
- Measurement errors such as inaccurate instruments or environmental conditions.
- Inherent variability resulting from either natural fluctuations or unforeseen events.
- Inappropriate data preprocessing operations like normalization or transformation may
unintentionally add noise.

A. Senthil Thilak (NITK) Introduction to Machine Learning 20 / 44


Handling noise in Machine learning
Noise in ML - Random or irrelevant data resulting in unpredictable situations that are different
from what is expected.

Causes of Noise:
- Inaccurate data collection - Errors in data collection such as malfunctioning sensors or
human error during data entry.
- Measurement errors such as inaccurate instruments or environmental conditions.
- Inherent variability resulting from either natural fluctuations or unforeseen events.
- Inappropriate data preprocessing operations like normalization or transformation may
unintentionally add noise.
- Inaccurate data point labeling or annotation can introduce noise and affect the learning
process.

A. Senthil Thilak (NITK) Introduction to Machine Learning 20 / 44


Handling noise in Machine learning
Noise in ML - Random or irrelevant data resulting in unpredictable situations that are different
from what is expected.

Causes of Noise:
- Inaccurate data collection - Errors in data collection such as malfunctioning sensors or
human error during data entry.
- Measurement errors such as inaccurate instruments or environmental conditions.
- Inherent variability resulting from either natural fluctuations or unforeseen events.
- Inappropriate data preprocessing operations like normalization or transformation may
unintentionally add noise.
- Inaccurate data point labeling or annotation can introduce noise and affect the learning
process.

A. Senthil Thilak (NITK) Introduction to Machine Learning 20 / 44


Handling noise in Machine learning
Noise in ML - Random or irrelevant data resulting in unpredictable situations that are different
from what is expected.

Causes of Noise:
- Inaccurate data collection - Errors in data collection such as malfunctioning sensors or
human error during data entry.
- Measurement errors such as inaccurate instruments or environmental conditions.
- Inherent variability resulting from either natural fluctuations or unforeseen events.
- Inappropriate data preprocessing operations like normalization or transformation may
unintentionally add noise.
- Inaccurate data point labeling or annotation can introduce noise and affect the learning
process.
Is noise always bad?

A. Senthil Thilak (NITK) Introduction to Machine Learning 20 / 44


Handling noise in Machine learning
Noise in ML - Random or irrelevant data resulting in unpredictable situations that are different
from what is expected.

Causes of Noise:
- Inaccurate data collection - Errors in data collection such as malfunctioning sensors or
human error during data entry.
- Measurement errors such as inaccurate instruments or environmental conditions.
- Inherent variability resulting from either natural fluctuations or unforeseen events.
- Inappropriate data preprocessing operations like normalization or transformation may
unintentionally add noise.
- Inaccurate data point labeling or annotation can introduce noise and affect the learning
process.
Is noise always bad? No, as it represents unpredictability in the real world scenarios.

A. Senthil Thilak (NITK) Introduction to Machine Learning 20 / 44


Handling noise in Machine learning
Noise in ML - Random or irrelevant data resulting in unpredictable situations that are different
from what is expected.

Causes of Noise:
- Inaccurate data collection - Errors in data collection such as malfunctioning sensors or
human error during data entry.
- Measurement errors such as inaccurate instruments or environmental conditions.
- Inherent variability resulting from either natural fluctuations or unforeseen events.
- Inappropriate data preprocessing operations like normalization or transformation may
unintentionally add noise.
- Inaccurate data point labeling or annotation can introduce noise and affect the learning
process.
Is noise always bad? No, as it represents unpredictability in the real world scenarios.
However, too much noise might confuse important patterns and reduce model performance.

A. Senthil Thilak (NITK) Introduction to Machine Learning 20 / 44


Handling noise in Machine learning
Noise in ML - Random or irrelevant data resulting in unpredictable situations that are different
from what is expected.

Causes of Noise:
- Inaccurate data collection - Errors in data collection such as malfunctioning sensors or
human error during data entry.
- Measurement errors such as inaccurate instruments or environmental conditions.
- Inherent variability resulting from either natural fluctuations or unforeseen events.
- Inappropriate data preprocessing operations like normalization or transformation may
unintentionally add noise.
- Inaccurate data point labeling or annotation can introduce noise and affect the learning
process.
Is noise always bad? No, as it represents unpredictability in the real world scenarios.
However, too much noise might confuse important patterns and reduce model performance.
Noise can sometimes add diversity, which improves the robustness and generalization of the
model.

A. Senthil Thilak (NITK) Introduction to Machine Learning 20 / 44


Types of noise:

- Feature Noise: Refers to superfluous (redundant) or irrelevant features in


the dataset causing confusion and disruption in the process of learning.

A. Senthil Thilak (NITK) Introduction to Machine Learning 21 / 44


Types of noise:

- Feature Noise: Refers to superfluous (redundant) or irrelevant features in


the dataset causing confusion and disruption in the process of learning.
- Systematic Noise: Recurring biases or mistakes in measuring or data
collection procedures resulting in biased or incorrect data.

A. Senthil Thilak (NITK) Introduction to Machine Learning 21 / 44


Types of noise:

- Feature Noise: Refers to superfluous (redundant) or irrelevant features in


the dataset causing confusion and disruption in the process of learning.
- Systematic Noise: Recurring biases or mistakes in measuring or data
collection procedures resulting in biased or incorrect data.
- Random Noise: Unpredictable fluctuations in data brought in by
variables such as measurement errors or ambient circumstances.

A. Senthil Thilak (NITK) Introduction to Machine Learning 21 / 44


Types of noise:

- Feature Noise: Refers to superfluous (redundant) or irrelevant features in


the dataset causing confusion and disruption in the process of learning.
- Systematic Noise: Recurring biases or mistakes in measuring or data
collection procedures resulting in biased or incorrect data.
- Random Noise: Unpredictable fluctuations in data brought in by
variables such as measurement errors or ambient circumstances.
- Background noise: It is the information in the data that is unnecessary or
irrelevant and could distract the model from learning.

A. Senthil Thilak (NITK) Introduction to Machine Learning 21 / 44


Ways to Handle noise
Handling noise is important as it may result in unreliable models and predications.
• Data preprocessing: Includes methods such as data cleaning, normalization, and outlier
elimination, improving the quality of data and lessen noise from errors or
inconsistencies.
Ways to Handle noise
Handling noise is important as it may result in unreliable models and predications.
• Data preprocessing: Includes methods such as data cleaning, normalization, and outlier
elimination, improving the quality of data and lessen noise from errors or
inconsistencies.
• Mathematical techniques: Like using Fourier transforms to transform the signals from
time or spatial domain to frequency domain. It helps to identify and filter out noise by
respresenting the signal as a combination of different frequencies.
Ways to Handle noise
Handling noise is important as it may result in unreliable models and predications.
• Data preprocessing: Includes methods such as data cleaning, normalization, and outlier
elimination, improving the quality of data and lessen noise from errors or
inconsistencies.
• Mathematical techniques: Like using Fourier transforms to transform the signals from
time or spatial domain to frequency domain. It helps to identify and filter out noise by
respresenting the signal as a combination of different frequencies.
• Specific learning strategies: Like constructive learning which involves training a
model to distinguish between clean and noisy data instance. This requires labeled data
where the noise level is known. The model learns to classify instances as either clean or
noisy, allowing the removal of noisy data points from the dataset.
Ways to Handle noise
Handling noise is important as it may result in unreliable models and predications.
• Data preprocessing: Includes methods such as data cleaning, normalization, and outlier
elimination, improving the quality of data and lessen noise from errors or
inconsistencies.
• Mathematical techniques: Like using Fourier transforms to transform the signals from
time or spatial domain to frequency domain. It helps to identify and filter out noise by
respresenting the signal as a combination of different frequencies.
• Specific learning strategies: Like constructive learning which involves training a
model to distinguish between clean and noisy data instance. This requires labeled data
where the noise level is known. The model learns to classify instances as either clean or
noisy, allowing the removal of noisy data points from the dataset.
• Encoding techniques: Using autoencoders that consist of encoder and decoder. The
encoder compresses the input data into a lower-dimensional representation, while the
decoder reconstructs the original data from this representation. Autoencoders can be
trained to reconstruct clean signals while effectively filtering out noise during the
reconstruction process.
Ways to Handle noise
Handling noise is important as it may result in unreliable models and predications.
• Data preprocessing: Includes methods such as data cleaning, normalization, and outlier
elimination, improving the quality of data and lessen noise from errors or
inconsistencies.
• Mathematical techniques: Like using Fourier transforms to transform the signals from
time or spatial domain to frequency domain. It helps to identify and filter out noise by
respresenting the signal as a combination of different frequencies.
• Specific learning strategies: Like constructive learning which involves training a
model to distinguish between clean and noisy data instance. This requires labeled data
where the noise level is known. The model learns to classify instances as either clean or
noisy, allowing the removal of noisy data points from the dataset.
• Encoding techniques: Using autoencoders that consist of encoder and decoder. The
encoder compresses the input data into a lower-dimensional representation, while the
decoder reconstructs the original data from this representation. Autoencoders can be
trained to reconstruct clean signals while effectively filtering out noise during the
reconstruction process.
• Principal Component Analysis (PCA): A dimensionality reduction technique that
identifies the principal components of a dataset, which are orthogonal vectors that
capture the maximum variance in the data. By projecting the data onto a reduced set of
principal components, PCA can help reduce noise by focusing on the most informative
dimensions of the data while discarding noise related dimensions.
Noise compensation techniques
Dealing with noisy data are crucial in machine learning to improve model robustness and
generalization performance.
Two common approaches for compensating for noisy data are cross-validation and ensemble
models.
• Cross-validation: By training on different subsets of data, cross-validation helps in
reducing the impact of noise in the data. It also aids in avoiding overfitting by providing
a more accurate estimate of the model’s performance.

A. Senthil Thilak (NITK) Introduction to Machine Learning 23 / 44


Noise compensation techniques
Dealing with noisy data are crucial in machine learning to improve model robustness and
generalization performance.
Two common approaches for compensating for noisy data are cross-validation and ensemble
models.
• Cross-validation: By training on different subsets of data, cross-validation helps in
reducing the impact of noise in the data. It also aids in avoiding overfitting by providing
a more accurate estimate of the model’s performance.
• Ensemble models: Involves combining multiple individual models to improve
predictive performance compared to using a single model. Ensemble models work by
aggregating the predictions of multiple base models, such as decision trees, neural
networks, or other machine learning algorithms.

A. Senthil Thilak (NITK) Introduction to Machine Learning 23 / 44


Noise compensation techniques
Dealing with noisy data are crucial in machine learning to improve model robustness and
generalization performance.
Two common approaches for compensating for noisy data are cross-validation and ensemble
models.
• Cross-validation: By training on different subsets of data, cross-validation helps in
reducing the impact of noise in the data. It also aids in avoiding overfitting by providing
a more accurate estimate of the model’s performance.
• Ensemble models: Involves combining multiple individual models to improve
predictive performance compared to using a single model. Ensemble models work by
aggregating the predictions of multiple base models, such as decision trees, neural
networks, or other machine learning algorithms.

A. Senthil Thilak (NITK) Introduction to Machine Learning 23 / 44


Noise compensation techniques
Dealing with noisy data are crucial in machine learning to improve model robustness and
generalization performance.
Two common approaches for compensating for noisy data are cross-validation and ensemble
models.
• Cross-validation: By training on different subsets of data, cross-validation helps in
reducing the impact of noise in the data. It also aids in avoiding overfitting by providing
a more accurate estimate of the model’s performance.
• Ensemble models: Involves combining multiple individual models to improve
predictive performance compared to using a single model. Ensemble models work by
aggregating the predictions of multiple base models, such as decision trees, neural
networks, or other machine learning algorithms. Ensemble methods are particularly
effective when individual models may be sensitive to noise or may overfit the data.

A. Senthil Thilak (NITK) Introduction to Machine Learning 23 / 44


Noise compensation techniques
Dealing with noisy data are crucial in machine learning to improve model robustness and
generalization performance.
Two common approaches for compensating for noisy data are cross-validation and ensemble
models.
• Cross-validation: By training on different subsets of data, cross-validation helps in
reducing the impact of noise in the data. It also aids in avoiding overfitting by providing
a more accurate estimate of the model’s performance.
• Ensemble models: Involves combining multiple individual models to improve
predictive performance compared to using a single model. Ensemble models work by
aggregating the predictions of multiple base models, such as decision trees, neural
networks, or other machine learning algorithms. Ensemble methods are particularly
effective when individual models may be sensitive to noise or may overfit the data. They
help in improving robustness and generalization performance by reducing the variance of
the predictions.
• Data augmentation: A technique used to increase the diversity of a dataset without
actually collecting new data.

A. Senthil Thilak (NITK) Introduction to Machine Learning 23 / 44


Noise compensation techniques
Dealing with noisy data are crucial in machine learning to improve model robustness and
generalization performance.
Two common approaches for compensating for noisy data are cross-validation and ensemble
models.
• Cross-validation: By training on different subsets of data, cross-validation helps in
reducing the impact of noise in the data. It also aids in avoiding overfitting by providing
a more accurate estimate of the model’s performance.
• Ensemble models: Involves combining multiple individual models to improve
predictive performance compared to using a single model. Ensemble models work by
aggregating the predictions of multiple base models, such as decision trees, neural
networks, or other machine learning algorithms. Ensemble methods are particularly
effective when individual models may be sensitive to noise or may overfit the data. They
help in improving robustness and generalization performance by reducing the variance of
the predictions.
• Data augmentation: A technique used to increase the diversity of a dataset without
actually collecting new data.

A. Senthil Thilak (NITK) Introduction to Machine Learning 23 / 44


Noise compensation techniques
Dealing with noisy data are crucial in machine learning to improve model robustness and
generalization performance.
Two common approaches for compensating for noisy data are cross-validation and ensemble
models.
• Cross-validation: By training on different subsets of data, cross-validation helps in
reducing the impact of noise in the data. It also aids in avoiding overfitting by providing
a more accurate estimate of the model’s performance.
• Ensemble models: Involves combining multiple individual models to improve
predictive performance compared to using a single model. Ensemble models work by
aggregating the predictions of multiple base models, such as decision trees, neural
networks, or other machine learning algorithms. Ensemble methods are particularly
effective when individual models may be sensitive to noise or may overfit the data. They
help in improving robustness and generalization performance by reducing the variance of
the predictions.
• Data augmentation: A technique used to increase the diversity of a dataset without
actually collecting new data. It helps to create new, modified versions of data that help
the model learn to generalize better.

A. Senthil Thilak (NITK) Introduction to Machine Learning 23 / 44


Noise compensation techniques
Dealing with noisy data are crucial in machine learning to improve model robustness and
generalization performance.
Two common approaches for compensating for noisy data are cross-validation and ensemble
models.
• Cross-validation: By training on different subsets of data, cross-validation helps in
reducing the impact of noise in the data. It also aids in avoiding overfitting by providing
a more accurate estimate of the model’s performance.
• Ensemble models: Involves combining multiple individual models to improve
predictive performance compared to using a single model. Ensemble models work by
aggregating the predictions of multiple base models, such as decision trees, neural
networks, or other machine learning algorithms. Ensemble methods are particularly
effective when individual models may be sensitive to noise or may overfit the data. They
help in improving robustness and generalization performance by reducing the variance of
the predictions.
• Data augmentation: A technique used to increase the diversity of a dataset without
actually collecting new data. It helps to create new, modified versions of data that help
the model learn to generalize better. Models trained with augmented data are more
robust to variations and distortions in real-world data.
A. Senthil Thilak (NITK) Introduction to Machine Learning 23 / 44
Problems of Underfitting and Overfitting

We can determine whether a predictive model is underfitting (↑ Training


error) or

A. Senthil Thilak (NITK) Introduction to Machine Learning 24 / 44


Problems of Underfitting and Overfitting

We can determine whether a predictive model is underfitting (↑ Training


error) or overfitting (↑ Prediction/Testing error) the training data by looking at
the prediction error on the training data and the evaluation data.

(a) Underfitting

A. Senthil Thilak (NITK) Introduction to Machine Learning 24 / 44


Problems of Underfitting and Overfitting

We can determine whether a predictive model is underfitting (↑ Training


error) or overfitting (↑ Prediction/Testing error) the training data by looking at
the prediction error on the training data and the evaluation data.

(a) Underfitting (b) Overfitting

A. Senthil Thilak (NITK) Introduction to Machine Learning 24 / 44


Problems of Underfitting and Overfitting

We can determine whether a predictive model is underfitting (↑ Training


error) or overfitting (↑ Prediction/Testing error) the training data by looking at
the prediction error on the training data and the evaluation data.

(a) Underfitting (b) Overfitting (c) Appropriate Fitting

A. Senthil Thilak (NITK) Introduction to Machine Learning 24 / 44


Underfitting

• Refers to the phenomenon in which a model performs poorly on the training data.

A. Senthil Thilak (NITK) Introduction to Machine Learning 25 / 44


Underfitting

• Refers to the phenomenon in which a model performs poorly on the training data.
• Poor performance on the training data may be because the model is too simple (the input
features are not expressive enough) to describe the target well.
• It poorly represents the complete picture of the predominant data pattern. It also arises
when the training data set is too small or not representative of the population data.

A. Senthil Thilak (NITK) Introduction to Machine Learning 25 / 44


Underfitting

• Refers to the phenomenon in which a model performs poorly on the training data.
• Poor performance on the training data may be because the model is too simple (the input
features are not expressive enough) to describe the target well.
• It poorly represents the complete picture of the predominant data pattern. It also arises
when the training data set is too small or not representative of the population data.
• As an underfit model does not satisfactorily predict new data points. This implies that the
predictions using unseen data are weak, since individuals are perceived as strangers
unfamiliar with the training data set.
• Handling underfit model: Performance can be improved by increasing model
flexibility. To increase model flexibility, we can try the following:
• Add new domain-specific features and more feature (cartesian) products, and
change the types of feature processing used.

A. Senthil Thilak (NITK) Introduction to Machine Learning 25 / 44


Underfitting

• Refers to the phenomenon in which a model performs poorly on the training data.
• Poor performance on the training data may be because the model is too simple (the input
features are not expressive enough) to describe the target well.
• It poorly represents the complete picture of the predominant data pattern. It also arises
when the training data set is too small or not representative of the population data.
• As an underfit model does not satisfactorily predict new data points. This implies that the
predictions using unseen data are weak, since individuals are perceived as strangers
unfamiliar with the training data set.
• Handling underfit model: Performance can be improved by increasing model
flexibility. To increase model flexibility, we can try the following:
• Add new domain-specific features and more feature (cartesian) products, and
change the types of feature processing used.
• Decrease the amount of regularization used.

A. Senthil Thilak (NITK) Introduction to Machine Learning 25 / 44


Overfitting I

• Refers to the phenomenon in which a models performs well on the training data, but
poorly on unseen data.
Overfitting II

• Feature selection: Use fewer feature combinations and decrease the number of
numeric attributes.
• Increase the amount of regularization used.
• Overfitting is more predominant in cases where the loss function is learnt form a
complex statistical machine learning model with more flexibity. So, introduce
constraints in the loss function (like non-parametric statistical learning models) to
improve the learning process.
Tradeoff between bias and variance I
Among various ways to evaluate a model’s performance, Bias and Variance help us in
parameter tuning and choosing a better-fit model.
Bias:
• Bias (error) - Difference between actual or expected values and the predicted values.
• Bias is a systematic error that occurs due to wrong assumptions in the learning process.
• Low Bias: Means fewer assumptions are made to build the target function. Hence, the
model will closely match the training dataset.
• High Bias: Means more assumptions are made to build the target function. Hence, the
model will not match the training dataset closely (Underfitting).
Variance:
• Measures the spread in data from its mean position. Equivalently, the deviation of
predictions from one repitition to another using the same training set or change in
performance of a predictive model when trained on different subsets of the training data.
• Low variance: Model is less sensitive to changes in the training data and produces
consistent estimates of the target function with different subsets of data from the same
distribution.
• High variance: Model is very sensitive to changes in the training data and results in
significant changes in the estimate of the target function when trained on different
subsets of data from the same distribution. This is the case of overfitting.
Tradeoff between bias and variance II

• Low bias and low variance → Samples are acceptable representatives of the population.
Tradeoff between bias and variance II

• Low bias and low variance → Samples are acceptable representatives of the population.
• High bias and low variance → Samples are fairly consistent, but not particularly
representative of the population.
Tradeoff between bias and variance II

• Low bias and low variance → Samples are acceptable representatives of the population.
• High bias and low variance → Samples are fairly consistent, but not particularly
representative of the population.
• Low bias and high variance → Samples vary widely in their consistency, and only some
may be representatives of the population
Tradeoff between bias and variance II

• Low bias and low variance → Samples are acceptable representatives of the population.
• High bias and low variance → Samples are fairly consistent, but not particularly
representative of the population.
• Low bias and high variance → Samples vary widely in their consistency, and only some
may be representatives of the population
• High bias and high variance → Samples are somewhat consistent, but unlikely to be
representatives of the population
Tradeoff between bias and variance II

• Low bias and low variance → Samples are acceptable representatives of the population.
• High bias and low variance → Samples are fairly consistent, but not particularly
representative of the population.
• Low bias and high variance → Samples vary widely in their consistency, and only some
may be representatives of the population
• High bias and high variance → Samples are somewhat consistent, but unlikely to be
representatives of the population
Tradeoff between bias and variance II

• Low bias and low variance → Samples are acceptable representatives of the population.
• High bias and low variance → Samples are fairly consistent, but not particularly
representative of the population.
• Low bias and high variance → Samples vary widely in their consistency, and only some
may be representatives of the population
• High bias and high variance → Samples are somewhat consistent, but unlikely to be
representatives of the population

Best Scenario - Low bias and low variance!


Tradeoff between bias and variance III

How to finding the right balance?


• To find the right balance, start with a simple model and gradually
increase complexity until you get satisfactory results.

A. Senthil Thilak (NITK) Introduction to Machine Learning 30 / 44


Tradeoff between bias and variance III

How to finding the right balance?


• To find the right balance, start with a simple model and gradually
increase complexity until you get satisfactory results.
• Have an appropriate split of the data into training, validation, and testing
sets. Use the validation set to select the best model complexity.

A. Senthil Thilak (NITK) Introduction to Machine Learning 30 / 44


Tradeoff between bias and variance III

How to finding the right balance?


• To find the right balance, start with a simple model and gradually
increase complexity until you get satisfactory results.
• Have an appropriate split of the data into training, validation, and testing
sets. Use the validation set to select the best model complexity.
• Use appropriate regularization techniques to prevent overfitting.

A. Senthil Thilak (NITK) Introduction to Machine Learning 30 / 44


Model Tuning
• Hyperparameter: A parameter whose value is set before the learning process begins.

A. Senthil Thilak (NITK) Introduction to Machine Learning 31 / 44


Model Tuning
• Hyperparameter: A parameter whose value is set before the learning process begins.
• Hyperparameters govern many aspects of the behavior of machine learning models, such
as their ability to learn features from data, degree of generalizability in performance
when presented with new data, the time and memory cost of training the model, etc.

A. Senthil Thilak (NITK) Introduction to Machine Learning 31 / 44


Model Tuning
• Hyperparameter: A parameter whose value is set before the learning process begins.
• Hyperparameters govern many aspects of the behavior of machine learning models, such
as their ability to learn features from data, degree of generalizability in performance
when presented with new data, the time and memory cost of training the model, etc.
• Different hyperparameters often result in models with significantly different
performance. So, it is crucial aspect in training process and a key element for the
resulting quality of prediction accuracies.

A. Senthil Thilak (NITK) Introduction to Machine Learning 31 / 44


Model Tuning
• Hyperparameter: A parameter whose value is set before the learning process begins.
• Hyperparameters govern many aspects of the behavior of machine learning models, such
as their ability to learn features from data, degree of generalizability in performance
when presented with new data, the time and memory cost of training the model, etc.
• Different hyperparameters often result in models with significantly different
performance. So, it is crucial aspect in training process and a key element for the
resulting quality of prediction accuracies.
• Common methods for tuning: Grid search, Random search, Optimization, Latin
hypercube sampling

A. Senthil Thilak (NITK) Introduction to Machine Learning 31 / 44


Model Tuning
• Hyperparameter: A parameter whose value is set before the learning process begins.
• Hyperparameters govern many aspects of the behavior of machine learning models, such
as their ability to learn features from data, degree of generalizability in performance
when presented with new data, the time and memory cost of training the model, etc.
• Different hyperparameters often result in models with significantly different
performance. So, it is crucial aspect in training process and a key element for the
resulting quality of prediction accuracies.
• Common methods for tuning: Grid search, Random search, Optimization, Latin
hypercube sampling

A. Senthil Thilak (NITK) Introduction to Machine Learning 31 / 44


Model Tuning
• Hyperparameter: A parameter whose value is set before the learning process begins.
• Hyperparameters govern many aspects of the behavior of machine learning models, such
as their ability to learn features from data, degree of generalizability in performance
when presented with new data, the time and memory cost of training the model, etc.
• Different hyperparameters often result in models with significantly different
performance. So, it is crucial aspect in training process and a key element for the
resulting quality of prediction accuracies.
• Common methods for tuning: Grid search, Random search, Optimization, Latin
hypercube sampling
4.4 Model Tuning 125

Fig. 4.9 Schematic representation of the tuning process proposed by Kuhn and Johnson (2013)

Figure:prediction
Schematic representation of the tuning process
accuracies. However, choosing appropriate hyperparameters is challeng-
ing (Montesinos-López et al. 2018a). Hyperparameter tuning finds the best version
A. Senthil Thilak (NITK) of a statistical machine learning modeltobyMachine
Introduction running many training sets on the original
Learning 31 / 44
Metrics to evaluate the prediction performance

• Choice of metric depends on the type of response variable.

A. Senthil Thilak (NITK) Introduction to Machine Learning 32 / 44


Metrics to evaluate the prediction performance

• Choice of metric depends on the type of response variable.


• Quantitative Measures: Mean-Square Error (MSE), Pearson’s correlation coefficient,
Mean absolute error (MAE), Mean absolute percentage error (MAAPE)

A. Senthil Thilak (NITK) Introduction to Machine Learning 32 / 44


Metrics to evaluate the prediction performance

• Choice of metric depends on the type of response variable.


• Quantitative Measures: Mean-Square Error (MSE), Pearson’s correlation coefficient,
Mean absolute error (MAE), Mean absolute percentage error (MAAPE)
• Binary and Ordinal Measures: Confusion matrix, True false negatives (TFN), True
false positives (TFP), Total True negatives (TTN), Total true positives (TTP), Proportion
of cases correctly classified (PCCC), Kappa coefficient, Area under the receiver
operating characteristic curve (AUC - ROC), Likelihood estimator, etc.

A. Senthil Thilak (NITK) Introduction to Machine Learning 32 / 44


Metrics to evaluate the prediction performance

• Choice of metric depends on the type of response variable.


• Quantitative Measures: Mean-Square Error (MSE), Pearson’s correlation coefficient,
Mean absolute error (MAE), Mean absolute percentage error (MAAPE)
• Binary and Ordinal Measures: Confusion matrix, True false negatives (TFN), True
false positives (TFP), Total True negatives (TTN), Total true positives (TTP), Proportion
of cases correctly classified (PCCC), Kappa coefficient, Area under the receiver
operating characteristic curve (AUC - ROC), Likelihood estimator, etc.
• Count measures: Spearman’s correlation, Likelihood estimator, etc.

A. Senthil Thilak (NITK) Introduction to Machine Learning 32 / 44


Paradigms to design a Learning system

A. Senthil Thilak (NITK) Introduction to Machine Learning 33 / 44


Paradigms to design a Learning system

• Choosing a Training experience

A. Senthil Thilak (NITK) Introduction to Machine Learning 33 / 44


Paradigms to design a Learning system

• Choosing a Training experience


• Choosing the Target Function

A. Senthil Thilak (NITK) Introduction to Machine Learning 33 / 44


Paradigms to design a Learning system

• Choosing a Training experience


• Choosing the Target Function
• Choosing a Representation for the Target function

A. Senthil Thilak (NITK) Introduction to Machine Learning 33 / 44


Paradigms to design a Learning system

• Choosing a Training experience


• Choosing the Target Function
• Choosing a Representation for the Target function
• Choosing a Function Approximation Algorithm

A. Senthil Thilak (NITK) Introduction to Machine Learning 33 / 44


Paradigms to design a Learning system

• Choosing a Training experience


• Choosing the Target Function
• Choosing a Representation for the Target function
• Choosing a Function Approximation Algorithm
• The Final Design

A. Senthil Thilak (NITK) Introduction to Machine Learning 33 / 44


An example - Designing a learning system for Checkers
problem

• T: Play Checkers
• P: % of games won against an opponent

A. Senthil Thilak (NITK) Introduction to Machine Learning 34 / 44


An example - Designing a learning system for Checkers
problem

• T: Play Checkers
• P: % of games won against an opponent
• What experience to choose?

A. Senthil Thilak (NITK) Introduction to Machine Learning 34 / 44


An example - Designing a learning system for Checkers
problem

• T: Play Checkers
• P: % of games won against an opponent
• What experience to choose? (Training experience from which the system
learns)

A. Senthil Thilak (NITK) Introduction to Machine Learning 34 / 44


An example - Designing a learning system for Checkers
problem

• T: Play Checkers
• P: % of games won against an opponent
• What experience to choose? (Training experience from which the system
learns)
• What exactly should be learnt?

A. Senthil Thilak (NITK) Introduction to Machine Learning 34 / 44


An example - Designing a learning system for Checkers
problem

• T: Play Checkers
• P: % of games won against an opponent
• What experience to choose? (Training experience from which the system
learns)
• What exactly should be learnt? (Target Function)

A. Senthil Thilak (NITK) Introduction to Machine Learning 34 / 44


An example - Designing a learning system for Checkers
problem

• T: Play Checkers
• P: % of games won against an opponent
• What experience to choose? (Training experience from which the system
learns)
• What exactly should be learnt? (Target Function)
• How shall it be represented?

A. Senthil Thilak (NITK) Introduction to Machine Learning 34 / 44


An example - Designing a learning system for Checkers
problem

• T: Play Checkers
• P: % of games won against an opponent
• What experience to choose? (Training experience from which the system
learns)
• What exactly should be learnt? (Target Function)
• How shall it be represented? (Representation of Target function)

A. Senthil Thilak (NITK) Introduction to Machine Learning 34 / 44


An example - Designing a learning system for Checkers
problem

• T: Play Checkers
• P: % of games won against an opponent
• What experience to choose? (Training experience from which the system
learns)
• What exactly should be learnt? (Target Function)
• How shall it be represented? (Representation of Target function)
• What specific algorithm/mechanism to adopt for learning?

A. Senthil Thilak (NITK) Introduction to Machine Learning 34 / 44


An example - Designing a learning system for Checkers
problem

• T: Play Checkers
• P: % of games won against an opponent
• What experience to choose? (Training experience from which the system
learns)
• What exactly should be learnt? (Target Function)
• How shall it be represented? (Representation of Target function)
• What specific algorithm/mechanism to adopt for learning? (Learning
Model/Algorithm)

A. Senthil Thilak (NITK) Introduction to Machine Learning 34 / 44


An example - Designing a learning system for Checkers
problem

• T: Play Checkers
• P: % of games won against an opponent
• What experience to choose? (Training experience from which the system
learns)
• What exactly should be learnt? (Target Function)
• How shall it be represented? (Representation of Target function)
• What specific algorithm/mechanism to adopt for learning? (Learning
Model/Algorithm)
• The Final design !!!

A. Senthil Thilak (NITK) Introduction to Machine Learning 34 / 44


An example - Designing a learning system for Checkers
problem (contd...)

I Choosing the Training Experience:

A. Senthil Thilak (NITK) Introduction to Machine Learning 35 / 44


An example - Designing a learning system for Checkers
problem (contd...)

I Choosing the Training Experience: Has a significant impact on success or


failure of the learning algorithm.

A. Senthil Thilak (NITK) Introduction to Machine Learning 35 / 44


An example - Designing a learning system for Checkers
problem (contd...)

I Choosing the Training Experience: Has a significant impact on success or


failure of the learning algorithm.
Main Attributes:
1 Does it provide DIRECT or INDIRECT feedback on the choices made
by the system?

A. Senthil Thilak (NITK) Introduction to Machine Learning 35 / 44


An example - Designing a learning system for Checkers
problem (contd...)

I Choosing the Training Experience: Has a significant impact on success or


failure of the learning algorithm.
Main Attributes:
1 Does it provide DIRECT or INDIRECT feedback on the choices made
by the system?
• Checkers game example:
- DIRECT −→ Individual checkers board states and the correct move for
each. (EASY!!!)

A. Senthil Thilak (NITK) Introduction to Machine Learning 35 / 44


An example - Designing a learning system for Checkers
problem (contd...)

I Choosing the Training Experience: Has a significant impact on success or


failure of the learning algorithm.
Main Attributes:
1 Does it provide DIRECT or INDIRECT feedback on the choices made
by the system?
• Checkers game example:
- DIRECT −→ Individual checkers board states and the correct move for
each. (EASY!!!)
- INDIRECT −→ Involves the move sequences and final outcomes of various
games played. (Correctness inferred indirectly from the fact that the
game was eventually won or lost!!!)

A. Senthil Thilak (NITK) Introduction to Machine Learning 35 / 44


An example - Designing a learning system for Checkers
problem (contd...)

2 The degree to which the learner controls the sequence of training


examples

A. Senthil Thilak (NITK) Introduction to Machine Learning 36 / 44


An example - Designing a learning system for Checkers
problem (contd...)

2 The degree to which the learner controls the sequence of training


examples - Teacher or Not?

A. Senthil Thilak (NITK) Introduction to Machine Learning 36 / 44


An example - Designing a learning system for Checkers
problem (contd...)

2 The degree to which the learner controls the sequence of training


examples - Teacher or Not?
- The learner might rely on the teacher to select informative board states
and to prove the correct move for each or

A. Senthil Thilak (NITK) Introduction to Machine Learning 36 / 44


An example - Designing a learning system for Checkers
problem (contd...)

2 The degree to which the learner controls the sequence of training


examples - Teacher or Not?
- The learner might rely on the teacher to select informative board states
and to prove the correct move for each or the learner might itself propose
board states that it finds particularly confusing and ask the teacher for the
correct move.
- Alternatively, may have complete control over the board states and
(indirect) training classifications by experimenting with novel board states
(without a teacher).
- Number of settings exist for learning!!! (Random process, Queries to
an expert teacher, Auto-exploring environment, etc.)

A. Senthil Thilak (NITK) Introduction to Machine Learning 36 / 44


2 How well it respresents the distribution/pattern of data similar to that of
future test data/unseen instance?
- We observe that most current theory of machine learning rests on the
crucial assumption that the distribution of training examples is identical to
the distribution of test examples. Despite our need to make this
assumption in order to obtain theoretical results, it is important to keep in
mind that this assumption must often be violated in practice.
- To proceed further with the design, let us assume that our system will train
by playing games against itself.
- This has the advantage that no external trainer is needed and hence, it
allows the system to generate as much training data as time permits. We
now have a fully specified learning task.

A. Senthil Thilak (NITK) Introduction to Machine Learning 37 / 44


An example - Designing a learning system for Checkers
problem (contd...)
II Choosing the Target Function / Function approximation:
An example - Designing a learning system for Checkers
problem (contd...)
II Choosing the Target Function / Function approximation: (Determines
the exact type of knowledge to be learnt and how this will be used by the
learner)
An example - Designing a learning system for Checkers
problem (contd...)
II Choosing the Target Function / Function approximation: (Determines
the exact type of knowledge to be learnt and how this will be used by the
learner)
• Checkers game example:
An example - Designing a learning system for Checkers
problem (contd...)
II Choosing the Target Function / Function approximation: (Determines
the exact type of knowledge to be learnt and how this will be used by the
learner)
• Checkers game example: (With B → Set of legal board states & M →
Set of legal moves)
- ChooseMove: B → M ???
An example - Designing a learning system for Checkers
problem (contd...)
II Choosing the Target Function / Function approximation: (Determines
the exact type of knowledge to be learnt and how this will be used by the
learner)
• Checkers game example: (With B → Set of legal board states & M →
Set of legal moves)
- ChooseMove: B → M ???
- V: B → R (Assigns higher scores/real values for better board
states; Successful learning helps in selecting best move from
any current board position)
An example - Designing a learning system for Checkers
problem (contd...)
II Choosing the Target Function / Function approximation: (Determines
the exact type of knowledge to be learnt and how this will be used by the
learner)
• Checkers game example: (With B → Set of legal board states & M →
Set of legal moves)
- ChooseMove: B → M ???
- V: B → R (Assigns higher scores/real values for better board
states; Successful learning helps in selecting best move from
any current board position)
Examples:
An example - Designing a learning system for Checkers
problem (contd...)
II Choosing the Target Function / Function approximation: (Determines
the exact type of knowledge to be learnt and how this will be used by the
learner)
• Checkers game example: (With B → Set of legal board states & M →
Set of legal moves)
- ChooseMove: B → M ???
- V: B → R (Assigns higher scores/real values for better board
states; Successful learning helps in selecting best move from
any current board position)
Examples:
• V (b) = 100, if b → won [Link]
An example - Designing a learning system for Checkers
problem (contd...)
II Choosing the Target Function / Function approximation: (Determines
the exact type of knowledge to be learnt and how this will be used by the
learner)
• Checkers game example: (With B → Set of legal board states & M →
Set of legal moves)
- ChooseMove: B → M ???
- V: B → R (Assigns higher scores/real values for better board
states; Successful learning helps in selecting best move from
any current board position)
Examples:
• V (b) = 100, if b → won [Link]
• V (b) = −100, if b → lost [Link]
An example - Designing a learning system for Checkers
problem (contd...)
II Choosing the Target Function / Function approximation: (Determines
the exact type of knowledge to be learnt and how this will be used by the
learner)
• Checkers game example: (With B → Set of legal board states & M →
Set of legal moves)
- ChooseMove: B → M ???
- V: B → R (Assigns higher scores/real values for better board
states; Successful learning helps in selecting best move from
any current board position)
Examples:
• V (b) = 100, if b → won [Link]
• V (b) = −100, if b → lost [Link]
• V (b) = 0, if b → drawn [Link]
An example - Designing a learning system for Checkers
problem (contd...)
II Choosing the Target Function / Function approximation: (Determines
the exact type of knowledge to be learnt and how this will be used by the
learner)
• Checkers game example: (With B → Set of legal board states & M →
Set of legal moves)
- ChooseMove: B → M ???
- V: B → R (Assigns higher scores/real values for better board
states; Successful learning helps in selecting best move from
any current board position)
Examples:
• V (b) = 100, if b → won [Link]
• V (b) = −100, if b → lost [Link]
• V (b) = 0, if b → drawn [Link]
• V (b) = V (b0 ), if b → is not a final [Link] & b’ is the next final [Link]
acheivable starting from b in an optimal play
An example - Designing a learning system for Checkers
problem (contd...)
II Choosing the Target Function / Function approximation: (Determines
the exact type of knowledge to be learnt and how this will be used by the
learner)
• Checkers game example: (With B → Set of legal board states & M →
Set of legal moves)
- ChooseMove: B → M ???
- V: B → R (Assigns higher scores/real values for better board
states; Successful learning helps in selecting best move from
any current board position)
Examples:
• V (b) = 100, if b → won [Link]
• V (b) = −100, if b → lost [Link]
• V (b) = 0, if b → drawn [Link]
• V (b) = V (b0 ), if b → is not a final [Link] & b’ is the next final [Link]
acheivable starting from b in an optimal play (Not operational & requires a
plausible approximation!!!)
An example - Designing a learning system for Checkers
problem (contd...)

III Choosing Representation for the Target Function:


• Collection of Rules?
• Neural Network?
• Polynomial/Linear/Non-linear function of features?
- Checkers problem:

V̂ (b) = w0 + w1 x1 + w2 x2 + · · · + w6 x6

• x1 = #black pieces; • x5 = #black pieces threatened


• x2 = #red pieces; by red;
• x3 = #black kings; • x6 = #red pieces threatened by
• x4 = #red kings; black.
Estimating Training Values

• V (b) : the true target function (Ideal Target fn.)


• V̂ (b) : the learned function (Modified/Actual Learning fn.)
• Vtrain (b) : the training value

Rule for estimating training values:

Vtrain (b) ←− V̂ (Successor(b))

A. Senthil Thilak (NITK) Introduction to Machine Learning 40 / 44


Weight Tuning

LMS Weight Update rule:


• (Do Repeatedly) Selet a training example b at random
1 Compute error(b):

error(b) = Vtrain (b) − V̂ (b)

2 For each board feature xi update weight wi :

wi ← wi + η ∗ error(b)xi ,

where η is a small constant, say 0.1, to moderate the rate of learning.

A. Senthil Thilak (NITK) Introduction to Machine Learning 41 / 44


uation function to depend on only the six specific board features provided. If
true target function V can indeed be represented by a linear combination of th

12 MACHINE LEARNING
Determine Type
of Training Experience
1
Experiment
Generator

New problem
(initial game board)
Hypothesis
f VJ
Determine
Target Function I
Performance Generalizer

I
System
Determine Representation
of Learned Function

Training examples
...
Solution tract
(game history) /<bl .Ymtn(blJ >. ...I
>. <bZ. Em(b2) Linear function Artificial neural
of six features network

I
/ \

I
Critic
Determine
Learning Algorithm

FIGURE 1.1
Figure: Final Design
Final design of the checkers learning program.

strategy used by the Performance System to select its next move at each step
is determined by the learned p evaluation function. Therefore,
FIGUREwe1.2expect
Figure:
Sununary
its performance to improve as this evaluation function becomes Summary
of choices
increasingly in designing the of Design
checkers Choices
learning program.
accurate.
e The Critic takes as input the history or trace of the game and produces as
output a set of training examples of the target function. As shown in the
diagram, each training example in this case corresponds to some game state
in A.
theSenthil
trace, along
Thilakwith an estimate Vtrai,
(NITK) of theIntroduction
target function value forLearning
to Machine this 42 / 44
Challenges in Machine Learning

• What algorithms can approximate functions well (and when)?


• How does the number of traning examples influence accuracy?
• How does complexity of hypothesis representation impact it?
• How does noisy data influence accuracy?
• What are the theoretical limits of learnability?
• How can prior knowledge of learner help?
• What clues can we get from biological learning systems?
• How can systems alter their own representations?

A. Senthil Thilak (NITK) Introduction to Machine Learning 43 / 44


References

Alpaydin, E. (2020). Introduction to machine learning. MIT press.


Mitchell, T. M. (1997). Machine Learning. McGraw-Hill, Inc., USA, 1
edition.

A. Senthil Thilak (NITK) Introduction to Machine Learning 44 / 44

You might also like