0% found this document useful (0 votes)
27 views16 pages

Designing Learning System

The document outlines the design of a learning system in machine learning, detailing the steps involved in choosing training experiences, target functions, and function approximation algorithms. It emphasizes the importance of hypothesis evaluation, generalization, and the concepts of overfitting and underfitting in model performance. Additionally, it discusses dataset splitting into training, validation, and testing sets for effective model evaluation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views16 pages

Designing Learning System

The document outlines the design of a learning system in machine learning, detailing the steps involved in choosing training experiences, target functions, and function approximation algorithms. It emphasizes the importance of hypothesis evaluation, generalization, and the concepts of overfitting and underfitting in model performance. Additionally, it discusses dataset splitting into training, validation, and testing sets for effective model evaluation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

8/28/2025

Designing Learning System


1. Designing a Learning System
2. Inductive Bias and Hypothesis
3. Hypothesis Evaluation

Modelling and Evaluation of ML Models


label
Dataset

Function
Pre-processing

Data Learns the definition of


Model Selection/Design

• The output of ML model is nothing but a program or a


function
Model Training
• What about definition of ? i.e. ( ) =?
• Definition of depends on its type, parameters used
(learnable or non-learnable)
Model Evaluation • During training ML model learns the definition of
• After training, for input X the function predicts the label
i.e. → or : →
Final Model

Dr. Anupam Biswas, NIT Silchar, Bharat 2


8/28/2025

Designing a Learning System


• ML according to Tom Mitchell
• A computer program is said to learn from
experience E
• with respect to some class of tasks T
• and performance measure P,
• if its performance at tasks in T, as measured
by P, improves with experience E.

Dr. Anupam Biswas, NIT Silchar, Bharat 3

Designing a Learning System


• Step 1) Choosing Training Experience:
• The first design choice we face is to choose the type of
training experience from which the system will learn.
• The type of training experience available can have a
significant impact on success or failure of the learner.
• One key attribute is whether the training experience
provides direct or indirect feedback regarding the
choices made by the performance system.
• Direct feedback-individual checkers board states and the
correct move for each
• Indirect feedback-the move sequences and final outcomes
of various games played
• Lets consider direct feedback

Dr. Anupam Biswas, NIT Silchar, Bharat 4


8/28/2025

Designing a Learning System


• Step 2) Choosing Target Function:
• The next design choice is to determine exactly what type of knowledge will be
learned i.e. the function that is to be learned
• E.g. for checker game, the most obvious choice for the type of information to
be learned is a program, or function, that chooses the best legal move for
any given board state.
• Let us call this function ChooseMove : B → M to indicate that
• this function accepts as input any board from the set of legal board states B
• and produces as output some move from the set of legal moves M
• Although ChooseMove is an obvious choice but it is difficult to learn
• Better function will be, an evaluation function that assigns a numerical score
to any given board state.

Dr. Anupam Biswas, NIT Silchar, Bharat 5

Designing a Learning System


• Step 2) Choosing Target Function cont…
• Let us call this target function V : B → ℝ to denote that V maps any legal
board state from the set B to some real value ℝ . Here B is successor board
state not the current board state.
• If the system can successfully learn such a target function V, then it can easily
use it to select the best move from any current board position.
• What exactly should be the value of the target function V for any given
board state?
• Let us define the target value V ( b )for an arbitrary board state b in B , as follows:
1. if b is a final board state that is won, then V ( b )= 100
2. if b is a final board state that is lost, then V ( b )= -100
3. if b is a final board state that is drawn, then V ( b )= 0
4. if b is a not a final state in the game, then V(b) = V(b’), where b' is the best final board state
that can be achieved starting from b and playing optimally until the end of the game.

Dr. Anupam Biswas, NIT Silchar, Bharat 6


8/28/2025

Designing a Learning System


• Step 2) Choosing Target Function cont…
• Cases 1-3 is easy, but determining the value of V(b) for case (4) requires
searching ahead in optimal line till the end, which is very difficult.
• Alternative way is to consider an approximate function using some local
state to acquire only some approximation to the target function V.
• In general too, the ideal target function considering operational description of
the problem is very difficult to learn perfectly.
• Thus we have
1. The ideal target function V for the problem
2. The approximate function , which is to be actually learned
• Thus, the process of learning the target function is often called function
approximation. The approximate function is also referred as hypothesis.

Dr. Anupam Biswas, NIT Silchar, Bharat 7

Designing a Learning System


• Step 3) Choosing Representation for Target function
• For the ideal target function V, we need to represent the approximate function .
• There are number of options for representation of function
• using a large table with a distinct entry specifying the value for each distinct board state
• or using a collection of rules that match against features of the board state
• or a quadratic polynomial function of predefined board features, or an artificial neural network.
• Let us choose a simple representation:
• for any given board state b, the function will be calculated as a linear combination of the
following board features:
x1: the number of black pieces on the board
x2: the number of red pieces on the board
x3: the number of black pieces threatened by red (i.e., which can be captured on red's next turn)
x4: the number of red pieces threatened by black
• Thus, represent (b) as a linear function as follows:
= 0+ 1 1+ 2 2+ 3 3+ 4 4
• where w0 through w4 are numerical coefficients, or weights, to be chosen by the learning
algorithm

Dr. Anupam Biswas, NIT Silchar, Bharat 8


8/28/2025

Designing a Learning System


• Partial design of a checkers learning program:
• Task T: playing checkers
• Performance measure P: percent of games won in the world tournament
• Training experience E: games played against itself (direct feedback)
• Target function: V : B → ℝ
• Target function representation
= 0+ 1 1+ 2 2+ 3 3+ 4 4

Dr. Anupam Biswas, NIT Silchar, Bharat 9

Designing a Learning System


• Step 4) Choosing Function Approximation Algorithm
• In order to learn the target function we require a set of training examples,
• each describing a specific board state b and the training value Vtrain(b) for b.
• i.e. an ordered pair of form <b, Vtrain(b)>
• E.g. <{x1=2, x2=0, x3=0, x4=0>, +100> , it is a win for black as x2=0
• Rules for estimating training values:
• As function gives the approximate value of an arbitrary board state, to estimate the value of
the next legal move from current board state, function can be used for successor state
( )← ( )
• All that remains is to specify the learning algorithm for choosing the weights wi to
best fit the set of training examples <b, Vtrain(b)>
• It is done by minimizing squared error
• = ∑<b, Vtrain(b)> ∈ ,
• Update wi based on error i.e. LMS rule wi=wi+ *E*xi, Here is constant

Dr. Anupam Biswas, NIT Silchar, Bharat 10


8/28/2025

Designing a Learning System-Summary

Dr. Anupam Biswas, NIT Silchar, Bharat 11

Hypothesis Space
• Generic Notations
• Training examples <b, Vtrain(b)> i.e. (X, y) Here, X→ data, y → label
• Approximate function (b , ) i.e. (X, ( )) or (X, ) Here, → predicted label
• X={x1,x2,x3…xn} Here, xi are features
• In last example what we saw,
• We are given some data and we tried to do induction to identify a function
represented in terms of features that can explain the data.
• I.e. We consider each data point one by one to get an estimated outcome and
based on error we update the function.
• Often the identified function is referred as one of the hypothesis for the
problem and the process is called induction learning/inductive learning.
• There are different kinds of functions i.e. classification/regression algorithms
Dr. Anupam Biswas, NIT Silchar, Bharat 12
8/28/2025

Hypothesis Space
• Function representations
• Linear or non-linear function
• Decision Tree
• Neural Network
• Single layer perceptron
• Multi-layer perceptron

Dr. Anupam Biswas, NIT Silchar, Bharat 13

Hypothesis Space
• For a problem multiple features can be considered to define the function.
• Set of all possible instances is called feature space or instance space.
• An instance is a data point in the feature space, known as feature vector.
• Lets consider two features x1 and x2, need to find the class of “?”
hypothesis

f(x)=mX+c

x2 ? x2 x2 x2

x1 x1 x1 x1

Dr. Anupam Biswas, NIT Silchar, Bharat 14


8/28/2025

Hypothesis Space
• A hypothesis ℎ is a function that best describes the target based on training examples in
supervised machine learning.
• Hypothesis space is the set of all the possible legal hypothesis.
• Different parameters and selected features based on which the hypothesis (function)
represented is called hypothesis language.
• With a specific hypothesis language a particular class of functions can be obtained after training.
• E.g. for linear regression the function obtained after training is a line

hypothesis

f(x)=mX+c

x2 x2

x1 x1
Dr. Anupam Biswas, NIT Silchar, Bharat 15

Inductive Bias
• Supervised learning we can think of it as a program that explores the
hypothesis space.
• Different parameter settings will lead to different functions that maps the
input to the output.
• Thus hypothesis space in general is very large.
• The set of hypotheses that can be produced, can be restricted by a
language bias. How?

• Certain rules or assumptions are applied to narrow down the search


space for a suitable hypothesis, which are referred as Inductive Bias.

Dr. Anupam Biswas, NIT Silchar, Bharat 16


8/28/2025

Inductive Bias
• Example
• With two features, we can have convex, concave or linear function.
• We can put a restriction and consider only linear function.
• Thereby, the hypothesis space is reduced to linear functions only.

x2
x2

x1
x1

Dr. Anupam Biswas, NIT Silchar, Bharat 17

Inductive Bias
• Biases are of two types
• Restrictions: puts restrictions to limit the hypothesis space
• Preferences: impose orders to prefer specific kind of hypotheses
• A classical example is “Occam’s Razor”
• It is a philosophical principle that “simpler hypothesis is better”
• i.e. if something can be described in a short language that hypothesis is to be
preferred over a more complex hypothesis
• It is a kind of preference for simpler hypothesis over complex one
• Another example- Maximum margin in SVM
• Maximizing the boundary between classes in SVM

Dr. Anupam Biswas, NIT Silchar, Bharat 18


8/28/2025

Inductive Learning
• In ML (supervised learning), the main aim is to induct a general
function (or hypothesis) based on training examples.
• Inductive Learning is to determine a hypothesis h identical to the
target concept c over the entire set of instances X, the only
information available about c is its value over the training examples.
• Inductive learning is inherently a problematic approach
• An unique hypothesis cannot be obtained unless all possible examples are
used during training, which in general is not possible.
• Thus, inductive learning algorithms can at best guarantee that the output
hypothesis fits the target concept over the training data.
• However, our assumption regarding unseen instances is that the best
hypothesis also best fits the observed training data.

Dr. Anupam Biswas, NIT Silchar, Bharat 19

Generalization
• What class of hypothesis or function will be obtained after training?
• It depends on
• the data, i.e. from training what features are selected
• the function parameters, i.e. the weights that are learned during training
• what type of restrictions or biases that are imposed
• In ML, coming up with a function is all about doing generalization.
• Generalization errors
• Bias error: the estimated model with training example, how much differs
from the true model.
• Error due to inaccurate assumption/simplifications made by the model
• Variance error: how much the estimated models from different training sets
differ from each other.
• Error due to different training sets estimated different models

Dr. Anupam Biswas, NIT Silchar, Bharat 20


8/28/2025

Overfitting and Underfitting

• Overfitting:
• The model is too “complex” and fits irrelevant characteristics (i.e. noise) in the
data.
• Low bias and High variance
• Low training error and High test error
• Underfitting:
• The model is too “simple” to represent all relevant characteristics in the data.
• High bias and Low variance
• High training error and high test error

Dr. Anupam Biswas, NIT Silchar, Bharat 21

Hypothesis Evaluation
• Evaluation of learning system or model (i.e. hypothesis) is crucial
• The model is designed to predict class of future unseen data points
• Thus testing on unseen data point is very important
• Evaluation strategies
• Dataset splitting or sampling
• Hold-out/K-fold cross validation
• Evaluation measures/metrics
• Error
• Accuracy
• Precision/Recall
• Other measures to be discussed later on

Dr. Anupam Biswas, NIT Silchar, Bharat 22


8/28/2025

Dataset splitting
• Datasets are commonly divided into three distinct subsets:
• training, validation, and testing sets, each serving a specific purpose in the
model development and evaluation process.

Dr. Anupam Biswas, NIT Silchar, Bharat 23

Dataset splitting
• Training set
• This is the largest portion of the dataset and is used to train the machine learning
model.
• Validation set
• This subset is used during the model development phase to fine-tune
hyperparameters and make decisions about the model architecture.
• It is considered a part of the training of the model.
• It provides an unbiased evaluation of the model's performance on unseen data
during training, helping to prevent overfitting.
• Testing set
• This subset is used only once, after the model has been fully trained.
• It provides a final, unbiased evaluation of the model's performance on truly unseen
data, giving an indication of how the model is expected to perform in a real-world
deployment scenario.

Dr. Anupam Biswas, NIT Silchar, Bharat 24


8/28/2025

Size Price
Dataset splitting 1256 250
2333 400
• Hold-Out Method 2789 435

Training
• The dataset is split into two sets: one for training 1400 200
and one for testing. 1500 145
3999 500
• Advantage
3200 430
• It's a simple and quick way to evaluate a model.
1399 199
• Disadvantage 1578 180
• May lead to higher bias as portion of dataset never 2677 240
used in training. 2500 299
3000 350

Testing
1400 170
2544 280
3599 450
Dr. Anupam Biswas, NIT Silchar, Bharat 25

Size Price
Dataset splitting 1256 250
2333 400
• K-fold cross validation 2789 435
• Split the data into k-parts (e.g. k=3) 1400 200
• Train the model k times, each time leaving
Training

1500 145
one fold for test 3999 500
• Final performance is average of all k tests 3200 430
• Advantages 1399 199
• Best for small dataset 1578 180
• Every data point is used for both training 2677 240
and testing 2500 299
• Disadvantages 3000 350
Testing

• Time consuming for large dataset 1400 170


• it may not be ideal for imbalanced datasets 2544 280
3599 450
Dr. Anupam Biswas, NIT Silchar, Bharat 26
8/28/2025

Dataset splitting
• Stratified k-fold cross validation
• It ensure that each fold of the cross-validation process maintains the same
class distribution as the entire dataset.
• The dataset is divided into k folds while maintaining the proportion of classes
in each fold.
• During each iteration, one-fold is used for testing and the remaining folds are
used for training.
• The process is repeated k times with each fold serving as the test set exactly
once just like k-fold cross validation.
• Advantages
• Very useful for imbalanced data

Dr. Anupam Biswas, NIT Silchar, Bharat 27

Dataset splitting
• Leave-one-out cross-validation (LOOCV)
• A single data point is used as the test set, and the remaining data points are used for
training the model.
• This process is repeated for each data point in the dataset, resulting in n iterations,
where n is the number of data points.
• The final performance metric is the average of the results from all iterations.
• It is extreme case of k-fold cross validation with (k=n).
• Advantage
• It reduces bias in performance estimation and a more robust evaluation compared to
validation set approaches.
• Disadvantages
• Variance Issues: May produce high variance in the model performance estimate
since it produces different training sets.
• Scalability Issues: Not always suitable for large datasets due to the computational
burden.

Dr. Anupam Biswas, NIT Silchar, Bharat 28


8/28/2025

Evaluation measures/metrics
• Training/Testing Error
• Training Error:
• This refers to the error a machine learning model exhibits when evaluated on the
same data it was trained on.
• It measures how well the model has learned the patterns and relationships within
the training dataset.
• A very low training error, especially when combined with a high test error, can be an
indicator of overfitting, where the model has memorized the training data rather
than learning generalizable features.
• Test Error:
• This refers to the error a machine learning model exhibits when evaluated on a
separate, unseen dataset, known as the test set.
• The test set is independent of the training data and is used to assess the model's
ability to generalize to new, real-world data.
• A low test error indicates that the model is performing well on data it has not
encountered during training, suggesting good generalization.

Dr. Anupam Biswas, NIT Silchar, Bharat 29

Evaluation measures/metrics
• Error:
• Absolute error:
∑ ℎ −

• Squared sum error:


∑ ℎ −

• Number of misclassification
1 =1 ℎ
ℎ ,
=0 ℎ

Dr. Anupam Biswas, NIT Silchar, Bharat 30


8/28/2025

Evaluation measures/metrics
• Confusion matrix
• True Positive: Both predicted and actual positive
• False Positive: Predicted positive but actually negative
• True Negative: Both predicted and actual negative
• False Negative: Predicted negative but actually positive
• Measures
• Accuracy=
• Precision=
• Recall=
∗ ∗
• F-measure=

Dr. Anupam Biswas, NIT Silchar, Bharat 31

You might also like