0% found this document useful (0 votes)
38 views42 pages

ML Unit-1

The document discusses key concepts in machine learning, including VC dimensions, PAC learning, hypothesis spaces, inductive bias, and generalization. It highlights the challenges of high-dimensional data, such as sparsity and overfitting, and emphasizes the importance of dimensionality reduction techniques. Additionally, it explains the significance of inductive bias in guiding model selection and the balance between bias and variance for effective generalization.

Uploaded by

nnce ece
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views42 pages

ML Unit-1

The document discusses key concepts in machine learning, including VC dimensions, PAC learning, hypothesis spaces, inductive bias, and generalization. It highlights the challenges of high-dimensional data, such as sparsity and overfitting, and emphasizes the importance of dimensionality reduction techniques. Additionally, it explains the significance of inductive bias in guiding model selection and the balance between bias and variance for effective generalization.

Uploaded by

nnce ece
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 42

 VC DIMENSIONS

 PAC
 HYPOTHESIS SPACES
 INDUCTIVE BIAS
 GENERALIZATION
 BIASS VARIANCE TRADE-OFF
VAPNIK CHERVONENKIS
(VC) DIMENTIONS
• Vc provides a measure of the complexity of space of function & which allows the probably
approximately correct framework to be extended to spaces containing an infinite number of function.

• VC DIMENTIONS IS A MEASURE OF COMPLEXITY OR CAPACITY OF A CLASS OF


FUNCTION f(α).

• The vc dimension measures the largest number of eg that can be explained by the family f(α).
DATA SET

• no of features -no of dimensions in a dataset increases, the amount of data to generalize accurately
grows.

• In the context of data anlysis and maching learning, dimensions refer to the features or attributes of
data.

• Examples data set: Houses-price, size, number of bedrooms, location.

• if we add more dimensions to dataset, the volume of the space increases, the data becomes sparse.

 1D-points
 2D-area
 3D-more points
Problems

• Data sparsity-clustering and classification challenging

• Increased Computation-more resourse and time

• Overfitting-reduces the model's ability to generalize to new data.

• Euclidean distance-the difference in distances between data points tends to become negligible.

• Performace degradation-k-nearest neighbors can drop in performance.

• Visualizatoin challenges- hard to visualize, making EDA more difficult.


solution

• In high dimensional data, the data points are at the edges or corners, making the data sparse.

• Dimensionality refers to the challenges and complications that arise when analysing and organising
data in high-dimensional spaces(100-1000 dimensions).

• The solution to the curse of dimensionalty is "dimensionality reduction".


dimensionality reduction

• "IT is a process that reduces the number of random variables under consideration by obtaining a set
of principle variables.

• by reducing dimensionality, we can retain the most important information in the dataset while
discarding the redundant or less important features.

• Dimensioality reduction methods:


1.PCA Principal component Analysis
2.LDA Linear Discriminant Analysis
3.t-SNE t-Distributed stochastic Neighbor Embedding
Vapnik-Chervonenkis (VC) dimension

• Vapnik-Chervonenkis (VC) dimension is a measure of the size (capacity, complexity, expressive


power, richness, or flexibility) of a class of sets.

• The Vapnik-Chervonenkis (VC) dimension is a measure of the capacity of a hypothesis set to fit
different data sets.

• Vapnik-Chervonenkis dimension, is a measure of the complexity of a machine learning model.

• VC dimension is measure of a model's capacity. which is used to guide the model selection process
while developing machine learning applications.

• VC dimension is a measure of the difficulty of the machine learning problem.it is the cardinality of
the largest set of points that the algorithm can shatter.
(VC) –Shattering

• shattering is the ability of a model to classify a set of points perfectly.it consider all possible
combinations of labels upon those points.

• VC dimension of a model is the size of the largest set of the that model can shatter.

 h(m)=0;h(n)=0

 h(m)=0;h(n)=1

 h(m)=1;h(n)=0

 h(m)=1;h(n)=1

VC dimension 2, The model can divide the points into


two segments two points in the dataset sre shattered.
(VC)

• r=1 if x is a positive example

• 0 if x is a negative example

• dataset containing N points.N points can be labeled in 2N ways


as positive and negative.

• find a hypothesis h ∈ H

• The maximum number of number of points that can be shattered


by H is called the Vapnik-Chervonenkis (VC) dimension.

• VC dimensionof H, is denoted as VC(H), and measures the


capacity of H.
(VC)

• VC dimension is the capacity of a machine learning algorithm.

• capacity-its ability to learn from a given dataset

• accuracy-its ability to correctly identify labels for a given dataset.

• VC dimension act as a guiding light in model selection.capacity of classification model=complexity


of model

• Eg:Bujus-After some finite number of examples, learner will have learned the correct concept (though
might not even know it!). Correct means agrees with target concept on labels for all data.
Probably Approximately Correct (PAC)
learning
Probably Approximately Correct (PAC) learning
• In PAC learning, the goal is to find a hypothesis that performs well on unseen examples, given a sample of labeled
training probability distribution the training data is drawn independently and identically from an unknown

Applications of PAC learning in machine learning

• Supervised Learning: PAC learning is particularly applicable to supervised learning problems, where a model is
trained on labeled examples to make predictions on new unseen examples.

• Sample Selection: PAC learning guides the process of selecting representative training samples, ensuring that the
selected samples are informative and cover therundering distribution adequately

• Model Selection and Evaluation: PAC learning provides a theoretical framework for selecting and evaluating
models based on their generalization performance

• Active Learning: Active learning strategies use PAC learning principles, to actively query and select the most
informative or uncertain instances from anunlabeled dataset.

• Computational Learning Theory: It provides insights into the feasibility andcomplexity of learning tasks
PAC

• learning provides a theoretical framework for understanding the sample complexity, generalization
performance, and guarantees in learning from data.

• It plays a crucial role in shaping the design, evaluation, and analysis of machine learning algorithms.

• Probability of successful learning Number of training examples Complexity of hypothesis space

• Accuracy to which target function is approximated Manner in which training examples presented

• Instances X (set of instance or objects in world)

• Target concept c (subset of instance space)

• Hypothesis H (collection of concepts over X)

• Training data D (example from instance space)


Probably Approximately Correct (PAC)learning

• PAC learning is a for analyzing the efficiency of machine learning algorithms.

• The goal of PAC learning is to design algorithms that can learn a target concept with high probability
and high accuracy, given a finite amount of labeled training data.

• Hypothesis Class (H): The set of possible hypotheses orclassifiers that the learning algorithm

• Concept Class (C): The set of all possible target concepts.goal is for I Thethe learning algorithm to
output a hypothesis that approximates the true concept

• The concept is a function that maps instances to binary labels (0 or 1).


(PAC)

• The number of examples (labeled instances) hypothesis that is "pro approximately correct" with high
probability.

Error and Confidence:

• Error (ε): The maximum allowable error rate for the learned hypothesis. The hypothesis is considered
correct if its error is less than ɛ.

• Confidence (∂ ): The desired confidence level. It represents the probability that the hypothesis is
"probably approximately correct.“

• A learning algorithm is PAC if, for any ɛ> 0 and ∂ > 0, with probability at least 1-∂ (confidence
level), the algorithm outputs a hypothesis h such that the error of h is at most ɛ.
(PAC)

• Theoretical results in PAC learning provide bounds on the number of training examples needed to
achieve a certain level of confidence and error.

• Imagine we're doing classification with categorical inputs.

• All inputs and outputs are binary.Eg: GenderThere's a machine f(x,h), hypotheses called h, h.. H

Example hypotheses: citizenship

• X1 ^ X2

• if there are 3 attributes, what is the complete set of hypotheses in f?


(PAC)

• In probably approximately correct (PAC) learning, given a class, C, and PAC learning error rate (e)
and confidence level (6), and the goal is to achieve low error rates with high confidence.

Noise Tolerance:

• PAC learning often assumes that the training data may contain some amount of noise or errors.

• Noise tolerance refers to the ability of a learning algorithm to still learn the underlying concept in the
presence of such noise.

PAC

• PAC learning provides a rigorous theoretical framework for analyzing the performance of learning
algorithms in terms of their ability to generalize from limited data, control error rates, and achieve
high confidence in their predictions.
HYPOTHESIS SPACES
Hypothesis spaces

• Hypothesis space (H)Hypothesis space is defined as a set of all possible legal hypotheses; hence it is
also known as a hypothesis set.

• It is used by supervised machine learning algorithms to determine the best possible hypothesis to
describe the target function or best maps input to output.

• It is often constrained by choice of the framing of the problem, the choice of model, and the choice of
model configuration.

• hypothesis (h) can be concluded as a single hypothesis that maps input to proper output and can be
evaluated as well as used to make predictions.

• y= mx + b
Hypothesis spaces

y= mx + b

Where, Y- Range
m- Slope of the line which divided test data
x- domain
c- intercept (constant)

• Example: Let's understand the hypothesis (h) and hypothesis space (H) with a two-dimensional
coordinate plane showing the distribution of data

• Hypothesis space (H) is the composition of all legal best possible ways to divide the coordinate plane
so that it best maps input to proper output.
Inductive bias
Bias

• the average squared difference between predictions and true values.

• It's a measure of how good your model fits the data.

• Zero bias would mean that the model captures the true data generating process perfectly. Both your
training and validation loss would go to zero. That is unrealistic

Inductive Bias

• Every machine learning model requires some type of architecture design and some initial assumptions
about the data, to analyze.

• every belief that we make about the data is a form of inductive bias.

• Inductive biases play an important role in the ability of machine learning models to generalize to the
unseen data
Inductive bias

• Given a training dataset, we need some additional constraints or criteria to help us better fit the
training samples, so that the trained model can make better predictions on the unseen samples (i.e.,
generalize beyond the training data).

• The additional constraints or criteria here are called inductive bias.

• In traditional machine learning, every algorithm has its own inductive biases

• inductive bias refers to the set of assumptions that a learning algorithm makes to predict outputs for
inputs it has never seen.

• It's the bias of a model towards making a particular kind of assumption in order to generalize from its
training data to unseen situations.
Inductive bias Importantace

• Learning from Limited Data- Inductive bias helps models

• generalize to unseen data based on the assumptions they carry.

• Guiding Learning- Given a dataset, there can be countless hypotheses that fit the data. Inductive bias
helps the algorithm choose one plausible hypothesis.

• Preventing Overfitting

• A model with no bias or assumptions might fit the training data perfectly, capturing every minute
detail, including noise. This is known as overfitting.

• An inductive bias can prevent a model from overfitting by making it favor simpler hypotheses.
Types of Inductive Bias

Preference Bias- It expresses a preference for some hypotheses over others. For example, in decision
tree algorithms like ID3, the preference is for shorter trees over longer trees..

Restriction Bias- It restricts the set of hypotheses considered by the algorithm. For instance, a linear
regression algorithm restricts its hypothesis to linear relationships between variables.
Examples of Inductive Bias

Decision Trees- a bias towards shorter trees and splits that categorize the data most distinctly at each
level.

k-Nearest Neighbors (k-NN)-The algorithm assumes that instances that are close to each other in the
feature space have similar outputs.

Neural Networks: They have a bias towards smoothfunctions. The architecture itself (number of layers,
number of neurons) can also impose bias.

Linear Regression: Assumes a linear relationship between the input features and the output.
GENERALIZATION
Generalization

• Generalization refers to your model's ability to adapt properly to new, previously unseen data, drawn
from the same distribution as the one used to create the model.

• supervised learning in the domain of machine learning refers to a way for the model to learn and
understand data.

• Based on this training data, the model learns to make predictions.

• The term 'generalization' refers to the model's capability to adapt and react properly to previously
unseen, new data, which has been drawn from the same distribution as the one used to build the
Model.

• generalization examines how well a model can digest new data and make correct predictions after
getting trained on a training set
• Generalization IS a model is able to generalize is the key to its success.

• If you train a model too well on training data, it will be incapable of generalizing. In such cases, it
will end up making erroneous predictions when it's given new data. This would make the model
ineffective even though it's capable of making correct predictions for the training data set. This is
known as overfitting.

• The inverse (underfitting) is also true, which happens when you train a model with inadequate data. In
cases of underfitting, your model would fail to make accurate predictions even with the training data.
This would make the model just as useless as overfitting.

• Generalization is a measure of how your model performs on predicting unseen data, So, it is
important to come up with the best-generalized model to give better performance against future data.
Let us first understand what is underfitting and overfitting, and then see what are the best practices to
train a generalized model.
What is Underfitting?

• Underfitting is a state where the model cannot model itself on the training data. And also not able
to generalize new data. You can notice it with the help of loss function during your training. A
simple rule of thumb is if both training loss and cross-validation loss are high, then your model is
underfitting.

• Lack of data, not enough features, lack of variance in training data or high regularization rate can
cause underfitting. A simple solution is to add more shuffled data to your training.

TRAINING INCREASE, TESTING REDUCE


What is Overfitting?

• Overfitting is a situation where your model force learns the whole variance. Experts say it as
model starts to memorize all the noise instead of learning. A simple rule of thumb to identify the
overfitting is if your training loss is low and cross-validation loss is high then your model is
overfitting.

• Uncleaned data, fewer steps in training, higher complexity of the model (due to higher weights in
data) can cause overfitting. It is always recommended to preprocess data and create a good dáta
pipeline.Select only necessary and meaningful features with good variance.

TRAINING DATA REDUCE, TESTING INCREASE


best practices to get a Generalized model?

• It is important to have a training dataset with good variance (i.e. a shuffled data set).

• split data into training, AND evaluate.

• evaluation set is used to cross-validate the trained model. It is always good to ensure that the
distribution in all the dataset is stationary(same). To achieve this goal, you can track the performance
of a machine learning algorithm over time as it's working with a set of training data. You can plot both
the skill on the training data and the skill on a test dataset that you've held back from the training
process.

• Training the model for too long would cause a continual decrease in the performance on the training
dataset due to overfitting. At the same time, due to the model's decreasing ability for generalization,
the error for the test set would start to increase again.

Regularization

• Regularization is a method to avoid high variance and overfitting as well as to increase generalization.
Without getting into details, regularization aims to keep coefficients close to zero
Low Bias: A low bias model will make fewer assumptions about the form of the target function.

High Bias:A model with a high bias makes more assumptions, and the model becomes unable to capture
the important features of our dataset. A high bias model also cannot perform well on new data.

variance :tells that how much a random variable is different from its expected value.

Low variance means there is a small variation in the prediction of the target function with changes in the
training data set.

High variance shows a large variation in the prediction of the target function with changes in the
training dataset
Ways to Reduce High Variance:

• Reduce the input features or number of parameters as a model is overfitted

• Do not use a much complex model.

• Increase the training data.

• Increase the Regularization term


Bias-Variance Trade-Off

• While building the machine learning model, it is really important to take care of bias and
variance in order to avoid overfitting and underfitting in the model.

• •If the model is very simple with fewer parameters, it may have low variance and high bias.

• •Whereas, if the model is complex, which has a large number of parameters, it will have high
variance and low bias.

• •So, it is required to make a balance between bias and variance errors, and this balance between
the bias error and variance error is known as the Bias-Variance trade-off.
The properties of inductive bias

• The strength of inductive bias describes its limitation on the size of the hypothesis space that the
learner can search.

• Strong inductive bias gives the learner a relatively small search space, while weak inductive bias
provides a broader search space for the learner.

• How to measure it? VC dimension theory (Vapnik-Chervonenkis dimension) Correctness. Only


the correct inductive bias can ensure that the learner successfully learns the target concept.

• Conversely, under the incorrect induction bias, the learner cannot learn the correct target concept
no matter how many training samples are used.

• How to measure it? PAC learning theory (Probably Approximately Correct)


Trade-offs

• While inductive bias helps models generalize from training data, there's a trade-off.

• A strong inductive bias means the model might not be flexible enough to capture all patterns in the
data.

• On the other hand, too weak a bias could lead the model to overfit the training data.

• inductive bias is the "background knowledge" or assumptions that guide a machine learning
algorithm.

• It's essential for generalization, especially when the training data is sparse or noisy.

• However, choosing the right type and amount of inductive bias for a particular problem is an art and is
crucial for the success of the model.
VARIANCE

• A model is said to have high variance if its predictions are sensitive to small changes in the input
When a model does not perform as well as it does with the trained data set, there is a possibility that
the model has a variance.

• It basically tells how scattered the predicted values are from the actual values

• Bias: Error in training data


• Variance: Error in test data

• A statistical model is said to be overfitted when we feed it a lot more data than necessary.

• Training Data Accuracy is high and Test Data Accuracy is low.

• UNDERFITTING In order to avoid overfitting, we could stop the training at an earlier stage.

• Training Data Acc is low and Test Data Acc is low underfitting would imply that the model has still
capacity to learn, so you would simply train for more iterations or collect more data.
Bias-Variance Tradeoff

• The bias-variance tradeoff is a stand-alone theory that provides a different perspective on


generalization

• The bias-variance tradeoff in machine learning involves a tradeoff between approximation and
generalization, aiming to minimize the error in learning.

• The bias-variance analysis quantifies how well the best hypothesis performs in approximating the
target function, taking into account the overall ability of the hypothesis set to approximate the
function.

• The decomposition of the out-of-sample error into approximation and generalization components can
help understand the behavior of the hypothesis and its performance on different data sets.
• The variance term in the bias-variance tradeoff arises from the fact that we only have access to one
dataset at a time, resulting in different outcomes for each dataset.

• The bias-variance tradeoff can be measured by comparing the squared difference between the
predicted values and the true values, which is called variance, and the difference between the
predicted values and the expected value, which is called bias.

• Increasing the size of the hypothesis set reduces bias but increases variance, while decreasing the size
of the hypothesis set increases bias but reduces variance.

• The bias-variance tradeoff highlights the importance of finding the right balance between model
complexity and data resources in a learning situation.

• Overfitting occurs when a complex model with many degrees of freedom fits the training set perfectly
but fails to generalize well, resulting in a high out-of-sample error and no real learning.

• Ensemble learning, methods, such as Bagging, rely on the concept of reducing variance by averaging
multiple models or predictions, leading to improved performance

You might also like