0% found this document useful (0 votes)
11 views40 pages

Miscellaneous Terms

Uploaded by

Aakash Bhat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views40 pages

Miscellaneous Terms

Uploaded by

Aakash Bhat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 40

Miscellaneous Terms in

Machine Learning
By:-
Dr Rashmi Popli
Associate Professor
Department of Computer Engineering

Dr Rashmi
Recommender systems
• Recommender systems are information filtering systems that help to
deal with the problem of information overload by filtering and
segregating information and creating fragments out of large amounts
of dynamically generated information according to user’s preferences,
interests, or observed behavior about a particular item or items.
• A Recommender system has the ability to predict whether a particular
user would prefer an item or not based on the user’s profile and its
historical information.

Dr Rashmi
Filtering
• Content-Based Filtering
• Collaborative Filtering

Dr Rashmi
Content-Based Filtering
• Definition: A content-based recommendation engine suggests
relevant items to users based on the features of those items.
Working:
• It considers the individual user’s preferences and focuses on the content
attributes of items.
• For example, if a user frequently searches for “yellow dresses” on an e-
commerce website, a content-based recommendation engine will suggest
other dresses of the same color.
• Music services like Spotify often use content-based filtering to recommend
songs based on a user’s listening history.

Dr Rashmi
Content-Based Filtering

Dr Rashmi
Collaborative Filtering
• Definition: Collaborative filtering recommends items based
on similarity measures between users or items.
• Working:
• It identifies patterns by analyzing user behavior across a community of users.
• If users with similar tastes have liked certain items, those items are
recommended to a new user.
• For instance, when you use Spotify, it suggests music liked by other users with
similar tastes.
• Amazon, Netflix, and other platforms use collaborative filtering to
recommend products or movies.

Dr Rashmi
Collaborative Filtering

Dr Rashmi
Dr Rashmi
Hybrid Recommendation
Engines
• Some recommendation systems combine both content-based and
collaborative filtering techniques. These hybrid engines offer a blend of
personalized recommendations.
• Examples: Services like Amazon and Spotify often employ hybrid approaches
to enhance recommendation accuracy.

Dr Rashmi
Over fitting-Under
fitting
Sphere
Features
Shape
Size Features
Radius Shape
Eat
Play

Dr Rashmi
Overfitting
• Overfitting occurs when a machine learning model learns the training data too well, capturing noise
and random fluctuations in the data rather than the underlying patterns.
• Characteristics:
• The model performs exceptionally well on the training data but poorly on new, unseen data.
• It may exhibit high accuracy or low error on the training set but fails to generalize to other datasets.
• Overfit models often have overly complex structures or too many parameters, capturing noise instead of the
actual patterns.
• Causes:
• Too many features or parameters relative to the amount of training data.
• Training the model for too many epochs, allowing it to memorize the training data.
• Using an overly complex model architecture.
• Prevention and Remedies:
• Use regularization techniques (e.g., L1 or L2 regularization) to penalize overly complex models.
• Increase the amount of training data.
• Use simpler model architectures.
• Employ techniques like dropout during training to prevent over-reliance on specific features.
Dr Rashmi
Under fitting
• Under fitting occurs when a machine learning model is too simple to capture the underlying
patterns in the training data, resulting in poor performance on both the training and new
data.
• Characteristics
• The model performs poorly on both the training set and new, unseen data.
• It may have high training error or low accuracy.
• Causes
• Using a model that is too simple or has too few parameters.
• Insufficient training time or insufficient data to capture the underlying patterns.
• Prevention and Remedies
• Use more complex model architectures.
• Increase the number of features or use feature engineering to provide the model with more
information.
• Train the model for a sufficient number of epochs to allow it to learn the patterns in the data.
Dr Rashmi
Overfitting-Underfitting

Dr Rashmi
Gradient Descent in Machine
Learning
• What is Gradient?
A gradient is nothing but a derivative that defines the effects on outputs of
the function with a little bit of variation in inputs.
• What is Gradient Descent?
• It is a numerical optimization algorithm that aims to find the optimal
parameters—weights and biases—of a neural network by minimizing a
defined cost function.
• Gradient Descent (GD) is a widely used optimization algorithm in machine
learning and deep learning that minimises the cost function of a neural
network model during training. It works by iteratively adjusting the weights
or parameters of the model in the direction of the negative gradient of the
cost function until the minimum of the cost function is reached.
Dr Rashmi
• Gradient Descent is a fundamental optimization algorithm in
machine learning used to minimize the cost or loss function during model
training.
• It iteratively adjusts model parameters by moving in the direction of the
steepest decrease in the cost function.
• The algorithm calculates gradients, representing the partial derivatives of
the cost function concerning each parameter.
• These gradients guide the updates, ensuring convergence towards the
optimal parameter values that yield the lowest possible cost.
• Gradient Descent is versatile and applicable to various machine learning
models, including linear regression and neural networks. Its efficiency lies in
navigating the parameter space efficiently, enabling models to learn patterns
and make accurate predictions. Adjusting the learning rate is crucial to
balance convergence speed and avoiding overshooting the optimal solution.
Dr Rashmi
Loss Function
• A function that measures the difference between the predicted values
and the actual values. It guides the optimization process by quantifying
how well the model performs.
• Loss functions are classified into two classes based on the type of
learning task
• Regression Models: predict continuous values.
• Classification Models: predict the output from a set of finite categorical
values.
• Mean Squared Error(MSE)
• Mean Absolute Error (MAE)
Dr Rashmi
AUC-ROC Curve
• A graphical representation of a classification model’s performance
across different thresholds, plotting the true positive rate against the
false positive rate.
• The AUC-ROC curve, or Area Under the Receiver Operating
Characteristic curve, is a graphical representation of the performance
of a binary classification model at various classification thresholds.
• It is commonly used in machine learning to assess the ability of a
model to distinguish between two classes, typically the positive class
(e.g., presence of a disease) and the negative class (e.g., absence of a
disease).

Dr Rashmi
Receiver Operating
Characteristics (ROC) Curve
• ROC stands for Receiver Operating Characteristics, and the ROC curve
is the graphical representation of the effectiveness of the binary
classification model.

• It plots the true positive rate (TPR) vs the false positive rate (FPR) at
different classification thresholds.

Dr Rashmi
Area Under Curve (AUC) Curve:
• AUC stands for the Area Under the Curve, and the AUC curve
represents the area under the ROC curve.
• It measures the overall performance of the binary classification model.
• As both TPR and FPR range between 0 to 1, So, the area will always lie
between 0 and 1.
• A greater value of AUC denotes better model performance.
• Our main goal is to maximize this area in order to have the highest TPR
and lowest FPR at the given threshold.
• The AUC measures the probability that the model will assign a
randomly chosen positive instance a higher predicted probability
compared to a randomly chosen negative instance.
Dr Rashmi
Validation Set
• A subset of data used to tune hyperparameters and assess the
model’s performance during training, helping to prevent overfitting.

Dr Rashmi
Cross validation
• Cross validation is a technique used in machine learning to evaluate the
performance of a model on unseen data.
• It involves dividing the available data into multiple folds or subsets, using
one of these folds as a validation set, and training the model on the
remaining folds.
• This process is repeated multiple times, each time using a different fold as
the validation set.
• Finally, the results from each validation step are averaged to produce a
more robust estimate of the model’s performance.
• Cross validation is an important step in the machine learning process and
helps to ensure that the model selected for deployment is robust and
generalizes well to new data.
Dr Rashmi
Variance
• A measure of how much the predictions of a model change with
different subsets of the training data.
• High variance indicates overfitting.
• Low variance suggests your model is internally consistent, with
predictions varying little from each other after every iteration.
• High variance (with low bias) suggests your model may be overfitting
and reading too deeply into the noise found in every training set.

Dr Rashmi
Epoch
• One complete pass through the entire training dataset. Multiple epochs are
often required to train a model effectively.
• An epoch is a complete iteration through the entire training dataset in one cycle
for training the machine learning model.
• During an epoch, Every training sample in the dataset is processed by the
model, and its weights and biases are updated in accordance with the computed
loss or error.
• In general, increasing the number of epochs improves the performance of the
model by allowing it to learn more complex patterns in the data. If there are too
many epochs, the model may overfit, So, it is important to monitor the model’s
performance on a validation set during training and stop training when the
validation performance starts to decay.
Dr Rashmi
Example of an Epoch
• If we are training a model on a 1000 samples dataset, one epoch
would involve training on all 1000 samples at one time.
• If the dataset has 1000 samples but a batch size of 100 is used, then
there would be only 10 batches in total. In this case, each epoch
would consist of 10 iterations, with each iteration processing one
batch of 100 samples.

Dr Rashmi
Tuning
• The process of adjusting model hyperparameters to optimize
performance, often involving techniques like grid search, random
search, or Bayesian optimization.

Dr Rashmi
Few-shot learning
• Few-shot learning is a type of meta-learning process. It is a process in
which a model possesses the capability to autonomously acquire
knowledge and improve its performance through self-learning.
• It is a process like teaching the model to recognize things or do tasks,
but instead of overwhelming it with a lot of examples, it only needs a
few. Few-shot learning focuses on enhancing the model’s capability to
learn quickly and efficiently from new and unseen data.

Dr Rashmi
Example
• If you want a computer to recognize a new type of car and you show a
few pictures of it instead of hundreds of cars. The computer uses this
small amount of information and recognizes similar cars on its own.
This process is known as few-shot learning.

Dr Rashmi
Bias
• Bias in machine learning refers to the tendency of a model to
consistently favor specific outcomes or predictions over others due to
the data it was trained on.
• Reducing bias is essential to ensure fair and accurate predictions.

Dr Rashmi
Imbalanced Data
• Imbalanced data refers to a data set where the distribution of classes
is significantly skewed, leading to an unequal number of instances for
each class. Handling imbalanced data is essential to prevent biased
model predictions.

Dr Rashmi
Joint Probability
• Joint probability is the probability of two or more events occurring
simultaneously. In machine learning, joint probability is often used in
modeling and inference tasks.

Dr Rashmi
Normalization
• Normalization is scaling numerical features to a standard range to
prevent one feature from dominating the learning process over
others.

Dr Rashmi
Transfer Learning
• Transfer learning is a technique where a pre-trained model is used as
a starting point for a new, related machine-learning task.
• It enables leveraging knowledge learned from one task to improve
performance on another.

Dr Rashmi
Weight
• In machine learning, weights are the parameters of a model that are
adjusted during training to minimize the error or loss function.

Dr Rashmi
Convergence
• A state reached during the training of a model when the loss changes
very little between each iteration.

Dr Rashmi
Dimension
• Dimension for machine learning and data scientist is different from
physics. Here, dimension of data means how many features you have
in your data ocean(data-set).
• e.g in case of object detection application, flatten image size and color
channel(e.g 28*28*3) is a feature of the input set.
• In case of house price prediction (maybe) house size is the data-set
so we call it 1 dimentional data.

Dr Rashmi
Extrapolation
• Making predictions outside the range of a dataset.

• E.g. My dog barks, so all dogs must bark. In machine learning we


often run into trouble when we extrapolate outside the range of our
training data.

Dr Rashmi
Noise
• Any irrelevant information or randomness in a dataset which obscures
the underlying pattern.

Dr Rashmi
Null Accuracy
• Baseline accuracy that can be achieved by always predicting the most
frequent class

• For eg:- (“B has the highest frequency, so lets guess B every time”).

Dr Rashmi
Confusion Matrix
• True Positive (TP) - Your model predicted the positive class. For
example, identifying a spam email as spam.
• True Negative (TN) - Your model correctly predicted the negative
class. For example, identifying a regular email as not spam.
• False Positive (FP) - Your model incorrectly predicted the positive
class. For example, identifying a regular email as spam.
• False Negative (FN) - Your model incorrectly predicted the negative
class. For example, identifying a spam email as a regular email.

Dr Rashmi
• Type 1 Error
• False Positives. Consider a company optimizing hiring practices to
reduce false positives in job offers. A type 1 error occurs when
candidate seems good and they hire him, but he is actually bad.
• Type 2 Error
• False Negatives. The candidate was great but the company passed on
him.

Dr Rashmi

You might also like