0% found this document useful (0 votes)

11 views40 pages

Miscellaneous Terms

Uploaded by

Aakash Bhat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views40 pages

Miscellaneous Terms

Uploaded by

Aakash Bhat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 40

Miscellaneous Terms in

Machine Learning
By:-
Dr Rashmi Popli
Associate Professor
Department of Computer Engineering

Dr Rashmi
Recommender systems
• Recommender systems are information filtering systems that help to
deal with the problem of information overload by filtering and
segregating information and creating fragments out of large amounts
of dynamically generated information according to user’s preferences,
interests, or observed behavior about a particular item or items.
• A Recommender system has the ability to predict whether a particular
user would prefer an item or not based on the user’s profile and its
historical information.

Dr Rashmi
Filtering
• Content-Based Filtering
• Collaborative Filtering

Dr Rashmi
Content-Based Filtering
• Definition: A content-based recommendation engine suggests
relevant items to users based on the features of those items.
Working:
• It considers the individual user’s preferences and focuses on the content
attributes of items.
• For example, if a user frequently searches for “yellow dresses” on an e-
commerce website, a content-based recommendation engine will suggest
other dresses of the same color.
• Music services like Spotify often use content-based filtering to recommend
songs based on a user’s listening history.

Dr Rashmi
Content-Based Filtering

Dr Rashmi
Collaborative Filtering
• Definition: Collaborative filtering recommends items based
on similarity measures between users or items.
• Working:
• It identifies patterns by analyzing user behavior across a community of users.
• If users with similar tastes have liked certain items, those items are
recommended to a new user.
• For instance, when you use Spotify, it suggests music liked by other users with
similar tastes.
• Amazon, Netflix, and other platforms use collaborative filtering to
recommend products or movies.

Dr Rashmi
Collaborative Filtering

Dr Rashmi
Dr Rashmi
Hybrid Recommendation
Engines
• Some recommendation systems combine both content-based and
collaborative filtering techniques. These hybrid engines offer a blend of
personalized recommendations.
• Examples: Services like Amazon and Spotify often employ hybrid approaches
to enhance recommendation accuracy.

Dr Rashmi
Over fitting-Under
fitting
Sphere
Features
Shape
Size Features
Radius Shape
Eat
Play

Dr Rashmi
Overfitting
• Overfitting occurs when a machine learning model learns the training data too well, capturing noise
and random fluctuations in the data rather than the underlying patterns.
• Characteristics:
• The model performs exceptionally well on the training data but poorly on new, unseen data.
• It may exhibit high accuracy or low error on the training set but fails to generalize to other datasets.
• Overfit models often have overly complex structures or too many parameters, capturing noise instead of the
actual patterns.
• Causes:
• Too many features or parameters relative to the amount of training data.
• Training the model for too many epochs, allowing it to memorize the training data.
• Using an overly complex model architecture.
• Prevention and Remedies:
• Use regularization techniques (e.g., L1 or L2 regularization) to penalize overly complex models.
• Increase the amount of training data.
• Use simpler model architectures.
• Employ techniques like dropout during training to prevent over-reliance on specific features.
Dr Rashmi
Under fitting
• Under fitting occurs when a machine learning model is too simple to capture the underlying
patterns in the training data, resulting in poor performance on both the training and new
data.
• Characteristics
• The model performs poorly on both the training set and new, unseen data.
• It may have high training error or low accuracy.
• Causes
• Using a model that is too simple or has too few parameters.
• Insufficient training time or insufficient data to capture the underlying patterns.
• Prevention and Remedies
• Use more complex model architectures.
• Increase the number of features or use feature engineering to provide the model with more
information.
• Train the model for a sufficient number of epochs to allow it to learn the patterns in the data.
Dr Rashmi
Overfitting-Underfitting

Dr Rashmi
Gradient Descent in Machine
Learning
• What is Gradient?
A gradient is nothing but a derivative that defines the effects on outputs of
the function with a little bit of variation in inputs.
• What is Gradient Descent?
• It is a numerical optimization algorithm that aims to find the optimal
parameters—weights and biases—of a neural network by minimizing a
defined cost function.
• Gradient Descent (GD) is a widely used optimization algorithm in machine
learning and deep learning that minimises the cost function of a neural
network model during training. It works by iteratively adjusting the weights
or parameters of the model in the direction of the negative gradient of the
cost function until the minimum of the cost function is reached.
Dr Rashmi
• Gradient Descent is a fundamental optimization algorithm in
machine learning used to minimize the cost or loss function during model
training.
• It iteratively adjusts model parameters by moving in the direction of the
steepest decrease in the cost function.
• The algorithm calculates gradients, representing the partial derivatives of
the cost function concerning each parameter.
• These gradients guide the updates, ensuring convergence towards the
optimal parameter values that yield the lowest possible cost.
• Gradient Descent is versatile and applicable to various machine learning
models, including linear regression and neural networks. Its efficiency lies in
navigating the parameter space efficiently, enabling models to learn patterns
and make accurate predictions. Adjusting the learning rate is crucial to
balance convergence speed and avoiding overshooting the optimal solution.
Dr Rashmi
Loss Function
• A function that measures the difference between the predicted values
and the actual values. It guides the optimization process by quantifying
how well the model performs.
• Loss functions are classified into two classes based on the type of
learning task
• Regression Models: predict continuous values.
• Classification Models: predict the output from a set of finite categorical
values.
• Mean Squared Error(MSE)
• Mean Absolute Error (MAE)
Dr Rashmi
AUC-ROC Curve
• A graphical representation of a classification model’s performance
across different thresholds, plotting the true positive rate against the
false positive rate.
• The AUC-ROC curve, or Area Under the Receiver Operating
Characteristic curve, is a graphical representation of the performance
of a binary classification model at various classification thresholds.
• It is commonly used in machine learning to assess the ability of a
model to distinguish between two classes, typically the positive class
(e.g., presence of a disease) and the negative class (e.g., absence of a
disease).

Dr Rashmi
Receiver Operating
Characteristics (ROC) Curve
• ROC stands for Receiver Operating Characteristics, and the ROC curve
is the graphical representation of the effectiveness of the binary
classification model.

• It plots the true positive rate (TPR) vs the false positive rate (FPR) at
different classification thresholds.

Dr Rashmi
Area Under Curve (AUC) Curve:
• AUC stands for the Area Under the Curve, and the AUC curve
represents the area under the ROC curve.
• It measures the overall performance of the binary classification model.
• As both TPR and FPR range between 0 to 1, So, the area will always lie
between 0 and 1.
• A greater value of AUC denotes better model performance.
• Our main goal is to maximize this area in order to have the highest TPR
and lowest FPR at the given threshold.
• The AUC measures the probability that the model will assign a
randomly chosen positive instance a higher predicted probability
compared to a randomly chosen negative instance.
Dr Rashmi
Validation Set
• A subset of data used to tune hyperparameters and assess the
model’s performance during training, helping to prevent overfitting.

Dr Rashmi
Cross validation
• Cross validation is a technique used in machine learning to evaluate the
performance of a model on unseen data.
• It involves dividing the available data into multiple folds or subsets, using
one of these folds as a validation set, and training the model on the
remaining folds.
• This process is repeated multiple times, each time using a different fold as
the validation set.
• Finally, the results from each validation step are averaged to produce a
more robust estimate of the model’s performance.
• Cross validation is an important step in the machine learning process and
helps to ensure that the model selected for deployment is robust and
generalizes well to new data.
Dr Rashmi
Variance
• A measure of how much the predictions of a model change with
different subsets of the training data.
• High variance indicates overfitting.
• Low variance suggests your model is internally consistent, with
predictions varying little from each other after every iteration.
• High variance (with low bias) suggests your model may be overfitting
and reading too deeply into the noise found in every training set.

Dr Rashmi
Epoch
• One complete pass through the entire training dataset. Multiple epochs are
often required to train a model effectively.
• An epoch is a complete iteration through the entire training dataset in one cycle
for training the machine learning model.
• During an epoch, Every training sample in the dataset is processed by the
model, and its weights and biases are updated in accordance with the computed
loss or error.
• In general, increasing the number of epochs improves the performance of the
model by allowing it to learn more complex patterns in the data. If there are too
many epochs, the model may overfit, So, it is important to monitor the model’s
performance on a validation set during training and stop training when the
validation performance starts to decay.
Dr Rashmi
Example of an Epoch
• If we are training a model on a 1000 samples dataset, one epoch
would involve training on all 1000 samples at one time.
• If the dataset has 1000 samples but a batch size of 100 is used, then
there would be only 10 batches in total. In this case, each epoch
would consist of 10 iterations, with each iteration processing one
batch of 100 samples.

Dr Rashmi
Tuning
• The process of adjusting model hyperparameters to optimize
performance, often involving techniques like grid search, random
search, or Bayesian optimization.

Dr Rashmi
Few-shot learning
• Few-shot learning is a type of meta-learning process. It is a process in
which a model possesses the capability to autonomously acquire
knowledge and improve its performance through self-learning.
• It is a process like teaching the model to recognize things or do tasks,
but instead of overwhelming it with a lot of examples, it only needs a
few. Few-shot learning focuses on enhancing the model’s capability to
learn quickly and efficiently from new and unseen data.

Dr Rashmi
Example
• If you want a computer to recognize a new type of car and you show a
few pictures of it instead of hundreds of cars. The computer uses this
small amount of information and recognizes similar cars on its own.
This process is known as few-shot learning.

Dr Rashmi
Bias
• Bias in machine learning refers to the tendency of a model to
consistently favor specific outcomes or predictions over others due to
the data it was trained on.
• Reducing bias is essential to ensure fair and accurate predictions.

Dr Rashmi
Imbalanced Data
• Imbalanced data refers to a data set where the distribution of classes
is significantly skewed, leading to an unequal number of instances for
each class. Handling imbalanced data is essential to prevent biased
model predictions.

Dr Rashmi
Joint Probability
• Joint probability is the probability of two or more events occurring
simultaneously. In machine learning, joint probability is often used in
modeling and inference tasks.

Dr Rashmi
Normalization
• Normalization is scaling numerical features to a standard range to
prevent one feature from dominating the learning process over
others.

Dr Rashmi
Transfer Learning
• Transfer learning is a technique where a pre-trained model is used as
a starting point for a new, related machine-learning task.
• It enables leveraging knowledge learned from one task to improve
performance on another.

Dr Rashmi
Weight
• In machine learning, weights are the parameters of a model that are
adjusted during training to minimize the error or loss function.

Dr Rashmi
Convergence
• A state reached during the training of a model when the loss changes
very little between each iteration.

Dr Rashmi
Dimension
• Dimension for machine learning and data scientist is different from
physics. Here, dimension of data means how many features you have
in your data ocean(data-set).
• e.g in case of object detection application, flatten image size and color
channel(e.g 28*28*3) is a feature of the input set.
• In case of house price prediction (maybe) house size is the data-set
so we call it 1 dimentional data.

Dr Rashmi
Extrapolation
• Making predictions outside the range of a dataset.

• E.g. My dog barks, so all dogs must bark. In machine learning we

often run into trouble when we extrapolate outside the range of our
training data.

Dr Rashmi
Noise
• Any irrelevant information or randomness in a dataset which obscures
the underlying pattern.

Dr Rashmi
Null Accuracy
• Baseline accuracy that can be achieved by always predicting the most
frequent class

• For eg:- (“B has the highest frequency, so lets guess B every time”).

Dr Rashmi
Confusion Matrix
• True Positive (TP) - Your model predicted the positive class. For
example, identifying a spam email as spam.
• True Negative (TN) - Your model correctly predicted the negative
class. For example, identifying a regular email as not spam.
• False Positive (FP) - Your model incorrectly predicted the positive
class. For example, identifying a regular email as spam.
• False Negative (FN) - Your model incorrectly predicted the negative
class. For example, identifying a spam email as a regular email.

Dr Rashmi
• Type 1 Error
• False Positives. Consider a company optimizing hiring practices to
reduce false positives in job offers. A type 1 error occurs when
candidate seems good and they hire him, but he is actually bad.
• Type 2 Error
• False Negatives. The candidate was great but the company passed on
him.

Dr Rashmi

L 13 Choose Your Own Algorithm D 07062024 111828am
No ratings yet
L 13 Choose Your Own Algorithm D 07062024 111828am
36 pages
Modelling
No ratings yet
Modelling
69 pages
Gansp Awareness Quiz PDF
No ratings yet
Gansp Awareness Quiz PDF
13 pages
CSC413 Lecture Note
No ratings yet
CSC413 Lecture Note
32 pages
MACHINELEARNING
No ratings yet
MACHINELEARNING
20 pages
Recommender Systems Notes
No ratings yet
Recommender Systems Notes
21 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
116 pages
Machine Learning QUESTION AND ANSWERS
No ratings yet
Machine Learning QUESTION AND ANSWERS
13 pages
CH 03
No ratings yet
CH 03
48 pages
AIML105
No ratings yet
AIML105
5 pages
5 Classification
No ratings yet
5 Classification
40 pages
Model Evaluation and Selection
No ratings yet
Model Evaluation and Selection
49 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
Data Mining: Class Imbalance Solutions
No ratings yet
Data Mining: Class Imbalance Solutions
56 pages
Module3 DS PPT
No ratings yet
Module3 DS PPT
68 pages
Aasignment
No ratings yet
Aasignment
7 pages
ML 5
No ratings yet
ML 5
26 pages
Deep Learning Notes
No ratings yet
Deep Learning Notes
61 pages
Module 1
No ratings yet
Module 1
7 pages
Lecture 3 1611410001002
No ratings yet
Lecture 3 1611410001002
51 pages
Unit 3
No ratings yet
Unit 3
17 pages
Machine Learning Note
No ratings yet
Machine Learning Note
40 pages
Session01 DataScience
No ratings yet
Session01 DataScience
79 pages
Machine Learning Interview Questions
No ratings yet
Machine Learning Interview Questions
38 pages
ML 2 PPT Unit 2
No ratings yet
ML 2 PPT Unit 2
214 pages
Week 15
No ratings yet
Week 15
41 pages
Unit 3
No ratings yet
Unit 3
37 pages
Ad3501 DL Unit 1
No ratings yet
Ad3501 DL Unit 1
7 pages
Optimization
No ratings yet
Optimization
95 pages
ML 2024
No ratings yet
ML 2024
52 pages
Descriptive Analytics in Business Decisions
No ratings yet
Descriptive Analytics in Business Decisions
30 pages
Unit6 - 7 Issues
No ratings yet
Unit6 - 7 Issues
53 pages
Module 3 Data Science Machine Learning
No ratings yet
Module 3 Data Science Machine Learning
53 pages
Ethics, Uses and Abuses of ML
No ratings yet
Ethics, Uses and Abuses of ML
11 pages
Overfitting vs Underfitting in ML
No ratings yet
Overfitting vs Underfitting in ML
32 pages
Brain, Bytes & Bias: ML Interview Questions You Can't Miss!
No ratings yet
Brain, Bytes & Bias: ML Interview Questions You Can't Miss!
21 pages
ML Answer Key (M.tech)
No ratings yet
ML Answer Key (M.tech)
31 pages
Supervised Lerning
No ratings yet
Supervised Lerning
39 pages
Functions:: Sparse Modeling
No ratings yet
Functions:: Sparse Modeling
7 pages
Machine Learning Basics
No ratings yet
Machine Learning Basics
32 pages
Introduction To Optimization-Lec1
No ratings yet
Introduction To Optimization-Lec1
36 pages
Ad3501 DL Notes
No ratings yet
Ad3501 DL Notes
16 pages
Overfitting & Feature Engineering
No ratings yet
Overfitting & Feature Engineering
37 pages
نسخة من prep
No ratings yet
نسخة من prep
17 pages
MachineLearning Chatgpt
No ratings yet
MachineLearning Chatgpt
19 pages
All Cards
No ratings yet
All Cards
106 pages
Chapter 1-ML
No ratings yet
Chapter 1-ML
27 pages
Top 25 Machine Learning Interview Questions
No ratings yet
Top 25 Machine Learning Interview Questions
21 pages
Types of Machine Learning
No ratings yet
Types of Machine Learning
63 pages
Data Science Interview Question
No ratings yet
Data Science Interview Question
7 pages
A10 Model Performance v2 2up
No ratings yet
A10 Model Performance v2 2up
11 pages
Aids2 QB Ut2
No ratings yet
Aids2 QB Ut2
24 pages
Full Machine Learning Definition
No ratings yet
Full Machine Learning Definition
79 pages
Machine Learning Notes "2023
No ratings yet
Machine Learning Notes "2023
31 pages
Machine - Learning - Unit - 1
No ratings yet
Machine - Learning - Unit - 1
70 pages
PMP ITTO Process Chart PMBOK Guide 6th Edition-1a
No ratings yet
PMP ITTO Process Chart PMBOK Guide 6th Edition-1a
14 pages
Market Snapshot Full Years 08-09
No ratings yet
Market Snapshot Full Years 08-09
1 page
The Book of Next Gen Networks
No ratings yet
The Book of Next Gen Networks
164 pages
SE Unit 5.2
No ratings yet
SE Unit 5.2
8 pages
Computers Buyers & Importers in India
No ratings yet
Computers Buyers & Importers in India
7 pages
List of Mathematical Symbols For Use in Social Sciences
No ratings yet
List of Mathematical Symbols For Use in Social Sciences
23 pages
Steam Gen: Detection and Classification of Discontinuities Using Discrete Wavelet Transform and MFL Testing
No ratings yet
Steam Gen: Detection and Classification of Discontinuities Using Discrete Wavelet Transform and MFL Testing
10 pages
Ssg-Ng01012401-Gen-Aa-5800-00008 - C01 - Project Interface Managment Plan
100% (1)
Ssg-Ng01012401-Gen-Aa-5800-00008 - C01 - Project Interface Managment Plan
16 pages
337337a Midi Operator Station Datasheet
No ratings yet
337337a Midi Operator Station Datasheet
2 pages
Yeni Metin Belgesi
No ratings yet
Yeni Metin Belgesi
4 pages
Etech 1st Sem (Finals)
No ratings yet
Etech 1st Sem (Finals)
6 pages
Unit V - Activity and Multimedia With Databases: Intent
No ratings yet
Unit V - Activity and Multimedia With Databases: Intent
53 pages
Course Structure BArch 2017-22 PDF
No ratings yet
Course Structure BArch 2017-22 PDF
106 pages
04 Excel 2016 - Cell Basics
No ratings yet
04 Excel 2016 - Cell Basics
18 pages
Communication Manual: Microlab 600 RS-232
No ratings yet
Communication Manual: Microlab 600 RS-232
23 pages
CM4101 EN SKF Microlog Analyzer Series Product Launch - RevE PDF
0% (1)
CM4101 EN SKF Microlog Analyzer Series Product Launch - RevE PDF
601 pages
iSCSI Network ESX5i
No ratings yet
iSCSI Network ESX5i
57 pages
SAVIOR AERM User Manual
100% (1)
SAVIOR AERM User Manual
37 pages
292 Introduction To Cyber Security II Assignment
No ratings yet
292 Introduction To Cyber Security II Assignment
2 pages
High Availability Disaster Recovery For Sap Applications
No ratings yet
High Availability Disaster Recovery For Sap Applications
33 pages
New Dea Skripsi Edit (25!02!2024)
No ratings yet
New Dea Skripsi Edit (25!02!2024)
106 pages
Assignment 4 CMOS Digital VLSI Design
No ratings yet
Assignment 4 CMOS Digital VLSI Design
4 pages
Kanban Tutorial
100% (3)
Kanban Tutorial
29 pages
PES 2019 Pro Evolution Soccer 2019CPY by Heroskeep License Key PDF
40% (5)
PES 2019 Pro Evolution Soccer 2019CPY by Heroskeep License Key PDF
3 pages
Matrix Algebra PDF
No ratings yet
Matrix Algebra PDF
56 pages
Scrum Artifacts Overview
No ratings yet
Scrum Artifacts Overview
1 page
Introduction To Digital Control: Digital Control Engineering, Second Edition - © 2013 Elsevier Inc. All Rights Reserved
No ratings yet
Introduction To Digital Control: Digital Control Engineering, Second Edition - © 2013 Elsevier Inc. All Rights Reserved
6 pages
The Effect of Computer Technology On Academic Achievement Author
No ratings yet
The Effect of Computer Technology On Academic Achievement Author
5 pages
NSX 63 Troubleshooting
No ratings yet
NSX 63 Troubleshooting
238 pages
PG VCE 2021 Booklist
No ratings yet
PG VCE 2021 Booklist
9 pages

Miscellaneous Terms

Uploaded by

Miscellaneous Terms

Uploaded by

Miscellaneous Terms in

• E.g. My dog barks, so all dogs must bark. In machine learning we

You might also like