0% found this document useful (0 votes)

5 views38 pages

Machine Learning

Uploaded by

surajsuryavanshi2003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views38 pages

Machine Learning

Uploaded by

surajsuryavanshi2003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

What is machine learning?

 Machine learning is field of study which give computer ability to learn from
past experience and past data without being explicitly programmed
o Arthure Samuel in 1959
 Or in simple words Machine learning is science or art of programming
computer to learn from the data
Supervised Learning
Definition
 Supervised Learning is a type of Machine Learning where the model is trained
using labeled data (data with both inputs and correct outputs).the computer
uses this information to learn the relationship between inputs and outputs.
It’s called “supervised” because it’s like a teacher guiding the computer.
Simple Example
 Imagine you want to teach a computer to recognize whether an email is spam
or not:
 You show the computer lots of emails.
 Each email is labeled as “spam” or “not spam.”
 The computer learns what spam emails look like based on these labeled
examples
 Later, you give it a new email, and it predicts if it’s spam or no
Supervised learning solve 2 problems
Regression and Classification
 Regression
o It is related to numeric values
o E.g.
 Population growth prediction
 Expecting life expectancy
 Market forecasting/prediction
 Advertising Popularity prediction
 Stock prediction
o Algorithms
 Linear Regression (single feature)
 Multiple Linear Regression (many features)
 Ridge Regression (regularized linear regression)
 Lasso Regression (another regularized version)
 Support Vector Regression (SVR)
 Decision Tree Regression / Random Forest Regression
 Classification
o Related to classify the records based on classes
o E.g.
 Find whether an email received is a spam or ham
 Identify customer segments
 Find if a bank loan is granted
 Identify if a kid will pass or fail in an examination
o Algorithms
 Logistic Regression
 Decision Tree
 Random Forest
 Support Vector Machine (SVM)
 K-Nearest Neighbors (KNN)
 Naïve Bayes
 AdaBoost
 Gradient Boosting / XGBoost
Unsupervised Learning
Definition
Unsupervised Learning is when the model is trained on unlabeled data (only inputs,
no correct outputs).
The goal is to find patterns, groups, or structures hidden in the data.

Examples

 Customer segmentation (grouping customers based on buying habits)

 Market basket analysis (finding which products are bought together)
 Anomaly detection (fraud detection, unusual activity in sensors)
 Document/topic clustering (news grouped by topics automatically)
 Image compression (reducing size without labels)

It solve the problem like clustering, Association Rules and Dimensionality Reduction

1. Clustering – discover the inherent groupings in the data, such as grouping customers by
purchasing behavior
o e.g., K-Means, Hierarchical Clustering, DBSCAN
2. Association Rules – finding relationships between items

An association rule learning problem is where you want to discover rules that describe
large portions of your data, such as people that buy X also tend to buy Y

E.g.

▪ Market basket analysis

Algorithms

▪ Apriori

▪ Eclat

▪ FP-Growth (used in market basket analysis)

3. Dimensionality Reduction

 The number of input features, variables, or columns present in a given dataset is known
as dimensionality, and the process to reduce these features is called dimensionality
reduction
 It is a way of converting the higher dimensions dataset into lesser dimensions dataset
ensuring that it provides similar information.
 A dataset contains a huge number of input features in various cases, which makes the
predictive modeling task more complicated  resource more test Time more
 Because it is very difficult to visualize or make predictions for the training dataset with a
high number of features, for such cases, dimensionality reduction techniques are required
to use
o Features Selection
 Filter
 Wrapper
 Embedded
o Features Extraction
 Principal Component Analysis (PCA)
 Linear Discriminant Analysis (LDA)
 Generalized Discriminant Analysis (GDA)

Reinforcement Learning
 A type of machine learning where an agent learns by interacting with an
environment, taking actions, and receiving rewards or penalties.
Goal: Learn the best strategy (policy) to maximize long-term rewards.
 It is employed by various software and machines to find the best possible
behavior or path it should take in a specific situation
 Reinforcement learning differs from the supervised learning in a way that in
supervised learning the training data has the answer key with it so the model
is trained with the correct answer itself whereas in reinforcement learning,
there is no answer but the reinforcement agent decides what to do to perform
the given task
 In the absence of training dataset, it is bound to learn from its experience
 Like humans learn in the real world – through trial and error. The agent
(software/machine) learns by experience:
o Action → Environment → Reward/Penalty → Learn → Improve
 Examples
o Resources management in computer clusters
o Traffic Light Control
o Robotics
o Web system configuration
o Chemistry
 Algorithms
o Q-Learning
o Deep Q-Learning & offline learning
Batch Learning / Offline Learning

 Production- is a server on which code is going to run

 Conventional way to training a model use whole data to train model
 No Incremental learning
 Generally use own system
 Training data -> train model on system using data -> test data -> deploy model
 If the dataset is larger or big data then it is costly and time taking
 Problem with batch learning

Model is static

Once learn no change in knowledgs

After 1 year

No change in Model

It give the same recommendation that based on previous 1 year dataset

We need to continuously updating of model every month or deploy

Data is large or big data

There is problem -> Hardware issues and availability

Train the model - > put on server -> again pull the ML model -> again train with
previous and updated data -> again deploy the model -> repeat the process again
and again
The above process is vary time consuming
In simple terms:
Batch learning is machine learning approach where
Model is train on the entire dataset all at once, instad of updating
continuously as new data arrive[avoid incremental learning] after training the model
is used for making predction ad it only changes if it is retain new data from scratch

Batch Learning advantage

If you want a batch learning system to know about new data, you need to train a
new version of the system from scratch on the full dataset, then stop the old system
and replace it with the new one
▪ The whole process of training, evaluating, and launching a Machine
Learning system can be automated easily

↳ MLops pipeline
▪ Training using the full set of data can take many hours
▪ Typically train a new system only every 24 hours or even just weekly
▪ Training on the full set of data requires a lot of computing resources (CPU,
memory space, disk space, disk I/O, network I/O)

Online Learning/ Incremental Learning

Online learning is a method where model learns continuously in sequence, updating
itself with each new data point or small groups of data point
Online learning is like learning day by day from what you see, rather than studying
everything all at once it helps computer keep improving step by step information
arrives.
Fast
Less costly
Less time consuming
Can train on server
Online learning do incremental learning using mini batches of data where train
model sequentially since the batches are small model can train on server

Each learning step is fast and cheap, so the system can learn about new data on the
server, as it arrives
Online learning is great for systems that receive data as a continuous flow (e.g.,
stock prices) and need to adapt to change rapidly or autonomously
It is also a good option if you have limited computing resources once an online
learning system has learned about new data instances, it does not need them
anymore, so you can discard them
This can save a huge amount of space
out-of-core learning: - train systems on huge datasets that cannot fit in one
machine’s main memory

When to use:
When there is concept drift
 If the software have volatile Nature
 The Frequency of data changes is high
Cost Effective
Faster Solution

Learning Rate:
Decide how much frequent you are going to train model on data
One important parameter of online learning systems is how fast they should
adapt to changing data
If you set a high learning rate, then your system will rapidly adapt to new
data, but it will also tend to quickly forget the old data
if you set a low learning rate, the system will have more inertia-
that is, it will learn more slowly, but it will also be less sensitive to
noise in the new data or to sequences of nonrepresentative data points
(outliers
We need to find correct learning rate if not the model can cause
 Learn new pattern and forgot old patterns
 Or Might be learn patterns slowly
Disadvantages:
 Tricky to use
 Risky
 Online learning is less stable and give less accuracy than batch learning
o Like server hack -> hacker give spam email
 Can handle using monitoring system like anomaly detection or
can me rollback
Instance Based
The system learns the examples by heart-- stance
Then generalizes to new cases by using a similarity measure to compare them to
the learned examples (or a subset of them)
It is called instance-based because it builds the hypotheses from the training
instances
It is also known as memory-based learning or lazy-learning
Ex. K Nearest Neighbor (KNN)

Model Based
 Model-> formula
 Train model from training data to estimate model parameters i.e. discover
patterns
 Store the built model in suitable format
 Generalize the rules of model pickle training set
 Predict the unseen instance (data) using the model
 It requires a known model form
 It takes less memory compared to the instance based learning
 E.g. ▪ Linear Regression
End to End for Model Deployment
Model Evaluation
Mean Absolute Error (MAE):
measures the average magnitude of the errors in a set of forecasts, without
considering their direction
measures accuracy for continuous variables

The MAE is a linear score which means that all the individual differences are
weighted equally in the average

Mean Squared Error (MSE)/ mean squared deviation (MSD):

measures the average of the squares of the error
i.e the average square between actual value and estimated value

Classification
 Prediction of class/label values
 Classification is supervised machine learning method where model try to predict correct
label using give input
 In classification model is fully train on training data then evaluated on test data before
model is going to predict on unseen data

Learners

 There are two types of Learners

o Eagar learner
 Logistic Regression
 SVM
 Naïve Bayes
 Decision Tree
 Artificial Neural Network

o Lazy Learners or Instance Base Learner [KNN or case base reasoning]

 Eager Learner

Model based [Model = Formula]

These are the Machine learning algorithm which first build on training dataset then
before making any future prediction

They spend more time on during the process because the eagerness to have more
generalization and required less time to prediction

Training Time  High

Prediction Time  Low

 Lazy Learner or instance based learner

No model is created
They memorize the training data and then during prediction the find the nearest
neighbor
From training dataset, which make prediction slow

Training Time -> Low

Prediction Time -> High

Types of classification

 There are two types of classification

o Binary Classification
o Multi-class Classification
o Multilabel Classification

Binary Classification

Label = 2
Spam

Email No Spam

Output = 1

The goal is to classify the input into two mutually exclusive category

Th training data in such situation is labeled as True False, 0 1 , Spam not Spam

Multiclass Classification
Label > 2
Upper class

Person Middel
class

Lower class
Output = 1

• Each instance belongs to exactly one class out of multiple possible classes.
• Classes are mutually exclusive (choosing one means excluding the others).

• Example:
- Predicting the type of fruit (Apple, Banana, Mango, Orange) → one fruit at a time.
- Handwritten digit recognition (0–9) → only one digit per image.

Multilabel Classification
• Each instance can belong to multiple classes simultaneously.
• Classes are not mutually exclusive (an instance may have zero, one, or several
labels).
• Example:
- Predicting movie genres → a movie can be Action + Comedy + Drama.
- Detecting objects in an image → one picture can have Dog + Car + Tree.
We can site:
 Multi-label Decision Tree
 Multi-label Gradient Boosting
 Multi-label Random Forest
Logistic Regression
Designed for Binary Classification

1) Definition

 Logistic Regression is a supervised machine learning algorithm used for classification

problems (mostly binary).
 It predicts the probability of an instance belonging to a class.
 Decision is made by applying a threshold (commonly 0.5).

2) How it works

 It takes input features and combines them linearly with weights.

 The result is passed through a sigmoid function, which compresses the value into a range
between 0 and 1.
 The output can be interpreted as a probability.

3) Training / Learning

 Parameters (weights) are learned using Maximum Likelihood Estimation (MLE).

 The objective is to minimize the log loss (cross-entropy loss).
 Optimization is usually done with Gradient Descent or advanced solvers (like Newton’s
method, LBFGS).
 Regularization (L1 or L2) is often applied to avoid overfitting and handle
multicollinearity.

4) Decision Making

 If predicted probability ≥ threshold → Class 1.

 If predicted probability < threshold → Class 0.
 Threshold can be adjusted depending on business needs (e.g., medical tests may need
higher recall).

Example: Cat vs Dog

 A Logistic Regression model is trained with images of cats (label = 0) and dogs (label =
1).
 For a new test image, the model predicts a probability:
 threshold (commonly 0.5).

👉 Suppose the model outputs: P(Dog) = 0.82

 Since probability ≥ 0.5 threshold, the model decides → Dog.

👉 Another image gives: P(Dog) = 0.23

 Since probability < 0.5 threshold, the model decides → Cat.

Decision Rule (in words)

 If probability ≥ 0.5 → predict Dog

 If probability < 0.5 → predict Cat

5) Evaluation Metrics

 Confusion Matrix for understanding TP, TN, FP, FN.

 Accuracy (useful only if classes are balanced).
 Precision, Recall, F1-score (better for imbalanced datasets).
 ROC-AUC / PR-AUC for probability ranking and imbalanced problems.
 Calibration curves when probability outputs are directly used in decision-making.

6) Assumptions

 requires the dependent variable to be binary [Two categories/Classes]

 Only the meaningful variables should b included

 The independent variable should be independent of each other. i.e the model should have
little or no multi-collinearity
 Required quite large sample sizes
7) Extensions

 Multinomial Logistic Regression → for multiclass problems (more than 2 classes).

 One-vs-Rest approach → trains multiple binary logistic models for each class.
 Multilabel classification → sigmoid applied independently to each label.

8) Advantages

 Simple and easy to implement.

 Allow easy regularization of outputs to prevent overfitting yielding probabilities as
prediction result
 Allows easy model updating using stochastic gradient descent
 Does not get effected to predict output probabilities on removal of variables uncorrelated
to output or multi-collinear variables

9) Disadvantages

 fails to solve non-linear problems

 underperforms when there are multiple or non-linear decision boundaries.
 It fails to capture more complex relationships.
 Without proper identification of independent variables Logistic Regression fails to
perform correctly.
 Logistic Regression can only predict a categorical outcome with discrete probability
outcome

10) Key Interview Highlights

 Logistic Regression is for classification, not regression.

 Uses a sigmoid function to output probabilities.
 Parameters are estimated by maximum likelihood.
 Evaluated using precision, recall, F1, ROC-AUC (not just accuracy).
 Assumes linear relationship in log-odds, not raw features.
 Interpretation is often in terms of odds ratios.
Sigmoid Function

 Formula:

 Range:
o Output is always between 0 and 1.
o This makes it suitable for probability representation.
o

 In ML usage:
o In binary classification, sigmoid maps raw model output (logit) into a probability
of belonging to the positive class.
o Example: if sigmoid(z) = 0.82 → interpreted as 82% probability of being class
1.
o In multilabel classification, sigmoid is applied to each label independently (since
each can have its own probability).
o In multiclass classification, we usually use softmax instead of sigmoid, because
probabilities must sum to 1 across classes.

🔑 Key Point:

 Sigmoid ≈ probability for one independent outcome.

 Softmax ≈ probability distribution for mutually exclusive outcomes.
Naïve Bayes

Use in Text Classification

Naïve  assumes that all features are conditionally independent of each other,
given the class label.

Bayes’ Theorem  finds the probability of an event occurring given the probability
of another event that has already occurred

A and B are events

P(A)  Prior Probability

 is happening or event A before seeing any evidence

P(B)  Evidence

 already happen or total probability of observing B under all possible conditions.

P(B|A)  Likelihood

 probability of observing event B given that A is true.

P(A|B)  Posterior Probability

 A happening given that event B has already happened

 Posterior = (Prior × Likelihood) ÷ Evidence

Naïve Bayes is Studies extensively since 1960s

Types of Naïve Bayes:

 Gaussian Naïve Bayes classifier [Binary Classification]

o When Features are numbes [marks, height, Weight]
 Multinomial Naive Bayes [ Multiclass Classification]
o Data has count of things [ Spam Detection, Sentiment Analysis ]
o How many times a word appears
 Bernoulli Naive Bayes [ Binary Classification]
o Only Care whether something is there or not [Yes or No]
o Categorical Naïve bayes
o All Features are binary
o Ex. Email spam check
 Is the word win in email ?? Yes / No
KNN [K-Nearest Neighbors]

 It is supervised machine learning algorithm which is used for both problems classification
as well as regression
 Can use in use for both Regression as well as Classification Problem Statement
 K  No of nearest neighbors
 Instance based algorithm or Lazy learner
 No model is created memorize the training data and find the nearest neighbors of the
input and predict the output
 Required more time for prediction
 How ever it is widely use for classification in industry
 It assume that similar data point are in close proximity

Working:

 Classify by the vote of its neighbors , with the case being assigned to the class most
common amongst its K nearest neighbors measured by a distance function

 Choose the optimal value for K by looking the dataset [ usually for most data set it is
between 3 to 10]
 Cross-validation is another way to retrospectively determine a good K value by using an
independent dataset to validate the K value
 Disadvantage:
o Computationally expensive – it store all the training data in memory
o High Memory requirement
 Applications of KNN
o Recommender system
o Relevant document classification
SVM [Support Vector Machine]

 It is Supervised Machine Learning Algorithm which is use for both

classification as well as regression
 It is mostly used in classification problem
 It separate the classes based on hyperplane
 In two dimensional plane the hyperplane is line dividing a two parts where
each class lay in either side
 In hyperplane there could be multiple way to chosen a plane but svm goal is
to choose the maximum margin plain.
Picture this: you’re on a quest to find the perfect algorithm that can effortlessly
distinguish between apples and oranges, even when they’re mixed together in a
basket. Enter Support Vector Machines, or SVM

Class label are denoted as

-1  for -ve class

+1  for +ve class

The main task of the classification problem is to find the best separating
hyperplane/ Decision boundary.

We can have n-1 hyperplane which can be either linear or nonlinear. Such data
points are called Support vectors.
What exactly are Margins?

 further the data points are from the margins, the more correctly they are
classified.
 Margins represent the width of the corridor that the SVM algorithm aims to
maximize when finding the optimal hyperplane to separate different classes
of data. The larger the margin, the greater the confidence in the
classification made by the SVM model.
Linear and Non-Linear SVM

Linear SVM:

 In linear SVM, it separates data by a straight line or hyperplane in the

input space, rendering it suitable for linearly separable data.
 The key advantage of linear SVM lies in its simplicity and efficiency.

Non-Linear SVM

 Non-linear SVM is employed when the relationship between features and

classes is not linear and cannot be separated by a straight line or
hyperplane in the input space.
 It addresses this by mapping the input data into a higher-dimensional
feature space where it becomes linearly separable.
Optimization Technique used in SVM

 This optimization problem aims to minimize the classification error while maximizing
the margin, which is the distance between the decision boundary and the closest data
points from each class.

 Hard constrain that Support Vector Machine follows:- each data point
must lie on the correct side of the margin and there should be no
misclassification.

Hard and Soft SVM

Hard SVM

 Algorithm aims to find the hyperplane that separates the classes with the
maximum margin while strictly enforcing that all data points are correctly
classified.
 Assuming that the data is linearly separable, it implies the existence of at
least one hyperplane that can perfectly separate the classes without any
misclassifications.
 However, Hard SVM does not tolerate any misclassification errors and
demands the data to be perfectly separable, which can be overly restrictive
and might lead to poor performance on noisy or overlapping datasets.
Soft SVM

 Also Known as C-SVM (C - regularization parameter)

 relaxes the strict requirement of Hard SVM by allowing some
misclassification errors.
 It introduces a regularization parameter (C) that controls the trade-off
between maximizing the margin and minimizing the classification error.
 A smaller value of C allows for a wider margin and more
misclassifications, while a larger value of C penalizes misclassifications
more heavily, leading to a narrower margin.
 Soft SVM is suitable for cases where the data may not be perfectly
separable or contains noise or outliers.
 It provides a more robust and flexible approach to classification, often
yielding better performance in practical scenarios.

Relation between Regularization parameter (C) and SVM

 Low C → Wide margin, allows more misclassifications, better generalization.
 High C → Narrow margin, fewer misclassifications, risk of overfitting

Kernel trick in SVM

Kernel Trick in SVM

RBF SVM
Polynomial Linear SVM
SVM

Library:
#from sklearn.svm import SVC
Decision Tree

Decision Tree
Classifier

ID3 (Iterative Dichotomiser 3): CART(Classification and Regression Tree):

A decision tree algorithm that uses information A decision tree algorithm that uses Gini Index
gain based on entropy to select the best (for classification) or variance reduction (for
attribute for splitting. regression) to split data.
CART always creates a binary tree

a. Entropy and Gini Index  Purity split

b. Information Gain  Feature decision tree split
Example:-
Let go with practical example:
Problem statement: Can Suraj go to play tennis

Test condition:
 Root Node
 Parent Node
 Child Node
 Leaf Node

Hyper Parameter
 Max_depth = #level / Height
Entropy:-
 Entropy H(S) = -p log2 p - q log2q
 Range between 0 to 1
 If entropy is 1  Not pure split
 If entropy is 0  Pure entropy

Gini Index / Gini Impurity:

 Range between 0 to 1
 If Gini impurity is 0 pure split
 If Gini impurity is 0.5 not pure split

H(s)
1- Entropy

Gini
0.5 -
Impurity

P+ ,P-
0 0.5 1

About Decision tree:

 It is a supervised machine learning algorithm that predict the output by splitting the data
into branches based decision rule
 It contains only conditional statement in each node except leaf node
 Work for both classification as well as regression problem
 Splitting is done using information gain and gini index or entropy [depends on dataset
size]
 Unlike linear model it build the relation with non-linear quite well
Application:
 Most popular algorithm use in Data Mining
Assumptions:
 At the beginning, the whole data set is consider as a root node
 Featured value is preferred to be categorical
 If the value are continuous then they are discretized prior to building the model.
 Records are distributed recursively on the basis of values
Advantages
 Easy to interpret and understand, even for non-technical users.
 No statistical knowledge required to interpret results.
 Graphical representation is intuitive and hypothesis-friendly.
 Helps in quick data exploration and variable importance detection.
 Robust to missing values and outliers to some extent.

Disadvantages

 Prone to overfitting if not pruned or constrained.

 Not suitable for continuous variables (information loss on binning).
 Can be unstable (small data changes → very different tree).
 Biased results if data is imbalanced.

Pre-pruning (Early Stopping)

 Stops the tree from growing once it reaches a certain condition.

 Criteria: maximum depth, minimum samples per split, minimum information gain, etc.
 Prevents overfitting during training itself.
 Faster, less complex, but may cause underfitting if stopped too early.

Post-pruning (Pruning after Full Growth)

 Allows the tree to grow fully, then removes (cuts back) the branches that add little
predictive power.
 Based on validation error, cost complexity pruning, or statistical tests.
 Reduces overfitting while keeping useful splits.
 More accurate but computationally expensive

 If dataset is big then Pre-pruning is use if not then Post pruning is use
 Most efficient in both of them is Pre-Pruning because it prune while creating the tree
 Pre-pruning = Stop early to avoid overfitting.
 Post-pruning = Grow fully, then trim to simplify.
Ensemble Learning
Ensemble is art of combining the diverse set of learners together [Individual Model] to improve
the stability and predictive power of model.

Primarily use improve the model

 Classification
 Prediction
 Function approximation
 Performance
Or
 Reduce the unfortunate selection of poor one
Goals:
 Assigning Confidence  Bagging
 Improving Accuracy  Boosting
 Give one model O/P to another as I/P  Stacking

Bagging
 Stands for bootstrap aggregation


Unit 1
No ratings yet
Unit 1
66 pages
The Machine Learning Landscape
No ratings yet
The Machine Learning Landscape
30 pages
ML Week 2 Part 2
No ratings yet
ML Week 2 Part 2
6 pages
Module 1
No ratings yet
Module 1
50 pages
ML&DL PDF
No ratings yet
ML&DL PDF
126 pages
Unit 6 Learning and Knowledge Acquisition
No ratings yet
Unit 6 Learning and Knowledge Acquisition
9 pages
ML Study
No ratings yet
ML Study
9 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
21 pages
THEORY FILE - Machine Learning (6th Sem) !!
No ratings yet
THEORY FILE - Machine Learning (6th Sem) !!
26 pages
PR & ML: CS5691: Machine Learning
No ratings yet
PR & ML: CS5691: Machine Learning
42 pages
My Hands-On ML Notebook
No ratings yet
My Hands-On ML Notebook
5 pages
ML Unit 1
No ratings yet
ML Unit 1
9 pages
ML Unit1
No ratings yet
ML Unit1
25 pages
Aiya Session 4
No ratings yet
Aiya Session 4
42 pages
Machine Learning Concise Notes
No ratings yet
Machine Learning Concise Notes
7 pages
Classification vs Regression in ML
No ratings yet
Classification vs Regression in ML
15 pages
Data Science Unit-4 B.sc. III Sem. MDC
No ratings yet
Data Science Unit-4 B.sc. III Sem. MDC
6 pages
Machine Learning for Beginners
No ratings yet
Machine Learning for Beginners
27 pages
Summary - Data Analytics& Machine Learning
No ratings yet
Summary - Data Analytics& Machine Learning
18 pages
Unit 1 ML
No ratings yet
Unit 1 ML
49 pages
Presentation On ML
No ratings yet
Presentation On ML
469 pages
AI Chapter 5
No ratings yet
AI Chapter 5
31 pages
Machine Learning
No ratings yet
Machine Learning
32 pages
Ai Notes
No ratings yet
Ai Notes
8 pages
ML 1 2 3
No ratings yet
ML 1 2 3
54 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
20 pages
There Are Key Areas in The Process of Machine Learning, Like
No ratings yet
There Are Key Areas in The Process of Machine Learning, Like
45 pages
What Is Machine Learning?
No ratings yet
What Is Machine Learning?
6 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
15 pages
Sec 1630
No ratings yet
Sec 1630
145 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
14 pages
Intro To ML
No ratings yet
Intro To ML
4 pages
Machine Learning
No ratings yet
Machine Learning
51 pages
Regression Training Overview
No ratings yet
Regression Training Overview
52 pages
Module2 ch2
No ratings yet
Module2 ch2
36 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
14 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
10 pages
Chapter-1 ML Intro
No ratings yet
Chapter-1 ML Intro
36 pages
Unit 5
No ratings yet
Unit 5
77 pages
Machine Learning
No ratings yet
Machine Learning
42 pages
Lec 2
No ratings yet
Lec 2
23 pages
Unit 1
No ratings yet
Unit 1
24 pages
What Is Machine Learning
No ratings yet
What Is Machine Learning
4 pages
Lec2 Intro To ML
No ratings yet
Lec2 Intro To ML
35 pages
Null 5
No ratings yet
Null 5
16 pages
Chapter 01 Introduction To ML
No ratings yet
Chapter 01 Introduction To ML
178 pages
Big Data Analytics Unit 4
No ratings yet
Big Data Analytics Unit 4
17 pages
Unit 1
No ratings yet
Unit 1
62 pages
Machine Learning and Deep Learning Basics
No ratings yet
Machine Learning and Deep Learning Basics
36 pages
Unit V
No ratings yet
Unit V
67 pages
Machine Learning Overview
No ratings yet
Machine Learning Overview
7 pages
Big-Data Unit-3
100% (1)
Big-Data Unit-3
54 pages
Notes Unit 1
No ratings yet
Notes Unit 1
13 pages
Machine Learning BE Merged Modules
No ratings yet
Machine Learning BE Merged Modules
561 pages
MCA in Machine Learning Overview
No ratings yet
MCA in Machine Learning Overview
5 pages
AI Module 1 Simple Notes
No ratings yet
AI Module 1 Simple Notes
14 pages
AI Lab6
No ratings yet
AI Lab6
7 pages
Machine Learning Is A Branch of Artificial Intelligence (AI)
No ratings yet
Machine Learning Is A Branch of Artificial Intelligence (AI)
80 pages
Machine Learning Notes
83% (12)
Machine Learning Notes
19 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
18 pages
Unit - 3 Q1.State Different Types of Report With Application 1.crosstab Report
No ratings yet
Unit - 3 Q1.State Different Types of Report With Application 1.crosstab Report
26 pages
1 Lect - 1.2 - 12 - August 2022 PDF
No ratings yet
1 Lect - 1.2 - 12 - August 2022 PDF
59 pages
An Analysis of Data Mining Applications in Crime Domain: P. Thongtae and S. Srisuk
No ratings yet
An Analysis of Data Mining Applications in Crime Domain: P. Thongtae and S. Srisuk
5 pages
Data Warehousing and Data Mining Dr.P.rizwan Ahmed
0% (1)
Data Warehousing and Data Mining Dr.P.rizwan Ahmed
20 pages
Chapter - 3 Data Pre - Processing
No ratings yet
Chapter - 3 Data Pre - Processing
54 pages
Btech Sem6 Cs1141 Data Mining
No ratings yet
Btech Sem6 Cs1141 Data Mining
5 pages
DT Assignment 2 Presentation
No ratings yet
DT Assignment 2 Presentation
14 pages
Data Mining Notes Jntuh Compress
No ratings yet
Data Mining Notes Jntuh Compress
62 pages
Data Warehousing and Data Mining Lab
No ratings yet
Data Warehousing and Data Mining Lab
53 pages
Association Rule Mining Guide
No ratings yet
Association Rule Mining Guide
88 pages
Data Mining Primitives and Architecture
No ratings yet
Data Mining Primitives and Architecture
39 pages
Business Analytics Midterm Exam Overview
No ratings yet
Business Analytics Midterm Exam Overview
7 pages
Advanced Association Rule Techniques
No ratings yet
Advanced Association Rule Techniques
18 pages
Data Mining and Warehouse MCQS With Answer Good
74% (72)
Data Mining and Warehouse MCQS With Answer Good
30 pages
Machine Learning in Python JIMS Rohini Sector 5 PDF
No ratings yet
Machine Learning in Python JIMS Rohini Sector 5 PDF
110 pages
Mining Frequent Patterns, Associations, and Correlations
No ratings yet
Mining Frequent Patterns, Associations, and Correlations
12 pages
Computer Literature Review
100% (1)
Computer Literature Review
6 pages
Presentation1 Revised (Autosaved)
No ratings yet
Presentation1 Revised (Autosaved)
83 pages
Handbook of Research On Disease Prediction Through Data Analytics and Machine Learning Latest Edition Download
No ratings yet
Handbook of Research On Disease Prediction Through Data Analytics and Machine Learning Latest Edition Download
15 pages
DM QB
No ratings yet
DM QB
7 pages
Configuration-Based E-Commerce Recommender
No ratings yet
Configuration-Based E-Commerce Recommender
9 pages
Santander Digital Marketing Report
No ratings yet
Santander Digital Marketing Report
35 pages
Apriori Association Rule Mining Guide
No ratings yet
Apriori Association Rule Mining Guide
4 pages
Data Analytics All Practical
No ratings yet
Data Analytics All Practical
31 pages
Book Recs for Tech Students
No ratings yet
Book Recs for Tech Students
7 pages
DWDM Question Bank II-I R23
No ratings yet
DWDM Question Bank II-I R23
8 pages
(Fall 2011) CS-402 Data Mining - Final Exam-SUB - v03
No ratings yet
(Fall 2011) CS-402 Data Mining - Final Exam-SUB - v03
6 pages
Lecturenotes Cse176
No ratings yet
Lecturenotes Cse176
80 pages
Big Data Analytics Notes
No ratings yet
Big Data Analytics Notes
15 pages

Machine Learning

Uploaded by

Machine Learning

Uploaded by

What is machine learning?

 Customer segmentation (grouping customers based on buying habits)

▪ Market basket analysis

▪ FP-Growth (used in market basket analysis)

 Production- is a server on which code is going to run

Once learn no change in knowledgs

It give the same recommendation that based on previous 1 year dataset

We need to continuously updating of model every month or deploy

Data is large or big data

There is problem -> Hardware issues and availability

Batch Learning advantage

Online Learning/ Incremental Learning

Mean Squared Error (MSE)/ mean squared deviation (MSD):

 There are two types of Learners

o Lazy Learners or Instance Base Learner [KNN or case base reasoning]

Model based [Model = Formula]

Training Time  High

 Lazy Learner or instance based learner

Training Time -> Low

 There are two types of classification

 Logistic Regression is a supervised machine learning algorithm used for classification

 It takes input features and combines them linearly with weights.

 Parameters (weights) are learned using Maximum Likelihood Estimation (MLE).

 If predicted probability ≥ threshold → Class 1.

Example: Cat vs Dog

👉 Suppose the model outputs: P(Dog) = 0.82

 Since probability ≥ 0.5 threshold, the model decides → Dog.

👉 Another image gives: P(Dog) = 0.23

 Since probability < 0.5 threshold, the model decides → Cat.

Decision Rule (in words)

 If probability ≥ 0.5 → predict Dog

 Confusion Matrix for understanding TP, TN, FP, FN.

 requires the dependent variable to be binary [Two categories/Classes]

 Multinomial Logistic Regression → for multiclass problems (more than 2 classes).

 Simple and easy to implement.

 fails to solve non-linear problems

10) Key Interview Highlights

 Logistic Regression is for classification, not regression.

 Sigmoid ≈ probability for one independent outcome.

Use in Text Classification

A and B are events

P(A)  Prior Probability

 is happening or event A before seeing any evidence

 already happen or total probability of observing B under all possible conditions.

 probability of observing event B given that A is true.

P(A|B)  Posterior Probability

 A happening given that event B has already happened

Naïve Bayes is Studies extensively since 1960s

 Gaussian Naïve Bayes classifier [Binary Classification]

 It is Supervised Machine Learning Algorithm which is use for both

Class label are denoted as

-1  for -ve class

+1  for +ve class

 In linear SVM, it separates data by a straight line or hyperplane in the

 Non-linear SVM is employed when the relationship between features and

Hard and Soft SVM

 Also Known as C-SVM (C - regularization parameter)

Relation between Regularization parameter (C) and SVM

Kernel trick in SVM

Kernel Trick in SVM

ID3 (Iterative Dichotomiser 3): CART(Classification and Regression Tree):

a. Entropy and Gini Index  Purity split

Gini Index / Gini Impurity:

About Decision tree:

 Prone to overfitting if not pruned or constrained.

Pre-pruning (Early Stopping)

 Stops the tree from growing once it reaches a certain condition.

Post-pruning (Pruning after Full Growth)

Primarily use improve the model

You might also like