0% found this document useful (0 votes)

14 views14 pages

DA Notes - Module 3

Uploaded by

harshit rathore

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views14 pages

DA Notes - Module 3

Uploaded by

harshit rathore

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

DATA ANALYTICS

(R1UC402T)
UNIT-II
Linear Correlation- Regression Modelling- Multivariate Analysis-
Bayesian Modelling- Inference and Bayesian Networks- Support
vector and Kernel Methods- Analysis of time series- Linear System
Analysis- Non Linear Dynamics- Rule Induction- Basic Fuzzy and
Neural Networks

Linear correlation refers to straight-line relationships between two

variables. A correlation can range between -1 (perfect
negative relationship) and +1 (perfect positive relationship), with 0
indicating no straight-line relationship.Linear correlation is a measure of
dependence between two random variables.

Definition
Let X and Y be two random variables. The linear correlation
coefficient (or Pearson's correlation coefficient) between X and , Y
denoted by Corr[X,Y] is defined as follows:
where is the covariance between and and and are the standard
deviations Corr[X,Y]=Cov[X,Y]/σ[X]σ[Y] where Cov[X,Y] is the
Covariance[X,Y] .
Note that, in principle, the ratio is well-defined only if σ[X]and σ[Y] and are
strictly greater than zero. However, it is often assumed that Corr[X,Y]=0
when one of the two standard deviations is zero. This is equivalent to
assuming that0/0=0 because Cov[X,Y]=0 when one of the two standard
deviations is zero.

Interpretation
The interpretation is similar to the interpretation of covariance: the
correlation between X and Y provides a measure of how similar their
deviations from the respective means are
Linear correlation has the property of being bounded between -1 and 1
-1 ≤ Corr[X,Y] ≤ 1
Thanks to this property, correlation allows to easily understand the intensity
of the linear dependence between two random variables: the closer
correlation is to 1, the stronger the positive linear dependence between
X and Y is (and the closer it is to -1, the stronger the negative linear
dependence between X and Y is).

Terminology
The following terminology is often used:
1. If Corr[X,Y]>0 then X and Y are said to be positively linearly
correlated (or simply positively correlated).
2. If Corr[X,Y]<0 then X and Y are said to be negatively linearly
correlated (or simply negatively correlated).
3. If Corr[X,Y]≠0 then X and Y are said to be linearly correlated (or
simply correlated).

4. If Corr[X,Y]=0 then X and Y are said to be uncorrelated.

Correlation of a random variable with itself

Let X be a random variable, thencorr[X,X]=1

Symmetry
The linear correlation coefficient is symmetric:
Corr[X,Y]=Corr[Y,X]
Regression Modelling:
It includes many techniques for modeling and analyzing several
variables, when the focus is on the relationship between a dependent
variable and one or more independent variables (or 'predictors'). ... In all
cases, a function of the independent variables called
the regression function is to be estimated.

Correlation and linear regression are not the

same. Correlation quantifies the degree to which two variables are
related. Correlation does not fit a line through the data points. You simply
are computing a correlation coefficient (r) that tells you how much one
variable tends to change when the other one does.

REGRESSION:
Data can be smoothed by fitting data to a function such as with
regression.

Linear regression involves finding the best line to fit two variables or
attributes so that one attribute can be used to predict the other.

Multiple linear regression:More than two attributes are involved and the
data are fit to a multidimensional surface.

Linear Regression:Straight line regression analysis involves a response

variable Y and a single predictor variable X.It is the simplest form of
regression and models Yas a linear function of X i.e.
Y=b+wx
Where the variance of Y is assumed to be constants and b,w are
regression coefficients specifying the Y-intercept and slope of the line.
Regression coefficient w&b can also be thought of as weight, so that can
equivalently write
Y=W 0 +W1X
These coefficients can be solved by method of least squares,which
estimates the best fitting straight line as the one that minimize the error
between the actual data and the estimate of the line.
Regression coefficient can be estimated using

w1=|D|i=1∑(xi- x)(yi- y)
___________________
|D| 2
i=1∑ (xi - x)

Example- Straight line regression using method of least squares.

X years experience Ysalary(in$1000s)

3 30
8 57
9 64
13 72
3 36
6 43
11 59
21 90
1 20
16 83

Distance between two binary variables based on the notion of similarity.

For example, the asymmetric binary similarity between the objects ‘i’ and ‘j’,
or sum(i,j) can be computed as
sum(i,j)=q/q+r+s=1-d(i,j)
The coefficient sum(i,j)is called Jaccard coefficient.
MULTIPLE LINEAR REGRESSION

Multiple linear regression model based on 2 predictor attributes or

variable A₁ and A₂i.e

X₁and X₂ values of attributes A₁ and A₂ in x.

Multiple regression problems are solved with software packages such

as SAS , Spss and S-Plus.

CO-RELATION CO-EFFICIENT

DEFINITION
Let X and Y be two random variables. The linear correlation co-efficient or
Pearson’s Correlation co-efficient between X and Y denoted by

INTERPOLATION : It is similar to the interpretation of covariance. The

correlation between X and Y provides a measure of how similar their
deviation from the respective means are

TERMINOLOGY

 If Corr[X,Y] >0 ,then X and Y are said to be positively linearly

correlated.
 If Corr[X,Y]<0 ,then it is said to be negatively linearly correlated.
 If Corr[X,Y]≠0 ,then X and Y are said to be linearly correlated.
 If Corr[X,Y]=0 ,then X and Y are said to be uncorrelated.
BAYESIAN CLASSIFICATION

Bayesian classifications are statistical classifiers. They can predict

class membership probabilities ,such as the probability that a given
tuple belongs to a particular class. Bayesian Classification have
exhibited high accuracy and speed when applied to long database.
Naive Bayesian classification assume that the effect of an attribute
value on given class is independent of value of other attributes. This
assumption is called class conditional independence.
It is made to simplify the computations involved ->it is called as Naïve.

BAYES THEOREM

‘ X ’ is considered as evidence. It is hypothesis,such as that the data

tuple ’ X ‘ belongs to a specified class ‘ c ‘.
P(H/X)  represents, looking for the probability that tuple ‘ X ‘
belongs to class ‘ c ’ , given that we know the attribute description of
‘ X ’.

P(H/X) is the posterior probability of H conditional in ‘ X ‘.

FOR EXAMPLE,
A Customer is described by the attribute age and income respectively, and
that ‘X’ is a 35 year old customer with an income of $40,000.
Suppose that ‘H’ is the hypothesis that our customer will buy a computer
given that we know the customer’s age and income.
P(H)  Prior-probability, for our example, this is the probability that any
given customer will buy a computer regardless of age, income or any other
information.
Similarly, P(X/H) is the posterior probability.
P(X) prior probability of ‘ X ‘
Above probabilities are estimated using Bayes Theorem.

BAYES THEOREM,

How Bayes theorem is used in Naive Bayesian classifier

The Naive Bayesian Classifier or simple Bayesian Classifier work as
follows
1. Let ‘ D ‘ be a training set of tuples and their associated class
labels. Each tuple is represented by an n-dimensional attribute
vector, X=(x1,x2,…..) depicting ‘ n ’ measurements made on the
tuple from ‘ n ’ attributes A1,A2…..

2. Suppose there are ‘ m ‘ classes C1,C2,…

Given a tuple‘ X’, the classifier predict that ‘ X’ belongs to the class
having the highest posterior probability , conditioned on ‘ X’ i.e ,
the Naïve Bayesian Classifier predicts the tuple ‘ X ‘ belongs to the
class if and only if

Thus, we maximize P(/X). The class for which P(/X) is maximized

is called maximum posterior hypothesis.

By Bayes Theorem,

3. As P(X) is constant for all classes, only P(X/)P() need to be

maximized. If the class prior probabilities are not known, then it is
assumed as

4. If the dataset with many attributes computation is extremely

expensive to compute P(X/) to reduce the computation.

Use,

For each attribute, we look at whether the attribute is categorical or

continuous valued. To compute P(X/) consider the following
a) If is categorized, then P(/) is the number of tuples
of class in ‘D’ having the value for divided by / ,
D/ , the number of tuples of class in ‘ D ‘ .

b) If is continuous assumed to have a Gaussian

distribution with a mean μ and S.D σ defined by

g(x,μ,σ)=(1/√2ᴨσ)eᶺ-((x-μ)ᶺ2)/2σᶺ2

P(/)=g(,μ,,σ)

5. In order to predict the class label X , P(X/)P() is evaluated for each

class .
The classifier predicts the class label of tuple X is the class if and
only if

P(X/)P() > P(X/)P() , for i ≤ j ≤ m , j≠i

In other words, the predicted class label is the class for which
P(X/)P() is the maximum.

Bayesian Belief Networks Or Belief Networks Or Bayesian Networks

Or Probabilistic Networks

Bayesian belief network specify joint conditional probability

distributions. They allow class conditional independencies to be defined
between subset of variables. Trained Bayesian belief networks can be used
for classification.

Belief networks is defined by two components – a directed acyclic graph

and a set of probability tables.
A Simple Bayesian Belief Network

If an arc is drawn from a node Y to node Z, then Y is the parent or

immediate predecessor of Z, and Z is the descendant of Y. each variable is
conditionally independent of its non-descendants in the graph, given its
parents.
The arcs in the figure allow a representation of casual knowledge. For
example, having lung cancer is influenced by a person’s family history of
lung cancer, as well as whether or not the person is a smoker.
 Note that the variable positive X-Ray is independent of whether the
patient has a family history of lung cancer or is a smoker, given that
we know that the patient has lung cancer.
 In other words, once we know the outcome of the variable lung
cancer, then the variables family history and smoker do not provide
any additional information regarding positive X-Ray.
The arcs also show the variable lung cancer is conditionally independent
of Emphysema, given its parents, family history and smoker.
A belief network has one conditional probability table (CPT) for each
variable. The CPT for a variable Y specifies the conditional distribution
P(Y/parents(Y)), where parents(Y) are the parents of Y.

FH, S FH, ~S ~FH, S ~FH, ~S

LC 0.8 0.5 0.7 0.1
-LC 0.2 0.5 0.3 0.9

This shows a CPT for the variable lung cancer. The conditional
probability for each known value of lung cancer is given for each
possible combination of values of its parents. For instance from the
table, the upper leftmost and bottom right most entries we see that

P (Lung Cancer=yes /Family History =yes, Smoker =yes) =0.8

P (Lung Cancer=no /Family History =no, Smoker =no) =0.9

Let X= (x1, x2 … xn) be a data tuple described by the variables or

attributes Y1, Y2 …. Ynrespectively. Recall that each variable is
conditionally independent of its non-descendants in the network graph,
given its parents. This allows the network to provide a complete
representation of the existing joint probability distribution with the
following equation:

P (x1, x2 …. xn) = ni=1 P (xi/parents(yi))

Where P (x1, x2 …. xn) is the probability of a particular combination of

values of X, and the values for P (x i/parents(yi)) corresponds to the
entries in the CPT for Yi.
A node within the network can be selected as an “output” node
representing a class label attribute. There may be more than one output
node. Rather than returning a single class label, the classification
process can return a probability distribution that gives the probability of
each class.

Multi- variate analysis

It is a set of techniques used for analysis of data sets that contain
more than one variable, and the techniques are especially valuable
when working with correlated variables.
 Here instead of looking at several variables separately, in
multivariate analysis we will be looking at them simultaneously and
hence we will be able to study the interrelationships between the
variables.
Application areas:
 Social science (gender, age, nationality of an individual).

 Climatology (min temp, max temp, rainfall, humidity) on a day.

 Econometrics (input costs, production, profit) of a firm.
 Medical (BP, pulse rate) of persons.
 Administrative (admissions, operations, discharges, deaths) per
day in hospital.
Multivariate analysis is classified as
 Classification of individuals.

 Dimension reduction.
 Cause -effect relationship

Cluster analysis:

Clusters are homogenous with itself but different from another cluster. It
tells us how the individuals are similar and dissimilar among themselves.

 How the clusters are different is answered by discriminant analysis.

Discriminant analysis: studies the properties of a given cluster and
thereby it identifies the difference between the different clusters.

 Can a newly arrived individual be assigned to one of the cluster?

Classification comes into part. This is the problem of assigning
new individual to the cluster and is referred to as a classification
problem.

Discriminant analysis and classification graph

If a new data arrives we should plot the new X and Y data and check in
which cluster it lies.

Dimensionality reduction
 PCA (K-L method)

 Factor analysis
PCA searches for K ‘n’ dimensional orthogonal vectors that can best be
used to represent the data where k<=n. It combines the essence of
attributes by creating an alternating smaller set of variables. The entire data
can then be projected into the smaller set. PCA often reveals the
relationship that were not previously supported.
The basic procedure is as follows:
 The input data are normalized, so that each attribute falls within the
same range. This step helps us to ensure that the entire attribute with
large domain will not dominate attributes with smaller domain.
 PCA computes k orthogonal vector that provide a basis for
normalized input data. These are unit vectors that each point in the
direction perpendicular to the others. These vectors are referred to as
the principal components. The input data are linear combination of
the principal components.
 The principal components are sorted in the order of decreasing
“significance” or strength. The principal components essentially serve
as the new set of axes for the data, providing important information
about variance.
Figure shows the first two principal components Y1,Y2 for the given set of
data originally mapped to the axes X1 and X2. This information helps
identify groups or patterns within the data.

 Because the components are sorted according to the decreasing

order of significance, the size of the data can be reduced by
eliminating the weaker components (i.e.,) those with low variance.
Using the strongest principal components, it should be possible to
reconstruct a good approximation of the original data.
 It can be applied to ordered and unordered attributes, and can handle
sparse and skewed data.

Rule Induction

A decision tree is a structure that includes a root node, branches, and leaf
nodes. Each internal node denotes a test on an attribute, each branch
denotes the outcome of a test, and each leaf node holds a class label. The
topmost node in the tree is the root node.

The following decision tree is for the concept buys_computer that indicates
whether a customer at a company is likely to buy a computer or not. Each
internal node represents a test on an attribute. Each leaf node represents a
class.

The benefits of having a decision tree are as follows:

 It does not require any domain knowledge.

 It is easy to comprehend.
 The learning and classification steps of a decision tree are
simple and fast.

Then, for each attribute A,

where Dj / D is the weight of the jth partition.

Info A (D) is the expected information required to classify a tuple from D

based on the partitioning by A. The smaller the expected information, the
greater the purity of the partitions.

Random Variables: Corr (X, Y) Cov (X, Y) / Cov (X, Y) Is The Covariance (X, Y)
No ratings yet
Random Variables: Corr (X, Y) Cov (X, Y) / Cov (X, Y) Is The Covariance (X, Y)
15 pages
Machine Leraning Unit 2
No ratings yet
Machine Leraning Unit 2
62 pages
W8-Supervised Learning Methods
No ratings yet
W8-Supervised Learning Methods
30 pages
Unit 2linear Regression Bayesian Learning
No ratings yet
Unit 2linear Regression Bayesian Learning
49 pages
Unit 5
No ratings yet
Unit 5
21 pages
Unit 2 Notes
No ratings yet
Unit 2 Notes
7 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
12 pages
Machine Learning Techniques Overview
No ratings yet
Machine Learning Techniques Overview
73 pages
Module 3
No ratings yet
Module 3
20 pages
ICT202B AI ML and Emerging Technologies UNIT 3 (Classification and Regression) 2
No ratings yet
ICT202B AI ML and Emerging Technologies UNIT 3 (Classification and Regression) 2
23 pages
Dataanalyticsunit 2
No ratings yet
Dataanalyticsunit 2
24 pages
Bayesian Classification Insights
No ratings yet
Bayesian Classification Insights
7 pages
K - Nearest Neighbours Classifier / Regressor
No ratings yet
K - Nearest Neighbours Classifier / Regressor
35 pages
Chapter 11 KNN Naive Bayes and LDA
No ratings yet
Chapter 11 KNN Naive Bayes and LDA
15 pages
2021 Stat Notes
No ratings yet
2021 Stat Notes
162 pages
Predictive Analytics-Mid Sem Exam Question Bank
No ratings yet
Predictive Analytics-Mid Sem Exam Question Bank
28 pages
Unit 2&3 - 250421 - 215911
No ratings yet
Unit 2&3 - 250421 - 215911
60 pages
Correlation and Regression 2
No ratings yet
Correlation and Regression 2
24 pages
Data Mining - Bayesian Classification
No ratings yet
Data Mining - Bayesian Classification
6 pages
Chapter 14 Simple Linear Regression .
No ratings yet
Chapter 14 Simple Linear Regression .
39 pages
Machine Learning: Regression & Bayes
No ratings yet
Machine Learning: Regression & Bayes
48 pages
2 - Classification Models
No ratings yet
2 - Classification Models
52 pages
Statistics for Data Science Overview
No ratings yet
Statistics for Data Science Overview
24 pages
Regression
No ratings yet
Regression
69 pages
CH 6
No ratings yet
CH 6
42 pages
Machine Learning Unit 5 Notes
No ratings yet
Machine Learning Unit 5 Notes
45 pages
WINSEM2023-24 MCSE602L TH VL2023240501960 2024-03-13 Reference-Material-I
No ratings yet
WINSEM2023-24 MCSE602L TH VL2023240501960 2024-03-13 Reference-Material-I
132 pages
Co 3&4
No ratings yet
Co 3&4
22 pages
Applied Statistics II Chapter 7 The Relationship Between Two Variables
No ratings yet
Applied Statistics II Chapter 7 The Relationship Between Two Variables
73 pages
22cse61 Module 4
No ratings yet
22cse61 Module 4
110 pages
23-Content Beyond Syllabus
No ratings yet
23-Content Beyond Syllabus
4 pages
CH 6
No ratings yet
CH 6
43 pages
4gaussian Discriminant
No ratings yet
4gaussian Discriminant
50 pages
Assignment Part A
No ratings yet
Assignment Part A
7 pages
Simple LR Lecture
No ratings yet
Simple LR Lecture
60 pages
Unit 2 ML - Ver 2
No ratings yet
Unit 2 ML - Ver 2
129 pages
Econometrics
No ratings yet
Econometrics
147 pages
KNN and Baysian Method
No ratings yet
KNN and Baysian Method
43 pages
Statistical Learning
No ratings yet
Statistical Learning
31 pages
ML Module 3
No ratings yet
ML Module 3
34 pages
Assignment 6 - STAT
No ratings yet
Assignment 6 - STAT
12 pages
DA Unit 2
No ratings yet
DA Unit 2
124 pages
Regression
No ratings yet
Regression
19 pages
Machine Learning Overview Guide
No ratings yet
Machine Learning Overview Guide
68 pages
Correlation, Regression & Curve Fitting
No ratings yet
Correlation, Regression & Curve Fitting
6 pages
Week 5 Notes
No ratings yet
Week 5 Notes
175 pages
Naïve Bayes Classification
No ratings yet
Naïve Bayes Classification
21 pages
ML - Unit 2
No ratings yet
ML - Unit 2
155 pages
Unit III BayesianClassifier
No ratings yet
Unit III BayesianClassifier
53 pages
DWDM Unit 3 Part 2
No ratings yet
DWDM Unit 3 Part 2
8 pages
CS-DM Module-4
No ratings yet
CS-DM Module-4
22 pages
Statistics Quiz
No ratings yet
Statistics Quiz
20 pages
Bayesian Classification Explained
No ratings yet
Bayesian Classification Explained
7 pages
Sem 6 Ques Data Science
No ratings yet
Sem 6 Ques Data Science
23 pages
Predictive Analytics & Hypothesis Testing
No ratings yet
Predictive Analytics & Hypothesis Testing
27 pages
Final Exam Guidelines
No ratings yet
Final Exam Guidelines
7 pages
ML - Module 3 Chapter 5
No ratings yet
ML - Module 3 Chapter 5
10 pages
Module 2
No ratings yet
Module 2
139 pages
Machine Learning Lecture Notes Undergrad
No ratings yet
Machine Learning Lecture Notes Undergrad
19 pages
Marketing Mix Impact on Egyptian Auto Purchases
No ratings yet
Marketing Mix Impact on Egyptian Auto Purchases
33 pages
B.Sc. Mathematics
No ratings yet
B.Sc. Mathematics
33 pages
Ge4 - Mathematics-In-The-Modern-World
No ratings yet
Ge4 - Mathematics-In-The-Modern-World
14 pages
Weather Impact on Okra Seed Yield
No ratings yet
Weather Impact on Okra Seed Yield
3 pages
Management Accounting
No ratings yet
Management Accounting
8 pages
Instrumento Autoeficacia PSEQ 10
No ratings yet
Instrumento Autoeficacia PSEQ 10
7 pages
PRELIMINARIES Final
No ratings yet
PRELIMINARIES Final
13 pages
Time Series Analysis: MA Models
No ratings yet
Time Series Analysis: MA Models
17 pages
Investor Attention & Stock Volatility
No ratings yet
Investor Attention & Stock Volatility
28 pages
Validity Assessment in Psychometrics
No ratings yet
Validity Assessment in Psychometrics
8 pages
Accounting Standard Insights
No ratings yet
Accounting Standard Insights
81 pages
Business, Management and Economics Engineering: ISSN: 2669-2481 / eISSN: 2669-249X 2024 Volume 22 Issue 01
No ratings yet
Business, Management and Economics Engineering: ISSN: 2669-2481 / eISSN: 2669-249X 2024 Volume 22 Issue 01
10 pages
NPGC BBA DB Syllabus
No ratings yet
NPGC BBA DB Syllabus
51 pages
Regression Analysis Overview for PGP-DSBA
No ratings yet
Regression Analysis Overview for PGP-DSBA
38 pages
Marketing Strategy Development Based On Consumer Behavior, and Marketing Analytics
No ratings yet
Marketing Strategy Development Based On Consumer Behavior, and Marketing Analytics
11 pages
Q2 - The Impact of Online Learning Service Quality On Student Satisfaction
No ratings yet
Q2 - The Impact of Online Learning Service Quality On Student Satisfaction
14 pages
Module I of Session I of Paper I
No ratings yet
Module I of Session I of Paper I
10 pages
Significance Testing of Pearson's Correlation
No ratings yet
Significance Testing of Pearson's Correlation
13 pages
Arda - Journal - 15909 - Al (6677-6687)
No ratings yet
Arda - Journal - 15909 - Al (6677-6687)
11 pages
A Level Paper 2017 H2 Math Paper 2 With Solutions (9758)
No ratings yet
A Level Paper 2017 H2 Math Paper 2 With Solutions (9758)
15 pages
The Impact of R&D Innovation Success On The Relationship Between R&D Investment and Financial Leverage
No ratings yet
The Impact of R&D Innovation Success On The Relationship Between R&D Investment and Financial Leverage
15 pages
Midterm Risk Management Study Guide
No ratings yet
Midterm Risk Management Study Guide
7 pages
843 Ai Xi
No ratings yet
843 Ai Xi
12 pages
A Study On Investment Analysis: Jerry Davis and Magdalene Peter
No ratings yet
A Study On Investment Analysis: Jerry Davis and Magdalene Peter
9 pages
Newbold Sbe8 Ch02
No ratings yet
Newbold Sbe8 Ch02
59 pages
Windle 2008
No ratings yet
Windle 2008
9 pages
Reviews of Geophysics - 2010 - Hansen - GLOBAL SURFACE TEMPERATURE CHANGE-1
No ratings yet
Reviews of Geophysics - 2010 - Hansen - GLOBAL SURFACE TEMPERATURE CHANGE-1
29 pages
Dependent Sample t-Test Guide
No ratings yet
Dependent Sample t-Test Guide
8 pages
1 In-Person Lecture
No ratings yet
1 In-Person Lecture
22 pages
Analisis Shift Sharee
No ratings yet
Analisis Shift Sharee
11 pages

DA Notes - Module 3

Uploaded by

DA Notes - Module 3

Uploaded by

DATA ANALYTICS

Linear correlation refers to straight-line relationships between two

4. If Corr[X,Y]=0 then X and Y are said to be uncorrelated.

Correlation of a random variable with itself

Correlation and linear regression are not the

Linear Regression:Straight line regression analysis involves a response

Example- Straight line regression using method of least squares.

X years experience Ysalary(in$1000s)

Distance between two binary variables based on the notion of similarity.

Multiple linear regression model based on 2 predictor attributes or

X₁and X₂ values of attributes A₁ and A₂ in x.

Multiple regression problems are solved with software packages such

INTERPOLATION : It is similar to the interpretation of covariance. The

 If Corr[X,Y] >0 ,then X and Y are said to be positively linearly

Bayesian classifications are statistical classifiers. They can predict

‘ X ’ is considered as evidence. It is hypothesis,such as that the data

P(H/X) is the posterior probability of H conditional in ‘ X ‘.

How Bayes theorem is used in Naive Bayesian classifier

2. Suppose there are ‘ m ‘ classes C1,C2,…

Thus, we maximize P(/X). The class for which P(/X) is maximized

3. As P(X) is constant for all classes, only P(X/)P() need to be

4. If the dataset with many attributes computation is extremely

For each attribute, we look at whether the attribute is categorical or

b) If is continuous assumed to have a Gaussian

5. In order to predict the class label X , P(X/)P() is evaluated for each

P(X/)P() > P(X/)P() , for i ≤ j ≤ m , j≠i

Bayesian Belief Networks Or Belief Networks Or Bayesian Networks

Bayesian belief network specify joint conditional probability

Belief networks is defined by two components – a directed acyclic graph

If an arc is drawn from a node Y to node Z, then Y is the parent or

FH, S FH, ~S ~FH, S ~FH, ~S

P (Lung Cancer=yes /Family History =yes, Smoker =yes) =0.8

P (Lung Cancer=no /Family History =no, Smoker =no) =0.9

Let X= (x1, x2 … xn) be a data tuple described by the variables or

P (x1, x2 …. xn) = ni=1 P (xi/parents(yi))

Where P (x1, x2 …. xn) is the probability of a particular combination of

Multi- variate analysis

 Climatology (min temp, max temp, rainfall, humidity) on a day.

 How the clusters are different is answered by discriminant analysis.

 Can a newly arrived individual be assigned to one of the cluster?

Discriminant analysis and classification graph

 Because the components are sorted according to the decreasing

The benefits of having a decision tree are as follows:

 It does not require any domain knowledge.

Then, for each attribute A,

where Dj / D is the weight of the jth partition.

Info A (D) is the expected information required to classify a tuple from D

You might also like