Dataanalyticsunit 2
Dataanalyticsunit 2
Digital Notes
[Department of Computer Science Engineering]
Subject Name : Introduction to Data Analytics
Subject Code : BCDS-501
Course : B. Tech
Branch : CSE
Semester : V
Prepared by : Mr. Anand Prakash Dwivedi
Unit – 2
Data Analysis:
What is Regression Analysis?
y β0 β1x ε
Variable
Independent
ŷ i b0 b1x variable
regression analysis
Page
Simple Linear Regression Example
A real estate agent wishes to examine the relationship between the selling price
of a home and its size (measured in square feet)
A random sample of 10 houses is selected
Dependent variable (y) = house price in $1000
Independent variable (x) = square feet
LESAST SQUARES
Or
REGRESSION
LESAST SQUARES
Polynomial Regression
This is a special case of multivariate regression, with only one independent variable
x, but an x-y relationship which is clearly nonlinear (at the same time, there is no
‘physical’ model to rely on).
y = β0 + β1x + β2x2 + β3x3.....+ βnxn + ε
Effectively, this is the same as having a multivariate model with x1 ≡ x, x2 ≡ x2, x3 ≡ x3
NONLINEAR REGRESSION
This is a model with one independent variable (the results can be easily extended to
several) and ‘n’ unknown parameters, which we will call b1,
b2, ... bn:
5
y = f (x, b) + ε
Page
where f (x, b) is a specific (given) function of the independent variable and the ‘n’
parameters
• In situations where
– a latent construct cannot be appropriately represented as a
continuous variable,
– ordinal or discrete indicators do not reflect underlying continuous
variables,
– the latent variables cannot be assumed to be normally distributed,
traditional Gaussian modeling is clearly not appropriate.
• In addition, normal distribution analysis sets minimum requirements for the
number of observations, and the measurement level of variables should be
continuous.
• A priori probability
• Conditional probability
• Posteriori probability
7
Page
Bayes’ Theorem
W h y does i t m a t t e r ? I f 1 % o f a popul at i on have cancer, f or a
screening t e s t w i t h 8 0 % sensitivity and 9 5 % specificity;
Test P[ Te s t + v e | C a n c e r ] = 80%
Have Positive
Cance P[ Te s t + v e ]
= 5.75
r P[ Cancer ]
P[ Cancer |Test + v e ] ≈ 14%
... i.e. m o s t positive results
are actually false alarms
C D
9
Page
So BN = (DAG, CPD)
C D
Each node in graph represents
a random variable
10
Page
What is Inference in BN?
— Using a Bayesian network to compute probabilities is called inference
— In general, inference involves queries of the form:
P( X | E )
where X is the query variable and E is the evidence variable.
11
Page
Summary
—
Bayesian methods provide sound theory and framework for implementation
of classifiers
Bayesian networks a natural way to represent conditional independence
information. Qualitative info in links, quantitative in tables.
NP-complete or NP-hard to compute exact values; typical to make simplifying
assumptions or approximate methods.
Many Bayesian tools and systems exist
Bayesian Networks: an efficient and effective representation of the joint
probability distribution of a set of random variables
Efficient:
o Local models
o Independence (d-separation)
Effective:
o Algorithms take advantage of structure to
o Compute posterior probabilities
o Compute most probable instantiation
o Decision making
Introduction to SVM
Support vector machines (SVMs) are powerful yet flexible supervised machine learning
algorithms which are used both for classification and regression. But generally, they
are used in classification problems. In 1960s, SVMs were first introduced but later they
got refined in 1990. SVMs have their unique way of implementation as compared to
other machine learning algorithms. Lately, they are extremely popular because of their
ability to handle multiple continuous and categorical variables.
Working of SVM
An SVM model is basically a representation of different classes in a hyperplane in
multidimensional space. The hyperplane will be generated in an iterative manner by
SVM so that the error can be minimized. The goal of SVM is to divide the datasets into
classes to find a maximum marginal hyperplane (MMH).
12
Page
The followings are important concepts in SVM −
Support Vectors − Datapoints that are closest to the hyperplane is called
support vectors. Separating line will be defined with the help of these data
points.
Hyperplane − As we can see in the above diagram, it is a decision plane or space
which is divided between a set of objects having different classes.
Margin − It may be defined as the gap between two lines on the closet data
points of different classes. It can be calculated as the perpendicular distance
from the line to the support vectors. Large margin is considered as a good
margin and small margin is considered as a bad margin.
The main goal of SVM is to divide the datasets into classes to find a maximum
marginal hyperplane (MMH) and it can be done in the following two steps −
First, SVM will generate hyperplanes iteratively that segregates the classes in
best way.
Then, it will choose the hyperplane that separates the classes correctly.
SVM Kernels
In practice, SVM algorithm is implemented with kernel that transforms an input
data space into the required form. SVM uses a technique called the kernel trick
in which kernel takes a low dimensional input space and transforms it into a
higher dimensional space. In simple words, kernel converts non-separable
problems into separable problems by adding more dimensions to it. It makes
13
SVM more powerful, flexible and accurate. The following are some of the types
of kernels used by SVM.
Page
Linear Kernel
It can be used as a dot product between any two observations. The formula of
linear kernel is as below −
K(x,xi)=sum(x∗xi)K(x,xi)=sum(x∗xi)
From the above formula, we can see that the product between two vectors say
& is the sum of the multiplication of each pair of input values.
Polynomial Kernel
RBF kernel, mostly used in SVM classification, maps input space in indefinite
dimensional space. Following formula explains it mathematically −
K(x,xi)=exp(−gamma∗sum(x−xi^2))K(x,xi)=exp(−gamma∗sum(x−xi^2))
Here, gamma ranges from 0 to 1. We need to manually specify it in the learning
algorithm. A good default value of gamma is 0.1.
Time Series Analysis
• Aim:
– To collect and analyze the past observations to develop an appropriate model which
can then be used to generate future values for the series.
• Time Series Forecasting is based on the idea that the history of occurrences over
time can be used to predict the future
14
Page
Application
• Business
• Economics
• Finance
• Science and Engineering
• Some rule induction systems induce more complex rules, in which values of
attributes may be expressed by negation of some values or by a value subset of
the attribute domain
• Data from which rules are induced are usually presented in a form sim- ilar to a
table in which cases (or examples) are labels (or names) for rows and variables
16
are labeled as attributes and a decision. We will restrict our attention to rule
Page
• A very simple ex- ample of such a table is presented as Table 1.1, in which
attributes are:
• Temperature, Headache, Weakness, Nausea, and the decision is Flu. The set of all
cases labeled by the same decision value is called a concept. For Table 1.1, case
set f1, 2, 4, 5g is a concept of all cases aected by flu (for each case from this set
the corresponding value of Flu is yes).
• Investment analysis
• Control systems & monitoring
Page
• Mobile computing
• Marketing and financial applications
• Forecasting – sales, market research, meteorology
Advantages:
• A neural network can perform tasks that a linear program can not.
• When an element of the neural network fails, it can continue without any
problem by their parallel nature.
• A neural network learns and does not need to be reprogrammed.
• It can be implemented in any application.
• It can be implemented without any problem
Disadvantages:
•The neural network needs training to operate.
•The architecture of a neural network is different from the architecture of
microprocessors
therefore needs to be emulated.
•Requires high processing time for large neural networks.
Conclusions
• Neural networks provide ability to provide more human-like AI
• Takes rough approximation and hard-coded reactions out of AI design (i.e. Rules
and FSMs)
• Still require a lot of fine-tuning during development
• The PCA method is a statistical method for Feature Selection and Dimensionality
Reduction.
• Feature Selection is a process whereby a data space is transformed into a
feature space. In
• principal both spaces have the same dimensionality.
• However, in the PCA method, the transformation is design in such way that the
data set be represented by a reduced number of “effective” features and yet
retain most of the intrinsic information contained in the data; in other words the
data set undergoes a dimensionality reduction.
• Suppose that we have a x of dimension m and we wish to transmit it using l
numbers, where l<m. If we simply truncate the vector x, we will cause a mean
square error equal to the sum of the variances of the elements eliminated from
19
x.
Page
• So, we ask: Does there exist an invertible linear transformation T such that the
truncation of Tx is optimum in the mean-squared sense?
• Clearly, the transformation T should have the property that some of its
components have low variance.
• Principal Component Analysis maximises the rate of decrease of variance and is
the right choice.
• Before we present neural network, Hebbian-based, algorithms that do this we
first present the statistical analysis of the problem.
• E[X]=0
• Where E is the statistical expectation operator. If X has not zero mean we first
subtract the mean from X before we proceed with the rest of the analysis.
• Let q denote a unit vector, also of dimension m, onto which the vector X is to be
projected. This projection is defined by the inner product of the vectors X and q:
• A=XTq=qTX
• • ||q||=(qTq)½=1
• The projection A is a random variable with a mean and variance related to the
statistics of vector X. Assuming that X has zero mean we can calculate the mean
value of the projection A:
• E[A]=qTE[X]=0
• The variance of A is therefore the same as its mean- square value and so we can
write:
• s2=E[A2]=E[(qTX)(XTq)]=qTE[XXT]q=qTR q
• The m-by-m matrix R is the correlation matrix of the random vector X, formally
defined as the expectation of the outer product of the vector X with itself, as
20
shown:
• R=E[XXT]
Page
• We observe that the matrix R is symmetric, which means that:
a single vector:
• a =[a1, a2,…, am]T
• =[xTq1, xTq2,…, xTqm]T
• =QTx
• Where Q is the matrix which is constructed by the (column) eigenvectors of R.
• From the above we see that:
• x=Q a
• This is nothing more than a coordinate
transformation from the input space, of vector x, to the feature space of the
vector a.
• From the perspective of the pattern recognition the usefulness of the PCA
method is that it provides an effective technique for dimensionality reduction.
• In particular we may reduce the number of features needed for effective data
representation by discarding those linear combinations in the previous formula
that have small variances and retain only these terms that have large variances.
• Let l1, l2, …, ll denote the largest l eigenvalues of R. We may then approximate
the vector x by
Definition of fuzzy
Fuzzy – “not clear, distinct, or precise; blurred”
Definition of fuzzy logic
A form of knowledge representation suitable for notions that
cannot be defined precisely, but which depend upon their
contexts.
The problem
Page
Change the speed of a heater fan, based off the room temperature and
humidity.
A temperature control system has four settings
Cold, Cool, Warm, and Hot
Humidity can be defined by:
Low, Medium, and High
Using this we can define the fuzzy set.
22
Page
ANTI LOCK BREAK SYSTEM ( ABS )
Nonlinear and dynamic in nature
Inputs for Intel Fuzzy ABS are derived from
Brake
4 WD
Feedback
Wheel speed
Ignition
Outputs
Pulsewidth
Error lamp
23
Page
Stochastic search
Stochastic search and optimization techniques are used in a vast number of areas,
including aerospace, medicine, transportation, and finance, to name but a
few. Whether the goal is refining the design of a missile or aircraft, determining the
effectiveness of a new drug, developing the most efficient timing strategies for
traffic signals, or making investment decisions in order to increase profits, stochastic
algorithms can help researchers and practitioners devise optimal solutions to
countless real-world problems.
24
Page