0% found this document useful (0 votes)
28 views24 pages

Dataanalyticsunit 2

notes.

Uploaded by

bhikharilal0711
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views24 pages

Dataanalyticsunit 2

notes.

Uploaded by

bhikharilal0711
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

MAHARANA PRATAP GROUP OF INSTITUTIONS

KOTHI MANDHANA, KANPUR


(Approved by AICTE, New Delhi and Affiliated to Dr.AKTU, Lucknow )

Digital Notes
[Department of Computer Science Engineering]
Subject Name : Introduction to Data Analytics
Subject Code : BCDS-501
Course : B. Tech
Branch : CSE
Semester : V
Prepared by : Mr. Anand Prakash Dwivedi
Unit – 2

Data Analysis:
What is Regression Analysis?

 Regression analysis is used to:


 Predict the value of a dependent variable based on the value of at least
one independent variable
 Explain the impact of changes in an independent variable on the
dependent variable
 Dependent variable: the variable we wish to explain

 Independent variable: the variable used to explain the dependent variable

Simple Linear Regression Model


 Only one independent variable, x
 Relationship between x and y is described by a linear function
 Changes in y are assumed to be caused by changes in x

Population Linear Regression

The population regression model:


Population Random
Population Independent Error
Slope
y intercept Variable term, or
Coefficient
Dependent residual

y  β0  β1x  ε
Variable

Linear component Random Error


component
2
Page
Linear Regression Assumptions
 The underlying relationship between the x variable and the y variable is linear
 The distribution of the errors has constant variability
 Error values are normally distributed
 Error values are independent (over time)

Estimated Regression Model

The sample regression line provides an estimate of


the population regression line

Estimated Estimate of Estimate of the


(or predicted) the regression regression slope
y value intercept

Independent

ŷ i  b0  b1x variable

Interpretation of the Slope and the Intercept

 b0 is the estimated average value of y when the value of x is zero


 b1 is the estimated change in the average value of y as a result of a one-unit
change in x
Finding the Least Squares Equation

 The coefficients b0 and b1 will be found using computer software, such as


Excel’s data analysis add-in or MegaStat
 Other regression measures will also be computed as part of computer-based
3

regression analysis
Page

Simple Linear Regression Example

 A real estate agent wishes to examine the relationship between the selling price
of a home and its size (measured in square feet)
 A random sample of 10 houses is selected
 Dependent variable (y) = house price in $1000
 Independent variable (x) = square feet

APPLICATION OF REGRESSION ANALYSIS IN RESEARCH

i. It helps in the formulation and determination of functional relationship between


two or more variables.
ii. It helps in establishing a cause and effect relationship between two variables in
economics and business research.
iii. It helps in predicting and estimating the value of dependent variable as price,
production, sales etc.
iv. It helps to measure the variability or spread of values of a dependent variable
with respect to the regression line

USE OF REGRESSION IN ORGANIZATIONS

In the field of business regression is widely used by businessmen in;


• Predicting future production
• Investment analysis
• Forecasting on sales etc.
It is also used in sociological study and economic planning to find the projections of
population, birth rates. death rates
So the success of a businessman depends on the correctness of the various estimates
that he is required to make. 4
Page
METHODS OF STUDYING REGRESSION:

FREE HAND CURVE


GRAPHICALLY

LESAST SQUARES
Or
REGRESSION
LESAST SQUARES

DEVIATION METHOD FROM


ALGEBRAICALLY
AIRTHMETIC MEAN

DEVIATION METHOD FORM


ASSUMED MEAN

MULTIVARIATE (LINEAR) REGRESSION


This is a regression model with multiple independent variables
Here, the independent (regressor) variables x1, x2.... xn with only one dependent
(response) variable y
The model therefore assumes the following format;
yi = β0 + β1x1 + β2x2 + ...... βnxn+ ε
Where 1, 2, ... n, are the first index labels of the variable and the second observation.
NB: The exact values of β and ε are, and will always remain unknown

Polynomial Regression
This is a special case of multivariate regression, with only one independent variable
x, but an x-y relationship which is clearly nonlinear (at the same time, there is no
‘physical’ model to rely on).
y = β0 + β1x + β2x2 + β3x3.....+ βnxn + ε
Effectively, this is the same as having a multivariate model with x1 ≡ x, x2 ≡ x2, x3 ≡ x3
NONLINEAR REGRESSION
This is a model with one independent variable (the results can be easily extended to
several) and ‘n’ unknown parameters, which we will call b1,
b2, ... bn:
5

y = f (x, b) + ε
Page
where f (x, b) is a specific (given) function of the independent variable and the ‘n’
parameters

Introduction to Bayesian Modeling


• In the social science researchers point of view, the requirements of traditional
frequentistic statistical analysis are very challenging.
• For example, the assumption of normality of both the phenomena under
investigation and the data is prerequisite for traditional parametric frequentistic
calculations.

• Continuous age, income, temperature, ..

• In situations where
– a latent construct cannot be appropriately represented as a
continuous variable,
– ordinal or discrete indicators do not reflect underlying continuous
variables,
– the latent variables cannot be assumed to be normally distributed,
traditional Gaussian modeling is clearly not appropriate.
• In addition, normal distribution analysis sets minimum requirements for the
number of observations, and the measurement level of variables should be
continuous.

Introduction to Bayesian Modeling


• Frequentistic parametric statistical techniques are designed for
normally distributed (both theoretically and empirically) indicators
that have linear dependencies.
– Univariate normality
– Multivariate normality
– Bivariate linearity
6
Page
• The essence of Bayesian inference is in the rule, known as Bayes' theorem, that
tells us how to update our initial probabilities P(H) if we see evidence E, in order
to find out P(H|E).

• A priori probability
• Conditional probability
• Posteriori probability
7
Page
Bayes’ Theorem
W h y does i t m a t t e r ? I f 1 % o f a popul at i on have cancer, f or a
screening t e s t w i t h 8 0 % sensitivity and 9 5 % specificity;

Test P[ Te s t + v e | C a n c e r ] = 80%
Have Positive
Cance P[ Te s t + v e ]
= 5.75
r P[ Cancer ]
P[ Cancer |Test + v e ] ≈ 14%
... i.e. m o s t positive results
are actually false alarms

M i xi n g u p P[ A | B ] w i t h P[ B | A ] is t h e Pros ecut or ’s Fallacy ; a


small pro b a bility o f evidence given innocence need N O T mea n a
small probabilit y o f innocence given evidence.
8
Page
What is a Bayesian Network?
A Bayesian network (BN) is a graphical model for depicting probabilistic relationships
among a set of variables.
 BN Encodes the conditional independence relationships between the variables in
the graph structure.
 Provides a compact representation of the joint probability distribution over the
variables
 A problem domain is modeled by a list of variables X1, …, Xn
 Knowledge about the problem domain is represented by a joint probability P(X1,
…, Xn)
 Directed links represent causal direct influences
 Each node has a conditional probability table quantifying the effects from the
parents.
 No directed cycles

Bayesian Network constitutes of..


 Directed Acyclic Graph (DAG)
 Set of conditional probability tables for each node in
the graph

C D
9
Page
So BN = (DAG, CPD)

 DAG: directed acyclic graph (BN’s structure)


 Nodes: random variables (typically binary or discrete, but
methods also exist to handle continuous variables)
 Arcs: indicate probabilistic dependencies between
 nodes (lack of link signifies conditional independence)
 CPD: conditional probability distribution (BN’s parameters)
 Conditional probabilities at each node, usually stored as a
table (conditional probability table, or CPT)

So, what is a DAG?

Follow the general graph


directed acyclic graphs use principles such as a node A is a
only unidirectional arrows to parent of another node B, if
show the direction of there is an arrow from node A
causation to node B.

B Informally, an arrow from


node X to node Y means X has
a direct influence on Y

C D
Each node in graph represents
a random variable

10
Page
What is Inference in BN?
— Using a Bayesian network to compute probabilities is called inference
— In general, inference involves queries of the form:

P( X | E )
where X is the query variable and E is the evidence variable.

Limitations of Bayesian Networks

• Typically require initial knowledge of many probabilities…quality and extent of


prior knowledge play an important role
• Significant computational cost(NP hard task)
• Unanticipated probability of an event is not taken care of.

Representing causality in Bayesian Networks

— A causal Bayesian network, or simply causal networks, is a Bayesian network


whose arcs are interpreted as indicating cause-effect relationships
— Build a causal network:
 Choose a set of variables that describes the domain
 Draw an arc to a variable from each of its direct causes
(Domain knowledge required)

11
Page
Summary

 Bayesian methods provide sound theory and framework for implementation
of classifiers
 Bayesian networks a natural way to represent conditional independence
information. Qualitative info in links, quantitative in tables.
 NP-complete or NP-hard to compute exact values; typical to make simplifying
assumptions or approximate methods.
 Many Bayesian tools and systems exist
 Bayesian Networks: an efficient and effective representation of the joint
probability distribution of a set of random variables
 Efficient:
o Local models
o Independence (d-separation)
 Effective:
o Algorithms take advantage of structure to
o Compute posterior probabilities
o Compute most probable instantiation
o Decision making

Introduction to SVM
Support vector machines (SVMs) are powerful yet flexible supervised machine learning
algorithms which are used both for classification and regression. But generally, they
are used in classification problems. In 1960s, SVMs were first introduced but later they
got refined in 1990. SVMs have their unique way of implementation as compared to
other machine learning algorithms. Lately, they are extremely popular because of their
ability to handle multiple continuous and categorical variables.

Working of SVM
An SVM model is basically a representation of different classes in a hyperplane in
multidimensional space. The hyperplane will be generated in an iterative manner by
SVM so that the error can be minimized. The goal of SVM is to divide the datasets into
classes to find a maximum marginal hyperplane (MMH).
12
Page
The followings are important concepts in SVM −
 Support Vectors − Datapoints that are closest to the hyperplane is called
support vectors. Separating line will be defined with the help of these data
points.
 Hyperplane − As we can see in the above diagram, it is a decision plane or space
which is divided between a set of objects having different classes.
 Margin − It may be defined as the gap between two lines on the closet data
points of different classes. It can be calculated as the perpendicular distance
from the line to the support vectors. Large margin is considered as a good
margin and small margin is considered as a bad margin.
The main goal of SVM is to divide the datasets into classes to find a maximum
marginal hyperplane (MMH) and it can be done in the following two steps −
 First, SVM will generate hyperplanes iteratively that segregates the classes in
best way.
 Then, it will choose the hyperplane that separates the classes correctly.

SVM Kernels
 In practice, SVM algorithm is implemented with kernel that transforms an input
data space into the required form. SVM uses a technique called the kernel trick
in which kernel takes a low dimensional input space and transforms it into a
higher dimensional space. In simple words, kernel converts non-separable
problems into separable problems by adding more dimensions to it. It makes
13

SVM more powerful, flexible and accurate. The following are some of the types
of kernels used by SVM.
Page
Linear Kernel

 It can be used as a dot product between any two observations. The formula of
linear kernel is as below −
 K(x,xi)=sum(x∗xi)K(x,xi)=sum(x∗xi)
 From the above formula, we can see that the product between two vectors say
& is the sum of the multiplication of each pair of input values.

Polynomial Kernel

 It is more generalized form of linear kernel and distinguish curved or nonlinear


input space. Following is the formula for polynomial kernel −
 k(X,Xi)=1+sum(X∗Xi)^dk(X,Xi)=1+sum(X∗Xi)^d
 Here d is the degree of polynomial, which we need to specify manually in the
learning algorithm.

Radial Basis Function (RBF) Kernel

 RBF kernel, mostly used in SVM classification, maps input space in indefinite
dimensional space. Following formula explains it mathematically −
 K(x,xi)=exp(−gamma∗sum(x−xi^2))K(x,xi)=exp(−gamma∗sum(x−xi^2))
 Here, gamma ranges from 0 to 1. We need to manually specify it in the learning
algorithm. A good default value of gamma is 0.1.
Time Series Analysis
• Aim:
– To collect and analyze the past observations to develop an appropriate model which
can then be used to generate future values for the series.
• Time Series Forecasting is based on the idea that the history of occurrences over
time can be used to predict the future
14
Page
Application
• Business
• Economics
• Finance
• Science and Engineering

An overview of nonlinear dynamics Fundamental concepts


• System may be defined as an orderly working totality, a set of units combined by
nature, by science, or by art to form a whole.
• System is not just a set of elementsbut includes also interactions between both
the system’s elements and with the ‘external world’.
• Interactions may be staticor dynamic i.e. through an exchange of mass, energy,
electric charge or through exchange of information
• A living organism is an open system, supplied with free energy from biochemical
reactions. There are also effects of information interactions.
• In physics state of a system in a given moment of time is characterized by values
of state variables (at this moment).
• The minimum number of independent state variables that are necessary to
characterize the system's state is called the number of degrees of freedom of the
system. If a system has n degrees of freedom then any state of the system may
be characterized by a point in an n-dimensional space with appropriately defined
coordinates, called the system's phase space
Fundamental concepts and definitions

• Process is defined as a series of gradual changes in a system that succeed one


another. Every process exhibits a characteristic time, τ, that defines the time
scale for this process. In the system's phase space a process is represented by a
series of connected points called trajectory.
• Attractor is a subset of the system's phase space that attracts trajectories (i.e.
the system tends towards the states that belong to some attractor).
• Signal is a detectable physical quantity or impulse (as a voltage, current,
magnetic field strength) by which information can be transmitted from a given
system to other systems, e.g. to a measuring device (EEG, ECG, EMG)
• Noise is any unwanted signal that interferes with the desired signal
15
Page
Nonlinear vs linear

• Linearity in science means more or less the same as proportionality or additivity.


But linearity has its limits. (Nonlinearity-nonadditivity)
• Reductionism, a methodological attitude of explaining properties of a system
through properties of its elements alone, may work only for linear systems.
• Some systems have properties that depend more on the
• way how the elements are connected than on what the specific properties of
individual elements are.
• Far from equilibrium vs equilibrium: Thermodynamic equilibrium means a
complete lack of differences between different parts of the system and, as a
consequence, a complete lack of changes in the system –all processes are
stopped. 'Living' states of any system are nonequilibriumstates.
• Equilibrium, the unique state when all properties are equally distributed, is the
state of 'death'. It is true not just for a single cell or an organism. In the systems
being close to equilibrium one can observe linear processes while in the systems
being far from equilibrium processes are nonlinear. Life appears to be a nonlinear
phenomenon

RULE INDUCTION

• Rule induction is one of the most important techniques of machine learning.


Since regularities hidden in data are frequently expressed in terms of rules, rule
induction is one of the fundamental tools of data mining at the same time.
Usually rules are expressions of the form

• if (attribute − 1, value − 1) and (attribute − 2, value − 2) and ···

• and (attribute − n, value − n) then (decision, value).

• Some rule induction systems induce more complex rules, in which values of
attributes may be expressed by negation of some values or by a value subset of
the attribute domain

• Data from which rules are induced are usually presented in a form sim- ilar to a
table in which cases (or examples) are labels (or names) for rows and variables
16

are labeled as attributes and a decision. We will restrict our attention to rule
Page

induction which belongs to supervised learning:


• all cases are preclassied by an expert. In dierent words, the decision value is
assigned by an expert to each case. Attributes are independent variables and the
decision is a dependent variable.

• A very simple ex- ample of such a table is presented as Table 1.1, in which
attributes are:

• Temperature, Headache, Weakness, Nausea, and the decision is Flu. The set of all
cases labeled by the same decision value is called a concept. For Table 1.1, case
set f1, 2, 4, 5g is a concept of all cases aected by flu (for each case from this set
the corresponding value of Flu is yes).

Case Temperature Attributes Weakness Nausea Decision


Headache Flu
1 41.6 yes yes no yes
2 39.8 yes no yes yes
3 36.8 no no no no
4 37.0 yes yes yes yes
5 38.8 no yes no yes
6 40.2 no no no no
7 36.6 no yes no no

What are Neural Networks?

• Models of the brain and nervous system


• Highly parallel
o Process information much more like the brain than a serial computer
Learning
17

• Very simple principles


Page

• Very complex behaviours


• Applications
o As powerful problem solvers
o As biological models

A method of computing, based on the interaction of multiple connected processing


elements.
• A powerful technique to solve many real world problems.
• The ability to learn from experience in order to improve their performance.
• Ability to deal with incomplete information

Basics Of Neural Network


• Biological approach to AI
• Developed in 1943
• Comprised of one or more layers of neurons
• Several types, we‟ll focus on feed-forward and feedback networks

Types of Neural Networks

Neural Network types can be classified based on following


attributes:
•Connection Type
- Static (feedforward)
- Dynamic (feedback)
• Topology
- Single layer
- Multilayer
- Recurrent
• Learning Methods
- Supervised
- Unsupervised
- Reinforcement
Neural Network Applications
Pattern recognition
18

• Investment analysis
• Control systems & monitoring
Page
• Mobile computing
• Marketing and financial applications
• Forecasting – sales, market research, meteorology
Advantages:
• A neural network can perform tasks that a linear program can not.
• When an element of the neural network fails, it can continue without any
problem by their parallel nature.
• A neural network learns and does not need to be reprogrammed.
• It can be implemented in any application.
• It can be implemented without any problem
Disadvantages:
•The neural network needs training to operate.
•The architecture of a neural network is different from the architecture of
microprocessors
therefore needs to be emulated.
•Requires high processing time for large neural networks.
Conclusions
• Neural networks provide ability to provide more human-like AI
• Takes rough approximation and hard-coded reactions out of AI design (i.e. Rules
and FSMs)
• Still require a lot of fine-tuning during development

Principal Component Analysis

• The PCA method is a statistical method for Feature Selection and Dimensionality
Reduction.
• Feature Selection is a process whereby a data space is transformed into a
feature space. In
• principal both spaces have the same dimensionality.
• However, in the PCA method, the transformation is design in such way that the
data set be represented by a reduced number of “effective” features and yet
retain most of the intrinsic information contained in the data; in other words the
data set undergoes a dimensionality reduction.
• Suppose that we have a x of dimension m and we wish to transmit it using l
numbers, where l<m. If we simply truncate the vector x, we will cause a mean
square error equal to the sum of the variances of the elements eliminated from
19

x.
Page
• So, we ask: Does there exist an invertible linear transformation T such that the
truncation of Tx is optimum in the mean-squared sense?
• Clearly, the transformation T should have the property that some of its
components have low variance.
• Principal Component Analysis maximises the rate of decrease of variance and is
the right choice.
• Before we present neural network, Hebbian-based, algorithms that do this we
first present the statistical analysis of the problem.

• Let X be an m-dimensional random vector representing the environment of


interest. We assume that the vector X has zero mean:

• E[X]=0

• Where E is the statistical expectation operator. If X has not zero mean we first
subtract the mean from X before we proceed with the rest of the analysis.

• Let q denote a unit vector, also of dimension m, onto which the vector X is to be
projected. This projection is defined by the inner product of the vectors X and q:

• A=XTq=qTX

• Subject to the constraint:

• • ||q||=(qTq)½=1

• The projection A is a random variable with a mean and variance related to the
statistics of vector X. Assuming that X has zero mean we can calculate the mean
value of the projection A:

• E[A]=qTE[X]=0

• The variance of A is therefore the same as its mean- square value and so we can
write:
• s2=E[A2]=E[(qTX)(XTq)]=qTE[XXT]q=qTR q
• The m-by-m matrix R is the correlation matrix of the random vector X, formally
defined as the expectation of the outer product of the vector X with itself, as
20

shown:
• R=E[XXT]
Page
• We observe that the matrix R is symmetric, which means that:
a single vector:
• a =[a1, a2,…, am]T
• =[xTq1, xTq2,…, xTqm]T
• =QTx
• Where Q is the matrix which is constructed by the (column) eigenvectors of R.
• From the above we see that:
• x=Q a
• This is nothing more than a coordinate
transformation from the input space, of vector x, to the feature space of the
vector a.
• From the perspective of the pattern recognition the usefulness of the PCA
method is that it provides an effective technique for dimensionality reduction.
• In particular we may reduce the number of features needed for effective data
representation by discarding those linear combinations in the previous formula
that have small variances and retain only these terms that have large variances.
• Let l1, l2, …, ll denote the largest l eigenvalues of R. We may then approximate
the vector x by

WHAT IS FUZZY LOGIC?

Definition of fuzzy
Fuzzy – “not clear, distinct, or precise; blurred”
Definition of fuzzy logic
A form of knowledge representation suitable for notions that
cannot be defined precisely, but which depend upon their
contexts.

FUZZY LOGIC IN CONTROL SYSTEMS


Fuzzy Logic provides a more efficient and resourceful way to solve Control
Systems.
Some Examples
Temperature Controller
Anti – Lock Break System ( ABS )
TEMPERATURE CONTROLLER
21

 The problem
Page
 Change the speed of a heater fan, based off the room temperature and
humidity.
 A temperature control system has four settings
 Cold, Cool, Warm, and Hot
 Humidity can be defined by:
 Low, Medium, and High
 Using this we can define the fuzzy set.

BENEFITS OF USING FUZZY LOGIC

22
Page
ANTI LOCK BREAK SYSTEM ( ABS )
Nonlinear and dynamic in nature
Inputs for Intel Fuzzy ABS are derived from
Brake
4 WD
Feedback
Wheel speed
Ignition
Outputs
Pulsewidth
Error lamp

23
Page
Stochastic search
Stochastic search and optimization techniques are used in a vast number of areas,
including aerospace, medicine, transportation, and finance, to name but a
few. Whether the goal is refining the design of a missile or aircraft, determining the
effectiveness of a new drug, developing the most efficient timing strategies for
traffic signals, or making investment decisions in order to increase profits, stochastic
algorithms can help researchers and practitioners devise optimal solutions to
countless real-world problems.

Introduction to Stochastic Search and Optimization: Estimation, Simulation, and


Control is a graduate-level introduction to the principles, algorithms, and practical
aspects of stochastic optimization, including applications drawn from engineering,
statistics, and computer science. The treatment is both rigorous and broadly
accessible, distinguishing this text from much of the current literature and providing
students, researchers, and practitioners with a strong foundation for the often-
daunting task of solving real-world problems.

Most widely used stochastic algorithms, including

Random search Machine (reinforcement) learning


Recursive linear estimation Model selection
Stochastic approximation Simulation-based optimization
Simulated annealing Markov chain Monte Carlo
Genetic and evolutionary algorithms Optimal experimental design

24
Page

You might also like