UNIT -2
Que 1 : How do you calculate chi square test
?Explain with an example ? x2
Que 2 : Discuss in detail about Maximum
likelihood estimate test with example ? x2
Que 3 : Explain different types of variables used
in regression modelling ? x1
Que 4 : What is multivariate analysis ? Describe
in detail ? x1
Que 5 : What Regression analysis ? Explain
simple & multiple ? x1
Que 6 : What Bayesian Modelling ? How it works
describe its adv disadvantages ? x1
Chi Square Tes
est :
The chi-square
square test is a statistical procedure used to determine if
there's a significant association
association between two categorical variables or
er there is a significant difference between observed and
whether
expected frequencies in categorical dat
data.
parametric test, meaning it does not assume any particular
It is a non-parametric
distribution for the data.
dat
Types:
Square Test of Independence: Tests whether two
1. Chi-Square
categorical variables are related or independent of each other
(used with contingency tables
tables).
2. Chi-Square Goodness-of-Fit: Tests whether the observed
Square Test of Goodness
distribution of data matches an expected theoretical distribution
distribution.
Formula:
Assumptions:
1. Data must be categorical (nominal or ordinal)
ordinal .
2. Observations should be independent.
3. Categories of the variables must be mutually exclusive.
4. Expected frequency in each category should be at least 5 .
5. Data should be randomly selected to minimize bias
Steps of Chi Square Test :
1. State the Hypotheses
2. Construct a Contingency Table
3. Calculate Expected Frequencies
4. Calculate the Chi-Square Statistic
5. Determine Degrees of Freedom
6. Compare to Critical Value ( dva notes example)
T-test notes-dva
Correlation analysis :
Correlation analysis is a statistical method used to measure and
evaluate the strength and direction of the relationship between two or
more variables.
1. It helps identify whether changes in one variable are associated
with changes in another and quantifies the degree of this
association.
2. Used to discover if there is a relationship between variables and
how strong that relationship may be.
3. Commonly applied in market research, social sciences, and data
analysis to identify patterns, trends, and significant connection
Correlation Coefficient:
The correlation coefficient (often denoted as r) quantifies the strength
and direction of the linear relationship between two variables.
It ranges from −1 to +1:
+1: Perfect positive correlation (both variables increase together).
-1: Perfect negative correlation (one variable increases as the other
decreases).
0: No linear correlation
Types of Correlation :
Positive Correlation: Both variables move in the same direction (as one
increases, so does the other).
Negative Correlation: Variables move in opposite directions (as one
increases, the other decreases).
No Correlation: No relationship between variables
Diagram of all 3
Application :
1. Business analytics
2. Medical Research
3. Weather forecast
4. Scientific research
Maximum likelihood estimates :
1. Maximum likelihood estimates is a statistical method used to
estimate the parameters of a probability distribution based on
observed data.
2. The fundamental idea behind mle is to find the parameter values
that maximize the likelihood of observed data.
Likelihood Function: The likelihood function is defined as the joint
probability of the observed data given the parameters. It is denoted
as L(θ , x), where θ represents the parameters and x represents the
observed data.
Steps to Perform MLE
1. Assume a probability distribution for your data (e.g., normal,
binomial).
2. Write the likelihood function using the assumed distribution
and the sample.
3. Take the log of the likelihood function (log-likelihood).
4. Differentiate the log-likelihood with respect to the
parameter(s).
5. Set derivative = 0 and solve to find the parameter(s) that
maximize the function.
Applications of MLE
Estimating parameters in machine learning models (e.g., logistic
regression).
Used in Bayesian analysis (as a basis for priors).
Widely used in probability modeling and hypothesis testing.
Example:
e: MLE for Estimating the Mean of a Normal Distribution
1. Sample data: x=[2,3,4]
2. Assume:: Data comes from a normal distribution N(μ,σ2)
3. Assume variance σ2=1 is known.
4. Goal: Estimate
e the mean μ using MLE
.
Multivariate Analysis :
1. Multivariate analysis refers to statistical techniques that
simultaneously examine three or more variables to understand
the relationships and patterns between them.
2. It is generally performed to uncover patterns, correlations, and
dependencies among multiple variables.
3. Unlike univariate (one variable) or bivariate (two variables)
analysis, multivariate analysis provides a more comprehensive
view by considering multiple variables simultaneously
Assumptions in MVA
Normality: Data should follow a normal distribution.
Linearity: Relationship among variables should be linear.
Homogeneity of variance: Variances across groups should be
equal.
No multicollinearity: Independent variables should not be
highly correlated.
Advantages :
a. Helps in dimensionality reduction and model performance.
b. More efficient than univariate and bivariate analysis.
c. Reveals hidden patterns and relationships.
Technique Purpose Example Use
Predict a continuous
Predict house price
Multiple Linear dependent variable based
based on area, location,
Regression (MLR) on several independent
and number of rooms
variables
Predicting if a student
Predicts a yes/no
Multiple Logistic passes based on hours
outcome using several
Regression studied, attendance,
variables.
and grades.
Multivariate Evaluate the effect of
Compare group means
Analysis of teaching method on
on multiple dependent
Variance student scores in math
variables simultaneously
(MANOVA) and science
Reduces many variables Combining diet,
Factor Analysis
into a few underlying exercise, and sleep into
(FA)
factors. a “health” factor
Transforms many Reducing height,
Principal
variables into a few weight, and age into a
Component
uncorrelated “body size”
Analysis (PCA)
components. component.
Group similar
Segmenting customers
observations together
Cluster Analysis by spending, age, and
based on several
purchase frequency.
variables.
Classify observations into
Discriminant Classify loan applicants
groups based on
Analysis (DA) as risky or safe
predictor variables
Regression Analysis / Modelling :
Regression analysis is a statistical method used to examine the
relationship between a dependent variable and one or more
independent variables (predictor).
Regression modelling is a statistical technique used to estimate and
model the relationships between a dependent (outcome) variable and
one or more independent (predictor) variables
Purpose:
1. To predict the value of a dependent variable based on
independent variables.
2. To understand the strength and direction of relationships
between dependent & independent variables.
Component Meaning
Dependent
The outcome variable being predicted.
Variable (Y)
Independent The predictor(s) used to predict dependent
Variable (X) variable (Y).
The value of the dependent variable when all
Intercept (β₀)
independent variables are zero.
Slope (β₁, β₂, ...) Amount Y changes for one-unit change in X.
Difference between observed and predicted
Error Term (ε)
values (residuals) of dependent variables.
3. Assumptions of Linear Regression:
1. Linearity – Relationship between X and Y is linear.
2. Independence – Observations are independent of each other.
3. Homoscedasticity – Constant variance of errors.
4. Normality – Residuals(predicted values) are normally distributed.
5. No Multicollinearity – Predictors(Independent variables) are not
highly correlated with each other.
Type Description
Simple Linear One independent variable predicts dependent
Regression variable; relationship is modeled with a straight
line.
Multiple Linear More than one independent variable predicts
Regression dependent variable.
Logistic Used when the dependent variable is categorical
Regression (usually binary). Predicts a categorical (yes/no or
0/1) outcome.
Polynomial Models a nonlinear relationship b/w X & Y using
Regression polynomial terms.
Ridge/Lasso Regularized regression to prevent overfitting in
Regression high-dimensional data, used in ml.
Applications:
Predicting prices (e.g., real estate, stock).
Medical studies (e.g., effect of lifestyle on health).
Economics (e.g., impact of interest rates on GDP).
Business forecasting (e.g., sales prediction).