0% found this document useful (0 votes)
19 views13 pages

Dva 2

The document covers various statistical methods including the chi-square test, maximum likelihood estimation, multivariate analysis, and regression analysis. It explains how to calculate the chi-square test, the principles of maximum likelihood estimation, and the types of regression models such as simple and multiple regression. Additionally, it discusses the assumptions and applications of these statistical techniques in various fields.

Uploaded by

Coder R1ck
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views13 pages

Dva 2

The document covers various statistical methods including the chi-square test, maximum likelihood estimation, multivariate analysis, and regression analysis. It explains how to calculate the chi-square test, the principles of maximum likelihood estimation, and the types of regression models such as simple and multiple regression. Additionally, it discusses the assumptions and applications of these statistical techniques in various fields.

Uploaded by

Coder R1ck
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

UNIT -2

Que 1 : How do you calculate chi square test


?Explain with an example ? x2

Que 2 : Discuss in detail about Maximum


likelihood estimate test with example ? x2

Que 3 : Explain different types of variables used


in regression modelling ? x1

Que 4 : What is multivariate analysis ? Describe


in detail ? x1

Que 5 : What Regression analysis ? Explain


simple & multiple ? x1
Que 6 : What Bayesian Modelling ? How it works
describe its adv disadvantages ? x1
Chi Square Tes
est :
The chi-square
square test is a statistical procedure used to determine if
there's a significant association
association between two categorical variables or
er there is a significant difference between observed and
whether
expected frequencies in categorical dat
data.

parametric test, meaning it does not assume any particular


It is a non-parametric
distribution for the data.
dat

Types:

Square Test of Independence: Tests whether two


1. Chi-Square
categorical variables are related or independent of each other
(used with contingency tables
tables).
2. Chi-Square Goodness-of-Fit: Tests whether the observed
Square Test of Goodness
distribution of data matches an expected theoretical distribution
distribution.

Formula:

Assumptions:

1. Data must be categorical (nominal or ordinal)


ordinal .
2. Observations should be independent.
3. Categories of the variables must be mutually exclusive.
4. Expected frequency in each category should be at least 5 .
5. Data should be randomly selected to minimize bias

Steps of Chi Square Test :


1. State the Hypotheses
2. Construct a Contingency Table
3. Calculate Expected Frequencies
4. Calculate the Chi-Square Statistic
5. Determine Degrees of Freedom
6. Compare to Critical Value ( dva notes example)
T-test notes-dva

Correlation analysis :
Correlation analysis is a statistical method used to measure and
evaluate the strength and direction of the relationship between two or
more variables.

1. It helps identify whether changes in one variable are associated


with changes in another and quantifies the degree of this
association.
2. Used to discover if there is a relationship between variables and
how strong that relationship may be.
3. Commonly applied in market research, social sciences, and data
analysis to identify patterns, trends, and significant connection

Correlation Coefficient:

The correlation coefficient (often denoted as r) quantifies the strength


and direction of the linear relationship between two variables.

It ranges from −1 to +1:

+1: Perfect positive correlation (both variables increase together).

-1: Perfect negative correlation (one variable increases as the other


decreases).

0: No linear correlation
Types of Correlation :

Positive Correlation: Both variables move in the same direction (as one
increases, so does the other).

Negative Correlation: Variables move in opposite directions (as one


increases, the other decreases).

No Correlation: No relationship between variables

Diagram of all 3

Application :

1. Business analytics
2. Medical Research
3. Weather forecast
4. Scientific research
Maximum likelihood estimates :
1. Maximum likelihood estimates is a statistical method used to
estimate the parameters of a probability distribution based on
observed data.
2. The fundamental idea behind mle is to find the parameter values
that maximize the likelihood of observed data.

Likelihood Function: The likelihood function is defined as the joint


probability of the observed data given the parameters. It is denoted
as L(θ , x), where θ represents the parameters and x represents the
observed data.

Steps to Perform MLE

1. Assume a probability distribution for your data (e.g., normal,


binomial).
2. Write the likelihood function using the assumed distribution
and the sample.
3. Take the log of the likelihood function (log-likelihood).
4. Differentiate the log-likelihood with respect to the
parameter(s).
5. Set derivative = 0 and solve to find the parameter(s) that
maximize the function.
Applications of MLE

 Estimating parameters in machine learning models (e.g., logistic


regression).
 Used in Bayesian analysis (as a basis for priors).
 Widely used in probability modeling and hypothesis testing.

Example:
e: MLE for Estimating the Mean of a Normal Distribution

1. Sample data: x=[2,3,4]


2. Assume:: Data comes from a normal distribution N(μ,σ2)
3. Assume variance σ2=1 is known.
4. Goal: Estimate
e the mean μ using MLE

.
Multivariate Analysis :
1. Multivariate analysis refers to statistical techniques that
simultaneously examine three or more variables to understand
the relationships and patterns between them.
2. It is generally performed to uncover patterns, correlations, and
dependencies among multiple variables.
3. Unlike univariate (one variable) or bivariate (two variables)
analysis, multivariate analysis provides a more comprehensive
view by considering multiple variables simultaneously

Assumptions in MVA

 Normality: Data should follow a normal distribution.


 Linearity: Relationship among variables should be linear.
 Homogeneity of variance: Variances across groups should be
equal.
 No multicollinearity: Independent variables should not be
highly correlated.

Advantages :

a. Helps in dimensionality reduction and model performance.


b. More efficient than univariate and bivariate analysis.
c. Reveals hidden patterns and relationships.
Technique Purpose Example Use
Predict a continuous
Predict house price
Multiple Linear dependent variable based
based on area, location,
Regression (MLR) on several independent
and number of rooms
variables
Predicting if a student
Predicts a yes/no
Multiple Logistic passes based on hours
outcome using several
Regression studied, attendance,
variables.
and grades.
Multivariate Evaluate the effect of
Compare group means
Analysis of teaching method on
on multiple dependent
Variance student scores in math
variables simultaneously
(MANOVA) and science

Reduces many variables Combining diet,


Factor Analysis
into a few underlying exercise, and sleep into
(FA)
factors. a “health” factor

Transforms many Reducing height,


Principal
variables into a few weight, and age into a
Component
uncorrelated “body size”
Analysis (PCA)
components. component.
Group similar
Segmenting customers
observations together
Cluster Analysis by spending, age, and
based on several
purchase frequency.
variables.
Classify observations into
Discriminant Classify loan applicants
groups based on
Analysis (DA) as risky or safe
predictor variables
Regression Analysis / Modelling :
Regression analysis is a statistical method used to examine the
relationship between a dependent variable and one or more
independent variables (predictor).

Regression modelling is a statistical technique used to estimate and


model the relationships between a dependent (outcome) variable and
one or more independent (predictor) variables

Purpose:

1. To predict the value of a dependent variable based on


independent variables.
2. To understand the strength and direction of relationships
between dependent & independent variables.

Component Meaning
Dependent
The outcome variable being predicted.
Variable (Y)
Independent The predictor(s) used to predict dependent
Variable (X) variable (Y).
The value of the dependent variable when all
Intercept (β₀)
independent variables are zero.
Slope (β₁, β₂, ...) Amount Y changes for one-unit change in X.
Difference between observed and predicted
Error Term (ε)
values (residuals) of dependent variables.
3. Assumptions of Linear Regression:

1. Linearity – Relationship between X and Y is linear.


2. Independence – Observations are independent of each other.
3. Homoscedasticity – Constant variance of errors.
4. Normality – Residuals(predicted values) are normally distributed.
5. No Multicollinearity – Predictors(Independent variables) are not
highly correlated with each other.

Type Description
Simple Linear One independent variable predicts dependent
Regression variable; relationship is modeled with a straight
line.
Multiple Linear More than one independent variable predicts
Regression dependent variable.
Logistic Used when the dependent variable is categorical
Regression (usually binary). Predicts a categorical (yes/no or
0/1) outcome.
Polynomial Models a nonlinear relationship b/w X & Y using
Regression polynomial terms.
Ridge/Lasso Regularized regression to prevent overfitting in
Regression high-dimensional data, used in ml.

Applications:

 Predicting prices (e.g., real estate, stock).


 Medical studies (e.g., effect of lifestyle on health).
 Economics (e.g., impact of interest rates on GDP).
 Business forecasting (e.g., sales prediction).

You might also like