Chapter 2: Bivariate analysis
1/ Multivariate Analysis:
Multivariate analysis involves examining the relationships and patterns that exist among multiple variables
simultaneously. This means that it looks at the joint distribution of several variables in a dataset.
Multivariate analysis is used when you want to understand how several variables interact with each other and how
they collectively influence the outcomes or patterns in your data.
It can include techniques such as multiple regression, principal component analysis, factor analysis, cluster analysis, and
discriminant analysis, among others
2/Bivariate Analysis:
Bivariate analysis, on the other hand, is a subset of statistical analysis that focuses on the relationships between exactly
two variables.
It specifically deals with examining the associations, differences, or dependencies between two variables. For example,
it may involve comparing two groups using a t-test or conducting an analysis of variance (ANOVA) to see if there are
statistically significant differences between two or more groups.
Bivariate analysis is a more straightforward and limited form of analysis that does not take into account the joint
distribution of more than two variables.
1. Statistical characters.
There are two types of bivariate relationships:
Dependency relationships (more frequent): We distinguish between the independent variable and the
dependent variable.
Interdependence relationships: the two variables influence each other.
2. Pearson correlation analysis.
This technique is used when the two variables studied are quantitative. To know the correlation coefficient linking these
two variables, we apply the following formula:
n
Cov ( X , Y )
∑ ( x i − x ) .( y i − y )
i=1
r (x , y )= =
√∑ √∑
S X . SY n
2
n
2
(x i − x ) × ( yi − y )
i=1 i=1
If r is close to 1 strong positive linear correlation.
If r is close to 0 lack of linear correlation.
If r is close to−1 strong negative linear correlation.
Hypothesis testing for a risk level
Hypotℎesis { H 0 :r =0 ( absence of relationsℎip )
H 1 : r ≠ 0 (existence of a relationsℎip )
1
r
Test statistic ( t − Student ) : t=
√
2
1−r
n− 2
Decision Rule :if p − value>α we accept H 0 ( Absence of relationsℎip ) .
if p − value<α we reject H 0 .(Existence of relationsℎip ).
Remark: the p-value is the probability to reject the null hypothesis H 0 by error.
3. Cross table analysis and Chi square test
Hypotℎesis { H 0 : absence of relationsℎip
H 1 : existence of a relationsℎip
2
(nij −T ij )
=∑
2
Static of tℎe test : X (l −1)(c− 1)
i, j T ij
Total row i × Total column j
T ij =
Grand Total
nij :istℎe observed frequency
Decision Rule :if p − value>α we accept H 0 ( Absence of relationsℎip ) .
if p − value<α we reject H 0 .(Existence of a relationsℎip ).
Intensity of a relationsℎip (V cramer )
√ X2
n(L −1)
Cramer's V is between 0 and1. V close to1, means a strong relationship.
witℎ L=min (l , c)
4. Comparison test of two averages (independent samples)
This test is applied in the mixed case where the categorical character has two levels. Here it is assumed that the studied
population is divided into two groups according to the two modalities of the categorical character. (sex: male , female)
x1− x2
t n −2=
s.
√( 1 1
+
n 1 n2 )
√
2 2
( n1 −1 ) s1 + ( n 2 −1 ) s2
s=
n1 +n2 −2
With si the standard error inside the group i and x i the average mean of the group i
2
5. ANALYSIS OF VARIANCE: ANOVA (case of independent sample)
Fisher statistic test:
SCG
k −1
F=
SCE
n− k
k ni
SCG=∑ ∑ ( x ij − X i )
2
i=1 j=1
k
SCE=∑ ni (x i − X )2
i=1
ni : Size of groupi .
x ij: Observation j of the group i (i=1 , … , k et j=1 , … , ni).
X i : The average of groupi .
X : Global average.
SCG: Sum of squares inside groups
SCE: Sum of squares between groups