Correlation Matrix
what is a correlation matrix
A correlation matrix is a table that displays the correlation coefficients between multiple
variables. Correlation coefficients are a measure of the degree to which two variables
are related or associated with each other.
A correlation matrix typically displays the correlations between each pair of variables in
a symmetric matrix. The diagonal of the matrix displays the correlation between each
variable with itself, which is always 1. The upper triangle of the matrix displays the
correlations between the variables, while the lower triangle is a mirror image of the
upper triangle, displaying the same values.
Correlation matrices are commonly used in statistical analysis and data science to
identify patterns and relationships among variables. They can be useful for identifying
variables that are highly correlated and may be redundant, and for selecting variables
for inclusion in statistical models.
what are the different types of correlations?
There are several types of correlations, including:
1. Pearson correlation: This measures the linear relationship between two continuous
variables. It ranges from -1 to 1, where -1 indicates a perfect negative correlation, 0
indicates no correlation, and 1 indicates a perfect positive correlation.
2. Spearman correlation: This measures the monotonic relationship between two variables.
It is often used when the relationship between the variables is not linear, but the
variables are still ordinal or continuous. It ranges from -1 to 1, where -1 indicates a
perfect negative correlation, 0 indicates no correlation, and 1 indicates a perfect
positive correlation.
3. Kendall correlation: This measures the ordinal association between two variables. It is
similar to the Spearman correlation, but it is more appropriate when dealing with small
sample sizes. It ranges from -1 to 1, where -1 indicates a perfect negative correlation, 0
indicates no correlation, and 1 indicates a perfect positive correlation.
4. Point-Biserial correlation: This measures the correlation between one continuous
variable and one dichotomous variable. It ranges from -1 to 1, where -1 indicates a
perfect negative correlation, 0 indicates no correlation, and 1 indicates a perfect
positive correlation.
5. Phi coefficient: This measures the correlation between two dichotomous variables. It
ranges from -1 to 1, where -1 indicates a perfect negative correlation, 0 indicates no
correlation, and 1 indicates a perfect positive correlation.
6. Cramer's V: This measures the correlation between two categorical variables. It ranges
from 0 to 1, where 0 indicates no correlation, and 1 indicates a perfect correlation.
Each type of correlation is appropriate for different types of variables and research
questions. The choice of correlation measure depends on the nature of the variables
being studied and the research question being asked.
A monotonic relationship is a type of association between two variables
where the direction of the association is consistent. In other words, as the
value of one variable increases or decreases, the value of the other variable
consistently increases or decreases as well.
However, in contrast to a linear relationship, the association between the
variables does not have to be perfectly straight or constant over the entire
range of values. Instead, the direction of the relationship can change, but it
does so in a consistent manner.
For example, if we look at the relationship between the age of a car and its
resale value, we might find a monotonic relationship where the older the car,
the lower its resale value tends to be, regardless of the exact age of the car.
The Spearman correlation coefficient is a common statistic used to measure
the strength and direction of a monotonic relationship between two
variables.
0.8 and above is multicolinearity but it needs to be checked via VIF and Tolerance.
R and R squared
Below .03 is usually not acceptable
.03 to .07 moderate
Above .07 is good
1. R-squared (Coefficient of Determination): R-squared measures the proportion
of the dependent variable's variance that is explained by the independent
variables in a regression model. It ranges from 0 to 1, where 0 indicates that
none of the variance is explained by the model, and 1 indicates that all of
the variance is explained. Generally, higher R-squared values are preferred
because they indicate a better fit of the model to the data. However, the
interpretation of an acceptable R-squared value can vary depending on the
discipline or specific research area. In some fields, an R-squared value above
0.70 or 0.80 might be considered good, while in others, a lower value may be
acceptable.
2. Correlation coefficient (R): The correlation coefficient measures the strength
and direction of the linear relationship between two variables. It ranges from
-1 to +1, where -1 indicates a perfect negative correlation, +1 indicates a
perfect positive correlation, and 0 indicates no correlation. Similarly, the
interpretation of an acceptable correlation coefficient depends on the
specific context. In general, a correlation coefficient above 0.7 or below -0.7
is often considered strong, while values between 0.3 and 0.7 (positive or
negative) may be considered moderate. However, as with R-squared,
acceptable values for the correlation coefficient can vary based on the field
of study and research objectives.