058 1
058 1
Link: https://www.kaggle.com/datasets/uciml/adult-census-income
Qualitative Variable:
Ordinal Variable:
Write description/reason of these tests and also write there formulas of Chi-square, Contingency
Coefficient, Phi & Cramer’s V, Lambda, Uncertainty Coefficient, Gamma, spearman, Kendal tau-b ,
Kendal tau-c, Somers’d:
Description:
Formula:
χ2=∑ (O−E)²/E
Where:
O= Observed frequency
E = Expected frequency
(calculated as Row Total×Column Total/Grand Total)
Description:
Measures the strength of association between two categorical variables.
Derived from the chi-square test and ranges from 0 to 1 (higher values indicate stronger
association).
Limited by the size of the contingency table.
Formula:
C= √ χ2/ χ2+N
Where:
Description:
Measures the association between two binary categorical variables (2×2 contingency
table).
Similar to Pearson’s correlation coefficient but for categorical data.
Formula:
ϕ= √χ2/N
4. Cramer’s V
Description:
Formula:
V= √χ2/N(k−1)
Where:
Description:
Formula:
λ=E1−E2/E1
Where:
Description:
Formula:
U=H(X)−H(X∣Y)/H(X)
Where:
Formula:
γ=C−D/C+D
Where:
Description:
Measures the strength and direction of the association between two ranked variables.
A non-parametric alternative to Pearson’s correlation.
Formula:
ρ=1−6∑di²/n(n²−1)
Where:
Description:
Formula:
τb=C−D/ √ (C+D+Tx)(C+D+Ty)
Where:
Description:
A variant of Kendall’s Tau that accounts for different row and column sizes.
Used when the number of categories in the two variables differs.
Formula:
τc=2(C−D)/n²(m−1)/m
Where:
11. Somers’ D
Description:
Measures the association between two ordinal variables, considering one as dependent.
An asymmetric measure derived from Kendall’s Tau.
Formula:
DYX=C−D/C+D+TY
TY= Number of tied pairs in the dependent variable
Crosstabs
Cases
income Total
<-50k >50k
Count 59 98 157
Some-college
Expected Count 43.7 113.3 157.0
Count 1 48 49
Dectorate
Expected Count 13.6 35.4 49.0
Chi-Square Tests
Significant Association: The Pearson Chi-Square test result (χ² = 101.441, p = 0.000) indicates a
statistically significant relationship between the two categorical variables tested.
Degrees of Freedom (df = 4): The test was conducted with four degrees of freedom, meaning
the variables had multiple categories.
Expected Counts Met Assumptions: No cells had an expected count less than 5, ensuring the
Chi-Square test is valid.
Conclusion:
Since p < 0.05, we reject the null hypothesis, meaning the two variables are not independent—
they have a statistically significant relationship.
2:Contingency Coefficient:
Procedure: Analyze > Descriptive Statistics > Crosstabs.
One Nominal variable: Workclass
Another Nominal variable : Income Level
Independent Dependent Why
Workclass (Nominal) Income (Nominal) Does job type impact income?
Crosstabs
Cases
income Total
<-50k >50k
Workclass Count 16 14 30
?
Expected Count 8.6 21.4 30.0
Federal-gov Count 14 25 39
Count 6 67 73
Self-emp-inc
Expected Count 20.9 52.1 73.0
Count 32 84 116
Self-emp-not-inc
Expected Count 33.2 82.8 116.0
Count 5 33 38
State-gov
Expected Count 10.9 27.1 38.0
Count 286 713 999
Total
Expected Count 286.0 713.0 999.0
Symmetric Measures
Statistical Significance: Since p = 0.000 (< 0.05), we reject the null hypothesis, meaning there
is a statistically significant relationship between the two categorical variables.
Strength of Association:
C = 0.173 suggests a weak association between the two variables.
The closer C is to 1, the stronger the association; in this case, 0.173 is relatively low.
Cases
<-50k >50k
Occupation Count 16 14 30
?
Expected Count 8.6 21.4 30.0
Count 40 29 69
Adm-clerical
Expected Count 19.8 49.2 69.0
Count 0 1 1
Armed-Forces
Expected Count .3 .7 1.0
Count 50 69 119
Craft-repair
Expected Count 34.1 84.9 119.0
Count 9 11 20
Farming-fishing
Expected Count 5.7 14.3 20.0
Count 8 7 15
Handlers-cleaners
Expected Count 4.3 10.7 15.0
Count 17 20 37
Machine-op-inspct
Expected Count 10.6 26.4 37.0
Count 17 7 24
Other-service
Expected Count 6.9 17.1 24.0
Protective-serv Count 5 14 19
Count 8 27 35
Tech-support
Expected Count 10.0 25.0 35.0
Count 23 21 44
Transport-moving
Expected Count 12.6 31.4 44.0
Count 286 713 999
Total
Expected Count 286.0 713.0 999.0
Chi-Square Tests
Symmetric Measures
Interpretation
Phi (.392, p = .000): Phi is a measure of association between two categorical variables. A value
of 0.392 suggests a moderate association. The p-value (.000) indicates that this association is
statistically significant.
Cramer's V (.392, p = .000): Cramer's V is similar to Phi but adjusts for table size. Since the
value is the same as Phi, it suggests a similar strength of association.
N of Valid Cases (999): This means 999 cases were used in the analysis.
4:Lambda:
Crosstabs
Cases
Income Total
<-50k >50k
? 16 14 30
Federal-gov 14 25 39
Local-gov 28 54 82
Self-emp-inc 6 67 73
Self-emp-not-inc 32 84 116
State-gov 5 33 38
Total 286 713 999
Chi-Square Tests
Interpretation
Lambda Value Asymp. Sig. (p-value)
Symmetric (Overall λ) 0.003
Workclass Dependent 0.000
Income Dependent 0.007
These values are very close to 0, indicating that knowing one variable does not significantly
improve prediction of the other.
Since the p-value is greater than 0.05, the association is not statistically significant.
This means that knowing "Workclass" does not help at all in predicting "Income Level".
5:Uncertainty Coefficient:
Procedure: Analyze > Descriptive Statistics > Crosstabs.
Move one nominal variable: Workclass
Move another nominal variable: Income Level
✅ Chi-Square
✅ Uncertainty Coefficient
Crosstabs
Cases
Income Total
<-50k >50k
? 16 14 30
Federal-gov 14 25 39
Local-gov 28 54 82
Self-emp-inc 6 67 73
Self-emp-not-inc 32 84 116
State-gov 5 33 38
Total 286 713 999
Chi-Square Tests
Value Df Asymp. Sig. (2-
sided)
a
Pearson Chi-Square 30.997 6 .000
Likelihood Ratio 34.528 6 .000
N of Valid Cases 999
Directional Measures
Crosstabs
Case Processing Summary
Cases
income Total
<-50k >50k
Some-college 59 98 157
Dectorate 1 48 49
Chi-Square Tests
Symmetric Measures
A negative Gamma means that as one variable increases, the other tends to decrease.
2. p-value = 0.002
7:Spearman:
Nonparametric Correlations
Correlations
education Income
N 820 820
Spearman's rho
**
Correlation Coefficient -.121 1.000
N 820 999
This means the correlation is statistically significant at the 1% level, so it's unlikely
to be due to random chance.
The number of observations (N) for education is 820, and for income, it is 999.
The difference in N suggests that some data might be missing for education when
compared to income.
8:Kendal tau-b:
Nonparametric Correlations
Correlations
education income
N 820 820
Kendall's tau_b
**
Correlation Coefficient -.110 1.000
N 820 999
P-value = 0.001
The difference in sample sizes suggests there may be missing values in the
dataset for education.
This could affect the accuracy of the correlation
Conclusion
10:Kendal tau-c:
Select two ordinal variables (e.g., Education Level and Income Level)
and move them to the Variables box.
Crosstabs
Cases
N Percent N Percent N
education * 820 82.1% 179 17.9% 999
income
Income Total
<-50k >50k
Some-college 59 98 157
Dectorate 1 48 49
Symmetric Measures
The test statistic (-3.064) suggests that the correlation is meaningful within
this dataset.
The standard error (0.040) indicates the reliability of the estimate; lower
values suggest a more precise measurement.
Conclusion
11:Somers’d:
Crosstabs
Case Processing Summary
Cases
N Percent N Percent N
income Total
<-50k >50k
Some-college 59 98 157
Dectorate 1 48 49
Directional Measures
This suggests a weak negative relationship between Education Level and Income Level.
The negative value means that as education level increases, income level tends to slightly
decrease.
Somers’ D Interpretation
Dependent
Education Dependent (D = - Stronger negative effect Education is a weak
0.151) predictor of income, but in
this dataset, higher
education is associated with
slightly lower income.
Income Dependent (D = - Weaker negative effect Income is an even weaker
0.080) predictor of education level.
p-value = 0.002 (< 0.05)
Conclusion