0% found this document useful (0 votes)
80 views3 pages

Chi-Square Test For Categorical Variables

Uploaded by

ankit.malik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views3 pages

Chi-Square Test For Categorical Variables

Uploaded by

ankit.malik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

10/7/24, 4:22 PM about:blank

Chi-Square Test for Categorical Variables


Introduction

The chi-square test is a statistical method used to determine if there is a significant association between two categorical variables. This test is widely used in various
fields, including social sciences, marketing, and healthcare, to analyze survey data, experimental results, and observational studies.

Concept

The chi-square test is a non-parametric statistical method used to examine the association between two categorical variables. It evaluates whether the frequencies of
observed outcomes significantly deviate from expected frequencies, assuming the variables are independent. The test is grounded in the chi-square distribution,
which is applied to count data and helps in determining if any observed deviations could have arisen by random chance.

Null Hypothesis and Alternative Hypothesis

The chi-square test involves formulating two hypotheses:

Null Hypothesis (𝐻0 )(H0​) - Assumes that there is no association between the categorical variables, implying that any observed differences are due to random
chance.

Alternative Hypothesis (𝐻1 )(H1​) - Assumes that there is a significant association between the variables, indicating that the observed differences are not due to
chance alone.

Formula

The chi-square statistic is calculated using the formula:

(𝑂𝑖 − 𝐸𝑖 )2
𝜒2 = ∑ χ2 = ∑ Ei​(Oi​−Ei​)2​
𝐸𝑖
where
𝑂𝑖 Oi​is the observed frequency for category 𝑖i.
𝐸𝑖 Ei​is the expected frequency for category 𝑖i, calculated as:
(𝑟𝑜𝑤 𝑡𝑜𝑡𝑎𝑙 × 𝑐𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙)
𝐸𝑖 = Ei​= grand total(row total×column total)​
𝑔𝑟𝑎𝑛𝑑 𝑡𝑜𝑡𝑎𝑙
The sum is taken over all cells in the contingency table.

The calculated chi-square statistic is then compared to a critical value from the chi-square distribution table. This table provides critical values for different degrees
of freedom (𝑑𝑓)(df ) and significance levels (𝛼)(α).
The degrees of freedom for the test are calculated as:

𝑑𝑓 = (𝑟 − 1) × (𝑐 − 1)df = (r − 1) × (c − 1)
where 𝑟r is the number of rows and 𝑐c is the number of columns in the table.
Applications
1. Market Research: Analyzing the association between customer demographics and product preferences.
2. Healthcare: Studying the relationship between patient characteristics and disease incidence.
3. Social Sciences: Investigating the link between social factors (e.g., education level) and behavioral outcomes (e.g., voting patterns).
4. Education: Examining the connection between teaching methods and student performance.
5. Quality Control: Assessing the association between manufacturing conditions and product defects.

Practical Example - Weak Correlation

Suppose a researcher wants to determine if there is an association between gender (male, female) and preference for a new product (like, dislike). The researcher
surveys 100 people and records the following data:

Category Like Dislike Total


Male 20 30 50
Female 25 25 50
Total 45 55 100

Step 1: Calculate Expected Frequencies

Using the formula for expected frequencies:

(50 × 45)
𝐸𝑀𝑎𝑙𝑒, 𝐿𝑖𝑘𝑒 = = 22.5EM ale,Like​= 100(50×45)​= 22.5
100
(50 × 55)
𝐸𝑀𝑎𝑙𝑒, 𝐷𝑖𝑠𝑙𝑖𝑘𝑒 = = 27.5EM ale,Dislike​= 100(50×55)​= 27.5
100

about:blank 1/3
10/7/24, 4:22 PM about:blank
(50 × 45)
𝐸𝐹𝑒𝑚𝑎𝑙𝑒, 𝐿𝑖𝑘𝑒 = = 22.5EF emale,Like​= 100(50×45)​= 22.5
100
(50 × 55)
𝐸𝐹𝑒𝑚𝑎𝑙𝑒, 𝐷𝑖𝑠𝑙𝑖𝑘𝑒 = = 27.5EF emale,Dislike​= 100(50×55)​= 27.5
100
Step 2: Compute Chi-Square Statistic

(20 − 22.5)2 (30 − 27.5)2 (25 − 22.5)2 (25 − 27.5)2


𝜒2 = 22.5
+ 27.5
+ 22.5
+ 27.5
χ2 = 22.5(20−22.5)2​+ 27.5(30−27.5)2​
+ 22.5(25−22.5)2​+ 27.5(25−27.5)2​
(2.5)2 (2.5)2 (2.5)2 (2.5)2
𝜒2 = + + + χ2 = 22.5(2.5)2​+ 27.5(2.5)2​+ 22.5(2.5)2​+ 27.5(2.5)2​
22.5 27.5 22.5 27.5
6.25 6.25 6.25 6.25
𝜒2 = + + + χ2 = 22.56.25​+ 27.56.25​+ 22.56.25​+ 27.56.25​
22.5 27.5 22.5 27.5

𝜒2 = 0.277 + 0.227 + 0.277 + 0.227χ2 = 0.277 + 0.227 + 0.277 + 0.227


𝜒2 = 1.008χ2 = 1.008
Step 3: Determine Degrees of Freedom

𝑑𝑓 = (2 − 1) × (2 − 1) = 1df = (2 − 1) × (2 − 1) = 1
Step 4: Interpret the Result

Using a chi-square distribution table, we compare the calculated chi-square value (1.008) with the critical value at one degree of freedom and a significance level
(e.g., 0.05). The critical value, as determined from chi-square distribution tables, is approximately 3.841.

Since 1.008 < 3.841, we fail to reject the null hypothesis. Thus, there is no significant association between gender and product preference in this sample.

Practical Example - Strong Association


Consider a study investigating the relationship between smoking status (smoker, non-smoker) and the incidence of lung disease (disease, no disease). The researcher
collects data from 200 individuals and records the following information:

Category Disease No Disease Total


Smoker 50 30 80
Non-Smoker 20 100 120
Total 70 130 200

Step 1: Calculate Expected Frequencies

Using the formula for expected frequencies:

(80 × 70)
𝐸𝑆𝑚𝑜𝑘𝑒𝑟, 𝐷𝑖𝑠𝑒𝑎𝑠𝑒 = = 28ESmoker,Disease​= 200(80×70)​= 28
200
(80 × 130)
𝐸𝑆𝑚𝑜𝑘𝑒𝑟, 𝑁𝑜 𝐷𝑖𝑠𝑒𝑎𝑠𝑒 = = 52ESmoker,N o Disease​= 200(80×130)​= 52
200
(120 × 70)
𝐸𝑁𝑜𝑛 − 𝑆𝑚𝑜𝑘𝑒𝑟, 𝐷𝑖𝑠𝑒𝑎𝑠𝑒 = = 42EN on−Smoker,Disease​= 200(120×70)​= 42
200
(120 × 130)
𝐸𝑁𝑜𝑛 − 𝑆𝑚𝑜𝑘𝑒𝑟, 𝑁𝑜 𝐷𝑖𝑠𝑒𝑎𝑠𝑒 = = 78EN on−Smoker,N o Disease​= 200(120×130)​= 78
200
Step 2: Compute Chi-Square Statistic

(50 − 28)2 (30 − 52)2 (20 − 42)2 (100 − 78)2


𝜒2 = + + + χ2 = 28(50−28)2​+ 52(30−52)2​+ 42(20−42)2​
28 52 42 78
+ 78(100−78)2​
(22)2 (22)2 (22)2 (22)2
𝜒2 = + + + χ2 = 28(22)2​+ 52(22)2​+ 42(22)2​+ 78(22)2​
28 52 42 78
484 484 484 484
𝜒2 = + + + χ2 = 28484​+ 52484​+ 42484​+ 78484​
28 52 42 78

𝜒2 = 17.29 + 9.31 + 11.52 + 6.21χ2 = 17.29 + 9.31 + 11.52 + 6.21


𝜒2 = 44.33χ2 = 44.33
Step 3: Determine Degrees of Freedom

𝑑𝑓 = (2 − 1) × (2 − 1) = 1df = (2 − 1) × (2 − 1) = 1
Step 4: Interpret the Result

about:blank 2/3
10/7/24, 4:22 PM about:blank
Using a chi-square distribution table, we compare the calculated chi-square value (44.33) with the critical value at one degree of freedom and a significance level
(e.g., 0.05), approximately 3.841. Since 44.33 > 3.841, we reject the null hypothesis. This indicates a significant association between smoking status and the
incidence of lung disease in this sample.

Conclusion
The chi-square test is a powerful tool for analyzing the relationship between categorical variables. By comparing observed and expected frequencies, researchers can
determine if there is a statistically significant association, providing valuable insights in various fields of study.

about:blank 3/3

You might also like