0% found this document useful (0 votes)
19 views14 pages

Script

The document discusses various types of variables used in data analysis, including categorical, quantitative, date and time, and text variables. It presents a probability analysis of student satisfaction and multitasking, revealing that high digital media use correlates with lower academic satisfaction, while some satisfied students still multitask. Additionally, it evaluates the distribution of data, concluding that the empirical rule is not applicable due to the distribution's non-normality, while Chebyshev's inequality holds true.

Uploaded by

aakankshagswm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views14 pages

Script

The document discusses various types of variables used in data analysis, including categorical, quantitative, date and time, and text variables. It presents a probability analysis of student satisfaction and multitasking, revealing that high digital media use correlates with lower academic satisfaction, while some satisfied students still multitask. Additionally, it evaluates the distribution of data, concluding that the empirical rule is not applicable due to the distribution's non-normality, while Chebyshev's inequality holds true.

Uploaded by

aakankshagswm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

INTRODUCTION

1. Categorical Variables

a. Nominal (No inherent order)

 Device type (e.g., Smartphone, Laptop, Tablet, Laptop Conf)

 Type of application (e.g., email, gaming, social media, etc.)

 Preferred study material (e.g., E-Book, Cal Books)

 Features used (e.g., Red, App Bio, Old Usage, etc.)

b. Ordinal (Ordered categories)

 Usage level (e.g., Very Low, Low, Moderate, High)

 Hours of use (e.g., 1–2-hour, 3–4-hour, 5-6 hour, >10)

 Frequency of digital distraction (e.g., Rarely, Sometimes, Often, Always)

 Impact on performance (e.g., 1, 2, 3, 4, 5 where higher numbers likely mean greater impact)

 Frequency of app use (e.g., Rarely, Sometimes, Often, Always)

 Study frequency (e.g., Rarely, Often, Always, etc.)

2. Quantitative Variables

a. Discrete Numerical

 Interval counts (e.g., 1-3, 4-6, etc.; these are categories representing counts but not precise values)

 Numerical scales (e.g., 1, 2, 3, 4, 5 in columns assessing impact, effect, or frequency)

b. Continuous Variables include time taken, marks obtained etc

3. Date and Time Variables

 Date (e.g., 05-Oct, Nov-20, 01-Feb, etc.)

 Time of day (e.g., 21:30, 20:30, 13:00, etc.)

 Recency of usage (e.g., >15, <5; can be considered ordinal or interval, depending on exact meaning)

4. Text Variables (Open-ended/Qualitative)

 Other comments or examples (e.g., "e.g. email", "Gaming", "Social Med", etc.
INDIVIDUAL ASSIGNMENT PROBABILITY ANALYSIS
The events chosen are:
 Event A: Satisfaction is high (Greater than or equal to 4)
 Event B: Multitasking is high (Greater than or equal to 4)

Functions used:

The events chosen are:


 Event A: Satisfaction is high (Greater than or equal to 4)
 Event B: Multitasking is high (Greater than or equal to 4)

Probability Calculations
1. P(A) - Probability of High Satisfaction
From the table, we can see P(A) = 0.25 {n(A)/n(S), i.e. 10/40}
This means 25% of the total subjects reported high satisfaction with their Academic
performance.
2. P(B) - Probability of High Multitasking
From the table, P(B) = 0.425 {n(B)/n(S), i.e. 30/40}
This means 42.5% of the total subjects reported high multitasking, during lectures they
multitask by using their phones.

3. P (A and B) - Probability of both High Satisfaction and High Multitasking


From the first table, the intersection is 4 out of 40 subjects.

P (A and B) = 4/40 = 0.1


This means 10% of subjects reported both high satisfaction and high multitasking with
digital devices during lectures.
4. P (A or B) - Probability of either High Satisfaction or High Multitasking (or both)
The formula for the union of two events is: P (A or B) = P(A) + P(B) – P (A and B)
We need to subtract P (A and B) to avoid counting the overlap twice.
If we just added P(A) + P(B), we would be counting the people who have both high
satisfaction AND high multitasking twice.
1. To correct this, we subtract P (A and B) once to ensure those in the overlap are
counted only once.
Using our values:
 P(A) = 0.25 (25% have high satisfaction)
 P(B) = 0.425 (42.5% have high multitasking)
 P (A and B) = 0.1 (10% have both high satisfaction and high multitasking)
P (A or B) = 0.25 + 0.425 - 0.1 = 0.575
From the original contingency table, we can verify this:
 High satisfaction, low multitasking: 6 people
 High satisfaction, high multitasking: 4 people
 Low satisfaction, high multitasking: 13 people
 Total people in A or B: 6 + 4 + 13 = 23 out of 40 people 23/40 = 0.575 or 57.5%

P (A or B) = P(A) + P(B) – P (A and B)


P (A or B) = 0.25 + 0.425 - 0.1 = 0.575
This means 57.5% of all subjects in the study reported either high satisfaction, high
multitasking with digital devices, or both.

5. P (A | B) - Probability of High Satisfaction given High Multitasking


In other words, do students have a high satisfaction with their own academic
performances given that they multitask a lot during lectures while using their phones?

P (A | B) = P (A and B) / P(B)
P (A | B) = 0.1 / 0.425 = 0.235294
Probability of high academic satisfaction GIVEN high digital media use during lectures
 Among students who frequently use digital media during lectures, only 23.5% are
satisfied with their academic performance
 This suggests that digital media use during lectures is associated with lower academic
satisfaction

This means approximately 23.5% of those with high multitasking by using phones during
lectures also reported high satisfaction with their academic performances.
6. Apply Bayes' Theorem
Probability of high digital media use GIVEN high academic satisfaction
Bayes' Theorem: P (B | A) = [P (A | B) × P(B)] / P(A)
We've already calculated:
 P (A | B) = 0.235294 (from the table it's rounded to 0.235)
 P(B) = 0.425
 P(A) = 0.25
P (B | A) = [P (A | B) × P(B)] / P(A)
P (B | A) = [0.235294 × 0.425] / 0.25 P (B | A) = 0.1 / 0.25 = 0.4
which means that 40% of those with high satisfaction also report high multitasking.
P (B | A) = 0.4 - Probability of high digital media use GIVEN high academic satisfaction
 Among students who are satisfied with their academic performance, 40% still engage
in high digital media use during lectures
 Interestingly, some academically satisfied students continue to multitask with digital
media
 This may suggest that academically satisfied students may be the ones that use digital
media to get access to study materials, notes and lectures or maybe they are efficient
in multitasking . Therefore, the use of digital media by academically satisfied students
here might be in a positive sense.

Major Findings
Digital media use appears detrimental to academic satisfaction:
Students who use digital media heavily during lectures are much less likely to be satisfied
with their academic performance (23.5% vs 25% overall rate).

Some successful students still multitask:


Even among academically satisfied students, 40% engage in high digital media use,
suggesting that some students may be able to manage both, though they're in the minority.

The majority of satisfied students avoid heavy digital media use:


60% of academically satisfied students engage in low digital media use during lectures,
supporting the idea that focused attention during lectures is beneficial for academic
performance.

Statistical Analysis: Empirical Rule vs Chebyshev's Inequality


Can the Empirical rule be used?
Is This Curve Normally Distributed?

1. Shape of the Distribution


 The curve is unimodal and roughly symmetric, but not perfectly bell-shaped.
 The peak occurs at the 21–30 class (midpoint 25.5), with frequencies tapering off
on either side.
 However, the distribution appears slightly skewed to the left
Mean= 22.275 less than Median =23.632 less than Mode= 24.158
THEREFORE, EMPERICAL RULE WILL NOT BE APPLICABLE AS THE
DISTRIBUTION IS NOT NORMAL.
2. Empirical Rule Check
 Mean: 22.275
 Standard Deviation: 7.10
Calculated Intervals:
 μ ± 1σ: 15.17 to 29.38
 μ ± 2σ: 8.07 to 36.48
 μ ± 3σ: 0.96 to 43.59
Probability in Each Interval (from your table):
 μ ± 1σ: Covers most of the data (from 15.17 to 29.38)
 μ ± 2σ: Covers almost all data (from 8.07 to 36.48)
 μ ± 3σ: Covers all data (from 0.96 to 43.59)
Class Mid-Point Frequency Probability
0-10 5 2 0.05
11-20 15.5 13 0.325
21-30 25.5 19 0.475
31-35 32.5 6 0.15

But, from your frequency data:


 0–10: 2 (5%)
 11–20: 13 (32.5%)
 21–30: 19 (47.5%)
 31–35: 6 (15%)
If you sum the probabilities within μ ± 1σ (15.17 to 29.38), you are including the 11–20,
21–30, and part of 31–35 classes.
This covers about 32.5% + 47.5% + part of 15% ≈ 80–90% of the data, which is more
than the 68% expected for a normal distribution.
3. Comparison to Normal Distribution
 In a normal distribution, about 68% of data falls within 1σ, but here it is higher.
THEREFORE, EMPERICAL RULE IS NOT APPLICABLE
4. Visual Inspection
 The plotted curve is not a perfect bell shape.
Conclusion
This distribution is not perfectly normal.
 It is roughly symmetric and unimodal, but the percentages within 1σ, 2σ, and 3σ
do not match the empirical rule for a normal distribution.
 There is a slight right skew and the spread of data is not as expected for a
normal curve.
 It is approximately normal, but not exactly. For statistical analysis, you should
be cautious about assuming normality.
Data Summary

From the probability distribution table:


 Mean (μ): 22.275
 Variance: 50.46188
 Standard Deviation (σ): 7.103662
 Total observations: 40
Distribution Analysis
Class Distribution:
Class Mid-Point Frequency Probability
0-10 5 2 0.05
11-20 15.5 13 0.325
21-30 25.5 19 0.475
31-35 32.5 6 0.15
Empirical Rule Analysis
The Empirical Rule (68-95-99.7 rule) applies to normal distributions and states:
 68% of data falls within μ ± 1σ
 95% of data falls within μ ± 2σ
 99.7% of data falls within μ ± 3σ
Calculating the intervals:
 μ - 1σ = 22.275 - 7.104 = 15.17
 μ + 1σ = 22.275 + 7.104 = 29.38
 μ - 2σ = 22.275 - 14.208 = 8.07
 μ + 2σ = 22.275 + 14.208 = 36.48
 μ - 3σ = 22.275 - 21.312 = 0.96
 μ + 3σ = 22.275 + 21.312 = 43.59

Empirical Rule Testing:


Within 1σ (15.17 to 29.38):
 Expected: 68% (27.2 observations)
 Actual: Classes 11-20 (13) + 21-30 (19) = 32 observations
 Actual percentage: 32/40 = 80%
 Result: 80% > 68%
Chebyshev's Inequality Analysis
Chebyshev's Inequality applies to any distribution and states:
At least (1 - 1/k²) of the data falls within k standard deviations of the mean.

Chebyshev's Testing:
k = 1: At least 0% within 1σ
 Actual: 80% (Satisfies)
k = 2: At least 75% within 2σ
 Actual: 95% (Satisfies)
k = 3: At least 89% within 3σ
 Actual: 100% ✓(Satisfies)
The vertical lines represent:
 Standard deviation boundaries (μ ± 1σ, μ ± 2σ, μ ± 3σ)
 These helps visualize how much data falls within each interval
Conclusion
1. Chebyshev's universality - Since Chebyshev's inequality applies to any distribution,
it's automatically satisfied when the empirical rule is satisfied.
Therefore, as the distribution is not normal (Left skewed) the data distribution is not normal
making Empirical rule invalid. Chebyshev’s inequality is clearly satisfied.

You might also like