0% found this document useful (0 votes)
13 views17 pages

Assignment in Stat Level 1

The document provides a comprehensive overview of basic statistics concepts, including data types, probability calculations, expected values, and measures of central tendency such as mean, median, and mode. It also covers confidence intervals, skewness, kurtosis, and interpretations of boxplots and histograms. Additionally, it includes practical examples and calculations related to various statistical problems.

Uploaded by

Najma Gani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views17 pages

Assignment in Stat Level 1

The document provides a comprehensive overview of basic statistics concepts, including data types, probability calculations, expected values, and measures of central tendency such as mean, median, and mode. It also covers confidence intervals, skewness, kurtosis, and interpretations of boxplots and histograms. Additionally, it includes practical examples and calculations related to various statistical problems.

Uploaded by

Najma Gani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 17

BASIC STATISTICS LEVEL - 1

Q1) Identify the Data type for the Following:

Activity Data Type


Number of beatings from Wife Discrete
Results of rolling a dice Discrete
Weight of a person Continuous
Weight of Gold Continuous
Distance between two places Continuous
Length of a leaf Continuous
Dog's weight Continuous
Blue Color Discrete
Number of kids Discrete
Number of tickets in Indian Discrete
railways
Number of times married Discrete
Gender (Male or Female) Discrete

Q2) Identify the Data types, which were among the following
Nominal, Ordinal, Interval, Ratio.
Data Data Type
Gender Nominal
High School Class Ranking Ordinal
Celsius Temperature Interval
Weight Ratio
Hair Color Nominal
Socioeconomic Status Ordinal
Fahrenheit Temperature Interval
Height Ratio
Type of living accommodation Nominal
Level of Agreement Ordinal
IQ(Intelligence Scale) Interval
Sales Figures Ratio
Blood Group Nominal
Time Of Day Ordinal
Time on a Clock with Hands Interval
Number of Children Ratio
Religious Preference Nominal
Barometer Pressure Ratio
SAT Scores Interval
Years of Education Ratio

Q3) Three Coins are tossed, find the probability that two heads and one tail are
obtained?
Ans:
Total possible events = {HHH, HHT, HTT, TTT, HTH, THH, TTH, THT}
=8
No of events = {HHT, HTH, THH}
=3
3
Probability = 8

= 0.375

So, the probability of getting two heads and one tail is 0.375

Q4) Two Dice are rolled, find the probability that sum is
a) Equal to 1
b) Less than or equal to 4
c) Sum is divisible by 2 and 3

Ans:
Total no of outcomes when two dices are rolls = 6*6
= 36.
S = {(1, 1)(1, 2)(1, 3)(1, 4)(1, 5)(1, 6)
(2, 1)(2, 2)(2, 3)(2, 4)(2, 5)(2, 6)
(3, 1)(3, 2)(3, 3)(3, 4)(3, 5)(3, 6)
(4, 1)(4, 2)(4, 3)(4, 4)(4, 5)(4, 6)
(5, 1)(5, 2)(5, 3)(5, 4)(5, 5)(5, 6)
(6, 1)(6, 2)(6, 3)(6, 4)(6, 5)(6, 6)}

a) Equal to 1

Event of the sum is equal to1 = 0


Not possible the sum is equal to 1 because that sum is always
exceed to 1.

b) Less than or equal to 4

B = {(1, 1)(1,2)(1,3)(2, 1)(2,2)(3,1)}


n(B) = 6
n(B)
P(B) = n(s)
6
= 36
1
=6

1
So, the probability of the event less than or equal to 4 is 6

c) Sum is divisible by 2 and 3.

C ={(1, 5)(2,4)(3,3)(4, 2)(5,1)(6,6)}


n(C) = 6
n(C)
P(B) = n(s)
6
= 36
1
=6

1
So, the probability of the event sum is divisible by 2 and 3 is 6
Q5) A bag contains 2 red, 3 green and 2 blue balls. Two balls are drawn at
random. What is the probability that none of the balls drawn is blue?

Ans:
Total number of events = nC r
= 7 C2

None of the balls drawn blue events = 5 C2


Probability of that none of the balls drawn is blue = 5 c 2/7 C 2

Q6) Calculate the Expected number of candies for a randomly selected child
Below are the probabilities of count of candies for children (ignoring the nature
of the child-Generalized view)
CHILD Candies count Probability
A 1 0.015
B 4 0.20
C 3 0.65
D 5 0.005
E 6 0.01
F 2 0.120
Child A – probability of having 1 candy = 0.015.
Child B – probability of having 4 candies = 0.20

Ans:
Expected number =E(x)
=1*0.015+4*0.20+3*0.65+5*0.005+6*0.01+2*0.120
=0.015+0.8+1.95+0.025+0.06+0.24
=3.09
So, the expected number of candies for a randomly selected child is 3.09
Q7) Calculate Mean, Median, Mode, Variance, Standard Deviation, Range &
comment about the values / draw inferences, for the given dataset
- For Points,Score,Weigh>
Find Mean, Median, Mode, Variance, Standard Deviation, and Range
and also Comment about the values/ Draw some inferences.
Use Q7.csv file
Q8) Calculate Expected Value for the problem below
a) The weights (X) of patients at a clinic (in pounds), are
108, 110, 123, 134, 135, 145, 167, 187, 199
Assume one of the patients is chosen at random. What is the Expected
Value of the Weight of that patient?
Ans:
EV=Σx/n
108+110+123+ 134+135+145+167 +187+199
= 9

=145.33
The expected value of the weight of that patient is 145.33

Q9) Calculate Skewness, Kurtosis & draw inferences on the following data
Cars speed and distance
Use Q9_a.csv
SP and Weight(WT)
Use Q9_b.csv
Q10) Draw inferences about the following boxplot & histogram

Ans:
 The data is more distributed in left.
 The data is positively skewed.
 Majority of the chickens are having weights in between 50-100
gm.
 Very rare number of chickens are having weight more than
300gm.
 After a certain point the frequency of the chicken decreases and
weights of the chicken increases .

Ans:

This boxplot data is positively skewed.


The data contains outliers.

Q11) Suppose we want to estimate the average weight of an adult male in


Mexico. We draw a random sample of 2,000 men from a population of
3,000,000 men and weigh them. We find that the average person in our sample
weighs 200 pounds, and the standard deviation of the sample is 30 pounds.
Calculate 94%, 98%, 96% confidence interval?
Ans:
n=2000
X = 200

s= 30
s
Confidence Interval Estimate = X ± Z
√n
30
=> 200 ± Z
√2000
94% Confidence:
Z=1.880794
30
Confidence Interval at 94% = 200 ± 1.88*
√2000
30
=200±1.88* 44.72

=200±1.88*0.67084
=200±1.2611
=198.74 – 201.26
94% Confidence Interval lies between from 198.74 to 201.26

98% Confidence:
Z=2.326348
30
Confidence Interval at 94% = 200 ± 2.33*
√2000
30
=200±2.33* 44.72

=200±2.33*0.67084
=200±1.5630
=198.44 – 201.56
98% Confidence Interval lies between from 198.44 to 201.56

96% Confidence:
Z=2.053749
30
Confidence Interval at 96% = 200 ± 2.05*
√2000
30
=200±2.05* 44.72

=200±2.05*0.67084
=200±1.3752
=198.62 – 201.38
96% Confidence Interval lies between from 198.62 to 201.38

Q12) Below are the scores obtained by a student in tests

34,36,36,38,38,39,39,40,40,41,41,41,41,42,42,45,49,56
1)Find mean, median, variance, standard deviation.
Ans:
Mean:

μ=
∑ (X i )
N
34+36+ 36+38+38+39+ 39+40+ 40+ 41+ 41+41+ 41+42+ 42+ 45+ 49+56
= 18
738
= 18

= 41
Mean=41
Median:

X=34,36,36,38,38,39,39,40,40,41,41,41,41,42,42,45,49,56
Middle value of the data set is even. The middle values are 40 and 41.
40+ 41
X= 2
81
=2
=40.5
Median=40.5

Variance:
s =∑ ¿¿ ¿ ¿ ¿
2
 n-1=18-1
= 17

x ¿¿ ¿¿¿
34 -7 49
36 -5 25
36 -5 25
38 -3 9
38 -3 9
39 -2 4
39 -2 4
40 -1 1
40 -1 1
41 0 0
41 0 0
41 0 0
41 0 0
42 1 1
42 1 1
45 4 16
49 8 64
56 15 225
Total 434

434
s=
2
17

=25.53
Variance = 25.53

Standard Deviation:
Standard Deviation = √ s 2
= √ 25.53
= 5.05
Standard deviation = 5.05

2)What can we say about the student marks?


Ans:
 Mean is greater than Median ,This implies that the data is not
normally distributed
 Slightly skewed towards right.
 Maximum number of the students scored marks between 35 – 45.
 The marks 56 can be outlier of the data.

Q13) What is the nature of skewness when mean, median of data are equal?
Ans:
Mean=Median.
In those case skewness does not exist.
Skewness = 0.
So the data is Perfectly symmetric with the bell shaped curve.
Q14) What is the nature of skewness when mean > median?
Ans:
Mean > Median.
Skewness = Positive.
Data is distributed more on left.
It is called positive skewed or right skewed data.
Q15) What is the nature of skewness when median > mean?
Ans:
Median > Mean.
Skewness = Negative.
Data is distributed more on right.
It is called negative skewed or left skewed data.
Q16) What does positive kurtosis value indicates for a data?
Ans:
High and narrow peak on central part of the data and less
variance.
Q17) What does negative kurtosis value indicates for a data?
Ans:
Wider peak on central part of the data and more variance.
Q18) Answer the below questions using the below boxplot visualization.

What can we say about the distribution of the data?


Ans:
 The data is not symmetric.
 Data is more concentrated towards right side.
What is nature of skewness of the data?
Ans:
 In this given boxplot left whisker is longer than the right whisker.
 Skewness= Negative.
 So this boxplot have left side skewed data or negatively skewed
data.

What will be the IQR of the data (approximately)?


Solution:
Q1=10

Q3=18

Inter Quartile Range (IQR) =Q3−Q1


= 18-10
=8
Approximately the IQR value of the data is 8.

Q19) Comment on the below Boxplot visualizations?

Draw an Inference from the distribution of data for Boxplot 1 with respect
Boxplot 2.
Solution:-
 Given both boxplot are having same median value as
approximately 262.5.
 The both whisker are equal in both side for the Boxplot (1) and
Boxplot (2) .It is also symmetric.
 Both boxplot are Normally Distributed.
 Neither data set shows any suspiciously outliers values.
 Range value is greater for the Boxplot (2) shown by the distances
between the ends of the two whiskers for each plot.
 The length of the box (Boxplot number 2) is more than twice that
the other box (Boxplot number 1).
 (Boxplot number 1) has less variability ,less variance, less
standard deviation, less IQR value as compared to (Boxplot
number 2 )
Q 20) Calculate probability from the given dataset for the below cases
Data _set: Cars.csv
Calculate the probability of MPG of Cars for the below cases.
MPG <- Cars$MPG
a. P(MPG>38)
b. P(MPG<40)
c. P (20<MPG<50)
Ans: (re do)
> mean(Cars$MPG)
[1] 34.42208

P(MPG>38):
> sd(Cars$MPG)
[1] 9.131445
> pnorm(38,34.42,9.13)
[1] 0.652513
P(MPG>38)=1-P(MPG<38)
(PS: Z-table gives you only less that probabilities)
>1 - 0.65
[1] 0.35
P(MPG<40):
pnorm(40,34.42,9.13)
[1] 0.7294571

P (20<MPG<50):
> pnorm(50,34.42,9.13)-pnorm(20,34.42,9.13)
[1] 0.8989178

Q 21) Check whether the data follows normal distribution(re do

a) Check whether the MPG of Cars follows Normal Distribution


Dataset: Cars.csv
Ans:
Yes MPG of the cars follows Normal Distribution.
b) Check Whether the Adipose Tissue (AT) and Waist
Circumference(Waist) from wc-at data set follows Normal
Distribution
Dataset: wc-at.csv
Ans:
Waist and Adipose Tissue doesn’t follow Normal Distribution.
Q 22) Calculate the Z scores of 90% confidence interval,94% confidence
interval, 60% confidence interval
Ans:
90% > qnorm(0.95)
[1] 1.644854
94% > qnorm(0.97)
[1] 1.880794
60% > qnorm(0.8)
[1] 0.8416212

Q 23) Calculate the t scores of 95% confidence interval, 96% confidence


interval, 99% confidence interval for sample size of 25
Ans:
95% > qt(0.975,24)
[1] 2.063899
96% > qt(0.98,24)
[1] 2.171545
99%  qt(0.995,24)
[1] 2.79694

Q 24) A Government company claims that an average light bulb


lasts 270 days. A researcher randomly selects 18 bulbs for
testing. The sampled bulbs last an average of 260 days, with a
standard deviation of 90 days. If the CEO's claim were true,
what is the probability that 18 randomly selected bulbs would
have an average life of no more than 260 days
Hint:
rcode  pt(tscore,df)
df  degrees of freedom
Ans:
µ=270
x =260

SD=90
n=18
df=n-1
=18-1
=17
x−µ
tscore =
s/√n
260−270
=
90 / √18

=-10/21.23
=-0.47
rcode  pt(tscore,df)
> pt(-0.47,17)
[1] 0.3221639

Required probability = 0.32 =32%

You might also like