GROUP ASSIGNMENT COVER SHEET
STUDENT DETAILS
Student name: Huỳnh Dương Phương Anh Student ID number: 31221023804
Student name: Vũ Hoàng Khánh Linh Student ID number: 31221023321
Student name: Trần Đình Quân Student ID number: 31221026975
Student name: Hoàng Diễm Quỳnh Student ID number: 31221023102
UNIT AND TUTORIAL DETAILS
Unit name: Statistics for Business Unit number: MAT102
Tutorial/Lecture: Lecture Class day and time: Friday 12:00 p.m
Lecturer or Tutor name: Mr. Tran Minh Hoang
ASSIGNMENT DETAILS
Title: Group assignment
Length: Due date: 05/01/2023 Date submitted: 05/01/2023
DECLARATION
I hold a copy of this assignment if the original is lost or damaged.
I hereby certify that no part of this assignment or product has been copied from any other student’s work or
from any other source except where due acknowledgement is made in the assignment.
I hereby certify that no part of this assignment or product has been submitted by me in another
(previous or current) assessment, except where appropriately referenced, and with prior permission
from the Lecturer / Tutor / Unit Coordinator for this unit.
No part of the assignment/product has been written/ produced for me by any other person except
where collaboration has been authorised by the Lecturer / Tutor /Unit Coordinator concerned.
I am aware that this work may be reproduced and submitted to plagiarism detection software programs for
the purpose of detecting possible plagiarism (which may retain a copy on its database for future
plagiarism checking).
Student’s signature: Huynh Duong Phuong Anh
Student’s signature: Vu Hoang Khanh Linh
Student’s signature: Tran Dinh Quan
Student’s signature: Hoang Diem Quynh
Note: An examiner or lecturer / tutor has the right to not mark this assignment if the above declaration has not
been signed.
1
CONTRIBUTION
Problem Data analysis
Vũ Hoàng Khánh Linh Probem 1&2 Section d
Section b
Hoàng Diễm Quỳnh Problem 3&4a
(Diagram)
Section a
Huỳnh Dương Phương Anh Problem 4b&4c Section c
Section b
Trần Đình Quân Problem 5
(Interpretation)
Note: The workload was equally and fairly assigned to each member from the
start, and all have performed well to find out the solutions as well as discuss to
adjust and complete the answer. All members were active, dedicated, and
timely to do this group assignment.
2
1. Problem Solving
Problem 1
We use the law of total probability for this problem.
The probability that the man left his umbrella in the second shop is equal the
probability that he did not leave it in the first shop, because the given condition
is that he left his umbrella in one of two shop, so:
P(left in the second shop) = P(not left in the first shop)
1
Given that the probability that the man leaves his umbrella in any shop is 5
1 4
⇒ The probability that he does not leave it in the first shop is 1 - 5
= 5
4
⇒ P(left in the second shop) = 5
Therefore, the probability that the man left his umbrella in the second shop is
4/5 or 0.8.
Problem 2
Assume there are 100 times flip coins
Tails Heads
Normal fair coin 25 25 50
Coins with heads both sides 0 50 50
25 75 100
25 1
P(normal fair coin | result comes up heads) = 75
= 3
Problem 3
3
a) The probabilities sum to 1, as must be true for any probability
distribution. That means, we have:
⇒ 0.1+p+q+0.2= 1 (1)
𝑁
Given that E(X) = 1.5 and E(X) = ∑ 𝑥iP(xi)
𝑖=1
⇒ 0×0.1 + p + 2q + 2×0.3 = 1.5
⇒ p + 2q + 0.6 = 1.5 (2)
From (1) and (2), we have:
Solving this set of equations, we find that: p = 0.5 and q = 0.2
𝑁
b) [ ]
Var(X) = ∑ 𝑥𝑖 − µ 2P(xi) = (0-1.5)2 × 0. 1 + (1-1.5)2 × 0. 5 + (2-1.5)2
𝑖=1
× 0. 2 + (3-1.5)2 × 0. 2 = 0.85
Problem 4
a) The probability that the demand is exactly 2 fans in any one week is:
𝑥 −λ 2 −3.2
λ𝑒 3.2 𝑒
P(X=2) = 𝑥!
= 2!
= 0.2087
b) The probability that will not be satisfy the demand for fan in that week is:
P(X>4) = 1 - P(X≤ 4)
= 1 - [P(X=0) + P(X=1) + P(X=2) + P(X=3) + P(X=4)]
= 1 - (0.0407 + 0.1304 + 0.2087 + 0.2226 + 0.178)
= 0.2196
c) The least value of n for which the probability of his not being able to
satisfy the demand for fans in that week is less than 0.05:
Testing:
P(X>5) = 1 - P(X≤ 5)
= 1 - [P(X=0) + P(X=1) + P(X=2) + P(X=3) + P(X=4) + P(X=5)]
4
= 1 - (0.0407 + 0.1304 + 0.2087 + 0.2226 + 0.178 + 0.1139)
= 0.1057 (>0.05, unacceptable)
P(X>6) = 1 - P(X≤ 6)
= 1 - [P(X=0) + P(X=1) + P(X=2) + P(X=3) + P(X=4) + P(X=50) +
P(X=6)]
= 1 - (0.0407 + 0.1304 + 0.2087 + 0.2226 + 0.178 + 0.1139 +
0.0607)
= 0.045 (<0.05, acceptable)
⇒n=6
Problem 5
a)
X = length of the bar
Notice that:
P(X< 20.02) = 12%; P(X>20.06) = 33%
Because it is about normal distribution,
so:
P(z<a) = 12%; P(z<b) = 67% (obtained
from that P(z>b) = 33%)
⇔ a = -1.18; b = 0.44
(according to the C-2 appendix of the
textbook)
Here we have:
Solving this set of equations, we find that:
μ = Mean = 20.0491358
σ = Standard deviation = 2/81 ≈ 0.0247
5
b)
𝑥−μ
Z-score formula: z =
σ
In the case of x = 20.03, we have that z = -0.775
According to the C-2 appendix of the textbook, this z-score goes with a
percentage of 22.06% that is synonymous with the statement: The
proportion of steel bars which measure 20.03 cm or more is
approximately 77.94%
c)
P(reject) = P(X<20.02) + P(X>20.08)
= P(z<-1.18) + P(z>1.25)
= P(z<-1.18) + (1 - P(z<1.25))
= 11.9% + (100% - 89.44%) (according to the C-2 appendix of
the textbook)
= 22.46%
Conclusion: The percentage of bars are rejected as being outside the
acceptable range is 22.46%
2. Data Analysis
a)
Descriptive statistics of the given data:
MEASURES OF CENTER Women Men Both Gender
Size 500 500 1000
Mean 163.25 176.46 169.86
Mode 163.00 177.00 177.00
Median 163.00 177.00 170.00
Min 146.00 154.00 146.00
Max 180.00 195.00 195.00
Mid Range 163.00 174.50 170.50
6
MEASURES OF
VARIABILITY
Range 34.00 41.00 49.00
Variance 36.00 34.12 78.64
Standard deviation 6.00 5.84 8.87
Coefficient of Variation (CV) 3.68% 3.31% 5.22%
Q1 159.00 172.00 163.00
Median 163.00 177.00 170.00
Q3 167.00 190.00 177.00
Interquartile range 8.00 18.00 14.00
Mean absolute deviation 4.81 4.63 7.41
Kurtosis -0.08 0.42 -0.70
And the histograms for each gender and for the whole data set:
7
b) The data represents the height of 500 American men and 500 American
women. To compare and contrast the distribution of two groups, we may
either use the descriptive statistics or interpret some suitable diagrams.
- From the descriptive statistics:
● The distributions of both the data sets of men and women are
roughly unimodal (because mean, mode, and median are approx.
8
equal) but that of men is left skewed (because the mean is less
than the median) and that of women is right skewed (because the
mean is greater than the median).
● The distributions of women height may contain low outliers due to
less variability (comprising of range, interquartile range, variance,
and standard deviation).
● Interquatile range is an appropriate measure for the spread of the
distribution; here, the IQR of them are far from each other (8.00
and 18.00), and this says that the distributions of the heigh of
American women and men are quite different.
- From some specific diagrams:
The histogram:
● Both of them have the rough bell-shape, therefore the height spreads
approximately equally to two sides from the centre.
● It is clear that the distribution of women height has fatter tails.
● In terms of spreading to the left, the men height has some strange
outliers that are isolated to the whole group. Similarly, the women height
also has these strange outliers in the left side.
9
The CDF: Since the blue line is above the red line, it means that the
distribution of the women height as fatter tails. Normally, fatter tails represent
the probability of extreme events being higher than normal. In this case, from
the variability of the descriptive statistics or the histogram, it is obvious that the
distribution of men height contains more outliers than women height.
Therefore, the extreme events of women weight are likely to be more than that
of men weight but fall into the unusual group, not outlier.
10
The dotplot: Similar to the histogram in terms of showing the spread but still
able to indicate something interesting. That is we also have the “strange
outliers” in the right side of the distribution of men height. And the dot plot also
reveals us the shape of the distribution (clearer than the histogram because
the sample here is large enough) — it is quite abnormal in both groups. The
shape here is neither really bell-shaped nor normally distributed as we can
see a lot of observations that do not follow the pattern and there are some
clusters appearing as from other population. Because it is not given that the
sample of 500 men and 500 women are randomly chosen so maybe this can
be explained by the fact that each group contains some different small group
with slight difference in characteristic, here is the height (it doesn’t seem to be
due to sampling error because the strange point is significant). For example,
perhaps the data of 500 women was chosen from women in different regions
of the US with different average height (the sample size for each region is not
the same), or maybe this 500 men includes a group of adolescents that have
the lower average height compared to the population.
c) We use a two-tailed test. The null hypothesis is in conformance with the
statement.
H0: μ = 178 cm (the average height of American men is 178 cm)
H1: μ ≠ 178 cm (the average height of American men is not 178 cm)
For α = 0.05, the two-tailed critical value for d.f. = n - 1 = 500 - 1 = 499
degrees of freedom is 1.96 (it can be obtained from both appendix C-2 or D
because normally we can use the z-score instead of t-score even if the σ is
unknown when the sample size is greater than 30, here, with the sample size
of 500, we can feel confident to use either the z-score or t-score test to gain
the accurate answer).
11
We will reject H0 if tcalc > 1.96 or if tcalc < -1.96, as illustrated in the figure.
Calculate the test statistic:
Since the test statistic obviously falls in the left tail of rejection region, we
reject the null hypothesis H0: μ = 178 cm and conclude H1: μ ≠ 178 cm at 5
percent level of significance
Conclusion: There is enough evidence to support that the average
height of American men is not 178 cm at 5% level of significance.
d) We will do a two-tailed test at α = 0.05. The hypotheses are:
H0: μ1 – μ2 = 14 cm (on average American men is 14 cm taller than women)
H1: μ1 – μ2 ≠ 14 cm (the statement is false)
Because the variance of the population is unknown and the sample standard
deviations appear different (6.00 for women height and 5.84 for men height),
we will assume that population variances are unequal. Therefore, we will
apply the formula for the case “unknown variances, assumed unequal”.
12
We will have the t statistic:
If we use the quick rule for degrees of freedom, then we would get the d.f. =
min(n1 – 1, n2 – 1) = 499 and t.05 = 1.96. Here, we can easily see that the t
statistic falls in the left tail of rejection region.
Conclusion: There is enough evidence to support that on average
American men is not 14 cm taller than women at 5% level of significance.
13