QA All Solved Question Paper
QA All Solved Question Paper
May 2023
1. Answer the following. Marks 20
Q.1. Define Statistics and list the limitations of statistics. (Marks 5)
1) Statistics is defined as collection, compilation analysis and interpretation of numerical
data.
2) Statistic is a science of data.
Q.4. Show the sample variance (𝑺𝟐 ) is an unbiased estimator of population variance (𝝈𝟐 ).
Also illustrate with an example. (Marks 5)
( 2) 2 2
𝐸 𝑋 = 𝜎 +𝜇
𝜎2
𝐸(𝑋̅ 2 ) = + 𝜇2
𝑛
𝐸[(𝑋𝑖 − 𝑋̅)2 ] = 𝐸(𝑋𝑖 2 ) − 𝑛𝐸(𝑋̅ 2 )
𝜎2
= ∑(𝜎 2 + 𝜇2 ) − ( + 𝜇2 )
𝑛
= 𝑛𝜎 2 + 𝑛𝜇2 − 𝜎 2 + 𝑛𝜇2
= 𝑛𝜎 2 − 𝜎 2
= (𝑛 − 1)𝜎 2
(𝑋 −𝑋̅ )2
Let’s prove 𝐸(𝑆 2 ) = 𝐸 [ (𝑛−1)
𝑖
] = 𝜎2
(𝑋𝑖 − 𝑋̅)2
𝐸(𝑆 2 ) = 𝐸 [ ]
𝑛−1
1
= 𝐸 [∑(𝑋𝑖 − 𝑋̅)2 ]
𝑛−1
1
= (𝑛 − 1)𝜎 2
𝑛−1
= 𝜎2
∴ Hence proved.
Example: Suppose we have a population of 5 individuals with ages: 20, 35, 45, 50, 55
Take sample size first 3.
1
∴ 𝑆2 = ∑(𝑋𝑖 − 𝑋̅)2
𝑛−1
∑ 𝑋 100
∴ 𝑛 = 3, 𝑋̅ = = = 33.33
𝑛 3
(20 − 33.33)2 + (35 − 33.33)2 + (45 − 33.33)2
∴ 𝑆2 =
3−1
177.68 + 2.78 + 136.18
=
2
316.64
=
2
∴ 𝑆 2 = 158.32
1 3
=√ ×
2 2
3
=√
4
= 0.8660
𝑟 = 0.8660
Q.2. The frequency distribution of scores obtained by 250 candidates in an entrance test is as
follows. Draw a less than and more than frequency curve(ogive) to represent the given
data. also, what is the significance of the point of intersection of the two ogive curves?
(Marks 10)
Scores Number of candidates
400 – 450 25
450 – 500 30
500 – 550 45
550 – 600 37
600 – 650 30
650 – 700 33
700 – 750 15
750 – 800 35
Curves:
∴ The significance of the point of intersection of the two ogive curves is approximately
580.
3. Solve the Following: Marks 20
Q.1. The following table gives the age of cars of a certain make and annual maintenance
costs. Obtain the regression equation for maintenance costs, taking age of the car as the
independent variable. Also, find the maintenance cost for age of the car = 5 years.
Age of Cars Maintenance Cost
(in Years) (In thousands of rupees)
2 10
4 20
6 25
8 30
(Marks 10)
𝑿 𝒀 𝑿𝟐 𝑿𝒀
2 10 4 20
4 20 16 80
6 25 36 150
8 30 64 240
∑ 𝑋 = 20 ∑ 𝑌 = 85 ∑ 𝑋 2 = 120 ∑ 𝑋𝑌 = 490
490 − 4 × 5 × 21.25
∴ 𝑏=
120 − 4 × 52
65
=
20
= 3.25
∴ 𝑦 = 𝑎 + 𝑏𝑥
Given, 𝑥 = 5
∴ The maintenance cost for 5 years old car is: 2125 RS.
Q.2. Explain with illustration the concept of Point Estimation. (Marks 10)
Sample Population
Size 𝑛 𝑁
Mean 𝑋̅ 𝜇
Standard 𝑆. 𝐷 𝜎
Deviation
Variance 𝑆 2 𝑜𝑟 𝑠 2 𝜎2
Capital S for biased
Small s for unbiased
Proposition P sampling 𝜋
𝑺𝒙 (Error) 𝑆 −
√𝑛
1) Point estimators are defined as functions that can be used to find the approximate
value of a particular point from a given population parameter.
2) In point estimation, we find out the statistic which may use for replace an unknown
parameter for all practical purpose.
3) A good estimator is one which is as close to true value the parameter as possible.
4) The sample data of a population is used to find a point estimate or a statistic that can
act as the best estimate of an unknown parameter that is given for a population.
5) The maximum likelihood method is a popularly used way to calculate point estimators.
This method uses differential calculus to understand the probability function from a
given number of sample parameters.
6) Following are the four characteristics of point estimation:
1. Unbiasedness
2. Consistency
3. Efficiency
4. Sufficiency
i) ∑ 𝑌 = 𝑛𝑎 + 𝑏1 ∑ 𝑋1 + 𝑏2 ∑ 𝑋2 …………………… eq(1)
2
∑ 𝑋1 𝑌 = 𝑎 ∑ 𝑋1 + 𝑏1 ∑ 𝑋1 + 𝑏2 ∑ 𝑋1 𝑋2 …………………… eq(2)
∑ 𝑋2 𝑌 = 𝑎 ∑ 𝑋2 + 𝑏1 ∑ 𝑋1 𝑋2 + 𝑏2 ∑ 𝑋2 2 …………………… eq(3)
𝒀 𝑿𝟏 𝑿𝟐
𝑿𝟏 𝒀 𝑿𝟐 𝒀 𝑿𝟏 𝟐 𝑿𝟐 𝟐 𝑿𝟏 𝑿𝟐
12 17 10
204 120 289 100 170
15 15 6
225 90 225 36 90
14 15 10
210 140 225 100 150
19 10 21
190 399 100 441 210
8 13 8
104 64 169 64 104
16 15 13
240 208 225 169 195
15 11 9
165 135 121 81 99
25 6 25
150 625 36 625 150
10 15 10
150 100 225 100 150
11 7 877 88 49 64 56
𝑌 𝑋1 𝑋2 𝑋1 𝑌 𝑋2 𝑌 𝑋1 2 𝑋2 2 𝑋1 𝑋2
= 145 = 124 = 112 = 1715 = 1969 = 1664 = 1780 = 1374
145 = 10𝑎 + 124𝑏1 + 112𝑏2 …………………… eq(1)
1715 = 124𝑎 + 1664𝑏1 + 1374𝑏2 …………………… eq(2)
1969 = 112𝑎 + 1374𝑏1 + 1780𝑏2 …………………… eq(3)
∴ 𝑎 = 14
∴ 𝑏1 = −0.5817
∴ 𝑏2 = 0.6400
̂ = 𝟏𝟒 − 𝟎. 𝟓𝟖𝑿𝟏 + 𝟎. 𝟔𝟒𝑿𝟐
𝒀
∑(𝑌𝑖−𝑌̅ )2−∑(𝑌𝑖−𝑌̂ )2
ii) 𝑅2 = ∑(𝑌𝑖−𝑌̅ )2
∑ 𝑌 145
𝑌̅ = = = 14.5
𝑛 10
𝒀𝒊 𝒀𝒊 − 𝒀 ̅ ̂
𝒀 𝒀𝒊 − 𝒀 ̂ (𝒀𝒊 − 𝒀̅ )𝟐 (𝒀𝒊 − 𝒀̂ )𝟐
12 -2.5 10.54 1.46 6.25 2.1316
15 0.5 9.14 5.86 0.25 34.3396
14 -0.5 11.7 2.3 0.25 136.89
19 4.5 21.64 -2.64 20.25 468.2896
8 -6.5 11.58 -3.58 42.25 134.0964
16 1.5 13.62 2.38 2.25 185.5044
15 0.5 13.38 1.62 0.25 179.0244
25 10.5 26.52 -1.52 110.25 703.3104
10 -4.5 11.7 -1.7 20.25 136.89
11 -3.5 15.06 -4.06 12.25 226.8036
𝑌𝑖 = 145 𝑌𝑖 − 𝑌̅ = 0 𝑌̂ 𝑌𝑖 − 𝑌̂ (𝑌𝑖 − 𝑌̅)2 (𝑌𝑖 − 𝑌̂)2
= 144.88 = 0.12 = 214.5 = 2207.28
214.5 − 2207.28
𝑅2 =
214.5
𝑹𝟐 = −𝟗. 𝟐𝟗
Q.2. Explain Primary data and Secondary data in detail. (Marks 10)
Parameter Primary Data Secondary Data
1. Meaning Data collected by researcher Data collected by other people.
itself.
2. Originality Original and Unique Not original and unique
Information. information.
3. Adjustment Does not need adjustment, is Need adjustment to suit actual
focused. aim.
4. Sources Observations, Surveys, Internal Records, govt. published
Experiment. Data, etc.
5. Type of Data Qualitative Data Quantitative Data
6. Methods Observation, experiment, Desk research method, searching
interview, etc. online, etc.
7. Reliability More reliable Less reliable
8. Capability More capable to solve a Less capable to solve a problem.
problem.
9. Time More time consuming Less time consuming
consumed
10. Cost- Costly Economical
effectiveness
11. Suitability More suitable May or may not be suitable
12. Need of Needs team of trained Does not need of team
Investigators Investigators. Investigators.
13. Collected Secondary data is inadequate. Before primary data is collected.
when
5. Solve the Following Marks 20
Q.1. Given 𝒓𝟏𝟐 = 𝟎. 𝟕, 𝒓𝟏𝟑 = 𝟎. 𝟔𝟏 and 𝒓𝟐𝟑 = 𝟎. 𝟒. (Marks 10)
Compute:
i. 𝒓𝟐𝟑.𝟏
ii. 𝒓𝟏𝟑.𝟐
iii. 𝒓𝟏𝟐.𝟑
𝟏) 𝒓𝟐𝟑.𝟏
𝑟23 − 𝑟21 𝑟31
𝑟23.1 =
√(1 − 𝑟21 2 )(1 − 𝑟31 2 )
(0.4) − (0.7) × (0.61)
=
√(1 − (0.7)2 ) × (1 − (0.61)2 )
0.4 − 0.427
=
√(1 − 0.49) × (1 − 0.3721)
−0.027
=
√0.51 × 0.6279
−0.027
=
0.565
∴ 𝑟23.1 = −0.047
𝟐) 𝒓𝟏𝟑.𝟐
𝑟13 − 𝑟12 𝑟32
𝑟13.2 =
√(1 − 𝑟12 2 )(1 − 𝑟32 2 )
(0.61) − (0.7) × (0.4)
=
√(1 − (0.7)2 ) × (1 − (0.4)2 )
0.61 − 0.28
=
√(1 − 0.49) × (1 − 0.16)
0.33
=
√0.51 × 0.84
0.33
=
0.65
∴ 𝑟13.2 = 0.50
𝟑) 𝒓𝟏𝟐.𝟑
𝑟12 − 𝑟13 𝑟23
𝑟12.3 =
√(1 − 𝑟13 2 )(1 − 𝑟23 2 )
(0.7) − (0.61) × (0.4)
=
√(1 − (0.61)2 ) × (1 − (0.4)2 )
0.7 − 0.244
=
√(1 − 0.3721) × (1 − 0.16)
0.456
=
√0.6279 × 0.84
0.456
=
0.726
∴ 𝑟12.3 = 0.628
Q.2. Differentiate between the following pair of concepts: (Marks 10)
i. Critical Region and Region of acceptance (Marks 5)
Sr. no. Critical Region Region of acceptance
1. Represents the range of values Represents the range of values of the
of the test statistic where the test statistic where the null hypothesis
null hypothesis is rejected. is not rejected.
2. Also known as the rejection Also known as the non-rejection
region. region.
5. If the test statistic falls within If the test statistic falls within this
this region, the null hypothesis is region, the null hypothesis is retained.
rejected.
6. Its size is directly related to the Its size is complementary to the size of
chosen significance level. the critical region.
Q.2. A random sample of size 100 has a standard deviation of 5. What can you say about the maximum
error with 95% confidence is 1.96. (Marks 5)
∴ 𝑛 = 100, 𝜎 = 5, confidence = 1.96.
Maximum error =
𝜎
𝐸(𝐸𝑟𝑟𝑜𝑟) = 𝑍𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 ×
√𝑛
5
= 1.96 ×
√100
= 1.96 × 0.5
= 0.98
∴ Maximum error with 95% confidence level is equal to plus or minus 0.98 from the mean.
Q.3. What are assumptions of Multiple Linear Regression? (Marks 5)
There are a number of assumptions that should be assessed before performing a multiple
regression analysis:
1) The dependant variable (the variable of interest) needs to be using a continuous
scale.
2) There are two or more independent variables. These can be measured using either
continuous or categorical means.
3) The three or more variables of interest should have a linear relationship, which
you can check by using a scatterplot.
4) The data should have homoscedasticity. In other words, the line of best fit is not
dissimilar as the data points move across the line in a positive or negative
direction. Homoscedasticity can be checked by producing standardised residual
plots against the unstandardized predicted values.
5) The data should not have two or more independent variables that are highly
correlated. This is called multicollinearity which can be checked using Variance-
inflation-factor or VIF values. High VIF indicates that the associated independent
variable is highly collinear with the other variables in the model.
6) There should be no spurious outliers.
7) The residuals (errors) should be approximately normally distributed. This can be
checked by a histogram (with a superimposed normal curve) and by plotting the
of the standardised residuals using either a P-P Plot, or a Normal Q-Q Plot.
2) Researcher try to reject or disprove it. Researcher try to accept or prove it.
8) Z - test T – test
2. Answer the Following Marks 20
Q.1. Represent the following data by a percentage sub-divided bar diagram. (Marks 10)
Item of Expenditure Family A Family B
Income Rs 500 Income Rs 300
Food 150 150
Clothing 125 60
Education 25 50
Miscellaneous 190 70
Savings or Deficits +10 -30
Q.2. Distinguish between primary data and secondary. What precautions should be taken in
the use of secondary data. (Marks 10)
Parameter Primary Data Secondary Data
1. Meaning Data collected by researcher Data collected by other
itself. people.
2. Originality Original and Unique Not original and unique
Information. information.
3. Adjustment Does not need adjustment, is Need adjustment to suit actual
focused. aim.
4. Sources Observations, Surveys, Internal Records, govt.
Experiment. published Data, etc.
5. Type of Data Qualitative Data Quantitative Data
6. Methods Observation, experiment, Desk research method,
interview, etc. searching online, etc.
7. Reliability More reliable Less reliable
8. Capability More capable to solve a Less capable to solve a
problem. problem.
9. Time More time consuming Less time consuming
consumed
10. Cost- Costly Economical
effectiveness
11. Suitability More suitable May or may not be suitable
12. Need of Needs team of trained Does not need of team
Investigators Investigators. Investigators.
13. Collected Secondary data is inadequate. Before primary data is
when collected.
Following some precautions should be taken in the use of secondary data:
Suitable purpose of investigation.
Inadequate data.
Definition of units.
Degree of accuracy.
Time and condition of collection of facts.
Homogeneous conditions.
Comparison.
3. Answer the Following Marks 20
Q.1. The following Table gives the frequency distribution of the weekly wages (in ‘00RS.) of
100 workers in factory. Draw the Histogram and frequency polygon of the distribution.
(Marks 10)
Weekly
wages 20-24 25-29 30-34 35-39 40-44 45-49 50-54 55-59 60-64 Total
(’00 RS.’)
No. of
4 5 12 23 31 10 8 5 2 100
Workers
Q.2. The equation of two lines of regression obtained in correlation analysis are given below:
𝟐𝑿 = 𝟖 − 𝟑𝒀 and 𝟐𝒀 = 𝟓 − 𝑿. Obtain the value of the correlation coefficient.
(Marks 10)
Let Regression line of X on Y be,
2𝑋 = 8 − 3𝑌
8 3
𝑋= − 𝑌
2 2
𝑋 = −1.5𝑌 + 4
3
∴ 𝑏𝑥𝑦 = −1.5 𝑜𝑟 −
2
3 1
= ±√− × −
2 2
= ±0.866
∴ 𝑟 = −0.866
Marks in 25 28 35 32 31 36 29 38 34 32
Economics
Marks in 43 46 49 41 36 32 31 30 33 39
Statistics
∑𝑋320
∴ 𝑋̅ = == 32
𝑁 10
∑ 𝑌 380
∴ 𝑌̅ = = = 38
𝑁 10
𝑿 𝒀 𝒙 𝒚 𝒙𝟐 𝒚𝟐 𝒙𝒚
=𝑿 =𝒀
− 𝟑𝟐 − 𝟑𝟖
25 43 -7 5 49 25 -35
28 46 -4 8 16 64 -32
35 49 3 11 9 121 33
32 41 0 3 0 9 0
31 36 -1 -2 1 4 2
36 32 4 -6 16 36 -24
29 31 -3 -7 9 49 21
38 30 6 -8 36 64 -48
34 33 2 -5 4 25 -10
32 39 0 1 0 1 0
∑𝑿 ∑𝒀 ∑𝒙 ∑𝒚 ∑ 𝒙𝟐 ∑ 𝒚𝟐 ∑ 𝒙𝒚 =
= 320 = 380 =0 =0 = 140 = 398 − 93
2) Regression coefficients Y on X:
𝑁 ∑ 𝑥𝑦 − (∑ 𝑥)(∑ 𝑦)
𝑏𝑦𝑥 =
𝑁 ∑ 𝑥 2 − ∑(𝑥)2
10 × −93 − 0 × 0
=
10 × 140 − 02
−930 − 0
=
1400 − 00
−930
=
1400
∴ 𝑏𝑦𝑥 = −0.6642
= ±√−0.2336 × −0.6642
= ±0.3938
∴ Both regression coefficients are negative, so take negative sign,
𝑟 = −0.3938
Q.2. Explain the following point Estimation Properties with Example (Marks 10)
i) Consistency:
1) It states that the estimator stays close to the parameter’s value as the population’s
size increases.
2) Thus, a large sample size is required to maintain its consistency level.
3) When the expected value moves towards the parameter’s value, we state that the
estimation is consistent.
4) Example:
1. Suppose you're estimating the mean height of students in a school.
2. You take random samples of increasing sizes, say 10 students, 50 students, 100
students, and so on.
3. With each increase in sample size, the mean height calculated from the sample
should approach the true mean height of all students in the school.
4. If this happens, the estimator for the mean height is considered consistent.
ii) Unbiasedness:
1) The most efficient estimator is considered the one which has the least unbiased and
consistent variance among all the estimators considered.
2) The variance considers how dispersed the estimator is from the estimate.
3) The smallest variance should deviate the least when different samples are brought
into place.
4) But, of course, this also depends on the distribution of the population.
5) Example:
1. Let's say you're estimating the average score of students in a class.
2. You take multiple random samples and calculate the mean score for each
sample.
3. If, on average, these sample means are equal to the true average score of the
entire class, then the estimator is unbiased.
4. This means that sometimes the estimate might be higher than the true value,
and sometimes it might be lower, but on average, it hits the mark.
i) ∑ 𝑌 = 𝑛𝑎 + 𝑏1 ∑ 𝑋1 + 𝑏2 ∑ 𝑋2 …………………… eq(1)
2
∑ 𝑋1 𝑌 = 𝑎 ∑ 𝑋1 + 𝑏1 ∑ 𝑋1 + 𝑏2 ∑ 𝑋1 𝑋2 …………………… eq(2)
∑ 𝑋2 𝑌 = 𝑎 ∑ 𝑋2 + 𝑏1 ∑ 𝑋1 𝑋2 + 𝑏2 ∑ 𝑋2 2 …………………… eq(3)
𝟐
𝒀 𝑿𝟏 𝑿𝟐 𝑿𝟏 𝒀 𝑿𝟐 𝒀 𝑿𝟏 𝑿𝟐 𝟐 𝑿𝟏 𝑿𝟐
100 17 19 1700 1900 361 289323
79 50 54 3950 4266 2916 2500
2700
100 90 75 9000 7500 5625 8100
6750
129 30 36 3870 4644 1296 900
1080
158 15 16 2370 2528 256 225240
106 20 25 2120 2650 625 400500
58 20 24 1160 1392 576 400480
78 50 53 3900 4134 2809 2500
2650
2 2
𝑌 𝑋1 𝑋2 𝑋1 𝑌 𝑋2 𝑌 𝑋1 𝑋2 𝑋1 𝑋2
= 808 = 292 = 302 = 28070 = 29014 = 15314 = 14464 = 14723
808 = 8𝑎 + 292𝑏1 + 302𝑏2 …………………… eq(1)
28070 = 292𝑎 + 15314𝑏1 + 14723𝑏2 …………………… eq(2)
29014 = 302𝑎 + 14723𝑏1 + 14464𝑏2 …………………… eq(3)
∴ 𝑎 = 137.55
∴ 𝑏1 = 2
∴ 𝑏2 = −2.90
̂ = 𝟏𝟑𝟕. 𝟓𝟓 + 𝟐𝑿𝟏 − 𝟐. 𝟗𝟎𝑿𝟐
𝒀
ii) R2 =
∑(𝑌𝑖 − 𝑌̅)2 − ∑(𝑌𝑖 − 𝑌̂)2
𝑅2 =
∑(𝑌𝑖 − 𝑌̅)2
∑ 𝑌 808
𝑌̅ = = = 101
𝑛 8
𝒀𝒊 𝒀𝒊 − 𝒀̅ ̂
𝒀 ̂
𝒀𝒊 − 𝒀 (𝒀𝒊 − 𝒀̅ )𝟐 (𝒀𝒊 − 𝒀̂ )𝟐
100 -1 131.84 -31.84 1 1013.78
79 -22 124.69 -24.69 484 609.59
100 -1 160.8 -60.8 1 3696.64
129 28 122.31 -22.31 784 497.73
158 57 134.11 -34.11 3249 1163.49
106 5 125.3 -25.3 25 640.09
58 -43 127.39 -27.39 1849 750.21
78 -23 126.78 -26.78 529 717.16
𝑌𝑖 = 808 𝑌𝑖 − 𝑌̅ = 0 𝑌̂ 𝑌𝑖 − 𝑌̂ (𝑌𝑖 − 𝑌̅)2 (𝑌𝑖 − 𝑌̂)2
= 1053.22 = −253.22 = 6922 = 9088.69
6922 − 9088.69
∴ 𝑅2 =
6922
𝟐
∴ 𝑹 = −𝟎. 𝟑𝟏𝟑𝟎
2) Regression Coefficients:
1. Regression coefficient of Y on X:
𝑥 + 6𝑦 = 6
6𝑦 = −𝑥 + 6
1 1
𝑦=− 𝑥+
6 6
1
𝑦=−
6
1
∴ The regression coefficient of Y on X is 𝑏𝑦𝑥 = − 6
2. Regression coefficient of X on Y:
3𝑥 + 2𝑦 = 10
3𝑥 = 10 − 2𝑦
1
𝑥 = (10 − 2𝑦)
3
10 2
𝑥= − 𝑦
3 3
2
∴𝑥=−
3
2
∴ The regression coefficient of X on Y is 𝑏𝑥𝑦 = − 3
𝑟 = ±√𝑏𝑦𝑥 × 𝑏𝑥𝑦
1 2
= ±√− × −
6 3
1
= ±√
3
= ±0.3333
The Both regression coefficients are negative. So, take negative sign:
∴ 𝑟 = −0.3333
c) In a certain trivariate distribution: 𝒓𝟏𝟐 = 𝟎. 𝟕, 𝒓𝟐𝟑 = 𝟎. 𝟔, 𝒓𝟑𝟏 = 𝟎. 𝟔 find the partial
correlation coefficient 𝒓𝟏𝟐.𝟑. (Marks 5)
𝑟12− 𝑟13 𝑟23
𝑟12.3 =
√(1 − 𝑟13 2 )(1 − 𝑟23 2 )
(0.7) − (0.6) × (0.6)
=
√(1 − (0.6)2 )(1 − (0.6)2 )
0.7 − 0.36
=
√(1 − 0.36)(1 − 0.36)
0.34
=
√0.64 × 0.64
0.34
=
0.64
= 0.53125
∴ 𝑟12.3 = 0.53125
d) A survey conducted over the last 25 years indicated the in 10 years the winter was mild, in
8 years it was cold and int the remaining 7 years it was very cold. A company sells 1000
woollen coats in a mild year, 1300 in a cold year and 2000 in a very cold year. You are
required to find the yearly expected profit of the company if a woollen coat costs Rs. 1730
and it is sold to stores for Rs. 2480. (Marks 5)
1) Find expected profit of the company if a woollen coat profit costs Rs. 1730 and it is sold
to stores for Rs. 2480:
Profit = woollen coat cost – sold for stores cost
= 2480 – 1730
= 750
∴ 𝑃𝑟𝑜𝑓𝑖𝑡 = 750 𝑅𝑠.
2) Calculate total no. of coats sell in each year:
Mild Year: 10 × 1000 = 10000 𝑐𝑜𝑎𝑡𝑠.
Cold Year: 8 × 1300 = 10400 𝑐𝑜𝑎𝑡𝑠.
Very Cold Year: 7 × 2000 = 14000 𝑐𝑜𝑎𝑡𝑠.
3) Calculate Total Coats sells in 25 Years:
𝑇𝑜𝑡𝑎𝑙 𝐶𝑜𝑎𝑡𝑠: 10000 + 10400 + 14000 = 34400 𝐶𝑜𝑎𝑡𝑠
4) Calculate Total profit of the coats sell in 25 years:
𝑇𝑜𝑡𝑎𝑙 𝐶𝑜𝑠𝑡 = 34400 × 750 = 25800000 𝑅𝑠.
5) Calculate expected profit for each year:
𝑃𝑒𝑟 𝑌𝑒𝑎𝑟 𝑃𝑟𝑜𝑓𝑖𝑡 = 25800000 ÷ 25 = 1032000 𝑅𝑠.
∴ So, the Yearly expected profit of the company is 1032000 𝑅𝑠.
Q.2. Solve the Following: Marks 20
a) Define the term “Statistics” and discuss its use in business and trade. Also point out its
limitations. (Marks 10)
1) Statistics is defined as collection, compilation analysis and interpretation of numerical
data.
2) Statistic is a science of data.
3) Statistics helps in gathering information about the appropriate quantitative data.
4) It depicts the complex data in graphical form, tabular form and in diagrammatic
representation to understand it easily.
5) It provides the exact description and a better understanding.
6) Use of statistics in a business or trade:
Business:
1. Market Research
2. Demand Forecasting
3. Financial Analysis
4. Quality Control
5. Risk Management
6. Performance Measurement
Trade:
1. Market Analysis
2. Risk Assessment
3. Technical Analysis
4. Algorithmic trading
5. Performance Evaluation
7) Limitations of Statistics:
= 35 + 2.5
= 37.5
∴ 𝑀𝑒𝑑𝑖𝑎𝑛 = 37.5
Let’s, find Mean deviation from the median:
1
𝑀. 𝐷. 𝑀 = 𝑁 ∑8𝑖=1 𝑓𝑖 𝑥𝑖 − 𝑀
1
= × 8175
1000
= 8.175
∴ 𝑀. 𝐷. 𝑀 = 8.175
b) A survey of 370 students from Commerce faculty and 130 students from Science Faculty
revealed that 180 students were studying for only C.A. Examinations, 140 for only
Costing Examinations and 80 for both C.A. and Costing Examinations. The rest had
offered part-time Management Courses, of those studying for Costing only, 13 were girls
and 90 boys belonged to Commerce faculty. Out of 80 studying for both C.A. and Costing,
72 were from Commerce Faculty amongst which 70 were boys. Amongst those who
offered part-time Management Courses, 50 boys were from Science Faculty and 30 boys
and 10 girls form Commerce faculty. In all three were 110 boys in Science Faculty. Present
the above information in a tabular form. Find the number of students form Science
Faculty studying for part-time Management Courses. (Marks 10)
Commerce Science Total
Boys Girls Total Boys Girls Total Boys Girls Total
C.A 130 25 155 21 4 25 151 29 180
Costing 90 13 103 35 2 37 125 15 140
Both 70 2 72 4 4 8 74 6 80
Management 30 10 40 50 10 60 80 20 100
Total 320 50 370 110 20 130 430 70 500
Q.4. Solve the Following: Marks 20
a) A department store gives in-service training to its salesmen which is followed by a test.
It is considering whether it should terminate the service of any salesmen who does not
do well in the test. The following data give the test scores and sales made by nine
salesmen during a certain period:
Test 14 19 24 21 26 22 15 20 19
Scores
Sales 31 36 48 37 50 45 33 41 39
(“00
Rs.”)
Calculate the coefficient of correlation between the test scores and the sales. Does it
indicate that the termination of services of low-test scores is justified? If the firm wants
a minimum sales volume of Rs. 30,000, what is the minimum test score that will ensure
continuation of service? Also estimate that most probable sales volume of a salesmen
making a score of 28. (Marks 10)
∑ 𝑋 180
∴ 𝑋̅ = = = 20 ∴ 𝑋̅ = 20
𝑛 9
∑ 𝑌 360
∴ 𝑌̅ = = = 40 ∴ 𝑌̅ = 40
𝑛 9
𝑋 𝑌 𝑥 = 𝑋 − 20 𝑦 = 𝑌 − 40 𝑥2 𝑦2 𝑥𝑦
14 31 -6 -9 36 81 54
19 36 -1 -4 1 16 4
24 48 4 8 16 64 32
21 37 1 -3 1 9 -3
26 50 6 10 36 100 60
22 45 2 5 4 25 10
15 33 -5 -7 25 49 35
20 41 0 1 0 1 0
19 39 -1 -1 1 1 1
𝑋 𝑌 𝑥=0 𝑦=0 𝑥2 𝑦2 𝑥𝑦
= 180 = 360 = 120 = 346 = 193
1) Regression Coefficients:
1. Regression Coefficient X on Y:
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑏𝑥𝑦 =
𝑛 ∑ 𝑥 2 − ∑(𝑥)2
9 × 193 − 0 × 0
=
9 × 120 − 02
1737
=
1080
= 1.60
∴ 𝑏𝑥𝑦 = 1.60
2. Regression Coefficient Y on X:
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑏𝑦𝑥 =
𝑛 ∑ 𝑦 2 − ∑(𝑦)2
9 × 193 − 0 × 0
=
9 × 346 − 02
1737
=
3114
= 0.55
∴ 𝑏𝑦𝑥 = 0.55
2) Regression Equations:
1. Regression Equation X on Y:
∴ 𝑋 − 𝑋̅ = 𝑏𝑥𝑦 (𝑌 − 𝑌̅)
𝑋 − 20 = 1.60(𝑌 − 40)
𝑋 − 20 = 1.60𝑌 − 64
𝑋 = 1.60𝑌 + 44
∴ The regression equation of X on Y is 𝑋 = 1.60𝑌 + 44
2. Regression Equation Y on X:
∴ 𝑌 − 𝑌̅ = 𝑏𝑦𝑥 (𝑋 − 𝑋̅)
𝑌 − 40 = 0.55(𝑋 − 20)
𝑌 − 40 = 0.55𝑋 − 11
𝑌 = 0.55𝑋 + 29
∴ The regression equation of Y on X is 𝑌 = 0.55𝑋 + 29
3) Coefficient Correlation:
∴ 𝑟 = ±√𝑏𝑥𝑦 × 𝑏𝑦𝑥
= ±√1.60 × 0.55
= ±√0.88
= ±0.93
Both regression equation is positive. So, take positive sign:
∴ 𝑟 = 0.93
∴ Hence, the termination of services for low test scores is justified.
5) When 𝑿 = 𝟐𝟖, 𝒀 =?
𝑌 = 0.55𝑋 + 29
𝑌 = 0.55(28) + 29
∴ 𝑌 = 44.4
b) Define a random variable and its mathematical expectation. (Marks 10)
1) The various outcomes of a random experiment is denoted with the help of a variable
which is called a random variable.
2) For example: In case of throwing a die, we may use a variable X for representing the
outcome of the throw. Thus, X will take the values 1, 2, 3, 4, 5 and 6.
3) But in some cases, the outcomes may be qualitative e.g. tossing of a coin which may
be head or tail, the colours of balls drawn from an urn may be red, yellow, white etc.
4) But for mathematical convenience the qualitative outcomes may be expressed in
quantitative forms. For example, in tossing of a coin we may denote the outcome
‘Head’ by 1 and ‘Tail’ by 0.
5) In this way each outcome of a random experiment, whether it is qualitative or
quantitative, can be expressed by a real number.
6) There are two types of random variables:
(a) Discrete random variable
(b) (b) Continuous random variable
7) Mathematical Expectation:
Let us consider a discrete random variable 𝑋 which assumes the values 𝑥1 , 𝑥2 , . . . , 𝑥𝑛
with respective probabilities 𝑝1 , 𝑝2 , … , 𝑝𝑛 , such that ∑ 𝑝𝑖 = 1, then the
mathematical expectation of the random variable 𝑋 is given by the sum of the
products of the different values of 𝑋 with their corresponding probabilities. The
expectation of a random variable is generally denoted by 𝐸(𝑋).
Thus, 𝐸(𝑋) = ∑𝑛𝑖=1 𝑥𝑖 × 𝑃 (𝑋 = 𝑥𝑖 ) = ∑𝑛𝑖=1 𝑝𝑖 𝑥𝑖 provided the series is convergent
and ∑ 𝑝𝑖 = 1.
In case the discrete random variable takes countably infinite number of values then
we have
𝑛 𝑛
𝐸(𝑋) = ∑ 𝑥𝑖 × 𝑃 (𝑋 = 𝑥𝑖 ) = ∑ 𝑝𝑖 𝑥𝑖
𝑖=1 𝑖=1
If 𝑋 is a continuous random variable with probability density function 𝑓(𝑥), <
𝑥 < Then the mathematical expectation of the random variable 𝑋 is given by
∞ ∞
𝐸(𝑋) = ∫−∞ 𝑥 𝑓 (𝑥)𝑑𝑥 provided ∫−∞ 𝑓 (𝑥)𝑑𝑥 = 1
The expectation of the random variable 𝑋 serves as the measure of central tendency
of the probability distribution of 𝑋.
4. In Normal Distribution:
3) Standard Normal Distribution: When the mean in a Normal Distribution is 0 and the
Standard Deviation is 1, then the Normal Distribution is called a Standard Normal
Distribution.
4) Types of Skewness:
Positive Skewness:
1. In positive skewness, the extreme data values are larger, which in turn
increases the mean value of the data set.
2. In Positive Skewness:
Mean > Median > Mode
Negative Skewness:
1. In negative skewness, the extreme data values are smaller, which decreases
the mean value of the dataset.
2. In Negative Skewness:
Mean < Median < Mode
5) Unlike the Normal Distribution (mean = median = mode), in positive and negative
skewness, the mean, median, and mode are all different.
ii) Unbiasedness
1) The most efficient estimator is considered the one which has the least unbiased
and consistent variance among all the estimators considered.
2) The variance considers how dispersed the estimator is from the estimate.
3) The smallest variance should deviate the least when different samples are brought
into place.
4) But, of course, this also depends on the distribution of the population.
5) Example:
i. Let's say you're estimating the average score of students in a class.
ii. You take multiple random samples and calculate the mean score for each
sample.
iii. If, on average, these sample means are equal to the true average score of the
entire class, then the estimator is unbiased.
iv. This means that sometimes the estimate might be higher than the true value,
and sometimes it might be lower, but on average, it hits the mark.
Q.2. Perform simple linear regression, Determine slope and intercept. (Marks 10)
X 1 2 3 3 4 5
y 8 4 5 2 2 0
𝒙 𝒚 𝒙𝟐 𝒚𝟐 𝒙𝒚
1 8 1 64 8
2 4 4 16 8
3 5 9 25 15
3 2 9 4 6
4 2 16 4 8
5 0 25 0 0
𝒙 = 𝟏𝟖 𝒚 = 𝟐𝟏 𝟐 𝟐 𝒙𝒚 = 𝟒𝟓
𝒙 = 𝟔𝟒 𝒚 = 𝟏𝟏𝟑
∴ Slope is 𝑏 = −1.66.
∑ 𝒚 ∑ 𝒙𝟐 − ∑ 𝒙 ∑ 𝒙𝒚
𝒂(𝒊𝒏𝒕𝒆𝒓𝒄𝒆𝒑𝒕) =
𝒏(∑ 𝒙𝟐 ) − (∑ 𝒙)𝟐
21 × 64 − 18 × 45
=
6 × 64 − 182
1344 − 810
=
384 − 324
534
=
60
= 8.9
∴ An intercept is 𝑎 = 8.9
𝒚 = 𝒂 + 𝒃𝒙
∴ Slope 𝑏 = −1.8 and intercept 𝑎 = 8.9
∴ 𝑦 = 8.9 − 1.8𝑥
Q.3. What do you mean by a questionnaire? What is the difference between a questionnaire and
a schedule? State the essential points to be remembered in drafting a questionnaire.
1) A questionnaire is a list of questions that ask respondents about themselves or others.
Q.6. The manufacturer of a certain make of electric bulbs claims that his bulbs have mean life of
25 months with standard deviation of 5 months. A random sample of 6 bulbs gave the
following value:
Life of bulbs in months: 24, 26, 30, 20, 20, 18
Is the manufacturer’s claim valid at 1% level of significance? (Given that the table values of
the appropriate test statistics at said level are 4.032, 3.707, and 3.499 for 5, 6 and 7 degrees
of freedom respectively). (Marks 10)
̅
∴ 𝑛 = 6, 𝜇 = 25, 𝜎 = 5, 𝑋 =?
∑ 𝑋 24 + 26 + 30 + 20 + 20 + 18 138
𝑋̅ = = = = 23 ∴ 𝑋̅ = 23
𝑛 6 6
Life of bulbs in months (𝑿) 𝒙 = 𝑿 − 𝟐𝟑 𝒙𝟐
24 1 1
26 3 9
30 7 49
20 -3 9
20 -3 9
18 -5 25
𝑋 = 138 𝑥=0 𝑥 2 = 102
∑ 𝒙𝟐
𝑺=√
𝒏
102
=√
6
∴ 𝑆 = 4.12
1. Hypothesis:
𝐻0 ∶ Null Hypothesis
𝜇 = 25
Checks the manufacturer’s claim is valid.
𝐻1 ∶ Alternative Hypothesis
𝜇 ≠ 25
Checks the manufacturer’s claim is not valid.
2. Computation of test statistics:
𝑺
̅) =
𝑺 ∙ 𝑬(𝑿
√𝒏 − 𝟏
4.12
=
√6 − 1
4.12
=
√5
= 1.84
̅−𝝁
𝑿
|𝒕| =
̅)
𝑺 ∙ 𝑬(𝑿
23 − 25
=
1.84
|𝑡| = −1.08
∴ 𝑡 = 1.08
3. Level of significance:
𝛼 = 0.01
4. Critical Value:
𝑡𝛼 at 1% Level of significance or degrees of freedom 𝑛 = 𝑛 − 1 = 6 − 1 = 5 is 4.032
𝑡𝛼 = 4.032
5. Decision:
∴ 𝑡 < 𝑡𝛼
∴ 1.08 < 4.032
∴ Null hypothesis is accepted and Alternative hypothesis is rejected.
∴ The manufacturer’s claims is Valid.
Extra Questions:
Q.1. A random sample of 900 items is taken from a normal population who’s the mean and
variance are 4. Can the sample with mean 4.5 be regarded as truly random one at 1% level
of significance? (Table value at 1% is 2.58). (Marks 5)
̅ 2
∴ 𝑛 = 900, 𝑋 = 4.5, 𝜇 = 4, 𝑆 = 4, 𝐿𝑂𝑆 = 1%
∴ 𝜎(𝑆) = 𝑆 2 = √4 = 2 ∴𝜎=2
1. Hypothesis:
𝐻0 ∶ Null Hypothesis
𝜇=4
𝐻1 ∶ Alternative Hypothesis
𝜇≠ 4
2. Test statistics:
𝝈
𝑺 ∙ 𝑬(𝑿̅) =
√𝒏
2
=
√900
2
=
30
= 0.06
𝑿̅ −𝝁
| 𝒛| =
𝑺 ∙ 𝑬(𝑿̅)
4.5 − 4
=
0.063
|𝑧| = 7.936
∴ 𝑧 = 7.936
3. Level of significance:
𝛼 = 0.01
4. Critical Value:
𝑍𝛼 at 1% Level of significance is 2.58
𝑍𝛼 = 2.58
5. Decision:
∴ 𝑧 > 𝑧𝛼
∴ 7.936 > 2.58
∴ Null hypothesis is rejected and Alternative hypothesis is accepted.
Q.2. The height of 10 children selected at random from a given locality had a mean 63.2cms and
variance 6.25cms. Test at 5% level of significance the hypothesis that the children of the
given locality are on the average less than 65cms in all. Given for 9 degrees of freedom (𝒕 >
𝟏. 𝟖𝟑) = 𝟎. 𝟓 (Marks 5)
̅ 2
∴ 𝑛 = 10, 𝑋 = 63.5, 𝜇 = 65, 𝑠 = 6.25, 𝐿𝑂𝑆 = 5%
∴ 𝑠 = 𝑠 2 = √6.25 = 2.5 ∴ 𝑠 = 2.5
1. Hypothesis:
𝐻0 ∶ Null Hypothesis
𝜇 ≥ 65
𝐻1 ∶ Alternative Hypothesis
𝜇 < 65
2. Test statistics:
𝒔
̅) =
𝑺 ∙ 𝑬(𝑿
√𝒏 − 𝟏
2.5
=
√10 − 1
2.5
=
3
= 0.833
̅−𝝁
𝑿
|𝒕| =
𝑺 ∙ 𝑬(𝑿̅)
63.2 − 65
=
0.833
|𝑡| = −2.16
∴ 𝑡 = 2.16
3. Level of significance:
𝛼 = 0.05
4. Critical Value:
𝑡𝛼 at 5% Level of significance or degrees of freedom 𝑛 = 𝑛 − 1 = 10 − 1 = 9 is 1.83
𝑡𝛼 = 1.83
5. Decision:
∴ 𝑡 > 𝑡𝛼
∴ 2.16 > 1.83
∴ Null hypothesis is rejected and Alternative hypothesis is accepted.
Q.3. Find 𝒚 when 𝒙𝟏 = 𝟑𝟕𝟎𝟎 kg and 𝒙𝟐 = 𝟐𝟔𝟎 km from least square regression equation of 𝒚 in
𝒙𝟏 and 𝒙𝟐 for the following:
𝒀 160 112 69 90 123 186
𝒙𝟏 4.0 2.0 1.6 1.2 3.4 4.8
(1000 kg)
𝒙𝟐 1.5 2.2 1.0 2.0 0.8 1.6
(100 km)
∴ ∑ 𝑌 = 740, ∑ 𝑥1 = 17, ∑ 𝑥2 = 9.1, 𝑛 = 6
Let, solve:
∴ ∑ 𝒀 = 𝒏𝒂 + 𝒃𝟏 ∑ 𝒙𝟏 + 𝒃𝟐 ∑ 𝒙𝟐 ………….eq(1)
𝟐
∴ ∑ 𝒙 𝟏 𝒚 = 𝒂 ∑ 𝒙 𝟏 + 𝒃𝟏 ∑ 𝒙 𝟏 + 𝒃𝟐 ∑ 𝒙 𝟏 𝒙 𝟐 ………….eq(2)
𝟐
∴ ∑ 𝒙𝟐 𝒚 = 𝒂 ∑ 𝒙𝟐 + 𝒃𝟏 ∑ 𝒙𝟏 𝒙𝟐 + 𝒃𝟐 ∑ 𝒙𝟐 ………….eq(3)
𝑌 𝑥1 𝑥2 2 2 𝑥1 𝑦 𝑥2 𝑦 𝑥1 𝑥2
𝑥1 𝑥2
160 4 1.5 16 2.25 640 240 6
112 2 2.2 4 4.84 224 246.4 4.4
69 1.6 1 2.56 1 110.4 69 1.6
90 1.2 2 1.44 4 108 180 2.4
123 3.4 0.8 11.56 0.64 418.2 98.4 2.72
186 4.8 1.6 23.04 2.56 892.8 297.6 7.68
𝑌 𝑥1 = 17 𝑥2 𝑥1 2 𝑥2 2 𝑥1 𝑦 𝑥2 𝑦 𝑥1 𝑥2
= 740 = 9.1 = 58.6 = 15.29 = 2393.4 = 1131.4 = 24.8
Q.4.