0% found this document useful (1 vote)
344 views46 pages

QA All Solved Question Paper

The document discusses quantitative analysis and contains four questions. Question 1 defines statistics and lists its limitations. Question 2 explains sampling and its purpose. Question 3 discusses the differences between regression analysis and correlation. Question 4 proves that sample variance is an unbiased estimator of population variance and provides an example.

Uploaded by

Shivam Ghodekar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (1 vote)
344 views46 pages

QA All Solved Question Paper

The document discusses quantitative analysis and contains four questions. Question 1 defines statistics and lists its limitations. Question 2 explains sampling and its purpose. Question 3 discusses the differences between regression analysis and correlation. Question 4 proves that sample variance is an unbiased estimator of population variance and provides an example.

Uploaded by

Shivam Ghodekar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

Subject: Quantitative Analysis

May 2023
1. Answer the following. Marks 20
Q.1. Define Statistics and list the limitations of statistics. (Marks 5)
1) Statistics is defined as collection, compilation analysis and interpretation of numerical
data.
2) Statistic is a science of data.

3) Statistics helps in gathering information about the appropriate quantitative data.


4) It depicts the complex data in graphical form, tabular form and in diagrammatic
representation to understand it easily
5) It provides the exact description and a better understanding
6) Limitations of Statistics:

 For quantitative results, statistical approaches are best applicable.


 On heterogeneous data, statistics cannot be implemented.
 In gathering, analyzing, and interpreting the data, if adequate care is not taken,
statistical findings can be misleading.
 Statistical data can be treated effectively only by a person who has professional
knowledge of statistics.
 In statistical judgments, certain errors are possible.
 Inferential statistics, in particular, include such errors.

Q.2. Explain sampling and purpose of sampling. (Marks 5)


1) It is a statistical tool in which a fixed no. of observations is taken from a larger population.
2) The behaviour or characteristics of the subset is used to estimate the characteristics of the
entire population.
3) Types of Sampling:
1. Random Sampling: it is a kind of sampling in which every item in the population has
an equal probability of being picked.
2. Block Sampling: Takes a consecutive series of items within the population to use as
the sample.
3. Judgement Sampling: an auditor’s judgement may be used to select the sample out
of the population.
4. Systematic Sampling: begins at a random sampling point within the population itself
and this kind of sampling uses a fixed, periodic interval to select items for a sample.
4) Purpose of Sampling:
a) It is to provide information about the statistical information regarding the whole by
examining just a few units.
b) It reduces the time, effort and cost involved.
c) It allows for minimisation of the loss caused in case of any mishap or failure.
d) Scientific, observable method of testing a hypothesis.
e) There is a greater scope for flexibility and probability.
Q.3. What is regression analysis? How does it differ from correlation. (Marks 5)
1) Regression analysis is a set of statistical methods used for the estimation of
relationships between a dependent variable and one or more independent variables.
2) It can be utilized to assess the strength of the relationship between variables and for
modelling the future relationship between them.
3) Regression analysis has three types:
1. Linear Regression
2. Multiple Linear Regression
3. Non-linear Regression

Parameter Regression Analysis Correlation Analysis


1) Purpose Measures strength and Predicts and models the
direction of the relationship.
relationship.
2) Variables Two variables (Equal roles).
Independent and dependant
variables.
3) Calculation Correlation Coefficient(r) Regression Equation
(y = mx + b)
4) Direction +1 and -1 (Positive, Positive, Negative (Strength
Negative, No Correlation) and direction)
5) Causation It does not imply causation. Can imply causation under
Controlled conditions.
6) Data Single Coefficient. The equation representing
Representation the relationship.
7) Hypothesis Limited to correlation Tests coefficient’s
Testing significance importance in the model.

Q.4. Show the sample variance (𝑺𝟐 ) is an unbiased estimator of population variance (𝝈𝟐 ).
Also illustrate with an example. (Marks 5)
( 2) 2 2
𝐸 𝑋 = 𝜎 +𝜇
𝜎2
𝐸(𝑋̅ 2 ) = + 𝜇2
𝑛
𝐸[(𝑋𝑖 − 𝑋̅)2 ] = 𝐸(𝑋𝑖 2 ) − 𝑛𝐸(𝑋̅ 2 )
𝜎2
= ∑(𝜎 2 + 𝜇2 ) − ( + 𝜇2 )
𝑛
= 𝑛𝜎 2 + 𝑛𝜇2 − 𝜎 2 + 𝑛𝜇2
= 𝑛𝜎 2 − 𝜎 2
= (𝑛 − 1)𝜎 2
(𝑋 −𝑋̅ )2
Let’s prove 𝐸(𝑆 2 ) = 𝐸 [ (𝑛−1)
𝑖
] = 𝜎2
(𝑋𝑖 − 𝑋̅)2
𝐸(𝑆 2 ) = 𝐸 [ ]
𝑛−1
1
= 𝐸 [∑(𝑋𝑖 − 𝑋̅)2 ]
𝑛−1
1
= (𝑛 − 1)𝜎 2
𝑛−1
= 𝜎2
∴ Hence proved.
Example: Suppose we have a population of 5 individuals with ages: 20, 35, 45, 50, 55
Take sample size first 3.
1
∴ 𝑆2 = ∑(𝑋𝑖 − 𝑋̅)2
𝑛−1
∑ 𝑋 100
∴ 𝑛 = 3, 𝑋̅ = = = 33.33
𝑛 3
(20 − 33.33)2 + (35 − 33.33)2 + (45 − 33.33)2
∴ 𝑆2 =
3−1
177.68 + 2.78 + 136.18
=
2
316.64
=
2
∴ 𝑆 2 = 158.32

Take all Samples,


1
∴ 𝜎 2 = ∑(𝑋𝑖 − 𝜇)2
𝑁
∑ 𝜇 205
∴ 𝑁 = 5, 𝜇 = = = 41
𝑛 5
(20 − 41)2 + (35 − 41)2 + (45 − 41)2 + (50 − 41)2 + (55 − 41)2
∴ 𝜎2 =
5
441 + 36 + 16 + 81 + 196
=
5
770
=
5
∴ 𝜎 2 = 154
𝐸(𝑆 2 ) = 𝜎 2
𝐸(158.32) = 154
158.32 = 154
Therefore, 𝑆 2 is an unbiased estimator of 𝜎 2 .

2. Solve the Following: Marks 20


Q.1. In a laboratory experiment on correlation research study, the equations to the two
regression lines were found to be 𝟐𝒙 − 𝒚 + 𝟏 = 𝟎 and 𝟑𝒙 − 𝟐𝒚 + 𝟕 = 𝟎. Find the mean
of x and y. Also work out the values of regression coefficients and correlation coefficient
between the two variables x and y. (Marks 10)
Solving the two regression equations we get mean values of X and Y:
2𝑥 − 𝑦 = −1 …………………eq(1)
3𝑥 − 2𝑦 = −7 …………………eq(2)

Solving equation 1 and 2,


We get, 𝑥 = 5, 𝑦 = 11.

∴ Regression Line is passed throughs means 𝑋̅ = 5 and 𝑌̅ = 11.


The regression equation Y on X is 3𝑥 − 2𝑦 = −7.
2𝑦 = 3𝑥 + 7
1
𝑦 = (3𝑥 + 7)
2
3 7
𝑦= 𝑥+
2 2
3
∴ 𝑏𝑦𝑥 = (> 1)
2

The regression equation X on Y is 2𝑥 − 𝑦 = −1.


2𝑥 = 𝑦 − 1
1
𝑥 = 2 (𝑦 − 1)
1 1
𝑥 = 2𝑦− 2
1
∴ 𝑏𝑥𝑦 = 2

The regression coefficients are positive.


1 3
𝑟 = ±√𝑏𝑥𝑦 ∙ 𝑏𝑦𝑥 = ±√ ×
2 2

1 3
=√ ×
2 2

3
=√
4
= 0.8660
𝑟 = 0.8660

Q.2. The frequency distribution of scores obtained by 250 candidates in an entrance test is as
follows. Draw a less than and more than frequency curve(ogive) to represent the given
data. also, what is the significance of the point of intersection of the two ogive curves?
(Marks 10)
Scores Number of candidates
400 – 450 25
450 – 500 30
500 – 550 45
550 – 600 37
600 – 650 30
650 – 700 33
700 – 750 15
750 – 800 35

We make a less than or more than cumulative frequency table:


Scores Number of candidates Less than C.F More than C.F
400 – 450 25 25 225 + 25 = 250
450 – 500 30 25 + 30 = 55 195 + 30 = 225
500 – 550 45 55 + 45 = 100 150 + 45 = 195
550 – 600 37 100 + 37 = 137 113 + 37 = 150
600 – 650 30 137 + 30 = 167 83 + 30 = 113
650 – 700 33 167 + 33 = 200 50 + 33 = 83
700 – 750 15 200 + 15 = 215 35 + 15 = 50
750 – 800 35 215 + 35 = 250 35

Curves:

∴ The significance of the point of intersection of the two ogive curves is approximately
580.
3. Solve the Following: Marks 20
Q.1. The following table gives the age of cars of a certain make and annual maintenance
costs. Obtain the regression equation for maintenance costs, taking age of the car as the
independent variable. Also, find the maintenance cost for age of the car = 5 years.
Age of Cars Maintenance Cost
(in Years) (In thousands of rupees)
2 10
4 20
6 25
8 30
(Marks 10)

Regression equation is 𝑦 = 𝑏𝑥 + 𝑎, where x is an age of cars and y is maintenance cost.


∑𝑋 20 ∑ 𝑌 85
𝑋̅ = = = 5, 𝑌̅ = = = 21.25, 𝑛 = 4
𝑛 4 𝑛 4
∑ 𝑋𝑌 − 𝑛𝑋̅𝑌̅
𝐹𝑖𝑛𝑑 𝑠𝑙𝑜𝑝𝑒, 𝑏=
∑ 𝑋 2 − 𝑛𝑋 2

𝑿 𝒀 𝑿𝟐 𝑿𝒀
2 10 4 20
4 20 16 80
6 25 36 150
8 30 64 240
∑ 𝑋 = 20 ∑ 𝑌 = 85 ∑ 𝑋 2 = 120 ∑ 𝑋𝑌 = 490

490 − 4 × 5 × 21.25
∴ 𝑏=
120 − 4 × 52
65
=
20
= 3.25

∴ 𝑦 = 𝑎 + 𝑏𝑥

∴ 𝑦 = 5 + 3.25𝑥 ………. 𝑎 = 𝑌̅ − 𝑏𝑋̅

Find the maintenance cost for 5 years old car:

Given, 𝑥 = 5

∴ 𝑦 = 5 + 3.25 × 5 = 21.25(In hundreds) = 21.25 × 100 = 2125Rs.

∴ The maintenance cost for 5 years old car is: 2125 RS.
Q.2. Explain with illustration the concept of Point Estimation. (Marks 10)
Sample Population
Size 𝑛 𝑁
Mean 𝑋̅ 𝜇
Standard 𝑆. 𝐷 𝜎
Deviation
Variance 𝑆 2 𝑜𝑟 𝑠 2 𝜎2
Capital S for biased
Small s for unbiased
Proposition P sampling 𝜋
𝑺𝒙 (Error) 𝑆 −
√𝑛

1) Point estimators are defined as functions that can be used to find the approximate
value of a particular point from a given population parameter.
2) In point estimation, we find out the statistic which may use for replace an unknown
parameter for all practical purpose.
3) A good estimator is one which is as close to true value the parameter as possible.
4) The sample data of a population is used to find a point estimate or a statistic that can
act as the best estimate of an unknown parameter that is given for a population.
5) The maximum likelihood method is a popularly used way to calculate point estimators.
This method uses differential calculus to understand the probability function from a
given number of sample parameters.
6) Following are the four characteristics of point estimation:
1. Unbiasedness
2. Consistency
3. Efficiency
4. Sufficiency

4. Solve the Following Marks 20


Q.1. Following is the data about the weights in Kgs of 10 Shipments(X 1), the distances they
were moved(X2) and the damage that was incurred (Y).
Shipment Damage Weights in Kgs Distance moved in Km
(thousands of RS) (X1) (X2)
(Y)
1 12 17 10
2 15 15 6
3 14 15 10
4 19 10 21
5 8 13 8
6 16 15 13
7 15 11 9
8 25 6 25
9 10 15 10
10 11 7 8
i. Fit the regression 𝒀̂ = 𝒂 + 𝒃𝟏 𝑿 𝟏 + 𝒃𝟐 𝑿 𝟐 (Marks 5)
ii. Find the coefficient of multiple determination (R 2). (Marks 2)
iii. Also test the significance of regression (Given the appropriate Table value,
F = 9.55, for a Significance level of 𝜶 = 𝟎. 𝟎𝟏) (Marks 3)

∴ ∑ 𝑌 = 145 ∴ ∑ 𝑋1 = 124 ∴ ∑ 𝑋2 = 112 ∴ 𝑛 = 10

i) ∑ 𝑌 = 𝑛𝑎 + 𝑏1 ∑ 𝑋1 + 𝑏2 ∑ 𝑋2 …………………… eq(1)
2
∑ 𝑋1 𝑌 = 𝑎 ∑ 𝑋1 + 𝑏1 ∑ 𝑋1 + 𝑏2 ∑ 𝑋1 𝑋2 …………………… eq(2)
∑ 𝑋2 𝑌 = 𝑎 ∑ 𝑋2 + 𝑏1 ∑ 𝑋1 𝑋2 + 𝑏2 ∑ 𝑋2 2 …………………… eq(3)
𝒀 𝑿𝟏 𝑿𝟐
𝑿𝟏 𝒀 𝑿𝟐 𝒀 𝑿𝟏 𝟐 𝑿𝟐 𝟐 𝑿𝟏 𝑿𝟐
12 17 10
204 120 289 100 170
15 15 6
225 90 225 36 90
14 15 10
210 140 225 100 150
19 10 21
190 399 100 441 210
8 13 8
104 64 169 64 104
16 15 13
240 208 225 169 195
15 11 9
165 135 121 81 99
25 6 25
150 625 36 625 150
10 15 10
150 100 225 100 150
11 7 877 88 49 64 56
𝑌 𝑋1 𝑋2 𝑋1 𝑌 𝑋2 𝑌 𝑋1 2 𝑋2 2 𝑋1 𝑋2
= 145 = 124 = 112 = 1715 = 1969 = 1664 = 1780 = 1374
145 = 10𝑎 + 124𝑏1 + 112𝑏2 …………………… eq(1)
1715 = 124𝑎 + 1664𝑏1 + 1374𝑏2 …………………… eq(2)
1969 = 112𝑎 + 1374𝑏1 + 1780𝑏2 …………………… eq(3)
∴ 𝑎 = 14
∴ 𝑏1 = −0.5817
∴ 𝑏2 = 0.6400
̂ = 𝟏𝟒 − 𝟎. 𝟓𝟖𝑿𝟏 + 𝟎. 𝟔𝟒𝑿𝟐
𝒀

∑(𝑌𝑖−𝑌̅ )2−∑(𝑌𝑖−𝑌̂ )2
ii) 𝑅2 = ∑(𝑌𝑖−𝑌̅ )2
∑ 𝑌 145
𝑌̅ = = = 14.5
𝑛 10
𝒀𝒊 𝒀𝒊 − 𝒀 ̅ ̂
𝒀 𝒀𝒊 − 𝒀 ̂ (𝒀𝒊 − 𝒀̅ )𝟐 (𝒀𝒊 − 𝒀̂ )𝟐
12 -2.5 10.54 1.46 6.25 2.1316
15 0.5 9.14 5.86 0.25 34.3396
14 -0.5 11.7 2.3 0.25 136.89
19 4.5 21.64 -2.64 20.25 468.2896
8 -6.5 11.58 -3.58 42.25 134.0964
16 1.5 13.62 2.38 2.25 185.5044
15 0.5 13.38 1.62 0.25 179.0244
25 10.5 26.52 -1.52 110.25 703.3104
10 -4.5 11.7 -1.7 20.25 136.89
11 -3.5 15.06 -4.06 12.25 226.8036
𝑌𝑖 = 145 𝑌𝑖 − 𝑌̅ = 0 𝑌̂ 𝑌𝑖 − 𝑌̂ (𝑌𝑖 − 𝑌̅)2 (𝑌𝑖 − 𝑌̂)2
= 144.88 = 0.12 = 214.5 = 2207.28
214.5 − 2207.28
𝑅2 =
214.5
𝑹𝟐 = −𝟗. 𝟐𝟗

iii) 𝐹𝛼 𝑎𝑡 1% 𝑙𝑒𝑣𝑒𝑙 𝑜𝑓 𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑐𝑒 𝑖𝑠 9.55.


𝐹𝛼 = 9.55
∑(𝑌𝑖 − 𝑌̂ )2
𝑝
𝐹=
∑(𝑌𝑖 − 𝑌̅ )2
𝑛−𝑝−1
Where, P is an independent variable(b’s).
𝑝 = 2.
2207.28
∴𝐹= 2
214.5
10 − 2 − 1
1103.64
=
30.64
= 36.01
∴ 𝐹 = 36.01
∴ 𝐹 > 𝐹𝛼 , 36.01 > 9.55 then regression model is significant.

Q.2. Explain Primary data and Secondary data in detail. (Marks 10)
Parameter Primary Data Secondary Data
1. Meaning Data collected by researcher Data collected by other people.
itself.
2. Originality Original and Unique Not original and unique
Information. information.
3. Adjustment Does not need adjustment, is Need adjustment to suit actual
focused. aim.
4. Sources Observations, Surveys, Internal Records, govt. published
Experiment. Data, etc.
5. Type of Data Qualitative Data Quantitative Data
6. Methods Observation, experiment, Desk research method, searching
interview, etc. online, etc.
7. Reliability More reliable Less reliable
8. Capability More capable to solve a Less capable to solve a problem.
problem.
9. Time More time consuming Less time consuming
consumed
10. Cost- Costly Economical
effectiveness
11. Suitability More suitable May or may not be suitable
12. Need of Needs team of trained Does not need of team
Investigators Investigators. Investigators.
13. Collected Secondary data is inadequate. Before primary data is collected.
when
5. Solve the Following Marks 20
Q.1. Given 𝒓𝟏𝟐 = 𝟎. 𝟕, 𝒓𝟏𝟑 = 𝟎. 𝟔𝟏 and 𝒓𝟐𝟑 = 𝟎. 𝟒. (Marks 10)
Compute:
i. 𝒓𝟐𝟑.𝟏
ii. 𝒓𝟏𝟑.𝟐
iii. 𝒓𝟏𝟐.𝟑
𝟏) 𝒓𝟐𝟑.𝟏
𝑟23 − 𝑟21 𝑟31
𝑟23.1 =
√(1 − 𝑟21 2 )(1 − 𝑟31 2 )
(0.4) − (0.7) × (0.61)
=
√(1 − (0.7)2 ) × (1 − (0.61)2 )
0.4 − 0.427
=
√(1 − 0.49) × (1 − 0.3721)
−0.027
=
√0.51 × 0.6279
−0.027
=
0.565
∴ 𝑟23.1 = −0.047

𝟐) 𝒓𝟏𝟑.𝟐
𝑟13 − 𝑟12 𝑟32
𝑟13.2 =
√(1 − 𝑟12 2 )(1 − 𝑟32 2 )
(0.61) − (0.7) × (0.4)
=
√(1 − (0.7)2 ) × (1 − (0.4)2 )
0.61 − 0.28
=
√(1 − 0.49) × (1 − 0.16)
0.33
=
√0.51 × 0.84
0.33
=
0.65
∴ 𝑟13.2 = 0.50

𝟑) 𝒓𝟏𝟐.𝟑
𝑟12 − 𝑟13 𝑟23
𝑟12.3 =
√(1 − 𝑟13 2 )(1 − 𝑟23 2 )
(0.7) − (0.61) × (0.4)
=
√(1 − (0.61)2 ) × (1 − (0.4)2 )
0.7 − 0.244
=
√(1 − 0.3721) × (1 − 0.16)
0.456
=
√0.6279 × 0.84
0.456
=
0.726
∴ 𝑟12.3 = 0.628
Q.2. Differentiate between the following pair of concepts: (Marks 10)
i. Critical Region and Region of acceptance (Marks 5)
Sr. no. Critical Region Region of acceptance
1. Represents the range of values Represents the range of values of the
of the test statistic where the test statistic where the null hypothesis
null hypothesis is rejected. is not rejected.
2. Also known as the rejection Also known as the non-rejection
region. region.

3. Determined based on the Comprises all values not included in


chosen significance level (alpha) the critical region.
of the test.
4. Contains extreme outcomes that Indicates that the data does not
provide evidence against the null provide sufficient evidence to reject
hypothesis. the null hypothesis.

5. If the test statistic falls within If the test statistic falls within this
this region, the null hypothesis is region, the null hypothesis is retained.
rejected.
6. Its size is directly related to the Its size is complementary to the size of
chosen significance level. the critical region.

7. Helps in identifying statistically Helps identify cases where there is


significant results. insufficient evidence to reject the null
hypothesis.
8. Typically depicted in the tail(s) of Typically represents the bulk of the
the sampling distribution. sampling distribution.

ii. Null Hypothesis and Alternative Hypothesis (Marks 5)


Sr. Null Hypothesis Alternative Hypothesis
No.
1) A Null Hypothesis is a statement in An Alternative Hypothesis is a
which there is no relation between statement in which there is some
two variables. statistical relation between two
variables.
2) Researcher try to reject or disprove Researcher try to accept or prove it.
it.
3) Indirect and Implicit. Direct and Explicit.
4) P - value: If P value is less than 𝛼 P - value: If P value is less than 𝛼
value, then Null hypothesis is value, then Alternative hypothesis is
rejected. accepted.
|𝑍𝑝 | < |𝑍𝑡 | |𝑍𝑝 | < |𝑍𝛼 |
5) Null Hypothesis is denoted by 𝐻0 .
Alternative Hypothesis is denoted by
𝐻1 .
6) Symbols: Equality Symbols: Inequality
=, >=, <= !=, >, <
7) Size of sample is 𝑛 ≥ 30. Large Size of sample is 𝑛 < 30. Small
sample. sample.
8) Z - test T – test
6. Write short note on Marks 20
i. Pie chart and its advantages and Disadvantages (Marks 5)
1) Pie Chart is a pictorial representation of the data.
2) It uses a circle to represent the data and is hence also called a Circle Graph.
3) In a Pie Chart, we present the data by dividing the whole circle into smaller slices or
sectors, and each slice or sector represents specific data.
4) Advantages:
1. Pie chart is easily understood and comprehended.
2. Visual representation of data in a pie chart is done as a fractional part of a
whole.
3. Pie chart provides an effective mode of communication to all types of audiences.
4. Pie chart provides a better comparison of data for the audience.
5) Disadvantages:
1. In the case of too much data, this presentation becomes less effective using a
pie chart.
2. For multiple data sets, we need a series to compare them.
3. For analyzing and assimilating the data in a pie chart, it is difficult for readers to
comprehend.

ii. Method of moments (Marks 5)


1) The method of moments is a technique for estimating the parameters of a statistical
model.
2) It works by finding values of the parameters that result in a match between the sample
moments and the population moments.
3) The advantage of method of moment is that it is quite easy to use.
4) however, the quality of the result from method of moment is not very good.
5) Suppose a random variable X has density 𝑓(𝑥|𝜃), and this should be understood as
point mass function when the random variable is discrete.
The 𝑘 − 𝑡ℎ theoretical moment of this random variable is defined as
𝜇𝑘 = 𝐸(𝑋 𝑘 ) = ∫ 𝑥 𝑘 𝑓 (𝑥 |𝜃) 𝑑𝑥
or
𝜇𝑘 = 𝐸(𝑋 𝑘 ) = ∑ 𝑥 𝑘 𝑓(𝑥|𝜃).
𝑥
6) If 𝑋1 ,· · · , 𝑋𝑛 are i.i.d. random variables from that distribution, the 𝑘 − 𝑡ℎ sample
moment is defined as
𝑛
1
𝑚𝑘 = ∑ 𝑋𝑖 𝑘 ,
𝑛
𝑖=1
thus, 𝑚𝑘 can be viewed as an estimator for 𝜇𝑘 .
From the law of large number, we have 𝑚𝑘 → 𝜇𝑘 in probability as 𝑛 → ∞ .
If we equate 𝜇𝑘 to 𝑚𝑘 , usually we will get an equation about the unknown parameter.
7) Solving this equation will help us get the estimator of the unknown parameter.
iii. Multiple Regression (Marks 5)
1) Multiple regression is a statistical technique that can be used to analyze the
relationship between a single dependent variable and several independent variables.
2) The objective of multiple regression analysis is to use the independent variables
whose values are known to predict the value of the single dependent value.
3) Each predictor value is weighed, the weights denoting their relative contribution to
the overall prediction.
𝑌 = 𝑎 + 𝑏1 𝑋1 + 𝑏2 𝑋2 + ⋯ + 𝑏𝑛 𝑋𝑛
4) Here Y is the dependent variable, and 𝑋1 , … . , 𝑋𝑛 are the 𝑛 independent variables.
5) In calculating the weights, 𝑎, 𝑏1 , … , 𝑏𝑛 , regression analysis ensures maximal prediction
of the dependent variable from the set of independent variables.
6) This is usually done by least squares estimation.
7) This approach can be applied to analyze multivariate time series data when one of the
variables is dependent on a set of other variables.
8) We can model the dependent variable Y on the set of independent variables.

iv. Neyman Pearson Lemma (Marks 5)


1) The Neyman-Pearson Lemma gives strong guidance about how to choose hypothesis
tests.
2) The Neyman-Pearson Lemma is an important result that gives conditions for a
hypothesis test to be uniformly most powerful.
3) That is, the test will have the highest probability of rejecting the null hypothesis while
maintaining a low false positive rate of 𝛼.
4) More formally, consider testing two simple hypotheses:
𝐻0 : 𝜃 = 𝜃0
𝐻1 : 𝜃 = 𝜃1
5) The Neyman-Pearson Lemma says a test is uniformly most powerful test among α-
level tests if it rejects 𝐻0 if and only if
𝑓𝑥 (𝑥; 𝜃1 )
>𝑘
𝑓𝑥 (𝑥; 𝜃0 )
for some 𝑘 𝜖 𝑅, were
𝑓𝑥 (𝑥; 𝜃1 )
𝛼 = 𝑃𝜃0 [ > 𝑘]
𝑓𝑥 (𝑥; 𝜃0 )
December 2023
1. Answer the Following Marks 20
Q.1. Define “Statistics”. Explain Uses and Limitations of Statistics. (Marks 5)
1) Statistics is defined as collection, compilation analysis and interpretation of numerical
data.
2) Statistic is a science of data.
3) Statistics helps in gathering information about the appropriate quantitative data.
4) It depicts the complex data in graphical form, tabular form and in diagrammatic
representation to understand it easily
5) It provides the exact description and a better understanding.
6) Uses:
1. Forecasting
2. Financial Analysis
3. Government
4. Designing Surveys
5. Economics
6. Quality Control
7. Health Care
8. Data Analysis
9. Sports
10. Probability
11. Politics
12. Research
7) Limitations of Statistics:

 For quantitative results, statistical approaches are best applicable.


 On heterogeneous data, statistics cannot be implemented.
 In gathering, analyzing, and interpreting the data, if adequate care is not taken,
statistical findings can be misleading.
 Statistical data can be treated effectively only by a person who has professional
knowledge of statistics.
 In statistical judgments, certain errors are possible.

Q.2. A random sample of size 100 has a standard deviation of 5. What can you say about the maximum
error with 95% confidence is 1.96. (Marks 5)
∴ 𝑛 = 100, 𝜎 = 5, confidence = 1.96.
Maximum error =
𝜎
𝐸(𝐸𝑟𝑟𝑜𝑟) = 𝑍𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 ×
√𝑛
5
= 1.96 ×
√100
= 1.96 × 0.5
= 0.98
∴ Maximum error with 95% confidence level is equal to plus or minus 0.98 from the mean.
Q.3. What are assumptions of Multiple Linear Regression? (Marks 5)
There are a number of assumptions that should be assessed before performing a multiple
regression analysis:
1) The dependant variable (the variable of interest) needs to be using a continuous
scale.
2) There are two or more independent variables. These can be measured using either
continuous or categorical means.
3) The three or more variables of interest should have a linear relationship, which
you can check by using a scatterplot.
4) The data should have homoscedasticity. In other words, the line of best fit is not
dissimilar as the data points move across the line in a positive or negative
direction. Homoscedasticity can be checked by producing standardised residual
plots against the unstandardized predicted values.
5) The data should not have two or more independent variables that are highly
correlated. This is called multicollinearity which can be checked using Variance-
inflation-factor or VIF values. High VIF indicates that the associated independent
variable is highly collinear with the other variables in the model.
6) There should be no spurious outliers.
7) The residuals (errors) should be approximately normally distributed. This can be
checked by a histogram (with a superimposed normal curve) and by plotting the
of the standardised residuals using either a P-P Plot, or a Normal Q-Q Plot.

Q.4. Distinguish between Null and Alternative hypothesis. (Marks 5)


Sr. Null Hypothesis Alternative Hypothesis
No.
1) A Null Hypothesis is a statement in An Alternative Hypothesis is a statement
which there is no relation between two in which there is some statistical relation
variables. between two variables.

2) Researcher try to reject or disprove it. Researcher try to accept or prove it.

3) Indirect and Implicit. Direct and Explicit.


4) P - value: If P value is less than 𝛼 value, P - value: If P value is less than 𝛼 value,
then Null hypothesis is rejected. then Alternative hypothesis is accepted.
|𝑍𝑝 | < |𝑍𝑡 | |𝑍𝑝 | < |𝑍𝛼 |

5) Null Hypothesis is denoted by 𝐻0 . Alternative Hypothesis is denoted by 𝐻1 .

6) Symbols: Equality Symbols: Inequality


=, >=, <= !=, >, <
7) Size of sample is 𝑛 ≥ 30. Large sample. Size of sample is 𝑛 < 30. Small sample.

8) Z - test T – test
2. Answer the Following Marks 20
Q.1. Represent the following data by a percentage sub-divided bar diagram. (Marks 10)
Item of Expenditure Family A Family B
Income Rs 500 Income Rs 300
Food 150 150
Clothing 125 60
Education 25 50
Miscellaneous 190 70
Savings or Deficits +10 -30
Q.2. Distinguish between primary data and secondary. What precautions should be taken in
the use of secondary data. (Marks 10)
Parameter Primary Data Secondary Data
1. Meaning Data collected by researcher Data collected by other
itself. people.
2. Originality Original and Unique Not original and unique
Information. information.
3. Adjustment Does not need adjustment, is Need adjustment to suit actual
focused. aim.
4. Sources Observations, Surveys, Internal Records, govt.
Experiment. published Data, etc.
5. Type of Data Qualitative Data Quantitative Data
6. Methods Observation, experiment, Desk research method,
interview, etc. searching online, etc.
7. Reliability More reliable Less reliable
8. Capability More capable to solve a Less capable to solve a
problem. problem.
9. Time More time consuming Less time consuming
consumed
10. Cost- Costly Economical
effectiveness
11. Suitability More suitable May or may not be suitable
12. Need of Needs team of trained Does not need of team
Investigators Investigators. Investigators.
13. Collected Secondary data is inadequate. Before primary data is
when collected.
Following some precautions should be taken in the use of secondary data:
 Suitable purpose of investigation.
 Inadequate data.
 Definition of units.
 Degree of accuracy.
 Time and condition of collection of facts.
 Homogeneous conditions.
 Comparison.
3. Answer the Following Marks 20
Q.1. The following Table gives the frequency distribution of the weekly wages (in ‘00RS.) of
100 workers in factory. Draw the Histogram and frequency polygon of the distribution.
(Marks 10)
Weekly
wages 20-24 25-29 30-34 35-39 40-44 45-49 50-54 55-59 60-64 Total
(’00 RS.’)
No. of
4 5 12 23 31 10 8 5 2 100
Workers
Q.2. The equation of two lines of regression obtained in correlation analysis are given below:
𝟐𝑿 = 𝟖 − 𝟑𝒀 and 𝟐𝒀 = 𝟓 − 𝑿. Obtain the value of the correlation coefficient.
(Marks 10)
Let Regression line of X on Y be,
2𝑋 = 8 − 3𝑌
8 3
𝑋= − 𝑌
2 2
𝑋 = −1.5𝑌 + 4
3
∴ 𝑏𝑥𝑦 = −1.5 𝑜𝑟 −
2

Let Regression line of Y on X be,


2𝑌 = 5 − 𝑋
5 1
𝑌= − 𝑋
2 2
𝑌 = −0.5𝑋 + 2.5
1
∴ 𝑏𝑦𝑥 = −0.5 𝑜𝑟 −
2
The regression coefficients are negative,
Coefficient correlation is,
𝑟 = ±√𝑏𝑥𝑦 × 𝑏𝑦𝑥

3 1
= ±√− × −
2 2
= ±0.866
∴ 𝑟 = −0.866

4. Answer the Following Marks 20


Q.1. From the data given below find: (Marks 10)
a) The Two regression coefficients
b) The Two regression equations
c) The coefficient of correlation between the marks in Economics and Statistics
d) The most likely marks in Statistics if marks in Economics are 30.

Marks in 25 28 35 32 31 36 29 38 34 32
Economics
Marks in 43 46 49 41 36 32 31 30 33 39
Statistics

∑𝑋320
∴ 𝑋̅ = == 32
𝑁 10
∑ 𝑌 380
∴ 𝑌̅ = = = 38
𝑁 10
𝑿 𝒀 𝒙 𝒚 𝒙𝟐 𝒚𝟐 𝒙𝒚
=𝑿 =𝒀
− 𝟑𝟐 − 𝟑𝟖
25 43 -7 5 49 25 -35
28 46 -4 8 16 64 -32
35 49 3 11 9 121 33
32 41 0 3 0 9 0
31 36 -1 -2 1 4 2
36 32 4 -6 16 36 -24
29 31 -3 -7 9 49 21
38 30 6 -8 36 64 -48
34 33 2 -5 4 25 -10
32 39 0 1 0 1 0
∑𝑿 ∑𝒀 ∑𝒙 ∑𝒚 ∑ 𝒙𝟐 ∑ 𝒚𝟐 ∑ 𝒙𝒚 =
= 320 = 380 =0 =0 = 140 = 398 − 93

a) Two regression coefficients:


1) Regression coefficients X on Y:
𝑁 ∑ 𝑥𝑦 − (∑ 𝑥)(∑ 𝑦)
𝑏𝑥𝑦 =
𝑁 ∑ 𝑦 2 − ∑(𝑦)2
10 × −93 − 0 × 0
=
10 × 398 − 02
−930 − 0
=
3980 − 0
−930
=
3980
∴ 𝑏𝑥𝑦 = −0.2336

2) Regression coefficients Y on X:
𝑁 ∑ 𝑥𝑦 − (∑ 𝑥)(∑ 𝑦)
𝑏𝑦𝑥 =
𝑁 ∑ 𝑥 2 − ∑(𝑥)2
10 × −93 − 0 × 0
=
10 × 140 − 02
−930 − 0
=
1400 − 00
−930
=
1400
∴ 𝑏𝑦𝑥 = −0.6642

b) Two regression equations:


1) Regression equation of X on Y:
𝑋 − 𝑋̅ = 𝑏𝑥𝑦 (𝑌 − 𝑌̅)
𝑋 − 32 = −0.2336(𝑌 − 38)
𝑋 − 32 = −0.2336𝑌 + 8.8768
𝑋 = −0.2336𝑌 + 40.8768
∴ The regression equation of X on Y is 𝑋 = −0.2336𝑌 + 40.8768
2) Regression equation of Y on X:
𝑌 − 𝑌̅ = 𝑏𝑦𝑥 (𝑋 − 𝑋̅)
𝑌 − 38 = −0.6642(𝑋 − 32)
𝑌 − 38 = −0.6642𝑋 + 21.2544
𝑌 = −0.6642𝑋 + 59.2544
∴ The regression equation of Y on X is 𝑌 = −0.6642𝑋 + 59.2544

c) coefficient of correlation between the marks in Economics and Statistics:


𝑟 = ±√𝑏𝑥𝑦 × 𝑏𝑦𝑥

= ±√−0.2336 × −0.6642
= ±0.3938
∴ Both regression coefficients are negative, so take negative sign,
𝑟 = −0.3938

d) most likely marks in Statistics if marks in Economics are 30:


𝑋 = 30, 𝑌 =?
𝑌 = −0.6642𝑋 + 59.2544
= −0.6642(30) + 59.2544
= −19.929 + 59.2544
∴ 𝑌 = 39.3254

Q.2. Explain the following point Estimation Properties with Example (Marks 10)
i) Consistency:
1) It states that the estimator stays close to the parameter’s value as the population’s
size increases.
2) Thus, a large sample size is required to maintain its consistency level.
3) When the expected value moves towards the parameter’s value, we state that the
estimation is consistent.
4) Example:
1. Suppose you're estimating the mean height of students in a school.
2. You take random samples of increasing sizes, say 10 students, 50 students, 100
students, and so on.
3. With each increase in sample size, the mean height calculated from the sample
should approach the true mean height of all students in the school.
4. If this happens, the estimator for the mean height is considered consistent.

ii) Unbiasedness:
1) The most efficient estimator is considered the one which has the least unbiased and
consistent variance among all the estimators considered.
2) The variance considers how dispersed the estimator is from the estimate.
3) The smallest variance should deviate the least when different samples are brought
into place.
4) But, of course, this also depends on the distribution of the population.
5) Example:
1. Let's say you're estimating the average score of students in a class.
2. You take multiple random samples and calculate the mean score for each
sample.
3. If, on average, these sample means are equal to the true average score of the
entire class, then the estimator is unbiased.
4. This means that sometimes the estimate might be higher than the true value,
and sometimes it might be lower, but on average, it hits the mark.

5. Answer the Following Marks 20


Q.1. The data with regard to the cost of production of 8 different drugs and cost of ingredients
and packaging cost, are as given below: (Marks 10)
Sr. no. Cost of production Cost of ingredients Packaging Cost (Rs.)
(Rs.) (In thousands of Rs.) (X2)
(Y) (X1)
1 100 17 19
2 79 50 54
3 100 90 75
4 129 30 36
5 158 15 16
6 106 20 25
7 58 20 24
8 78 50 53
̂
a) Fit the regression 𝒀 = 𝒂 + 𝒃𝟏 𝑿𝟏 + 𝒃𝟐 𝑿𝟐 (Marks 5)
b) Find the coefficient of multiple determination (R 2). (Marks 2)
c) Also test the significance of regression (Given F = 5.786, for a Significance level of
𝜶 = 𝟎. 𝟎𝟓) (Marks 3)

∴ ∑ 𝑌 = 808 ∴ ∑ 𝑋1 = 292 ∴ ∑ 𝑋2 = 302 ∴ 𝑛 = 8

i) ∑ 𝑌 = 𝑛𝑎 + 𝑏1 ∑ 𝑋1 + 𝑏2 ∑ 𝑋2 …………………… eq(1)
2
∑ 𝑋1 𝑌 = 𝑎 ∑ 𝑋1 + 𝑏1 ∑ 𝑋1 + 𝑏2 ∑ 𝑋1 𝑋2 …………………… eq(2)
∑ 𝑋2 𝑌 = 𝑎 ∑ 𝑋2 + 𝑏1 ∑ 𝑋1 𝑋2 + 𝑏2 ∑ 𝑋2 2 …………………… eq(3)
𝟐
𝒀 𝑿𝟏 𝑿𝟐 𝑿𝟏 𝒀 𝑿𝟐 𝒀 𝑿𝟏 𝑿𝟐 𝟐 𝑿𝟏 𝑿𝟐
100 17 19 1700 1900 361 289323
79 50 54 3950 4266 2916 2500
2700
100 90 75 9000 7500 5625 8100
6750
129 30 36 3870 4644 1296 900
1080
158 15 16 2370 2528 256 225240
106 20 25 2120 2650 625 400500
58 20 24 1160 1392 576 400480
78 50 53 3900 4134 2809 2500
2650
2 2
𝑌 𝑋1 𝑋2 𝑋1 𝑌 𝑋2 𝑌 𝑋1 𝑋2 𝑋1 𝑋2
= 808 = 292 = 302 = 28070 = 29014 = 15314 = 14464 = 14723
808 = 8𝑎 + 292𝑏1 + 302𝑏2 …………………… eq(1)
28070 = 292𝑎 + 15314𝑏1 + 14723𝑏2 …………………… eq(2)
29014 = 302𝑎 + 14723𝑏1 + 14464𝑏2 …………………… eq(3)
∴ 𝑎 = 137.55
∴ 𝑏1 = 2
∴ 𝑏2 = −2.90
̂ = 𝟏𝟑𝟕. 𝟓𝟓 + 𝟐𝑿𝟏 − 𝟐. 𝟗𝟎𝑿𝟐
𝒀

ii) R2 =
∑(𝑌𝑖 − 𝑌̅)2 − ∑(𝑌𝑖 − 𝑌̂)2
𝑅2 =
∑(𝑌𝑖 − 𝑌̅)2
∑ 𝑌 808
𝑌̅ = = = 101
𝑛 8
𝒀𝒊 𝒀𝒊 − 𝒀̅ ̂
𝒀 ̂
𝒀𝒊 − 𝒀 (𝒀𝒊 − 𝒀̅ )𝟐 (𝒀𝒊 − 𝒀̂ )𝟐
100 -1 131.84 -31.84 1 1013.78
79 -22 124.69 -24.69 484 609.59
100 -1 160.8 -60.8 1 3696.64
129 28 122.31 -22.31 784 497.73
158 57 134.11 -34.11 3249 1163.49
106 5 125.3 -25.3 25 640.09
58 -43 127.39 -27.39 1849 750.21
78 -23 126.78 -26.78 529 717.16
𝑌𝑖 = 808 𝑌𝑖 − 𝑌̅ = 0 𝑌̂ 𝑌𝑖 − 𝑌̂ (𝑌𝑖 − 𝑌̅)2 (𝑌𝑖 − 𝑌̂)2
= 1053.22 = −253.22 = 6922 = 9088.69

6922 − 9088.69
∴ 𝑅2 =
6922
𝟐
∴ 𝑹 = −𝟎. 𝟑𝟏𝟑𝟎

iii) 𝐹𝛼 𝑎𝑡 0.05 𝑙𝑒𝑣𝑒𝑙 𝑜𝑓 𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑐𝑒 𝑖𝑠 5.786.


𝐹𝛼 = 5.786
∑(𝑌𝑖 − 𝑌̂ )2
𝑝
𝐹=
∑(𝑌𝑖 − 𝑌̅ )2
𝑛−𝑝−1
Where, P is an independent variable(b’s).
𝑝 = 2.
9088.69
∴𝐹= 2
6922
8−2−1
4544.345
=
1384.4
= 3.282
∴ 𝐹 = 3.282

∴ 𝐹 < 𝐹𝛼 , 3.282 < 5.786 then regression model is not significant.


Q.2. What is hypothesis testing? (Marks 10)
i) Z-Test for Single Mean
ii) Z-Test for Difference of Mean
1) Hypothesis Testing is a type of statistical analysis in which you put your assumptions
about a population parameter to the test.
2) It is used to estimate the relationship between 2 statistical variables.
3) Z-Test for single Mean:
i) The z-test is a statistical test used to determine if a sample mean is significantly
different from a known population mean.
ii) It is used when the population standard deviation is known.
iii) The formula for the single mean z-test is:
̅ −𝝁
𝑿 ̅−𝝁
𝑿
|𝒁| = 𝒐𝒓 |𝒁| = 𝝈
𝑺 ∙ 𝑬(𝑿̅)
√𝒏
𝝈
̅) =
𝑺 ∙ 𝑬(𝑿
√𝒏
Were,
|𝑍| = The Z-Statistic
𝑛 = The Sample Size
𝑋̅ = The Sample Mean
𝜇 = The Population Mean
𝜎 = The Population Standard Deviation
𝑆 ∙ 𝐸 = The Standard Error
4) Z-Test for Difference of Mean:
i) A z-test is a statistical test to determine whether two population means are
different or to compare one mean to a hypothesized value when the variances
are known and the sample size is large.
ii) A z-test is a hypothesis test for data that follows a normal distribution.
iii) The formula for the double mean z-test is:
𝑿̅𝟏 − 𝑿
̅𝟐
|𝒁| =
𝝈 𝟐 𝝈 𝟐
√ 𝟏 + 𝟐
𝒏
𝟏 𝒏
𝟐
Were,
|𝑍| = The Z-Statistics for the two groups
𝑋̅1 𝑜𝑟 𝑋̅2 = The sample means of the two groups
𝑛1 𝑜𝑟 𝑛2 = The Samples sizes of the two groups
𝜎1 𝑜𝑟 𝜎2 = The Population Standard Deviation of the two groups

6. Answer the Following Marks 20


Q.1. Explain the method of maximum likelihood estimation. (Marks 10)
1) Maximum Likelihood Estimation (MLE) is a statistical method used to estimate the
parameters of a probability distribution.
2) Here's how the method of maximum likelihood estimation works:
1. Formulate the Likelihood Function:
1) Given a statistical model with parameters 𝜃, the likelihood function, denoted
as 𝐿 (𝜃 | 𝑥),measures the probability of observing the given sample data 𝑥 for
different values of the parameters 𝜃.
2) For independent and identically distributed (i.i.d.) data, the likelihood
function is often represented as the product of the probability density
functions (pdf) or probability mass functions (pmf) of the individual data
points:
𝐿(𝜃 | 𝑥) = 𝑓 (𝑥1 , 𝜃) × 𝑓 (𝑥2 , 𝜃) × … … … … × 𝑓(𝑥𝑛 , 𝜃)
3) Alternatively, it can be expressed as the joint probability density function (pdf)
or probability mass function (pmf) of the entire sample 𝑥:
𝐿 (𝜃 | 𝑥) = 𝑓 (𝑥, 𝜃)

2. Maximize the Likelihood Function:


1) The goal of MLE is to find the values of the parameters 𝜃, that maximize the
likelihood function 𝐿(𝜃 | 𝑥).
2) This is typically done by taking the derivative of the likelihood function with
respect to each parameter, setting the derivatives equal to zero, and solving
for the parameter values.
3) In some cases, it might be more convenient to maximize the log-likelihood
function (𝐼𝑛(𝐿(𝜃 | 𝑥))) instead, as it simplifies the computations and does
not change the location of the maximum.

3. Estimate the Parameters:


1) Once the maximum likelihood estimates of the parameters are obtained, they
are used as point estimates for the true parameter values.
2) These estimates are denoted as 𝜃̂ 𝑀𝐿𝐸 and are often accompanied by
standard errors or confidence intervals to quantify the uncertainty associated
with the estimates.

4. Assess the Model:


1) After obtaining the parameter estimates, it's essential to assess the goodness-
of-fit of the model to the data.
2) This can be done using various diagnostic tools, such as residual analysis,
goodness-of-fit tests, and graphical methods.
5. Interpretation:
1) The maximum likelihood estimates provide the parameter values that make
the observed data the most likely under the assumed statistical model.
2) These estimates are asymptotically efficient, meaning that as the sample size
increases, they approach the true parameter values with high probability.

Q.2. Explain the Neyman Pearson Lemma. (Marks 10)


1) The Neyman-Pearson Lemma gives strong guidance about how to choose hypothesis
tests.
2) The Neyman-Pearson Lemma is an important result that gives conditions for a
hypothesis test to be uniformly most powerful.
3) That is, the test will have the highest probability of rejecting the null hypothesis while
maintaining a low false positive rate of 𝛼.
4) More formally, consider testing two simple hypotheses:
𝐻0 : 𝜃 = 𝜃0
𝐻1 : 𝜃 = 𝜃1
5) The Neyman-Pearson Lemma says a test is uniformly most powerful test among α-level
tests if it rejects 𝐻0 if and only if
𝑓𝑥 (𝑥; 𝜃1 )
>𝑘
𝑓𝑥 (𝑥; 𝜃0 )
for some 𝑘 𝜖 𝑅, were
𝑓𝑥 (𝑥; 𝜃1 )
𝛼 = 𝑃𝜃0 [ > 𝑘]
𝑓𝑥 (𝑥; 𝜃0 )
December 2022
Q.1. Solve the Following: Marks 20
a) Explain Bar chart with following Example: (Marks 5)
Ex. The following table shows the number of books of different subject in library.
Subject Phy Chem Bio Hist Gio Eng Math Comp
No. of 100 125 75 75 50 200 250 175
Books
1) A bar chart or bar graph is a chart or graph that presents categorical data with
rectangular bars with heights or lengths proportional to the values that they represent.
The bars can be plotted vertically or horizontally.
2) There are some types of a Bar Chart or Graph:
 Simple bar diagram
 Percentage bar diagram
 Multiple bar diagram
 Subdivided/Component bar diagram
 Deviation bar diagram
 Broken bar diagram

b) Equations of the two lines of regression are: 𝒙 + 𝟔𝒚 = 𝟔 and 𝟑𝒙 + 𝟐𝒚 = 𝟏𝟎


Find: (Marks 5)
i) Mean of X and Man of Y
ii) Regression coefficients 𝒃𝒚𝒙 and 𝒃𝒙𝒚
iii) Correlation coefficient between X and Y
1) Mean of X and Y:
𝑥 + 6𝑦 = 6 ……..eq(1)
3𝑥 + 2𝑦 = 10 ……..eq(2)
Solving equation 1 and 2, we get,
1
∴ 𝑥 = 3 𝑜𝑟 𝑦 =
2
1
∴ The mean of x and y is 𝑋̅ = 3, 𝑌̅ = 2

2) Regression Coefficients:
1. Regression coefficient of Y on X:
𝑥 + 6𝑦 = 6
6𝑦 = −𝑥 + 6
1 1
𝑦=− 𝑥+
6 6
1
𝑦=−
6
1
∴ The regression coefficient of Y on X is 𝑏𝑦𝑥 = − 6

2. Regression coefficient of X on Y:
3𝑥 + 2𝑦 = 10
3𝑥 = 10 − 2𝑦
1
𝑥 = (10 − 2𝑦)
3
10 2
𝑥= − 𝑦
3 3
2
∴𝑥=−
3
2
∴ The regression coefficient of X on Y is 𝑏𝑥𝑦 = − 3

3) Correlation coefficient between X and Y:

𝑟 = ±√𝑏𝑦𝑥 × 𝑏𝑥𝑦

1 2
= ±√− × −
6 3

1
= ±√
3
= ±0.3333
The Both regression coefficients are negative. So, take negative sign:
∴ 𝑟 = −0.3333
c) In a certain trivariate distribution: 𝒓𝟏𝟐 = 𝟎. 𝟕, 𝒓𝟐𝟑 = 𝟎. 𝟔, 𝒓𝟑𝟏 = 𝟎. 𝟔 find the partial
correlation coefficient 𝒓𝟏𝟐.𝟑. (Marks 5)
𝑟12− 𝑟13 𝑟23
𝑟12.3 =
√(1 − 𝑟13 2 )(1 − 𝑟23 2 )
(0.7) − (0.6) × (0.6)
=
√(1 − (0.6)2 )(1 − (0.6)2 )
0.7 − 0.36
=
√(1 − 0.36)(1 − 0.36)
0.34
=
√0.64 × 0.64
0.34
=
0.64
= 0.53125
∴ 𝑟12.3 = 0.53125

d) A survey conducted over the last 25 years indicated the in 10 years the winter was mild, in
8 years it was cold and int the remaining 7 years it was very cold. A company sells 1000
woollen coats in a mild year, 1300 in a cold year and 2000 in a very cold year. You are
required to find the yearly expected profit of the company if a woollen coat costs Rs. 1730
and it is sold to stores for Rs. 2480. (Marks 5)
1) Find expected profit of the company if a woollen coat profit costs Rs. 1730 and it is sold
to stores for Rs. 2480:
Profit = woollen coat cost – sold for stores cost
= 2480 – 1730
= 750
∴ 𝑃𝑟𝑜𝑓𝑖𝑡 = 750 𝑅𝑠.
2) Calculate total no. of coats sell in each year:
 Mild Year: 10 × 1000 = 10000 𝑐𝑜𝑎𝑡𝑠.
 Cold Year: 8 × 1300 = 10400 𝑐𝑜𝑎𝑡𝑠.
 Very Cold Year: 7 × 2000 = 14000 𝑐𝑜𝑎𝑡𝑠.
3) Calculate Total Coats sells in 25 Years:
𝑇𝑜𝑡𝑎𝑙 𝐶𝑜𝑎𝑡𝑠: 10000 + 10400 + 14000 = 34400 𝐶𝑜𝑎𝑡𝑠
4) Calculate Total profit of the coats sell in 25 years:
𝑇𝑜𝑡𝑎𝑙 𝐶𝑜𝑠𝑡 = 34400 × 750 = 25800000 𝑅𝑠.
5) Calculate expected profit for each year:
𝑃𝑒𝑟 𝑌𝑒𝑎𝑟 𝑃𝑟𝑜𝑓𝑖𝑡 = 25800000 ÷ 25 = 1032000 𝑅𝑠.
∴ So, the Yearly expected profit of the company is 1032000 𝑅𝑠.
Q.2. Solve the Following: Marks 20
a) Define the term “Statistics” and discuss its use in business and trade. Also point out its
limitations. (Marks 10)
1) Statistics is defined as collection, compilation analysis and interpretation of numerical
data.
2) Statistic is a science of data.
3) Statistics helps in gathering information about the appropriate quantitative data.
4) It depicts the complex data in graphical form, tabular form and in diagrammatic
representation to understand it easily.
5) It provides the exact description and a better understanding.
6) Use of statistics in a business or trade:
Business:
1. Market Research
2. Demand Forecasting
3. Financial Analysis
4. Quality Control
5. Risk Management
6. Performance Measurement

Trade:

1. Market Analysis
2. Risk Assessment
3. Technical Analysis
4. Algorithmic trading
5. Performance Evaluation

7) Limitations of Statistics:

 For quantitative results, statistical approaches are best applicable.


 On heterogeneous data, statistics cannot be implemented.
 In gathering, analyzing, and interpreting the data, if adequate care is not taken,
statistical findings can be misleading.
 Statistical data can be treated effectively only by a person who has professional
knowledge of statistics.
 In statistical judgments, certain errors are possible.
b) What are the various methods of collecting statistical data? Which of these is most
reliable and why? (Marks 10)
Parameter Primary Data Secondary Data
1. Meaning Data collected by researcher Data collected by other people.
itself.
2. Originality Original and Unique Not original and unique
Information. information.
3. Adjustment Does not need adjustment, is Need adjustment to suit actual
focused. aim.
4. Sources Observations, Surveys, Internal Records, govt.
Experiment. published Data, etc.
5. Type of Data Qualitative Data Quantitative Data

6. Methods Observation, experiment, Desk research method,


interview, etc. searching online, etc.
7. Reliability More reliable Less reliable
8. Capability More capable to solve a Less capable to solve a
problem. problem.
9. Time More time consuming Less time consuming
consumed
10. Cost- Costly Economical
effectiveness
11. Suitability More suitable May or may not be suitable
12. Need of Needs team of trained Does not need of team
Investigators Investigators. Investigators.

13. Collected Secondary data is Before primary data is


when inadequate. collected.

Q.3. Solve the Following: Marks 20


a) Find the Mean Deviation from the Median for the following data. (Marks 10)
Age of 20-25 25-30 30-35 35-40 40-45 45-50 50-55 55-60
Workers
No. of 120 125 175 160 150 140 100 30
Workers
Age of No. of Cumulative Midpoint 𝑥𝑖 𝑓𝑖 ∙ (𝑥𝑖
Workers Workers Frequency 𝑥𝑖 − 𝑚𝑒𝑑𝑖𝑎𝑛 − 𝑚𝑒𝑑𝑖𝑎𝑛)
𝑓𝑖 (𝑐. 𝑓. )
20-25 120 120 22.5 15 1800
25-30 125 245 27.5 10 1250
30-35 175 420 32.5 5 875
35-40 160 580 37.5 0 0
40-45 150 730 42.5 5 750
45-50 140 870 47.5 10 1400
50-55 100 970 52.5 15 1500
55-60 30 1000 57.5 20 600
Total 1000 8175
∴ 𝑁 = 1000
𝑁 1000
∴ = = 500
2 2
∴ Therefore, 35 – 40 is the medial class.
𝑁
−𝐶
∴ 𝑀𝑒𝑑𝑖𝑎𝑛 = 𝑙 + 2 ×ℎ
𝑓
Here,
𝑁 = 1000
𝑙 = 35 (𝑙𝑜𝑤𝑒𝑠𝑡 𝑎𝑔𝑒 𝑜𝑓 𝑚𝑒𝑑𝑖𝑎𝑛 𝑐𝑙𝑎𝑠𝑠)
𝐶 = 420 (𝑝𝑟𝑒𝑣𝑖𝑜𝑢𝑠 𝑐𝑢𝑚𝑢𝑙𝑎𝑡𝑖𝑣𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑚𝑒𝑑𝑖𝑎𝑛 𝑐𝑙𝑎𝑠𝑠)
𝑓 = 160 (𝑐𝑢𝑟𝑟𝑒𝑛𝑡 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑚𝑒𝑑𝑖𝑎𝑛 𝑐𝑙𝑎𝑠𝑠)
ℎ = 5 (𝑟𝑎𝑛𝑔𝑒 𝑜𝑓 𝑝𝑒𝑟 𝑐𝑙𝑎𝑠𝑠)
1000
−420
2
∴ 𝑀𝑒𝑑𝑖𝑎𝑛 = 35 + 160
×5
500−420
= 35 + ×5
160

= 35 + 2.5
= 37.5
∴ 𝑀𝑒𝑑𝑖𝑎𝑛 = 37.5
Let’s, find Mean deviation from the median:
1
𝑀. 𝐷. 𝑀 = 𝑁 ∑8𝑖=1 𝑓𝑖 𝑥𝑖 − 𝑀
1
= × 8175
1000

= 8.175
∴ 𝑀. 𝐷. 𝑀 = 8.175
b) A survey of 370 students from Commerce faculty and 130 students from Science Faculty
revealed that 180 students were studying for only C.A. Examinations, 140 for only
Costing Examinations and 80 for both C.A. and Costing Examinations. The rest had
offered part-time Management Courses, of those studying for Costing only, 13 were girls
and 90 boys belonged to Commerce faculty. Out of 80 studying for both C.A. and Costing,
72 were from Commerce Faculty amongst which 70 were boys. Amongst those who
offered part-time Management Courses, 50 boys were from Science Faculty and 30 boys
and 10 girls form Commerce faculty. In all three were 110 boys in Science Faculty. Present
the above information in a tabular form. Find the number of students form Science
Faculty studying for part-time Management Courses. (Marks 10)
Commerce Science Total
Boys Girls Total Boys Girls Total Boys Girls Total
C.A 130 25 155 21 4 25 151 29 180
Costing 90 13 103 35 2 37 125 15 140
Both 70 2 72 4 4 8 74 6 80
Management 30 10 40 50 10 60 80 20 100
Total 320 50 370 110 20 130 430 70 500
Q.4. Solve the Following: Marks 20
a) A department store gives in-service training to its salesmen which is followed by a test.
It is considering whether it should terminate the service of any salesmen who does not
do well in the test. The following data give the test scores and sales made by nine
salesmen during a certain period:
Test 14 19 24 21 26 22 15 20 19
Scores
Sales 31 36 48 37 50 45 33 41 39
(“00
Rs.”)
Calculate the coefficient of correlation between the test scores and the sales. Does it
indicate that the termination of services of low-test scores is justified? If the firm wants
a minimum sales volume of Rs. 30,000, what is the minimum test score that will ensure
continuation of service? Also estimate that most probable sales volume of a salesmen
making a score of 28. (Marks 10)
∑ 𝑋 180
∴ 𝑋̅ = = = 20 ∴ 𝑋̅ = 20
𝑛 9
∑ 𝑌 360
∴ 𝑌̅ = = = 40 ∴ 𝑌̅ = 40
𝑛 9

𝑋 𝑌 𝑥 = 𝑋 − 20 𝑦 = 𝑌 − 40 𝑥2 𝑦2 𝑥𝑦
14 31 -6 -9 36 81 54
19 36 -1 -4 1 16 4
24 48 4 8 16 64 32
21 37 1 -3 1 9 -3
26 50 6 10 36 100 60
22 45 2 5 4 25 10
15 33 -5 -7 25 49 35
20 41 0 1 0 1 0
19 39 -1 -1 1 1 1
𝑋 𝑌 𝑥=0 𝑦=0 𝑥2 𝑦2 𝑥𝑦
= 180 = 360 = 120 = 346 = 193

1) Regression Coefficients:
1. Regression Coefficient X on Y:
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑏𝑥𝑦 =
𝑛 ∑ 𝑥 2 − ∑(𝑥)2
9 × 193 − 0 × 0
=
9 × 120 − 02
1737
=
1080
= 1.60
∴ 𝑏𝑥𝑦 = 1.60

2. Regression Coefficient Y on X:
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑏𝑦𝑥 =
𝑛 ∑ 𝑦 2 − ∑(𝑦)2
9 × 193 − 0 × 0
=
9 × 346 − 02
1737
=
3114
= 0.55
∴ 𝑏𝑦𝑥 = 0.55

2) Regression Equations:
1. Regression Equation X on Y:
∴ 𝑋 − 𝑋̅ = 𝑏𝑥𝑦 (𝑌 − 𝑌̅)
𝑋 − 20 = 1.60(𝑌 − 40)
𝑋 − 20 = 1.60𝑌 − 64
𝑋 = 1.60𝑌 + 44
∴ The regression equation of X on Y is 𝑋 = 1.60𝑌 + 44

2. Regression Equation Y on X:
∴ 𝑌 − 𝑌̅ = 𝑏𝑦𝑥 (𝑋 − 𝑋̅)
𝑌 − 40 = 0.55(𝑋 − 20)
𝑌 − 40 = 0.55𝑋 − 11
𝑌 = 0.55𝑋 + 29
∴ The regression equation of Y on X is 𝑌 = 0.55𝑋 + 29

3) Coefficient Correlation:
∴ 𝑟 = ±√𝑏𝑥𝑦 × 𝑏𝑦𝑥

= ±√1.60 × 0.55
= ±√0.88
= ±0.93
Both regression equation is positive. So, take positive sign:
∴ 𝑟 = 0.93
∴ Hence, the termination of services for low test scores is justified.

4) Find Test score:


Sales = 1.60 × 𝑇𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 + 44
30000 = 1.60 × 𝑇𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 + 44
1.60 × 𝑇𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 = 30000 − 14
1.60 × 𝑇𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 = 29986
29986
= 𝑇𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒
1.60
∴ 𝑇𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 = 18.741 ≈ 18

5) When 𝑿 = 𝟐𝟖, 𝒀 =?
𝑌 = 0.55𝑋 + 29
𝑌 = 0.55(28) + 29
∴ 𝑌 = 44.4
b) Define a random variable and its mathematical expectation. (Marks 10)
1) The various outcomes of a random experiment is denoted with the help of a variable
which is called a random variable.
2) For example: In case of throwing a die, we may use a variable X for representing the
outcome of the throw. Thus, X will take the values 1, 2, 3, 4, 5 and 6.
3) But in some cases, the outcomes may be qualitative e.g. tossing of a coin which may
be head or tail, the colours of balls drawn from an urn may be red, yellow, white etc.
4) But for mathematical convenience the qualitative outcomes may be expressed in
quantitative forms. For example, in tossing of a coin we may denote the outcome
‘Head’ by 1 and ‘Tail’ by 0.
5) In this way each outcome of a random experiment, whether it is qualitative or
quantitative, can be expressed by a real number.
6) There are two types of random variables:
(a) Discrete random variable
(b) (b) Continuous random variable
7) Mathematical Expectation:
Let us consider a discrete random variable 𝑋 which assumes the values 𝑥1 , 𝑥2 , . . . , 𝑥𝑛
with respective probabilities 𝑝1 , 𝑝2 , … , 𝑝𝑛 , such that ∑ 𝑝𝑖 = 1, then the
mathematical expectation of the random variable 𝑋 is given by the sum of the
products of the different values of 𝑋 with their corresponding probabilities. The
expectation of a random variable is generally denoted by 𝐸(𝑋).
Thus, 𝐸(𝑋) = ∑𝑛𝑖=1 𝑥𝑖 × 𝑃 (𝑋 = 𝑥𝑖 ) = ∑𝑛𝑖=1 𝑝𝑖 𝑥𝑖 provided the series is convergent
and ∑ 𝑝𝑖 = 1.
In case the discrete random variable takes countably infinite number of values then
we have
𝑛 𝑛

𝐸(𝑋) = ∑ 𝑥𝑖 × 𝑃 (𝑋 = 𝑥𝑖 ) = ∑ 𝑝𝑖 𝑥𝑖
𝑖=1 𝑖=1
If 𝑋 is a continuous random variable with probability density function 𝑓(𝑥),  <
𝑥 <  Then the mathematical expectation of the random variable 𝑋 is given by
∞ ∞
𝐸(𝑋) = ∫−∞ 𝑥 𝑓 (𝑥)𝑑𝑥 provided ∫−∞ 𝑓 (𝑥)𝑑𝑥 = 1
The expectation of the random variable 𝑋 serves as the measure of central tendency
of the probability distribution of 𝑋.

Q.5. Solve the Following: Marks 20


a) Write a detailed note on least square regression. (Marks 10)
1) Least square regression is a technique that helps you draw a line of best fit depending
on your data points.
2) The line is called the least square regression line, which perfectly depicts the changes
in your y (response) variables and their corresponding x (explanatory) variable.
3) The line that we draw through the scatterplots does not have to pass through all the
plotted points, provided there is a perfect linear relationship between the variables.
4) Equation of least square regression line: ŷ = 𝑎 + 𝑏𝑥
b) What is the test of skewness? (Marks 10)
1) Skewness is a measure of lack of symmetry, i.e. it measures the deviation of the given
distribution of a random variable from a symmetric distribution.
2) Normal Distribution:
1. A Normal Distribution is a probability distribution that is symmetric about the
mean.
2. It is also known as a Gaussian Distribution.
3. The distribution appears as a Bell-shaped curve, which means the mean is the
most frequent data in the given data set.

4. In Normal Distribution:

Mean = Median = Mode

3) Standard Normal Distribution: When the mean in a Normal Distribution is 0 and the
Standard Deviation is 1, then the Normal Distribution is called a Standard Normal
Distribution.
4) Types of Skewness:
 Positive Skewness:
1. In positive skewness, the extreme data values are larger, which in turn
increases the mean value of the data set.
2. In Positive Skewness:
Mean > Median > Mode
 Negative Skewness:
1. In negative skewness, the extreme data values are smaller, which decreases
the mean value of the dataset.
2. In Negative Skewness:
Mean < Median < Mode

5) Unlike the Normal Distribution (mean = median = mode), in positive and negative
skewness, the mean, median, and mode are all different.

Q.6. Solve the Following: Marks 20


a) Explain the following point Estimation Properties with example. (Marks 10)
i) Consistency
1) It states that the estimator stays close to the parameter’s value as the population’s
size increases.
2) Thus, a large sample size is required to maintain its consistency level.
3) When the expected value moves towards the parameter’s value, we state that the
estimation is consistent.
4) Example:
i. Suppose you're estimating the mean height of students in a school.
ii. You take random samples of increasing sizes, say 10 students, 50 students, 100
students, and so on.
iii. With each increase in sample size, the mean height calculated from the sample
should approach the true mean height of all students in the school.
iv. If this happens, the estimator for the mean height is considered consistent.

ii) Unbiasedness
1) The most efficient estimator is considered the one which has the least unbiased
and consistent variance among all the estimators considered.
2) The variance considers how dispersed the estimator is from the estimate.
3) The smallest variance should deviate the least when different samples are brought
into place.
4) But, of course, this also depends on the distribution of the population.
5) Example:
i. Let's say you're estimating the average score of students in a class.
ii. You take multiple random samples and calculate the mean score for each
sample.
iii. If, on average, these sample means are equal to the true average score of the
entire class, then the estimator is unbiased.
iv. This means that sometimes the estimate might be higher than the true value,
and sometimes it might be lower, but on average, it hits the mark.

b) What is Hypothesis testing? For large samples explain (Marks 10)


i) Test of Significance for a single mean
ii) Test of Significance of difference between two means
1) Hypothesis Testing is a type of statistical analysis in which you put your
assumptions about a population parameter to the test.
2) It is used to estimate the relationship between 2 statistical variables.
3) Z-Test for single Mean:
1. The z-test is a statistical test used to determine if a sample mean is
significantly different from a known population mean.
2. It is used when the population standard deviation is known.
3. The formula for the single mean z-test is:
̅ −𝝁
𝑿 ̅−𝝁
𝑿
|𝒁| = 𝒐𝒓 |𝒁| = 𝝈
𝑺 ∙ 𝑬(𝑿̅)
√𝒏
𝝈
̅) =
𝑺 ∙ 𝑬(𝑿
√𝒏
Were,
|𝑍| = The Z-Statistic
𝑛 = The Sample Size
𝑋̅ = The Sample Mean
𝜇 = The Population Mean
𝜎 = The Population Standard Deviation
𝑆 ∙ 𝐸 = The Standard Error
4) Z-Test for Difference of Mean:
1. A z-test is a statistical test to determine whether two population means are
different or to compare one mean to a hypothesized value when the
variances are known and the sample size is large.
2. A z-test is a hypothesis test for data that follows a normal distribution.
3. The formula for the double mean z-test is:
̅𝟏 − 𝑿
𝑿 ̅𝟐
|𝒁| =
𝟐 𝟐
√ 𝝈𝟏 + 𝝈𝟐
𝟏 𝒏 𝟐 𝒏
Were,
|𝑍| = The Z-Statistics for the two groups
𝑋̅1 𝑜𝑟 𝑋̅2 = The sample means of the two groups
𝑛1 𝑜𝑟 𝑛2 = The Samples sizes of the two groups
𝜎1 𝑜𝑟 𝜎2 = The Population Standard Deviation of the two groups
May 2022
Q.1. In a simple study about coffee habits in two towns A and B the following information is given
Town A: Females were 40%, total coffee drinkers were 45% and female non coffee drinkers
were 20%.
Town B: Males were 55%, male non coffee drinkers were 30% and female coffee drinkers
were 15%.
Present the data into a table format.
Town A Town B Total
Male Female Total Male Female Total
Coffee drinkers 25 20 45 25 15 40 95
Non coffee drinkers 35 20 55 30 30 60 115
Total 60 40 100 55 45 100 200

Q.2. Perform simple linear regression, Determine slope and intercept. (Marks 10)
X 1 2 3 3 4 5
y 8 4 5 2 2 0

𝒙 𝒚 𝒙𝟐 𝒚𝟐 𝒙𝒚
1 8 1 64 8
2 4 4 16 8
3 5 9 25 15
3 2 9 4 6
4 2 16 4 8
5 0 25 0 0
𝒙 = 𝟏𝟖 𝒚 = 𝟐𝟏 𝟐 𝟐 𝒙𝒚 = 𝟒𝟓
𝒙 = 𝟔𝟒 𝒚 = 𝟏𝟏𝟑

Let find a slope:


𝒏 ∑ 𝒙𝒚 − ∑ 𝒙 ∑ 𝒚
𝒃(𝒔𝒍𝒐𝒑𝒆) =
𝒏 ∑ 𝒙𝟐 − (∑ 𝒙)𝟐
6 × 45 − 18 × 21
=
6 × 64 − 182
270 − 378
=
384 − 324
−108
=
60
= −1.8

∴ Slope is 𝑏 = −1.66.

Let find an intercept:

∑ 𝒚 ∑ 𝒙𝟐 − ∑ 𝒙 ∑ 𝒙𝒚
𝒂(𝒊𝒏𝒕𝒆𝒓𝒄𝒆𝒑𝒕) =
𝒏(∑ 𝒙𝟐 ) − (∑ 𝒙)𝟐
21 × 64 − 18 × 45
=
6 × 64 − 182
1344 − 810
=
384 − 324
534
=
60
= 8.9
∴ An intercept is 𝑎 = 8.9

Simple linear regression formula is given by,

𝒚 = 𝒂 + 𝒃𝒙
∴ Slope 𝑏 = −1.8 and intercept 𝑎 = 8.9

∴ 𝑦 = 8.9 − 1.8𝑥

∴ The simple linear regression is 𝑦 = 8.9 − 1.66𝑥.

Q.3. What do you mean by a questionnaire? What is the difference between a questionnaire and
a schedule? State the essential points to be remembered in drafting a questionnaire.
1) A questionnaire is a list of questions that ask respondents about themselves or others.

Basis Questionnaire Schedule


Meaning A questionnaire is a research A schedule is a formalized
instrument used by any
arrangement of inquiries,
researcher as a tool to collect data
proclamations, statements, and
or gather information from any spaces for replies given to the
source or subject of his or her enumerators who pose inquiries to
interest from the respondents. the respondents and note down the
responses.
Filled by A questionnaire is filled by the A schedule is filled by an
respondents. enumerator.
Response The response rate of a The response rate of a schedule is
Rate questionnaire is low. high.
Cost It is economical in terms of time, It is expensive in terms of time,
effort, and money. effort, and money.
Coverage A large area can be covered Comparatively small areas can be
through a questionnaire. covered through a schedule.
Respondent’s The identity of the respondent is As the enumerator visits the
Identity unknown. informant personally, his identity is
known.
Dependency The success of a questionnaire The success of a schedule depends
of Success depends upon its quality. upon the honesty and competence
of the enumerator.
Usage A questionnaire is used only when A schedule can be used in both cases
the people are literate and when people are literate and
cooperative. illiterate.
Following few essential points to be remembered in drafting a questionnaire:
 The questionnaire flows.
 Keep it short & simple.
 Be neutral in questioning.
 Avoid Double barrelled questions.
 Don't assume respondents are experts.
 Avoid negatives or double negatives.
Q.4. What is Stratified sampling? Explain the merits and limitations of Stratified sampling.
1) Stratified sampling is a method of collecting data that involves dividing a large population
into smaller subgroups, and there are various pros and cons of the stratified sampling
method.
2) It’s commonly used when conducting surveys or gathering statistical data.
3) It allows people to survey a large population but in a more manageable way.
4) For example, if you're surveying a university population about how satisfying their school
experience has been, you may use this method to divide the students up by program.
5) This not only provides more in-depth data, but it also makes the large task easier to do.
6) Merits of Stratified Sampling:
1. More accurate data:
This method of data collection can allow for more accurate information.
2. More diverse data:
Having multiple subgroups within your sample population also allows you to collect a
more diverse range of data.
3. More manageable:
Having subgroups within your population can also make the data, and the work of
collecting the data, more manageable.
4. More cost-effective:
If you use stratified sampling, it can be a most cost-effective method of conducting a
survey.
5. Prevents sample bias:
Stratified sampling allows researchers to examine their sample and build groups of
participating who are free of bias.
7) Limitations of Stratified Sampling:
a) Lacks versality:
This method only works for studies that require sample populations and surveys
b) Difficult Data Analysis:
With more and more subgroups, there also comes a larger input of information. The
information can be specific and intricate, and it can take a long time to analyze.
c) Requires more planning:
Creating a study using a stratified sampling method can require a significant amount of
planning.

Q.5. What is diagrammatic representation of data? Explain its advantages.


1) Diagrammatic presentation is the visual form of presentation of data in which facts are
highlighted in the language of diagrams.
2) It consists in presenting statistical material in interesting and attractive geometrical figures
(Bars, Circle, Rectangle, Squares, Graphs, etc.), pictures, maps and charts etc.
3) It will attract the attention of a large number of persons.
4) It facilitates comparison between two or more sets of data.
5) Advantages of Diagrammatic Representation:
1. Diagrams are attractive and impressive
2. Diagrams facilitate comparison
3. Diagrams simplify data
4. Universal applicability
5. Easy to remember
6. Diagrams save time for understanding
7. Diagrams provides more information

Q.6. The manufacturer of a certain make of electric bulbs claims that his bulbs have mean life of
25 months with standard deviation of 5 months. A random sample of 6 bulbs gave the
following value:
Life of bulbs in months: 24, 26, 30, 20, 20, 18
Is the manufacturer’s claim valid at 1% level of significance? (Given that the table values of
the appropriate test statistics at said level are 4.032, 3.707, and 3.499 for 5, 6 and 7 degrees
of freedom respectively). (Marks 10)
̅
∴ 𝑛 = 6, 𝜇 = 25, 𝜎 = 5, 𝑋 =?
∑ 𝑋 24 + 26 + 30 + 20 + 20 + 18 138
𝑋̅ = = = = 23 ∴ 𝑋̅ = 23
𝑛 6 6
Life of bulbs in months (𝑿) 𝒙 = 𝑿 − 𝟐𝟑 𝒙𝟐
24 1 1
26 3 9
30 7 49
20 -3 9
20 -3 9
18 -5 25
𝑋 = 138 𝑥=0 𝑥 2 = 102
∑ 𝒙𝟐
𝑺=√
𝒏

102
=√
6
∴ 𝑆 = 4.12
1. Hypothesis:
𝐻0 ∶ Null Hypothesis
𝜇 = 25
Checks the manufacturer’s claim is valid.
𝐻1 ∶ Alternative Hypothesis
𝜇 ≠ 25
Checks the manufacturer’s claim is not valid.
2. Computation of test statistics:
𝑺
̅) =
𝑺 ∙ 𝑬(𝑿
√𝒏 − 𝟏
4.12
=
√6 − 1
4.12
=
√5
= 1.84
̅−𝝁
𝑿
|𝒕| =
̅)
𝑺 ∙ 𝑬(𝑿
23 − 25
=
1.84
|𝑡| = −1.08
∴ 𝑡 = 1.08
3. Level of significance:
𝛼 = 0.01
4. Critical Value:
𝑡𝛼 at 1% Level of significance or degrees of freedom 𝑛 = 𝑛 − 1 = 6 − 1 = 5 is 4.032
𝑡𝛼 = 4.032
5. Decision:
∴ 𝑡 < 𝑡𝛼
∴ 1.08 < 4.032
∴ Null hypothesis is accepted and Alternative hypothesis is rejected.
∴ The manufacturer’s claims is Valid.
Extra Questions:

Q.1. A random sample of 900 items is taken from a normal population who’s the mean and
variance are 4. Can the sample with mean 4.5 be regarded as truly random one at 1% level
of significance? (Table value at 1% is 2.58). (Marks 5)
̅ 2
∴ 𝑛 = 900, 𝑋 = 4.5, 𝜇 = 4, 𝑆 = 4, 𝐿𝑂𝑆 = 1%
∴ 𝜎(𝑆) = 𝑆 2 = √4 = 2 ∴𝜎=2
1. Hypothesis:
𝐻0 ∶ Null Hypothesis
𝜇=4
𝐻1 ∶ Alternative Hypothesis
𝜇≠ 4
2. Test statistics:
𝝈
𝑺 ∙ 𝑬(𝑿̅) =
√𝒏
2
=
√900
2
=
30
= 0.06
𝑿̅ −𝝁
| 𝒛| =
𝑺 ∙ 𝑬(𝑿̅)
4.5 − 4
=
0.063
|𝑧| = 7.936
∴ 𝑧 = 7.936
3. Level of significance:
𝛼 = 0.01
4. Critical Value:
𝑍𝛼 at 1% Level of significance is 2.58
𝑍𝛼 = 2.58
5. Decision:
∴ 𝑧 > 𝑧𝛼
∴ 7.936 > 2.58
∴ Null hypothesis is rejected and Alternative hypothesis is accepted.

Q.2. The height of 10 children selected at random from a given locality had a mean 63.2cms and
variance 6.25cms. Test at 5% level of significance the hypothesis that the children of the
given locality are on the average less than 65cms in all. Given for 9 degrees of freedom (𝒕 >
𝟏. 𝟖𝟑) = 𝟎. 𝟓 (Marks 5)
̅ 2
∴ 𝑛 = 10, 𝑋 = 63.5, 𝜇 = 65, 𝑠 = 6.25, 𝐿𝑂𝑆 = 5%
∴ 𝑠 = 𝑠 2 = √6.25 = 2.5 ∴ 𝑠 = 2.5
1. Hypothesis:
𝐻0 ∶ Null Hypothesis
𝜇 ≥ 65
𝐻1 ∶ Alternative Hypothesis
𝜇 < 65
2. Test statistics:
𝒔
̅) =
𝑺 ∙ 𝑬(𝑿
√𝒏 − 𝟏
2.5
=
√10 − 1
2.5
=
3
= 0.833
̅−𝝁
𝑿
|𝒕| =
𝑺 ∙ 𝑬(𝑿̅)
63.2 − 65
=
0.833
|𝑡| = −2.16
∴ 𝑡 = 2.16
3. Level of significance:
𝛼 = 0.05
4. Critical Value:
𝑡𝛼 at 5% Level of significance or degrees of freedom 𝑛 = 𝑛 − 1 = 10 − 1 = 9 is 1.83
𝑡𝛼 = 1.83
5. Decision:
∴ 𝑡 > 𝑡𝛼
∴ 2.16 > 1.83
∴ Null hypothesis is rejected and Alternative hypothesis is accepted.

Q.3. Find 𝒚 when 𝒙𝟏 = 𝟑𝟕𝟎𝟎 kg and 𝒙𝟐 = 𝟐𝟔𝟎 km from least square regression equation of 𝒚 in
𝒙𝟏 and 𝒙𝟐 for the following:
𝒀 160 112 69 90 123 186
𝒙𝟏 4.0 2.0 1.6 1.2 3.4 4.8
(1000 kg)
𝒙𝟐 1.5 2.2 1.0 2.0 0.8 1.6
(100 km)
∴ ∑ 𝑌 = 740, ∑ 𝑥1 = 17, ∑ 𝑥2 = 9.1, 𝑛 = 6
Let, solve:
∴ ∑ 𝒀 = 𝒏𝒂 + 𝒃𝟏 ∑ 𝒙𝟏 + 𝒃𝟐 ∑ 𝒙𝟐 ………….eq(1)
𝟐
∴ ∑ 𝒙 𝟏 𝒚 = 𝒂 ∑ 𝒙 𝟏 + 𝒃𝟏 ∑ 𝒙 𝟏 + 𝒃𝟐 ∑ 𝒙 𝟏 𝒙 𝟐 ………….eq(2)
𝟐
∴ ∑ 𝒙𝟐 𝒚 = 𝒂 ∑ 𝒙𝟐 + 𝒃𝟏 ∑ 𝒙𝟏 𝒙𝟐 + 𝒃𝟐 ∑ 𝒙𝟐 ………….eq(3)
𝑌 𝑥1 𝑥2 2 2 𝑥1 𝑦 𝑥2 𝑦 𝑥1 𝑥2
𝑥1 𝑥2
160 4 1.5 16 2.25 640 240 6
112 2 2.2 4 4.84 224 246.4 4.4
69 1.6 1 2.56 1 110.4 69 1.6
90 1.2 2 1.44 4 108 180 2.4
123 3.4 0.8 11.56 0.64 418.2 98.4 2.72
186 4.8 1.6 23.04 2.56 892.8 297.6 7.68
𝑌 𝑥1 = 17 𝑥2 𝑥1 2 𝑥2 2 𝑥1 𝑦 𝑥2 𝑦 𝑥1 𝑥2
= 740 = 9.1 = 58.6 = 15.29 = 2393.4 = 1131.4 = 24.8

∴ 740 = 6𝑎 + 17𝑏1 + 9.1𝑏2 ………….eq(1)


∴ 2393.4 = 17𝑎 + 58.6𝑏1 + 24.8𝑏2 ………….eq(2)
∴ 1131.4 = 9.1𝑎 + 24.8𝑏1 + 15.29𝑏2 ………….eq(3)
∴ 𝑎 = −4.57
∴ 𝑏1 = 30.94
∴ 𝑏2 = 26.53
∴ The regression line equation is 𝑌 = −4.57 + 30.94𝑋1 + 26.53𝑋2

When 𝑋1 = 3700 and 𝑋2 = 260, 𝑌 =?


𝑌 = −4.57 + 30.94𝑋1 + 26.53𝑋2
= −4.57 + 30.94(3700) + 26.53(260)
= 121371.23
∴ 𝑌 = 121371.23

Q.4.

You might also like