Jhansi Kokkiligadda
PMRF Research Scholar
Department of Metallurgical and Materials Engineering
IIT Madras, Chennai, TN
NPTEL Course: Introduction to Research
Role: Conduct Problem Solving Sessions
Course Id: noc23-ge05
Course offered: By Prof. Edamana Prasad, Prof. Prathap Haridoss | IIT Madras
COURSE LAYOUT
Week 1 : A group discussion on what is research; Overview of research;
Week 2 : Literature survey , Experimental skills;
Week 3 : Data analysis, Modelling skills;
Week 4 : Technical writing; Technical Presentations; Creativity in Research
Week 5 : Creativity in Research; Group discussion on Ethics in Research
Week 6 : Intellectual Property
Week 7 : Design of Experiments
Week 8 : Department specific research discussions
Week 7: Sample Problems
Note : This assignment is only for practice purpose. The following questions may have
more than one correct answer.
1) Scope of hypothesis testing concerns with-
a) Parameters of probability distribution of population
b) Parameters of probability distribution of sample
c) It relies on data contained in the sample taken from population of interest
d) All of the above
1) Answers: a,c
Explanation: Hypothesis testing is a fundamental concept in statistics used to make inferences
about populations based on sample data. Here's an explanation regarding the options provided:
a) Parameters of probability distribution of population: Hypothesis testing primarily concerns
itself with making inferences about the parameters of a population's probability distribution, like
the population mean or population proportion. These parameters are characteristics of the
population that we're trying to estimate or test.
1
c) It relies on data contained in the sample taken from population of interest: In practice, it's
often impossible to collect data from an entire population, so a sample is used instead.
Hypothesis testing relies on the data contained in this sample to make inferences about the
population parameters. The sample data provide the evidence used to test a hypothesis about the
population.
Option b) Parameters of probability distribution of sample is not typically the focus of hypothesis
testing. Instead, sample statistics are used to make inferences about the population parameters,
which is why it's not listed as part of the answer.
2) A normal probability Distribution function is symmetric about-
a) Standard Deviation (SD)
b) Mean
c) SD*Mean
d) SD-Mean
2) Answers: b
Explanation: The normal probability distribution, also known as the Gaussian distribution, is
symmetric about its mean (μ). The mean value is the central value of the distribution, which
implies that the distribution is mirror-imaged along the vertical line that passes through the
mean. This symmetry signifies that half of the values are to the left of the mean and the other
half to the right of the mean. Moreover, in a normal distribution, the mean, median, and mode are
all equal and located at the center of the distribution. Hence, the correct answer is b) Mean.
3) Coefficient of Determination is said to be unity when -
a) Error Sum of Squares is zero
b) Total Sum of Squares is zero
c) Regression Sum of Squares is zero
d) Regression Sum of Squares is same as Total Sum of Squares
3) Answers: a,d
Explanation: The Coefficient of Determination, denoted as R2, is a statistic that provides a
measure of how well the observed outcomes are replicated by the model, based on the proportion
of total variation of outcomes explained by the model.
a) Error Sum of Squares is zero: When the Error Sum of Squares (SSE) is zero, it means that the
model has perfectly predicted all the data points; there are no errors between the predicted values
and observed values. When SSE is zero, R2 will be equal to 1 (or 100%), indicating a perfect fit.
d) Regression Sum of Squares is same as Total Sum of Squares: The Total Sum of Squares
(SST) is partitioned into the Regression Sum of Squares (SSR) and the Error Sum of Squares
(SSE). If SSR is equal to SST, it means that all the variability in the data has been explained by
2
the regression model, leaving no error variance. In such a scenario, R2R2 would also be equal to
1, indicating a perfect fit.
Both scenarios reflect a situation where the model has perfectly captured the underlying
relationship in the data, hence leading to a Coefficient of Determination of unity.
4) The probability density function describes the following-
a) Probability of a phenomenon occurring in macro scale
b) Probability of a phenomenon occurring in micro scale
c) Distribution of probabilities in the continuous random variable domain
d) Distribution of probabilities in some particular set of undefined variables
4) Answers: c
Explanation: The probability density function (PDF) is a statistical expression that defines a
continuous probability distribution. Here's an elaboration concerning option c:
c) Distribution of probabilities in the continuous random variable domain: The PDF is utilized to
specify the probability of a continuous random variable falling within a particular range of
values. Unlike discrete random variables, which have distinct, separate values, continuous
random variables can take on an infinite number of values within a given range. The PDF
provides a way to model the distribution of probabilities across the continuum of possible values
that the random variable can take on. The area under the curve of a PDF over an interval gives
the probability that the random variable falls within that interval. Hence, the PDF describes the
distribution of probabilities in the domain of a continuous random variable, aligning with option
c.
5) During Response Surface Methodology’s Canonical Analysis involving two factors,
maximum response is said to be achieved when the eigen values of the B matrix are
a) all positive
b) all negative
c) all zero
d) such that one is positive and the other is negative
5) Answers: b
Explanation: The Response Surface Methodology (RSM) is a technique used for optimizing
processes with several independent variables. In the context of RSM, Canonical Analysis is
employed to examine the nature of the stationary point of the response surface, which could be a
maximum, a minimum, or a saddle point.
The eigenvalues of the B matrix, which is derived from the second-order model, play a crucial
role in understanding the nature of the response surface around the stationary point.
3
b) all negative: When all the eigenvalues of the B matrix are negative, it implies that the response
surface has a local maximum at the stationary point. In other words, in a Canonical Analysis
involving two factors, if both eigenvalues are negative, it indicates a maximum response
situation.
The eigenvalues provide insight into the curvature of the response surface in the direction of the
principal axes. Negative eigenvalues signify a downward curvature, hence indicating a maximum
response. Therefore, the correct answer to this question is b) all negative, as it points to the
scenario where the maximum response is achieved.
6) Experiments are conducted in laboratories to
a) verify scientific predictions
b) obtain trends when theory is not known
c) Find effect of different variables on the desired response
d) All of the above
6) Answers: d
Explanation: Experiments are conducted for a variety of reasons in scientific and engineering
contexts. Here's a breakdown of the provided options:
a) Verify scientific predictions: Experiments can be conducted to test hypotheses or verify
predictions made by theories. By comparing experimental results with theoretical predictions,
scientists can validate or challenge existing theories.
b) Obtain trends when theory is not known: When there isn't a well-established theory,
experiments can help in identifying trends, behaviors, or relationships among variables. This
exploratory approach helps in generating data that might lead to the development of new theories
or models.
c) Find the effect of different variables on the desired response: Experiments are pivotal in
understanding how different variables affect a particular outcome or response. Through
controlled experimentation, researchers can isolate the effects of individual variables, allowing
for a better understanding of the system under study.
d) All of the above: Each of the above points represents a valid reason for conducting
experiments in laboratories. Therefore, option d) encapsulates the multifaceted purposes of
experimentation, from verifying scientific predictions, exploring unknown phenomena, to
analyzing the effects of different variables on a desired response. Hence, the answer is indeed d)
All of the above.
7) Despite meticulous planning and execution, repeated experiments with accurately
calibrated instruments often yield varying results due to:
a) Systematic influences from environmental factors
4
b) Uncontrollable random effects
c) Extreme fluctuations in system properties
d) None of the above
7) Answers: b
Explanation: Uncontrollable random effects (Option b) often contribute to the variation observed
in repeated experiments, even when they are well-designed and executed with properly
calibrated instruments. These random effects could be due to inherent variability in the materials
being studied, minute changes in environmental conditions, or other unpredictable factors.
Unlike systematic effects, which can often be identified and corrected for, random effects are
inherent to the experimental process and contribute to the variability observed in experimental
results.
8) When you do experiments repeatedly you should analyse
a) the average value of the responses only
b) the variability in the responses only
c) neither the average nor variability
d) both average of the responses and the variability in them
8) Answers: d
Explanation: When conducting repeated experiments, it's crucial to analyze both the average
value of the responses and the variability in them. Here's a brief explanation of option d:
d) Both average of the responses and the variability in them: Analyzing both the average and
variability provides a comprehensive understanding of the data generated from the experiments.
1. Average of the Responses: The average value gives you a central tendency of your data,
which is a crucial aspect to understand the general behavior or outcome of the
experiment. It provides a single value representation of the data which is useful for
comparison purposes.
2. Variability in the Responses: The variability, on the other hand, gives insight into the
spread or dispersion of the data points. This is essential for understanding the consistency
and reliability of the results. It helps in assessing the precision and the repeatability of the
experiments.
Both of these aspects together provide a fuller picture of the experimental outcomes and their
implications, making option d) the appropriate choice for a thorough analysis of repeated
experiments.
5
9) The average estimated from a data collection is 50 cm. The quantity (quantities) which
has the same units of the data is(are)
a) Standard deviation
b) median
c) mode
d) All of the above
9) Answer: d
Explanation: The units of measurement in a dataset remain consistent across various statistical
measures such as the mean, median, mode, and standard deviation. Here's a brief explanation of
option d:
d) All of the above:
1. Standard Deviation (a): The standard deviation is a measure of the amount of variation
or dispersion of a set of values. It is expressed in the same units as the data because it is
derived from the square root of the variance (which is calculated using the original units
of the data).
2. Median (b): The median is the value separating the higher half from the lower half of a
data sample. Since it's a value from the dataset or an average of two values from the
dataset, it is expressed in the same units as the data.
3. Mode (c): The mode is the value that appears most frequently in a data set. Like the
median, it's a value from the dataset, and so it is also expressed in the same units as the
data.
Hence, all these statistical measures - standard deviation, median, and mode - are expressed in
the same units as the data, which in this case is centimetres. Therefore, the answer is d) All of the
above.
10) Sample collected from an instrument giving binary data has the following values
{1,-1,-1,1,1,-1}
Mean of the above data set is
a) -1
b) 0
c) 1
d) 3
10) Answer: b
Explanation: The mean (average) of a dataset is calculated by summing all the values in the
dataset and then dividing by the number of values in the dataset.
For the given dataset {1, -1, -1, 1, 1, -1}, the calculation of the mean is as follows:
6
Mean = 0
Therefore, the correct answer is b) 0.
11) Sample collected from an instrument giving binary data has the following values
{1,-1,-1,1,1,-1}
Sample standard deviation (s) of the above data set is
a) -1
b) 0
c) 1.095
d) 1.200
11) Answers: c
Explanation:
12) Sample collected from an instrument giving binary data has the following values {1,-1,-
1,1,1,-1} median of the above data set is
a) 0
b) 1
c) -1
7
d) 2
12) Answers: a
Explanation: To find the median of a data set, you should first arrange the data in numerical
order, from least to greatest:
−1,−1,−1,1,1,1
Since the data set has an even number of observations (6), the median will be the average of the
3rd and 4th values in this sorted list.
Median=(−1+1)2=0Median=2(−1+1)=0
So, the median of the data set is 0.
13) Sample collected from an instrument giving binary data has the following values
{1,−1,−1,1,1,−1,1} median of the above data set is
a) 0
b) 1
c) -1
d) 2
13) Answers: b
Explanation: To find the median of the data set {1,−1,−1,1,1,−1,1}we need to first sort the data
in ascending order:
-1,-1,-1,1,1,1,1
Since the data set has an odd number of observations (7), the median is the value at the middle
position, which is the 4th value in this sorted list.
Therefore, the median of the data set is 1.
14) Sample collected from an instrument giving binary data has the following values {1,-1,-
1,1,1,-1}
Variance of the above data set is
a) 1
b) 1.2
c) 1.5
8
d) 1.8
15) When two random variables X and Y are independent, then the expected value for
E(X) may be found from the expected values E(Y) and E(XY) in the following manner
a) E(X) = E(XY) - E(Y)
b) E(X) =E(XY)/E(Y)
c) E(X)= E(XY) +E(Y)
d) E(X) =E(XY) . E(Y)
15) Accepted Answers: b) E(X) =E(XY)/E(Y)
Explanation: The statement given in option b) is correct when X and Y are independent random
variables. The relationship between the expected values of X, Y, and XY can be expressed as
follows:
E(XY)=E(X)E(Y)
Now, if we want to solve for E(X), we can rearrange the formula to isolate E(X) on one side of
the equation:
E(X)=E(XY)/E(Y)
9
This equation shows that the expected value of X can be found by dividing the expected value of
XY by the expected value of Y, which corresponds to option b) E(X)=E(XY)/E(Y)
16) In a normal distribution, the following property is valid
a) the mean of the distribution is always 0
b) it approaches the t-distribution as the limits of the normal distribution are widened
c) the distribution is symmetric about its mean
d) None of the above
16) Accepted Answers: c) the distribution is symmetric about its mean
Explanation: In a normal distribution, the distribution is symmetric about its mean. This is a
defining characteristic of the normal distribution. The mean, median, and mode of a normal
distribution are all equal and are located at the center of the distribution. This is reflected in
option c) "the distribution is symmetric about its mean," which is the accepted answer for
question 16. The symmetry implies that the probability of obtaining a value above the mean is
equal to the probability of obtaining a value below the mean. This property is fundamental to
many statistical methods and analyses which assume normality.
17) In a factorial design, the 24 design strategy was chosen. This means
a) Number of experiments is 16
b) Each of the two variables is varied at four levels
c) After the test, ANOVA will NOT be able to detect ternary interaction effects between the
variables
d) None of the above
17) Accepted Answers: a
Number of experiments is 16
Explanation: In a 2^4 factorial design, there are two levels for each of the four factors, resulting
in 2^4=16 different experimental conditions or runs. This type of design allows for the
investigation of the main effects of four factors as well as all possible interactions between these
factors. So, option a) "Number of experiments is 16" is the accurate description of a 2^4 factorial
design, which is why it's the accepted answer for question 17. This design strategy is quite
common and powerful in experimental research as it allows for a comprehensive exploration of
the effects of multiple factors and their interactions.
18) In a statistical design of experiments study, the effects of two factors A and B were
analysed at two levels. It was found that when A was changed from its low to high setting at
the fixed lower level of B, the change in response was an increase of 10 units. Now when A
is changed from its low setting to high setting at the fixed higher level of B, the change in
response was a decrease of 10 units. This means that
a) there is no interaction between A and B
b)A interacts with B but B does not interact with A
10
c) B interacts with A but A does not interact with B
d) there is negative interaction between factors A and B
18) Accepted Answers: d) there is negative interaction between factors A and B
Explanation: The scenario described suggests that the effect of factor A on the response variable
changes depending on the level of factor B. Specifically, when B is at its lower level, increasing
A leads to an increase in the response, but when B is at its higher level, increasing A leads to a
decrease in the response. This change in the effect of factor A depending on the level of factor B
is indicative of an interaction between factors A and B.
The interaction is termed as negative since the effect of factor A is opposite at different levels of
factor B. Therefore, option d) "there is negative interaction between factors A and B" accurately
describes the relationship between factors A and B, making it the accepted answer for question
18. Understanding such interactions is crucial in a Design of Experiments (DOE) framework as it
helps in understanding how different factors jointly affect the response variable.
19) In hypothesis testing,
a) wrongly rejecting the null hypothesis is a more serious mistake than wrongly accepting it
b) wrongly accepting the null hypothesis is a more serious mistake than wrongly rejecting it
c) wrongly accepting a null hypothesis is equivalent to accepting the alternate hypothesis
d) correctly accepting the null hypothesis means that the alternate hypothesis is incorrectly
rejected
19) Accepted Answers: a) wrongly rejecting the null hypothesis is a more serious mistake
than wrongly accepting it
Explanation: In hypothesis testing, Type I error occurs when you wrongly reject the null
hypothesis, while Type II error occurs when you wrongly accept the null hypothesis. Type I error
(wrongly rejecting the null hypothesis) is typically considered more serious because it leads to
the conclusion that there is a significant effect or difference when there isn't one, potentially
leading to costly and unnecessary actions or decisions. Type II error (wrongly accepting the null
hypothesis) is still important to consider but is generally considered less serious in many
practical applications.
20) In an one-tailed hypothesis test, with the level of significance α is set to 0.05. In order
for a particular effect to be significant, the p-value of the test should be
a) exactly 0.05
b) above 0.05
c) below 0.05
d) exactly 0.025
20) Accepted Answers: c) below 0.05
11
Explanation: In a one-tailed hypothesis test with a significance level (α) set to 0.05, for the test
to be considered significant, the p-value should be below 0.05. This means that the observed data
should provide strong enough evidence to reject the null hypothesis at the 0.05 significance level.
If the p-value is equal to or less than 0.05, you would reject the null hypothesis in favor of the
alternative hypothesis.
21) When data are fitted to a model then increasing number of model parameters is
penalized by the
a) R2 criterion
b) adjusted R2 criterion
c) Sum of squares of the residuals
d) Sum of absolute residuals
21) Accepted Answers: b) adjusted R2 criterion
Explanation: When fitting data to a model, increasing the number of model parameters can lead
to overfitting, where the model fits the noise in the data rather than capturing the underlying
patterns. The adjusted R2 criterion is a statistical measure that penalizes the inclusion of
excessive model parameters. It adjusts the R-squared (R2) value for the number of predictors in
the model, helping to prevent overfitting. The adjusted R2 increases only if the additional
parameter improves the model fit significantly, and it decreases if adding more parameters
doesn't contribute enough to justify their inclusion. This makes it a useful criterion for model
selection when dealing with multiple predictors in regression analysis.
22) In a typical central composite design, the repeats are performed
a) to obtain an estimate of the random experimental error
b) at the centre of the experimental design space
c) to help in identifying model curvature
d) All of the above
22) Accepted Answers: d) All of the above
Explanation: In a typical central composite design (CCD), repeats or replicates are performed for
several reasons:
a) To obtain an estimate of the random experimental error: Replicates help in assessing the
variability and randomness in the experimental data, which is crucial for drawing reliable
conclusions from the experiment.
b) At the center of the experimental design space: CCD includes a set of experimental runs at the
center point of the design space to estimate the linear effects and to provide a baseline for
comparing the effects of factors at extreme settings.
c) To help in identifying model curvature: Replicates at different points within the design space,
including the center and possibly other intermediate points, help in detecting and quantifying any
12
curvature or nonlinearity in the response surface, which may be essential for building an accurate
model of the system.
So, all of the options (a), (b), and (c) are correct, making the answer d) "All of the above" the
correct choice.
23) In a central composite design, involving two factors A and B, the response surface is
described by a quadratic model. If the effect of interaction between A and B is only absent,
the relevant significant effects are from
a) A only
b) A and B only
c) A, B, A2 and B2
d) A, B, A2, B2, A3 and B3
23) Accepted Answers: c) A, B, A2 and B2
Explanation: In a central composite design (CCD) involving two factors A and B, where the
response surface is described by a quadratic model, if the effect of interaction between A and B
is absent (meaning there is no AB term in the model), then the relevant significant effects are
typically from:
c) A, B, A^2, and B^2
In a quadratic model, you consider the main effects of A and B (A and B), as well as their
squared terms (A^2 and B^2) to account for curvature in the response surface. Since there's no
interaction term (AB), you don't need to consider it when determining the relevant significant
effects in this case.
24) In a normal distribution with parameter μ and σ ^2, the
a) mean ≠ median ≠ mode
b) distribution is not symmetric about the mean
c) mean is always zero
d) none of the above
24) Accepted Answers: d) None of the above
Explanation: In a normal distribution with parameters μ (mean) and σ (standard deviation):
a) The mean (μ) is equal to the median, and both are equal to the mode. In a normal distribution,
the mean, median, and mode are all the same value, which is μ.
b) The normal distribution is symmetric about its mean. It is a bell-shaped curve that is perfectly
symmetric, with the mean (μ) as the central point of symmetry.
13
c) The mean of a normal distribution is not always zero. The mean (μ) can take any value, and it
is not necessarily zero.
So, none of the provided options (a), (b), or (c) is correct.
25)
a) ∫ 0 ∞ f(θ) − 0
b) ∫ 0 ∞ f(θ) −∞
c) ∫ 0 ∞ f(θ) − 1
d) ∫ 0 ∞ f(θ) − τ
25) Answers: c) ∫ 0 ∞ f(θ) − 1
26) The mean value of the distribution, is
a) τ
b) 0
c) ∞
d) τ/2
26) Answer: a) τ
14