Statistics in Research
Research Aptitude
Copyright © 2014-2023 TestBook Edu Solutions Pvt. Ltd.: All rights reserved
Download Testbook App
Parametric Tests
Used for Quantitative Data
Used for continuous variables
Used when data are measured on an approximate interval or ratio scales of measurement.
Data should follow a normal distribution
Parametric Tests
1. t-test (n<30)
1 of 16
SUBJECT | Research Aptitude
Download Testbook App
2. ANOVA (Analysis of Variance
3. Pearson's r Correlation
4. Z test for large samples (n>30)
Parametric tests
STUDENT'S T-TEST
Developed by Prof WS Gossett in 1908, who published statistical papers under the pen name of 'Student'. Thus the test is known as Student's 't' test.
indications for the test:
1. When samples are small
2. Population variance is not known.
Uses
1. Two means of small independent samples
2. Sample mean and population mean
2 of 16
SUBJECT | Research Aptitude
Download Testbook App
Assumptions made in the use of 't' test
1. Samples are randomly selected
2. Data utilised is Quantitative
3. Variable follow a normal distribution
4. Sample variances are mostly the same in both the groups under the study
5. Samples are small, mostly lower than 30
A t-test compares the difference between two means of different groups to determine whether that difference is statistically significant.
Student's 't' test for different purposes
't' test for one sample
't' test for unpaired two samples
't' test for paired two samples
ONE SAMPLE T-TEST
When comparing the mean of a single group of observations with a specified value
In one sample t-test, we know the population mean. We draw a random sample from the population and then compare the sample mean with the po
Calculation
Where = Sample mean
µ = population mean
= Standard error
Where x = element of sample
3 of 16
SUBJECT | Research Aptitude
Download Testbook App
= sample mean
n – 1 = degrees of freedom
Now we compare calculated value with table value at a certain level of significance (generally 5% or 1%)
If the absolute value of 't' obtained is greater than the table value then reject the null hypothesis and if it is less than the table value, the null hypothesis
EXAMPLE
Research Problem: Comparison of mean dietary intake of a particular group of individuals with the recommended daily intake.
DATA: Average daily energy intake (ADEI) over 10 days of 11 healthy women
Mean ADEI value = 6753.6
SD ADEI value = 1142.1
When can we say about the energy intake of these women in relation to a recommended daily intake of 7725 KJ?
Research Hypothesis
State null hypothesis and alternative hypothesis:
H
0 = there is no difference between population
mean and sample mean
OR
H
0 : µ = 7725 KJ
H
1 = there is a difference between the population mean and sample mean
OR
H
1 : µ ≠ 7725 KJ
Set the level of significance α = .05, .01 or .001
Calculate the value of the proper statistic
State the rule for rejecting the null hypothesis:
4 of 16
SUBJECT | Research Aptitude
Download Testbook App
Reject H0 if t ≥ +ve Tabulated
value
OR
Reject H0 if t ≤ -ve Tabulated
value
Or we can say that p < .05
In the above example, we have seen
t=-.2564 which is less than 2.23
P-value suggests that the dietary intake of these women was significantly less than the recommended level (7725 KJ)
Two Sample "t' test
A. Unpaired Two sample 't'- test
The unpaired t-test is used when we wish to compare two means
Used when the two independent random samples come from the normal populations having unknown or same variance
We test the null hypothesis, that the two population means are the same i.e μ1 = μ2 against an appropriate one-sided or two-sided alternative hypoth
Assumptions
The samples are random & independent of each other
The distribution of the dependent variables is normal.
The variances are equal in both the groups
FORMULA
The test statistic is given by
Where are respectively called SD's of the first and second group
Research Problem
A study was conducted to compare the birth weights of children born to 15 non-smoking with those of children born to 14 heavy smoking mothers.
5 of 16
SUBJECT | Research Aptitude
Download Testbook App
Non-smoking Heavy smoking
mothers (n = 15) mothers (n = 14)
3.99 3.18
3.79 2.84
3.60 2.90
3.73 3.27
3.21 3.85
3.60 3.52
4.08 3.23
3.61 2.76
3.83 3.60
3.31 3.75
4.13 3.59
3.26 3.63
3.54 2.38
3.51 2.34
2.71
Research Hypothesis: State null hypothesis and alternative hypothesis
H0 = there is no difference between the birth weights of children born lo non-smoking and smoking mother
H1 = there is a difference between the birth weights of children born to non-smoking and smoking mothers
Set the level of significance α =.05, .01 or .001
Calculate the value of proper statistic
State the rule for rejecting the null hypothesis
If tcal > ttab > we can say that P < .05 then we reject null hypothesis and accept the Alternative hypothesis. Decision
If we reject the null hypothesis so we can say that children born to non-smokers are heavier than children born to heavy smokers.
PAIRED TWO-SAMPLE T-TEST
6 of 16
SUBJECT | Research Aptitude
Download Testbook App
Used when we have paired data of observations from one sample only when each individual gives a pair of observations.
Same individuals are studied more than once in different circumstances- measurements made on the same people before and after interventions
Assumptions
The outcome variable should be continuous
The difference between pre-post measurements should be normally distributed
Z Test
This test is used for testing significance difference between two means (n>30).
Assumptions to apply Z test
The sample must be randomly selected
Data must be quantitative
Samples should be larger than 30
Data should follow normal distribution
Sample variances should be almost the
same in both the groups of study
7 of 16
SUBJECT | Research Aptitude
Download Testbook App
If the SD of the populations is known, a Z test can be applied even if the sample is smaller than 30
Indications for Z Test
To compare sample mean with population mean
To compare two sample means
Steps
1. Define the problem
2. State the null hypothesis (H0) & alternate hypothesis (H1)
3. Find Z value
4. Fix the level of significance
5. Compare calculated Z value with the value in Z table at corresponding degree significance level.
If the observed Z value is greater than theoretical Z value, Z is significant, reject null hypothesis and accept alternate hypothesis
One tailed and Two tailed Z tests
Z values on each side of mean are calculated as +Z or as -Z.
A result larger than difference between sample mean will give +Z and result smaller than the difference between mean will give -Z
E.g. for two tailed :
In a test of significance, when one wants to determine whether the mean 10 of malnourished children is different from that of well nourished and do
Conclusion
8 of 16
SUBJECT | Research Aptitude
Download Testbook App
Tests of significance play an important role in conveying the results of any research & thus the choice of an appropriate statistical test is very impor
Hence the emphasis placed on tests of significance in clinical research must be tempered with an understanding that they are tools for analyzing dat
Analysis of Variance(ANOVA)
Given by Sir Ronald Fisher
The principle aim of statistical models is to explain the variation in measurements.
The statistical model involving a test of significance of the difference in mean values of the variable between two groups is the student's ‘t’ test If there
Assumptions for ANOVA
1. Sample population can be easily approximated to normal distribution.
2. All populations have the same Standard Deviation.
3. Individuals in the population are selected randomly.
4. Independent samples
ANOVA compares variance by means of a simple ratio, called F-Ratio
F= Variance between groups
Variance within groups
The resulting F statistics are then compared with a critical value of F (critic), obtained from F tables in much the same way as was done with 't'
If the calculated value exceeds the critical value for the appropriate level of α, the null hypothesis will be rejected.
An F test is therefore a test of the Ratio of Variances F Tests can also be used on their own, independently of the ANOVA technique, to test hypotheses a
In ANOVA, the F test is used to establish whether a statistically significant difference exists in the data being tested.
9 of 16
SUBJECT | Research Aptitude
Download Testbook App
ANOVA can be
One Way ANOVA
If the various experimental groups differ in terms of only one factor at a time- a one way ANOVA is used
e.g. A study to assess the effectiveness of four different antibiotics on S Sanguis
Two Way ANOVA
If the various groups differ in terms of two or more factors at a time, then a Two Way ANOVA is performed
e.g. A study to assess the effectiveness of four different antibiotics on S Sanguis in three different age groups
Pearson's Correlation Coefficient
Karl Pearson is the most popular, widely used and correlation quantitatively within specified limitations through an ideal measure of covariance. The coe
indicates no correlation at all. It is popularly called Karl Pearson’s Coefficient of correlation or Pearsonian Correlation. The formulas used under this meth
By Direct method (Actual mean)
Where: γ = Karl Pearson’s Coefficient of Correlation
x and y = Deviations of individual items of the series from their mean
n = The number of terms of a series
σ1 and σ2 = standard Deviations of first and second series
The Kruskal-Wallis H Test
The Kruskal-Wallis H Test is a non-parametric procedure that can be used to compare more than two populations in a completely randomized desi
10 of 16
SUBJECT | Research Aptitude
Download Testbook App
All n = n1 + n2 + ... + nk measurements are jointly ranked (i.e. treat as one large sample).
We use the sums of the ranks of the k samples to compare the distributions.
The Kruskal-Wallis H Test
Rank the total measurements in all k samples from 1 to n. Tied observations arc assigned average of the ranks they would have gotten if not tied .
Calculate
Ti = rank sum for the i th sample i = 1, 2, ... ,k
And the test statistic
The Kruskal-Wallis H Test
H0: the k distributions are identical versus
Ha : at least one distribution is different
Test statistic: Kruskal-Wallis H
When H0 is true, the test statistic H has an approximate chi-square distribution with df
= k-1.
Use a right-tailed rejection region or p-value based on the Chi-square distribution.
Example
Four groups of students were randomly assigned to be taught with four different techniques, and their achievement test scores were recorded. Are the dis
1 2 3 4
65 75 59 94
87 69 78 89
79 81 62 88
Teaching Methods
11 of 16
SUBJECT | Research Aptitude
Download Testbook App
1 2 3 4
65 (3) 75 (7) 59 (1) 94 (16)
87 (13) 69 (5) 78 (8) 89 (15)
73 (6) 83 (12) 67 (4) 80 (10)
79 (9) 81 (11) 62 (2) 88 (14)
Ti 31 35 15 55
Teaching Methods
Key Concepts
l. Nonparametric Methods
These methods can be used when the data cannot be measured on a quantitative scale, or when
The numerical scale of measurement is arbitrarily set by the researcher, or when
The parametric assumptions such as normality or constant variance are seriously violated.
Key Concepts
Kruskal-Wallis H Test: Completely Randomized Design
1. Jointly rank all the observations in the k samples (treat as one large sample of size n say). Calculate the rank sums, Ti, = rank sum of sample i. and the test
2. If the null hypothesis of equality of distributions is false, H will be unusually large, resulting in a one-tailed test
3. For sample sizes of five or greater, the rejection region for H is based on the chi-square distribution with (k – 1) degrees of freedom.
Mann Whitney U test:
nonparametric equivalent of a t test for two independent samples
Use when:
12 of 16
SUBJECT | Research Aptitude
Download Testbook App
• Data does not support means (ordinal)
• Data is not normally distributed.
1) Rank all data.
2) Evaluate if ranks tend to cluster within a group.
Mann Whitney U test:
Where: n1 Size of Sample one
n2 Size of Sample two
Evaluation of Mann Whitney U
1) Choose the smaller of the two U values.
2) Find the critical value (Mann Whitney table)
3) When a computed value is smaller than the critical value the outcome is significant!
group 1 group 2
24 28
18 42
45 63
57 57
12 90
30 68
Step One: Rank all data across groups
group 1 group 2
24 28
18 2 42
45 63
57 57
12 1 90
30 68
group 1 group 2
24 3 28 4
18 2 42 6
45 7 63 10
57 8.5 57 8.5
12 1 90 12
30 5 68 11
Step Two: Sum the ranks for each group
13 of 16
SUBJECT | Research Aptitude
Download Testbook App
group 1 group 2
24 3 28 4
18 2 42 6
45 7 63 10
57 8.5 57 8.5
12 1 90 12
30 5 68 11
Check the rankings:
Step Three: Compute U1
U1 = 36 + 21– 26.5
U1 = 30.5
Step Four: Compute U2
U2 = 36+21–51.5
U2 = 5.5
Step Five: Compare U1 to U2
U1 = 30.5
U2 = 5.5
5.5 < 30.5
U = 5.5
Critical Value = 5
This is a nonsignificant outcome
Chi-square Test
Chi-square is a test statistic used to test a hypothesis that provides a set of the theoretical frequencies with, which observed frequencies are compared.
Chi-square, symbolically written as x2, enable us to test and compare whether more than two population proportions can be considered equal.
14 of 16
SUBJECT | Research Aptitude
Download Testbook App
Hence, it is a non-parametric test of statistical significance. Which compare observed data with expected data and testing the null hypothesis, which states t
The Chi-square ( ) is computed by using the following formula.
where O represents the observed frequency, E represents an expected frequency.
Whether or not a calculated value of is significant, can be ascertained by looking at the tabulated values of for a given degree of freedom at a certa
between the observed and expected frequencies is taken as significant but if the table value is more than the calculated value of , then the difference is
ignored.
Area of Application of Chi-square Test
The Chi-square test technique is used in a number of problems. Some of them are
As a Test of Goodness of Fit Karl Pearson developed a test for significance called the chi-square test of goodness of fit, which is used to test whether o
deviations, if any, between the observed and estimated values can be because of a chance or some other inadequacies.
As a Test of Homogeneity test helps is in stating whether different samples' come from the same universe. Through, this test, we can also explain whe
results fail to support the given hypothesis.
As Test of Population Variance square is also used to test the significance of population variance through confidence intervals, especially in the case of s
Conditions for the Applicability of Test
The following conditions should be satisfied before the test can be applied
• Observations are recorded and collected on a random basis.
• All the members in the sample must be independent
• No group should contain very few items.
• The overall number of items must be reasonably large.
• The constraints must be linear. Constraints, which involve linear equations in the cell frequencies of a contingency table are known as linear constraints
Step involved in Finding the Value of Chi-square
The process of computing the value involves the following steps
1. Set-up null hypothesis and alternative hypothesis.
2. List-up the observed frequencies.
3. Calculate the expected frequencies, if the data followed a given theoretical distribution.
4. Obtain the difference between the observed and corresponding expected frequencies.
5. Expressing the square of the difference as a fraction of the corresponding expected frequencies.
6. Now add all the fractions obtained.
7. Then compare the value with the appropriate (x2) value from the tables at the predetermined level of significance.
8. Accept the null hypothesis, if the value, thus computed for the given degrees of freedom and levels of significance is lesser than the (tabulated value) othe
Illustration The following table depicts the expected sales (E) and actual sales (O) of television sets for a company. Test whether there is a substantial differe
Actual and Expected Sales of Television Sets
Actual Sales (O) Expected Sales (E)
57 59
69 76
51 55
83 75
44 39
48 53
35 30
37 48
15 of 16
SUBJECT | Research Aptitude
Download Testbook App
Solution
Computation of Test Statistic
O E O–E (O – E)2 (O – E)2 / E
57 59 –2 4 0.068
69 76 –8 64 0.842
51 55 –4 16 0.291
83 75 8 64 0853
44 39 5 25 0.641
48 53 –5 25 0.472
35 30 –5 25 0.833
37 48 –11 121 2.521
Total 6.521
The critical value of Chi-square (8 – 1) = 7 degree of freedom at 0.05 level of significance is 2.167
But
Since, the value of x2 does fall within critical region the null hypothesis has to rejected. That is, there is a significant difference actual values of sales and
16 of 16
SUBJECT | Research Aptitude