Reliability as the second characteristics of assessment methods
means the extent to which a test is consistent and dependable. In
other words, the test agrees with itself. It is concerned with the
consistency of responses from moment to moment. Even if a
student takes the same test twice, the test yields the same results.
However, a reliable test may not always be valid.
For instance, Student D took Mathematics test twice. His
answer is eight (8) to item 9, “How many sides are there in
heptagon?” In the second administration of the test, his answer to
the same question remains the same, (8). Thus, his response is
reliable due to consistency of responses, but not valid because the
correct answer is seven (7). Hence, a reliable test may not always
be valid.
Techniques in Testing the Reliability of Assessment Method
1. Test-retest Method
In test-retest method, the same test is administered twice to
the same group of students and the correlation coefficient is
determined. The disadvantages of this method are:
a. when the time interval is short, the respondents may recall their
previous responses and this tends to make the correlation
coefficient high
b. when the time interval is long, factors such as unlearning,
forgetting, among others may occur and may result in low
correlation of the test
c. regardless of the time interval separating the two administrations,
other varying environmental conditions such as noise, temperature,
lighting and other factors may affect the correlation coefficient of
the test.
A Spearman rank correlation coefficient or Spearman rho is
the statistical tool used to measure the relationship between paired
ranks assigned to individual scores on two variables, X and Y of the
first administration (X) and second administration (Y). To obtain the
value of Spearman rho (rs), consider the formula (2.1) below.
2
6(∑ D )
rs = 1− 3 Figure 2.1
N −N
where rs stands for Spearman rho; ∑ D2 sum of the squared
difference between ranks and N number of cases.
To substitute formula (2.1), the steps are as follows:
Step Rank the scores of respondents from highest to lowest in
1. the first set of administration (X) and mark this rank as R x.
The highest score receives rank 1, second highest ,2; third
Assessment in Learning 1 Prepared by: Frederick M. Manuel, PhD
2
highest, 3 and so on.
Step Rank the second set of scores in the second
2. administration (X) in the same manner as in Step 1 and
mark as Ry.
Step Find the difference
3.
Step Multiply the difference by itself to get D2.
4.
Step Add or total D2 to get ∑ D2
5.
Step Compute Spearman rho rs by using formula 2.1
6.
For instance, fifteen students are used as pilot sample to test
the reliability of an achievement test in Mathematics using test-
retest technique. Table 2.1 presents the students’ scores and
reliability coefficient in two administrations of test-retest technique
using Spearman rho (rs).
Table Spearman rho Computation of First and Second
2.1. Administrations of Achievement Test Using the Test-
retest Technique in Mathematics
Ranks Differences
Groups X Y Rx Ry D D2
1 50 51 3.0 1.5 1.5 2.25
2 43 42 10.5 10.5 0 0
3 48 48 5.0 4.5 0.5 0.25
4 45 44 8 8 0 0
5 40 41 13 12.5 0.5 0.25
6 47 47 6 6 0 0
7 52 51 1 1.5 0.5 0.25
8 39 38 14.5 15 0.5 0.25
9 44 43 9 9 0 0
10 43 42 10.5 10.5 0 0
11 41 41 12 12.5 0.5 0.25
12 46 45 7 7 0 0
13 39 39 14.5 14 0.5 0.25
14 51 50 2 3 1.00 1.00
15 49 48 4 4.5 0.5 0.25
Total 5.00
2
6(∑ D ) Given:
rs = 1−
N 3−N
Assessment in Learning 1 Prepared by: Frederick M.
Manuel, PhD
3
2
∑D = 5
6(5)
= 1−
153−15 N = 15
30
= 1−
3360
= 1−0.00892857
rs = 0.9910714 or 0.99 (very high
relationship)
The spearman rho value obtained is 0.99 which denotes very
high relationship. This means that students who got very high score
in Mathematics achievement test in the first administration also got
very high score in the second administration and those who got very
low score in the second administration. Hence, the test is reliable.
Interpretation of Correlation Value
An r + 0.00 denotes zero correlation
An r from 0.01 to + 0.20 deals on negligible correlation
An r from + 0.21 to + 0.40 denotes low or slight correlation
An r from + 0.41 to + 0.70 signifies moderate relationship
An r from + 0.71 to + 0.90 deals on high relationship
An r from + 0.91 to + 0.99 denotes very high correlation
An r + 1.0 means perfect correlation
Bear in mind, the perfect correlation is 1.0. If correlation is
more than 1.0, there is something wrong in the computation.
2. Parallel Forms Method
Parallel or equivalent forms test is administered to a group of
students and the paired observation is correlated. In constructing
parallel forms, the two forms of the test must be constructed that
the content, type of test item, difficulty, and instruction of
administration are similar but not identical. Pearson product-
moment correlation coefficient is the statistical tool used to
determine the correlation of parallel forms.
For instance, in Form A item, “How many meters are there in
8 kilometers?” In Form B item, “How many kilometers are there in
8,000 meters?” These two forms have the same mean and
variability of scores. For instance, there are forty students in
Mathematics class. Hence, in constructing parallel forms test to
forty students in Mathematics class, twenty test papers are
constructed in Form A and twenty test papers, Form B. These Forms
A and B test papers are administered simultaneously. Table 2.2
shows sample correlation computation of parallel forms in
Mathematics achievement test.
ILLUSTRATI
Assessment in Learning 1
Manuel, PhD
Prepared by: Frederick M.
4
Table Sample Correlation Computation of Parallel Forms in
2.2. 70-item Mathematics Achievement Test
Scores
Student X Y X2 Y2 XY
s
1 68 67 4624 4489 4556
2 69 68 4761 4624 4692
3 66 66 4356 4356 4356
4 65 64 4225 4096 4160
5 70 69 4900 4761 4830
6 55 53 3025 2809 2915
7 50 51 2500 2601 2550
8 48 50 2304 2500 2400
9 40 40 1600 1600 1600
10 52 51 2704 2601 2652
11 60 59 3600 3481 3540
12 42 40 1764 1600 1680
13 50 49 2500 2401 2450
14 45 44 2025 1936 1980
15 40 41 1600 1681 1640
16 38 38 1444 1444 1444
17 35 34 1225 1156 1190
18 63 63 3969 3969 3969
19 47 48 2209 2304 2256
20 42 43 1764 1849 1806
Total 1045 1038 57099 56258 56666
N ∑ XY −( ∑ X ) (∑ Y ) Given:
rxy =
√[ N ∑ X −( ∑ X ) ] [N ∑ Y −( ∑ Y ) ]
2 2 2 2
N = 20
20(56666)−(1045)(1038) ∑ XY = 56666
=
√¿ ¿ ¿
∑ X = 1045
=
1133320−1084710
∑ Y = 1038
√(1141980−1092025)−( 1125160−1077444)
2
∑ X = 57099
48610
=
√(49955)−(47716) 2
∑ Y = 56258
48610
=
√¿ ¿ ¿
Assessment in Learning 1 Prepared by: Frederick M.
Manuel, PhD
5
48610
=
48822.6667
rxy = 0.99 (very high relationship)
The correlation value obtained is 0.99 which denotes very high
relationship. This means that Forms A and B achievement test is
reliable.
Likewise, the correlation between the scores obtained on paired
observations of these two forms represent the reliability coefficient
of the test. If the correlation value (r) obtained is high, the test is
reliable.
Correlation with the Use of Computer
It is easier and faster if computation of correlation is done with
the use of computer, because, in just a few seconds the results are
attained.
Using the same data in Table 2.2, the steps in computing
correlation with the use of computer are as follows:
Step 1. Switch on the computer
Step 2. Wait until Start menu appears
Step 3. Hold the mouse, click Start menu, click Programs, Click
Microsoft Excel
Step 4. Wait until the computer displays Microsoft Excel
Step 5. Type the data as follows:
Cell A Cell B
1 68 67
2 69 68
3 66 66
4 65 64
5 70 69
6 55 53
7 50 51
8 48 50
9 40 40
10 52 51
11 60 59
12 42 40
13 50 49
14 45 44
15 40 41
16 38 38
17 35 34
18 63 63
19 47 48
Assessment in Learning 1 Prepared by: Frederick M.
Manuel, PhD
6
20 42 43
Step 6. Highlight the data. Click Tools menu. Click Data Analysis.
The computer displays Analysis Tools.
Step 7. Click Correlation under Analysis Tools. Click OK
Step 8. The computer displays Input.
Step 9. In the Input Range, type $A1:$A20:$B1:$B20. Click OK
Step The computer displays as follows:
10. Column Column
1 2
Column 1 1
Column 2 0.99564 1
4
3. Split-Half Method
Split-half method is administered once, but the test items are
divided into two halves. The common procedure is to divide the test
into odd and even items. The two halves of the test must be similar
but not identical in content, number of items, difficulty, means, and
standard deviations. Each student obtains two scores, one on the
odd and the other even items in the same test. The scores obtained
in the two halves are correlated. The result is reliability coefficient
for a half test. Since the reliability holds only for a half test, the
reliability coefficient for a whole test is estimated by using the
Spearman-Brown formula.
Spearman-Brown formula is as follows:
(r ¿¿ ht ) Figure 2.3
rwt = 2 ¿
1+r ht
Where
rwt = Reliability of whole test
r ht = Reliability of half test
ILLUSTRATI
For instance, a test is administered to fourteen students from
different institution as pilot sample to determine the reliability
coefficient of odd and even items by using Spearman-Brown
formula. To substitute Spearman-Brown formula (2.3), consider
Table 2.3 below.
Assessment in Learning 1 Prepared by: Frederick M.
Manuel, PhD
7
Table Computation of Reliability Coefficient of Odd and Even
2.3. Items using Spearman-Brown Formula (Artificial Data)
Student Scores Ranks Difference
s X(odd) Y(even) Rx Ry D D2
1 40 41 8 7 1 1
2 45 46 4 3.5 .5 .25
3 39 40 10.5 9.5 1 1
4 48 50 1 1 0 0
5 46 48 3 2 1 1
6 40 42 8 5 1 1
7 38 38 12 11.5 .5 .25
8 35 31 13 13.5 .5 .25
9 30 31 14 13.5 .5 .25
10 39 41 10.5 7 3.5 12.25
11 42 40 6 9.5 3.5 12.25
12 47 46 2 3.5 1.5 2.25
13 43 41 5 7.0 2.0 4
14 40 38 8 11.5 3.5 12.25
Total 48.00
6(∑ D )
2
Given:
rht = 1− 3
N −N 2
∑ D = 48
6(48)
= 1− N = 14
14 3−14
288
= 1−
2744−14
288
= 1−
2730
= 1−0.10549451
rht = 0.89 (reliability of half test)
Computation of reliability of Whole Test (rwt)
(r ¿¿ ht) Given:
rwt = 2 ¿
1+r ht
r ht = 0.89
2(0.89)
=
1+ 0.89
Assessment in Learning 1 Prepared by: Frederick M.
Manuel, PhD
8
1.78
=
1.89
rwt = 0.94 (highly reliable)
The foregoing reliability of half test is 0.89 and reliability of whole
test obtained is 0.94 which denotes very high relationship. This
means the reliability of whole test between odd and even items of
achievement test is highly reliable.
4. Internal Consistency Method
This method is used in psychological test that consists of
dichotomous scored items. The examinee either passes or fails in an
item. A rating of 1 (one) is assigned for correct answer and 0 (zero)
for incorrect response. This method is obtained by using Kuder-
Richardson Formula 20. The formula is as follows:
Where
N = Number of items
2
SD = Variance
pi qi = Product of passing and failing item (i)
The proportion of individuals passing item (i) is denoted by the
symbol pi and the proportion failing q i where qi=1-pi. The steps in
applying Kuder-Richardson formula 20 are as follows:
Step Compute the variance (SD2) of the test scores for the
1. whole group
Step Find the proportion passing (p i) each item and the
2. proportion failing (qi) each item. For instance, of the 12
students, 9 got the correct answer in item 1, so 9/12 =
0.75 as pi (1-0.75=0.25) who got the incorrect answer.
Step Multiply pi and qi for each item, for instance, 0.75 x 0.25 =
3. 0.1875 and get the sum of all the items to get ∑ pi qi .
Step Substitute the calculated values using the Kuder-
4. Richardson Formula 20
ILLUSTRATI
Assessment in Learning 1
Manuel, PhD
Prepared by: Frederick M.
9
For illustration purposes, consider Table 2.4 of a test of 15
items is administered to 12 students using Kuder-Richardson
Formula 20.
Table 2.4. Computation of Kuder-Richardson Formula 20
Item
Students
s 1 2 3 4 5 6 7 8 9 1 1 1 f pi qi piqi
0 1 2
1 1 1 1 1 1 1 1 1 1 0 0 0 9 .75 .25 .187
5
2 1 1 1 1 1 1 1 1 0 0 0 0 8 .67 .33 .221
1
3 1 1 1 1 1 1 1 1 0 0 0 0 8 .67 .33 .221
1
4 1 1 1 1 1 1 1 1 0 0 0 0 8 .67 .33 .221
1
5 1 1 1 1 1 1 1 0 0 0 0 0 7 .58 .42 .243
6
6 1 1 1 1 1 1 1 0 0 0 0 0 7 .58 .42 .243
6
7 1 1 1 1 1 1 1 0 0 0 0 0 7 .58 .42 .243
6
8 1 1 1 1 1 1 1 0 0 0 0 0 7 .58 .42 .243
6
9 1 1 1 1 1 1 0 0 0 0 0 0 6 .50 .50 .25
10 1 1 1 1 1 1 0 0 0 0 0 0 6 .50 .50 .25
11 1 1 1 1 1 1 0 0 0 1 1 0 8 .67 .33 .221
1
12 1 1 1 1 1 1 0 0 1 0 0 0 7 .58 .42 .243
6
13 1 1 1 1 1 1 0 0 0 0 0 0 6 .50 .50 .25
14 0 1 1 1 1 0 0 0 0 1 0 0 5 .58 .42 .243
6
15 1 1 0 1 0 0 1 0 0 0 0 0 4 .67 .33 .221
1
Total 14 15 14 15 14 13 9 4 2 2 1 0
3.504
6
Variance (SD2) Computation
Students Score (X) X-x (X - x )2
1 14 5.33 28.4089
2 15 6.33 40.0689
3 14 5.33 28.4089
4 15 6.33 40.0689
5 14 5.33 28.4089
6 13 4.33 18.7489
Assessment in Learning 1 Prepared by: Frederick M.
Manuel, PhD
1
0
7 9 .33 0.1089
8 4 -4.67 21.8089
9 2 -6.67 44.4889
10 2 -6.67 44.4889
11 1 -7.67 58.8289
12 0 -8.67 75.1689
Total 104 429.0068
Mean Computation
∑X ( x−x )2
x= SD2 =
N N −1
104 429.0068
= =
12 12−1
x = 8.67 SD2 = 39.0006 or 39
Variance and Mean with the Use of Computer
It is easier and faster if variance is computed with the use of
computer because in a few seconds the needed result is attained.
The steps in computing variance with the use of computer are as
follows:
Step 1. Switch on the computer
Step 2. Wait until Start menu appears.
Step 3. Hold the mouse, click Start menu, click Programs, click
Microsoft Excel.
Step 4. Wait until the computer displays Microsoft Excel Program
Step 5. Type the data as follows:
Cell A Cell A
1 14 7 9
2 15 8 5
3 14 9 2
4 15 10 2
5 14 11 1
6 13 12 0
Step 6. Highlight the data. Click Tools menu. Click Data Analysis
Step 7. The computer displays Analysis Tools
Step 8. Click Descriptive Statistics. Click OK
Step 9. The computer displays Descriptive Statistics. In Input
Range: Type $A1:$A12. Click Summary Statistics. Click
OK.
Step The computer displays the following:
10. Column
1
Mean 8.66666
7
Assessment in Learning 1 Prepared by: Frederick M.
Manuel, PhD
1
1
Median 11
Mode 14
Sample 38.2424
Variance
Range 15
Minimum 0
Maximum 15
Sum 104
Count 12
Kuder-Richardson Formula 20 Computation
The reliability coefficient (rxx) value obtained is 0.97 which denotes
very high relationship. This means that the test is highly reliable.
LEARNING
ACTIVITY
Problem Solving
1. Using the data below, find out if the test is reliable using the test-
retest method administered to students as pilot sample.
Students First Second
Administratio Administratio
n n
1 70 69
2 72 71
3 63 60
4 55 50
5 78 76
6 48 45
7 52 50
8 71 71
9 58 57
10 60 59
Assessment in Learning 1 Prepared by: Frederick M.
Manuel, PhD
1
2
11 75 74
12 65 63
13 73 73
14 79 77
Assessment in Learning 1 Prepared by: Frederick M.
Manuel, PhD