Module 2 Characteristics of a Good Test
Introduction
This module deals on one of the main characteristics of a good test which
is reliability. It emphasizes the different ways of establishing these
characteristics. Factors affecting the reliability of a test will also be
covered in this module.
Student Learning Objectives (SLO)
At the end of this module you will be able to:
enumerate the different ways of establishing the reliability of
different assessment tools.
identify the different factors affecting the reliability of the test.
compute and interpret the reliability coefficient.
Content
Reliability
Another characteristic of a good test is reliability. Reliability refers to the
consistency of test scores. Test scores may vary under different conditions. The
reliability of test scores is usually reported by a reliability coefficient. A reliability
coefficient is also a correlation coefficient.
There are different ways of establishing reliability of a test. They are the test-
retest method, equivalent form, test-retest with equivalent form, split-half method and
Kuder-Richardson method.
Factors Affecting Reliability
1. Length of the item
2. Item difficulty
3. Objective scoring
4. Heterogeneity of the students
5. Limited time
Page 1
Reliability Coefficient
Reliability coefficient is a measure of the amount of error associated with the
test scores.
Description of Reliability Coefficient
a. The range of the reliability coefficient is from 0 to 1.0
b. The acceptable range value is o.60 or higher.
c. The higher the value of the reliability coefficient, the more reliable the overall
test score is.
Interpretation of Reliability Coefficient
a. The group variability will affect the size of the reliability coefficient. Higher
coefficient results from the heterogeneous groups than from the homogeneous
groups. As group variability increases, reliability goes up.
b. Scoring reliability limits test score reliability. If tests are scored unreliably, error
is introduced. This will limit the reliability of the test scores.
c. Test length affects test score reliability. As the length increases, the test’s
reliability tends to go up.
d. Item difficulty affects test score reliability. As test items become very easy or
very difficult, the test’s reliability goes down.
Level of Reliability Coefficient
Reliability Coefficient Interpretation
0.91 – 1.00 Excellent reliability. Very ideal for a classroom
test.
0.81 – 0.90 Very high reliability. Very good for a
classroom test.
0.71 – 0.80 High reliability. Good for a classroom test.
There are probably few items needs to be
improved.
0.61 – 0.70 Moderate reliability. The test needs to be
supplemented by other measures (more test)
to determine grades.
0.51 – 0.60 Low reliability. Suggested need for revision of
the test unless it is quite short (ten or fewer
items). Needs to be supplemented by other
measures to determine grades.
0.50 and below Questionable reliability. This test could not
contribute heavily to the course grade and it
needs revision.
TEST-RETEST METHOD
In this method, the same test is administered twice to the same group of
Page 2
students with any time interval between tests. The result of the test scores are
correlated using the Pearson Product Correlation Coefficient (r xy) or Spearman rho
formula (rs) and this correlation provides a measure of stability. This indicates how
stable or consistent the test result over a period of time. The formulae are:
Pearson r (rxy) Formula
n ∑ xy−( ∑ x )( ∑ y )
r xy =
√[ n ∑ x −(∑ x) ][n ∑ y −(∑ y ) ]
2 2 2 2
where: x = first set of scores
y = second set of scores
n = the number of cases.
Spearman rho (rs) Formula (when the scores are in rank)
6 ∑ D2
r s=1− 3
n −n
where: rs = Spearman rho
D = difference between ranks (rank x – ranky)
∑ D 2 = sum of the squared difference between ranks
n = number of cases
Examples
Example 1: Prof. Alcantara conducted a test to his 10 students in Elementary
Statistics class twice after one-day interval. The test given after one day is exactly
the same tests given the first time. Scores below were gathered in the first test (X)
and second test (Y). Using test-retest method is the test reliable? Show the
complete solution using Pearson r formula.
Student First Test Second
XY X2 Y2
(X) Test (Y)
1 36 38 1368 1296 1444
Page 3
2 26 34 884 676 1156
3 38 38 1444 1444 1444
4 15 27 405 225 729
5 17 25 425 289 625
6 28 26 728 784 676
7 32 35 1120 1024 1225
8 35 36 1260 1225 1296
9 12 19 228 144 361
10 35 38 1330 1225 1444
n = 10 ∑ X=274 ∑ Y =316 ∑ XY =9192 ∑ X 2=8332 ∑ Y 2=10400
Solution:
n ∑ xy−( ∑ x )( ∑ y )
r xy =
√[ n ∑ x −(∑ x) ][n ∑ y −(∑ y ) ]
2 2 2 2
10 ( 9192 )−(274)(316)
r xy=
√ [ 10(8332)−( 274 ) ] [ 10 ( 10400 )−( 316 ) ]
2 2
91920−86584
r xy =
√ [ 83320−75076 ][ 104000−99856 ]
5336
r xy =
√ [ 8224 ] [ 4144 ]
5336
r xy =
√ 34080256
r xy=0.91
Analysis: The reliability coefficient using Pearson r is 0.91 which means that it has
an excellent reliability. The scores of the ten students conducted twice with one-day
interval is consistent. Therefore, the test is very ideal for a classroom test.
Example 2: Compute the reliability coefficient of data in example 1 using Spearman
rho. Is the test reliable?
Note: Rank the scores in the first test (X) then the scores in the second test (Y).
In case of tie scores:
rank = add the rank/no. of ranks (1 + 2)/2 = 1.5 rank
Page 4
Student First Second Rank of Rank of
D D2
Test (X) Test (Y) X (Rx) Y (Ry)
1 36 38 2 2 0 0
2 26 34 7 6 1 1
3 38 38 1 2 -1 1
4 15 27 9 7 2 4
5 17 25 8 9 -1 1
6 28 26 6 8 -2 4
7 32 35 5 5 0 0
8 35 36 3.5 4 -0.5 0.25
9 12 19 10 10 0 0
10 35 38 3.5 2 1.5 2.25
n = 10 ∑ D 2=13.5
6 ∑ D2
r s=1− 3
n −n
6(13.5)
r s=1−
103−10
81
r s=1−
1000−10
81
r s=1−
990
r s=1−0.0818
r s=0.92
Analysis: The reliability coefficient using Spearman rho is 0.92 which means that it
has an excellent reliability. The scores of the ten students conducted twice with one-
day interval is consistent. Therefore, the test is very ideal for a classroom test.
EQUIVALENT FORM
It is also known as PARALLEL or ALTERNATE forms. In this method, two
different but equivalent forms of the test is administered to the same group of
students with a close time interval. The two forms of the test must be constructed that
the content type of test item, difficulty, and instruction of administration, are similar but
not identical. For example, in Form A item, “How many meters are there in 8
kilometers?” In Form B item, “How many kilometers in 8,000 meters?” The results of
test scores are correlated using the Pearson Product Correlation Coefficient (r) or
Spearman rho and this correlation provides a measure of equivalence of the tests.
Example 1: Prof. Alvarez conducted a test to her 10 students in Biology class twice
Page 5
after one-week interval. The test given after one week is the parallel form of the test
during the first time the test was conducted. Scores below were gathered in the first
test (X) and second test or parallel test (Y). Using equivalent or parallel form method,
is the test reliable? Show your complete solution using Pearson r.
Student First Test Second
XY X2 Y2
(X) Test (Y)
1 12 20 240 144 400
2 20 22 440 400 484
3 19 23 437 361 529
4 17 20 340 289 400
5 25 25 625 625 625
6 22 20 440 484 400
7 15 19 285 22 361
8 16 18 288 256 324
9 23 25 575 529 625
10 21 24 504 441 576
n = 10 ∑ X=190 ∑ Y =216 ∑ XY =4174 ∑ X 2=3754 ∑ Y 2=4724
n ∑ xy−( ∑ x )( ∑ y )
r xy=
√[ n ∑ x −(∑ x) ][n ∑ y −(∑ y ) ]
2 2 2 2
10 ( 4174 )−(190)(216)
r xy =
√ [ 10(3754)− (190 ) ] [ 10 ( 4724 ) −( 216 ) ]
2 2
41740−41040
r xy =
√ [ 37540−36100 ] [ 47240−46656 ]
700
r xy =
√ [ 1440 ][ 584 ]
700
r xy =
√ 840960
r xy =0.76
Analysis: The reliability coefficient using Pearson r is 0.76 which means that it has a
high reliability. The scores of the ten students conducted twice with one-week interval
are consistent. Therefore, the test is good for a classroom test but there are probably
some few items needs to be improved.
Example 2: Compute the reliability coefficient of data in example 1 using Spearman
rho. Is the test reliable?
Page 6
Note: Rank the scores in the first test (X) then the scores in the second test (Y).
In case of tie scores:
rank = add the rank/no. of ranks (1 + 2)/2 = 1.5 rank
Student First Second Rank of Rank of
D D2
Test (X) Test (Y) X (Rx) Y (Ry)
1 12 20 10 7 3 9
2 20 22 5 5 0 0
3 19 23 6 4 2 4
4 17 20 7 7 0 0
5 25 25 1 1.5 -0.5 0.25
6 22 20 3 7 -4 16
7 15 19 9 9 0 0
8 16 18 8 10 -2 4
9 23 25 2 1.5 0.5 0.25
10 21 24 4 3 1 1
n = 10 ∑ D 2=34.5
6∑ D
2
r s=1− 3
n −n
6(34.5)
r s=1−
103−10
207
r s=1−
1000−10
207
r s=1−
990
r s=1−0.2091
r s=0.79
Analysis: The reliability coefficient using Spearman rho is 0.76 which means that it
has a high reliability. The scores of the ten students conducted twice with one-week
interval are consistent. Therefore, the test is good for a classroom test but there are
probably some few items needs to be improved.
TEST-RETEST WITH EQUIVALENT FORMS METHOD
It is done by giving equivalent forms of tests with increased time interval
Page 7
between forms. The results of the test scores are correlated using Pearson r or
Spearman rho and this correlation provides measures of stability and equivalence of
the tests.
SPLIT-HALF METHOD
In this method, the test administered once and equivalent halves of the test is
scored. The common procedure is to divide the test into-odd numbered and even
numbered items. The two halves of the test must be similar but not identical in
content, number of items and difficulty. This provides two scores for each student.
The scores obtained in the two halves are correlated using Pearson r. The result is
reliability coefficient for a half test. Since the reliability holds only for a half test, the
reliability coefficient for a whole test is estimated using the Spearman-Brown
formula. The Spearman-Brown formula is as follows:
2 ( r ht )
r wt =
1+r ht
This correlation coefficient (r wt) provides a measure of internal consistency. It
indicates the degree to which consistent results are obtained from two halves of the
test.
Example: Prof. Andal conducted a test to her 10 students in Filipino class. The test
was given only once. The scores of the students in odd (O) and even (E) items below
were gathered. Using split-half method, is the test reliable?
Step 1: Compute the reliability of half test using Pearson r.
Student Odd (X) Even (Y) XY X2 Y2
1 15 20 300 225 400
2 19 17 323 361 289
3 20 24 480 400 576
4 25 21 525 625 441
5 20 23 460 400 529
6 18 22 396 324 484
7 19 25 475 361 625
8 26 24 624 676 576
9 20 18 360 400 324
Page 8
10 18 17 306 324 289
n = 10 ∑ X=200 ∑ Y =211 ∑ XY =4249 ∑ X 2=4096 ∑ Y 2=4533
n ∑ xy−( ∑ x )( ∑ y )
r xy =
√[ n ∑ x −(∑ x) ][n ∑ y −(∑ y ) ]
2 2 2 2
10 ( 4249 ) −(200)(211)
r ht =
√[ 10 (4096)− (200 ) ] [ 10 ( 4533 )−( 211 ) ]
2 2
42490−42200
r ht =
√[ 40960−40000 ][ 45330−44521 ]
290
r ht =
√[ 960 ][ 809 ]
290
r ht =
√776640
r ht =0.33
Step 2: Get the reliability of the whole test using Spearman-Brown formula.
2 ( r ht )
r wt =
1+r ht
2 ( 0.33 )
r wt =
1+0.33
0.66
r wt =
1.33
r wt =0.50
Analysis: The reliability coefficient using Spearman-Brown formula is 0.50 which
means that it is a questionable reliability. Therefore, the test items should be revised.
KUDER-RICHARDSON METHOD
In this method, the test is administered once, the total test is scored then the
proportion or percentage of the students passing and not passing a given item is
correlated. It has two types: KR-20 and KR-21.
Kuder-Richardson 20 (KR-20) is applicable only in situations where students’
responses are scored dichotomously, and therefore, is most useful with
Page 9
traditional test items that are scored as right or wrong, true or false, and yes or
no type. It uses the formula:
KR 20=
k
k−1 (
∑ pq
1− 2
s )
where: k = number of items
p = proportion of the students who got the item correctly (score/N)
q=1–p
s2 = variance of the total score
n ∑ x 2− ( ∑ x )
2
2
s=
n ( n−1 )
Kuder-Richardson 21 (KR-21) is not limited to test items that are scored
dichotomously. It uses the formula:
KR 21=
k
k−1
1− (
x ( k−x )
ks2 )
where: k = number of items
x = mean value
2
s = variance of the total score
Example 1: Prof. Coronel administered a 40-item test in English for her first-year
college students. Below are the scores of 15 pupils, find the reliability using KR-21
formula.
Student X X2
1 16 256
2 25 625
3 35 1225
4 39 1521
5 25 625
6 18 324
7 19 361
8 22 484
9 33 1089
10 36 1296
11 20 400
12 17 289
13 26 676
Page 10
14 35 1225
15 39 1521
n = 15 ∑ X=405 ∑ X 2=11917
Step 1: Solve for the variance
n ∑ x 2− ( ∑ x )
2
2
s=
n ( n−1 )
2
2 15 ( 11917 )−( 405 )
s=
15 ( 14 )
2 178755−164025
s=
210
2 14730
s=
210
2
s =70.14
Step 2: Solve for the mean
x=
∑X
n
405
x=
15
x=27
Step 3: Solve the reliability coefficient using KR-21 formula.
KR 21=
k
k−1
1− (
x ( k−x )
ks2 )
KR 21=
40
40−1
1− (
27 ( 40−27 )
40(70.14 ) )
KR 21=
40
39 (
1−
27(13)
2805.60 )
KR 21=
40
39 (
1−
351
2805.60 )
KR 21=1.03 ( 1−0.1251 )
Page 11
KR 21=1.03(0.8749)
KR 21=0.90
Analysis: The reliability using KR-21 formula is 0.90 which means that the test has a
very high reliability. Meaning, the test is very good for a classroom test.
Example 2: Ms. Gonzaga administered a 20-item true or false test for her Grade VIII
students in XYZ National High School. Below are the scores of 40 students, find the
reliability using KR-20 formula.
Item No. Score (X) p q pq X2
1 25 0.625 0.375 0.234375 625
2 36 0.9 0.1 0.09 1296
3 28 0.7 0.3 0.21 784
4 23 0.575 0.425 0.244375 529
5 25 0.625 0.375 0.234375 625
6 33 0.825 0.175 0.144375 1089
7 38 0.95 0.05 0.0475 1444
8 15 0.375 0.625 0.234375 225
9 23 0.575 0.425 0.244375 529
10 25 0.625 0.375 0.234375 625
11 36 0.9 0.1 0.09 1296
12 35 0.875 0.125 0.109375 1225
13 19 0.475 0.525 0.249375 361
14 39 0.975 0.025 0.024375 1521
15 28 0.7 0.3 0.21 784
16 33 0.825 0.175 0.144375 1089
17 19 0.475 0.525 0.249375 361
18 37 0.925 0.075 0.069375 1369
19 36 0.9 0.1 0.09 1296
20 25 0.625 0.375 0.234375 625
Total 578 3.38875 17698
Note: Do not round off the values
Step 1: Solve the variance
n ∑ x 2− ( ∑ x )
2
2
s=
n ( n−1 )
2
20 ( 17698 )− (578 )
s2=
20 ( 19 )
2 353960−334084
s=
380
Page 12
2 19876
s=
380
2
s =52.31
Step 2: Solve the reliability using KR-20 formula
KR 20=
k
k−1 (
∑ pq
1− 2
s )
KR 20=
20
20−1 (
1−
3.38875
52.31 )
KR 20=0.98
Analysis: The reliability using KR-20 formula is 0.98 which means that the test has an
excellent reliability. Meaning, the test is very ideal for a classroom test.
Assessment and Evaluation
Directions: Do as indicated.
1. Dr. Magpantay conducted a test to his 15 students in Mathematics twice with
three-day interval. The test given after three days is exactly the same test
during the first time it was conducted. Scores below were gathered and in the
first test (X) and second test (Y). Using test-retest method is the test reliable?
Show your complete solution using Pearson r and Spearman rho. (20 pts.)
Students (X) (Y)
1 33 34
2 33 25
3 35 29
4 40 38
5 27 25
6 20 23
7 35 32
8 33 36
9 20 25
10 25 30
11 32 30
12 20 22
13 34 35
Page 13
14 26 20
15 38 29
2. Mrs. Espina conducted a test to her 12 students in Biology class two times with
two-week interval. The test given after two weeks is the parallel form of the
test given the first time. Scores below were gathered in the first test (X) and
the second or parallel test (Y). Using equivalent form method, is the test
reliable? Show the complete solution using Pearson r and Spearman rho. (20
pts.)
Students X Y
1 12 13
2 21 20
3 23 24
4 20 25
5 24 26
6 20 18
7 19 22
8 25 20
9 19 23
10 28 30
11 29 30
12 17 20
3. Mrs. Axalan conducted a test in her English class composed of 10 students.
The test was given once only. Scores of the students in odd (O) and even (E)
numbered items below were gathered. Using split-half method, is the test
reliable? Show the complete solution. (20 pts.)
Students Odd (X) Even (Y)
1 16 18
2 25 26
3 24 26
4 23 25
5 23 25
6 21 20
7 20 23
8 22 20
9 18 20
10 20 27
4. Using the scores in the first test (X) in number 1, solve the values of KR-21 and
KR-20. Interpret the results. (30 pts.)
Page 14
Students (X)
1 33
2 33
3 35
4 40
5 27
6 20
7 35
8 33
9 20
10 25
11 32
12 20
13 34
14 26
15 38
References
Adamos, J. and de Guzman, E., Assessment of Learning 1 and 2
Callo, Ede C. and Yazon, A., Assessment in Student Learning
Calmorin, L., Assessment of Student Learning 2
Garcia, C., Measuring and Evaluating Learning Outcomes: A Textbook in Educational
Assessment 1 & 2
Prepared:
Mrs. JENNIFER A. REYES
Instructor
Page 15