The Use of The QUEST Program
The Use of The QUEST Program
2
and the low performing group of students. A high discrimination index for an
item indicates that the group of students who answered the item correctly are
those with the highest performance, indicated by their Mean Ability in the output
(see below), or as Lietz (1995, p. 166) put it, there is a strong link between the
item and the scale. A discrimination index close to zero or with a negative sign
indicates that there is no clear pattern of linkage between the item and the scale.
Thorndike (1982, p. 72) has reported that if the distribution in the total group is
normal, the point biserial can never go beyond 0.80, while Lietz (1995)
contends that the critical low value for the point biserial coefficient should be set
to 0.20. This means that a satisfactory point biserial coefficient for an item
should be within the range of 0.20 to 0.80.
P-value is used as an index to indicate the significance level of the point
biserial correlation.
Mean Ability is the estimated performance of the group of students who
responded to the test. The estimate of the mean ability is undertaken applying the
Rasch procedure.
Item Threshold is the difficulty level of an item using the Rasch model. In
the Rasch model, also known as the one-parameter logistic model, Gustafsson
(1979, p. 3) has argued that the probability of a correct answer to an item is a
function of two parameters, one representing the difficulty of the item and one
representing the ability of the person. Adams and Khoo (1993, p. 86) state that a
threshold for an item is the ability level that is required for an individual to have
a 50 per cent chance of passing.
Infit mean square is the fit statistics in the Rasch procedure which can be
used, as Adams and Khoo (1993, p. 86) claim, to consider the compatibility of
the model and the data.
The output of the item analysis is very helpful in order to examine the
properties of each item in a test. In a multiple-choice test, like the one used in this
study, the point biserial coefficient is widely used for this purpose.
For example, if the point biserial coefficient for an item is close to zero or
with a negative sign, there might be something wrong with the item. The first
Ramon Mohandas
3
possibility is that the key entered for the item in the QUEST command line is
wrong, or there is no correct answer among the options given in the item, or the
item is too difficult. Thus even the most able group of students cannot give the
right answer to the question, or perhaps there are other problems related to the
instructions associated with the item.
Examples for the output of item analysis for the QUEST program are
presented in Figures 7.1 and 7.2. Figure 7.1 gives examples of good and poor
discrimination indices for mathematics items, and Figure 7.2 provides examples
of good and poor discrimination indices for science items.
Data in Figure 7.1 indicate that the discrimination index for Item 7 is 0.36,
which is satisfactory, while the discrimination index for Item 74 is -0.03, which
is very poor. Further examination of the data indicate that the correct answer for
Item 7 is Category 4 (Option d), indicated by a star sign (4*), while the correct
answer for Item 74 is Category 1 (Option a), indicated by a star sign (1*). For
Item 7, Category 4 as the correct answer is chosen by 39.6 per cent of the group,
with the mean ability of -0.22, which is the highest ability compared to other
groups who answered Category 1, 2, 3, 5, or 0 (Category 0 in these data means
no response, scored as wrong; category missing mean that there are no data for
this category since the data are arranged for concurrent equating and these data
are ignored in the analysis).
Ramon Mohandas
4
CONCURRENT EQUATING FOR TEST 1 TO TEST 8
------------------------------------------------------------------------------------Item Analysis Results for Observed Responses
1/ 7/98 13:40
all on math (N =7864 L =125 Probability Level= .50)
------------------------------------------------------------------------------------.....................................................................................
Item
7: item 7
Infit MNSQ =
Disc =
Categories
Count
Percent (%)
Pt-Biserial
p-value
Mean Ability
29
.7
-.05
.001
-1.03
662
17.0
-.13
.000
-.80
Step Labels
2
259
6.7
-.10
.000
-.88
3
1396
35.9
-.21
.000
-.80
.99
.36
4*
1541
39.6
.36
.000
-.22
5
4
.1
.00
.430
-.52
missing
3973
-.54
Thresholds
-.12
Error
.03
.....................................................................................
Item
74: item 74
Categories
Count
Percent (%)
Pt-Biserial
p-value
Mean Ability
1*
36
1.2
-.05
.003
-1.00
1392
47.3
-.03
.060
-.60
Step Labels
2
594
20.2
-.02
.132
-.62
3
285
9.7
-.07
.000
-.78
4
638
21.7
.12
.000
-.39
5
0
.0
NA
NA
NA
missing
4919
-.55
Thresholds
-.47
Error
.04
.....................................................................................
.....................................................................................
.....................................................................................
Mean test score
Standard deviation
Internal Consistency
12.64
5.16
.62
Ramon Mohandas
1: item 1
Categories
Count
Percent (%)
Pt-Biserial
p-value
Mean Ability
33
.8
-.05
.001
-.69
1041
26.8
-.10
.000
-.41
Step Labels
2
244
6.3
-.12
.000
-.62
3*
2345
60.3
.24
.000
-.16
4
227
5.8
-.18
.000
-.79
5
1
.0
.00
.468
-.48
missing
3973
-.31
Thresholds
-.75
Error
.03
........................................................................................
............................................
Item
82: item 82
Categories
Count
Percent (%)
Pt-Biserial
p-value
Mean Ability
31
1.0
-.02
.156
-.40
637
21.4
.06
.000
-.22
Step Labels
2
781
26.3
-.04
.017
-.35
3
758
25.5
.05
.004
-.24
4*
763
25.7
-.07
.000
-.36
5
1
.0
-.03
.041
-1.38
missing
4893
-.31
Thresholds
.85
Error
.04
.....................................................................................
.....................................................................................
Mean test score
Standard deviation
Internal Consistency
13.89
4.17
.42
6
high performing group and the low performing group, or in this case a low or
poor discrimination index.
Data in Figure 7.2 for science items indicate that the discrimination index
for Item 1 is 0.24, which can be regarded as satisfactory, while the discrimination
index for Item 82 is -0.07, which is very poor. Further examination of the data
indicate that the correct answer for Item 1 is Category 3 (Option c), indicated by
a star sign (3*), while the correct answer for Item 82 is Category 4 (Option d),
also indicated by a star sign (4*). For Item 1, Category 3 as the correct answer is
chosen by 60.3 per cent of the group, with the mean ability of -0.16, which
indicates a higher level of ability compared to other groups who answered
Category 1, 2, 4, 5, or 0. For Item 82, Category 4 as the correct answer is chosen
by 25.7 per cent of the group, with the mean ability of -0.36, which is not the
highest ability level compared to other groups who answered Category 1,2, 3, 5
or 0. These results indicate that for Item 82, some students in the higher ability
group chose the wrong option, while in the lower ability group many students
chose the correct answer, hence there is poor discrimination between the high
performing group and the low performing group, or in this case a small or poor
discrimination index.
The investigation of item properties using the output of the item analysis
results in the deletion of several items from further analysis. The types of errors
found in the items used in this study have been discussed in Chapter 5, and a
description of the items with their errors has been presented in Appendix 5.2.
Item Analysis Using the Rasch Procedure
The next type of item analysis that should be carried out before further analysis
of the data is the selection of items based on the results of the Rasch procedure.
There are two command lines that can be used for this purpose. The first one is
the show command line, and the second one is the show items command line.
The show command line provides output about the summary tables for
items and cases, an item map for all cases on the items, and an item fit map for
Ramon Mohandas
7
the items with all cases. The section that follows discusses the item fit map as
used in this study. An example extracted from the output using this command
line for the mathematics and science tests employed in this study is presented in
Figure 7.3. It can be observed from the table that any item, represented by the
star (*) sign, that lies between the dotted line in the figure shows that the item
lies within the acceptable range. The measures used to decide whether an item is
within an acceptable range or not are the infit mean square indices, shown by
figures 0.63 to 1.60 listed horizontally in the figure. The infit mean square is the
fit statistic in the Rasch procedure and it is weighted to give greater weight to
those responses near the steepest segment of the item characteristic curve. The
acceptable range for the infit mean square, or in other words, the criterion to
accept the items as conforming to the Rasch model is set by the QUEST program
to lie within the range of 0.77 to 1.30, indicated by the dotted lines in the figure.
Any item outside this range is considered as not conforming to the Rasch model,
hence that item should be deleted from further analysis. By using this criterion,
all the mathematics and science items shown in the figure conform to the Rasch
model because they are all within the range of 0.77 to 1.30, including Item 74 for
mathematics and Item 82 for science. These two items were found to have very
poor item discrimination as presented in Figure 7.1 and Figure 7.2 respectively.
These two examples of item analysis, namely using classical test theory and
Rasch procedure, show the importance of using both types of analysis in the
selection of items for further analysis. This is because by using only the Rasch
procedure for item analysis might result in having poorly constructed items being
included for further analysis.
Ramon Mohandas
8
CONCURRENT EQUATING FOR TEST 1 TO TEST 8
-------------------------------------------------------------------------------Item Fit
1/ 7/98 21:31
all on math (N =7864 L =125 Probability Level= .50)
-------------------------------------------------------------------------------INFIT
MNSQ
.63
.71
.83
1.00
1.20
1.40
1.60
--------------+---------+---------+---------+---------+---------+---------+----7 item 7
.
*|
.
8 item 8
.
*
.
9 item 9
.
|
*
.
10 item 10
.
* |
.
.
.
74 item 74
.
|
*.
75 item 75
.
* |
.
76 item 76
.
| *
.
83 item 83
.
* |
.
84 item 84
.
|
*
.
85 item 85
.
*
.
86 item 86
.
|
*
.
87 item 87
.
* |
.
.
.
221 item 221
.
| *
.
222 item 222
.
|
*
.
223 item 223
.
*
|
.
224 item 224
.
*
|
.
225 item 225
.
| *
.
226 item 226
.
* |
.
================================================================================
Ramon Mohandas
9
CONCURRENT EQUATING FOR TEST 1 TO TEST 8
-------------------------------------------------------------------------------Item Estimates (Thresholds) In input Order
1/ 7/98 21:32
all on math (N =7864 L =125 Probability Level= .50)
-------------------------------------------------------------------------------ITEM NAME
|SCORE MAXSCR| THRSH | INFT OUTFT INFT OUTFT
|
|
1
| MNSQ MNSQ
t
t
-------------------------------------------------------------------------------7
item 7
| 1540 3891 |
-.12 |
.99
.99
-.8
-.5
|
|
.03|
|
|
|
8
item 8
| 2314 3891 | -1.02 |
1.00 1.03
-.2
1.1
|
|
.04|
|
|
|
9
item 9
| 1259 3891 |
.24 |
1.11 1.15
6.5
5.2
|
|
.04|
|
|
|
10 item 10
|
446 3891 |
1.70 |
.94
.92 -1.6 -1.5
|
|
.05|
|
|
|
.
.
74
item 74
75
item 75
76
item 76
83
item 83
84
item 84
85
item 85
86
item 86
87
item 87
.
.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1392 2945 |
|
|
1825 2945 |
|
|
1085 2945 |
|
|
666 2971 |
|
|
701 2971 |
|
|
1114 2971 |
|
|
549 2971 |
|
|
751 2971 |
|
|
-.47 |
.04|
|
-1.14 |
.04|
|
.01 |
.04|
|
.84 |
.05|
|
.76 |
.05|
|
.02 |
.04|
|
1.11 |
.05|
|
.66 |
.05|
|
1.27
1.46
21.1
16.1
.97
.95
-2.6
-1.7
1.05
1.06
2.8
2.2
.94
.95
-2.2
-1.2
1.20
1.34
7.5
8.0
.99
1.00
-.4
.0
1.14
1.26
4.6
5.3
.96
.98
-1.9
-.7
|
200 922 |
.85 |
1.04 1.04
.9
.6
|
|
.08|
|
|
|
222 item 222
|
360 922 |
-.09 |
1.14 1.27
5.5
5.2
|
|
.07|
|
|
|
223 item 223
|
562 922 | -1.08 |
.92
.88 -3.8 -2.5
|
|
.07|
|
|
|
224 item 224
|
448 922 |
-.53 |
.93
.91 -3.7 -2.1
|
|
.07|
|
|
|
225 item 225
|
366 922 |
-.12 |
1.06 1.09
2.5
1.9
|
|
.07|
|
|
|
226 item 226
|
370 922 |
-.14 |
.94
.95 -2.2 -1.1
|
|
.07|
|
|
|
-------------------------------------------------------------------------------Mean
|
|
.00 |
1.01 1.02
-.1
.2
SD
|
|
.72 |
.09
.13
4.1
3.0
================================================================================
Ramon Mohandas
10
item 81
82
item 82
89
item 89
90
item 90
91
item 91
92
item 92
93
item 93
.
.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1856 2971 |
|
|
763 2971 |
|
|
182 993 |
|
|
209 993 |
|
|
355 993 |
|
|
518 993 |
|
|
154 993 |
|
|
-.85 |
.04|
|
.85 |
.04|
|
1.27 |
.08|
|
1.09 |
.08|
|
.31 |
.07|
|
-.42 |
.07|
|
1.48 |
.09|
|
.93
.91
-5.3
-3.8
1.20
1.33
8.9
8.8
1.03
1.07
.5
1.0
1.09
1.21
2.0
2.9
1.00
1.00
.2
.1
.99
.99
-.4
-.3
.98
1.00
-.3
.1
|
603 922 |
-.97 |
1.00 1.01
.0
.2
|
|
.07|
|
|
|
207 item 207
|
357 922 |
.22 |
1.01 1.01
.5
.2
|
|
.07|
|
|
|
215 item 215
|
320 922 |
.41 |
.92
.91 -3.1 -1.9
|
|
.07|
|
|
|
216 item 216
|
398 922 |
.02 |
1.05 1.06
2.5
1.3
|
|
.07|
|
|
|
-------------------------------------------------------------------------------Mean
|
|
.00 |
.99 1.01
-.3
.0
SD
|
|
.97 |
.06
.10
3.4
2.6
================================================================================
Ramon Mohandas
11
The show item command line provides a table containing item parameters
such as the difficulty level of the items (item threshold), and the infit mean
square as well as the outfit mean square of the items. Examples extracted from
the output using this command line for the mathematics and science tests
employed in this study are presented in Figure 7.4 and Figure 7.5 respectively.
The use of this output is basically the same as the one produced in the
show command line for the purpose of item selection. However, more
information is provided in the output from the show items command line. The
first column in the Figures 7.4 and 7.5 is the information about the item name.
The purpose of this column is to give a name or label to each item in the tests by
specifying it in a separate file. However, in these data this item name is not
specified so that the program uses the default, which is labelled Item 1, Item 2
and so on (the figures on the left side are the numerical order of the items in the
tests). The second column in the figures, score maxscr, provide information
about the number of cases who answered each item correctly (score), and the
total number of cases who attempted each item (maxscr). In these data, the total
number of cases who attempted each item in maxscr column was not the same
for all items because these items are distributed in eight different booklets so that
the total number of cases who attempted each item depends on the distribution of
the items in the booklets. The third column, thrsh, is the item threshold or the
difficulty index of the item. The fourth column provides information about the fit
indices for each item.
Scoring Procedure
It should be noted that the scoring procedure should be preceded by the
calibration procedure. The Rasch procedure requires that the tests to be calibrated
must measure one dimension. The QUEST program using the Rasch procedure
could be used to provide a test of unidimensionality and the test of fit to the
Rasch model. Those items and persons that did not fit a unidimensionality model
would need to be eliminated from the next analysis in order to get a better model.
Ramon Mohandas
12
The ultimate goal of the use of QUEST program in this study is to obtain
the scores in mathematics and science for each individual taking the tests. The
scores for each individual in mathematics and science are used as outcome
measures or the dependent variable for subsequent analysis. The score for each
individual is produced in the QUEST program by the show cases command line.
These scores are also known as the case estimates or the ability of the persons
taking the tests. Before employing the scoring procedure, a decision has to be
made about the items that should be deleted in the analysis based on the results of
the items analysis, using classical test theory as well as the Rasch procedure. A
decision has also to be made about the persons who do not answer or omit a
particular item, whether to score it as wrong or whether to ignore it. In this study,
it was decided that any omitted item would be scored as wrong.
Figure 7.6 presents an example of results of the scoring procedure after
the deletion of several items for mathematics and science which have poor
discrimination indices or do not conform to the Rasch model. The first column in
the figure is the information about name. There are two figures in this column.
The first figure on the left is the numerical order of the cases, indicating the
number of students who took the tests. The second figure on the right depends on
the first information provided in the format command line in the QUEST
program, which is defining the content of the data file in particular column(s). In
the data file for this study, the first piece of information defined in the format
command line is concerned with booklet, so that the information given in the
second figure of the first column in the figure is booklet number, which is from 1
to 8. The second column in the figure provides information about the number of
items answered correctly by an individual (score), and the number of items
attempted by the individual or the maximum score that can be achieved by the
individual (maxscr). The third column provides information about the case
estimate or the ability of an individual (estimate) and the standard error of
the
CONCURRENT EQUATING FOR TEST 1 TO TEST 8
Ramon Mohandas
13
-------------------------------------------------------------------------------Case Estimates In input Order
1/ 7/98 23:32
all on math (N =7864 L =116 Probability Level= .50)
-------------------------------------------------------------------------------NAME
|SCORE MAXSCR | ESTIMATE
ERROR | INFIT OUTFT INFT
OUTFT
|
|
| MNSQ
MNSQ
t
t
-------------------------------------------------------------------------------1 1
|
17 31
|
.21
.38 |
1.31
1.40
2.43
1.61
2 1
|
22 31
|
1.00
.42 |
.85
.78
-.73
-.68
3 1
|
23 31
|
1.18
.43 |
1.14
1.47
.70
1.34
4 1
|
22 31
|
1.00
.42 |
.78
.88 -1.16
-.32
5 1
|
25 31
|
1.59
.48 |
.85
.78
-.46
-.42
6 1
|
16 31
|
.07
.38 |
1.03
1.03
.32
.21
7 1
|
22 31
|
1.00
.42 |
.98
1.01
-.04
.12
8 1
|
20 31
|
.67
.40 |
.84
.80 -1.03
-.74
9 1
|
16 31
|
.07
.38 |
1.03
1.03
.30
.22
10 1
|
26 31
|
1.83
.51 |
.99
.93
.08
.01
.
.
7855 8
|
12 37
|
-.85
.37 |
1.07
1.23
.48
.88
7856 8
|
17 37
|
-.20
.35 |
.85
.79 -1.45
-.95
7857 8
|
20 37
|
.16
.35 |
.94
.89
-.52
-.45
7858 8
|
15 37
|
-.45
.36 |
.91
1.00
-.76
.09
7859 8
|
21 37
|
.29
.35 |
.89
.83
-.94
-.74
7860 8
|
18 37
|
-.08
.35 |
.92
.89
-.69
-.46
7861 8
|
14 37
|
-.58
.36 |
.93
.87
-.50
-.49
7862 8
|
7 37
| -1.64
.44 |
1.01
.88
.12
-.16
7863 8
|
14 37
|
-.58
.36 |
1.06
1.01
.48
.13
7864 8
|
11 37
|
-.99
.38 |
.88
.82
-.67
-.54
-------------------------------------------------------------------------------Mean
|
|
-.51
|
1.00
1.01
.01
.06
SD
|
|
.86
|
.12
.21
.78
.61
================================================================================
CONCURRENT EQUATING FOR TEST 1 TO TEST 8
-------------------------------------------------------------------------------Case Estimates In input Order
1/ 7/98 23:32
all on science (N =7864 L = 96 Probability Level= .50)
-------------------------------------------------------------------------------NAME
|SCORE MAXSCR | ESTIMATE
ERROR | INFIT OUTFT INFT
OUTFT
|
|
| MNSQ
MNSQ
t
t
-------------------------------------------------------------------------------1 1
|
19 31
|
.37
.40 |
.86
.90
-.84
-.28
2 1
|
23 31
|
1.08
.44 |
.94
.80
-.22
-.41
3 1
|
23 31
|
1.08
.44 |
1.27
1.90
1.23
1.95
4 1
|
16 31
|
-.11
.40 |
.78
.71 -1.53 -1.15
5 1
|
22 31
|
.89
.43 |
1.17
1.34
.89
.97
6 1
|
20 31
|
.54
.41 |
.89
.94
-.62
-.09
7 1
|
20 31
|
.54
.41 |
.77
.71 -1.40
-.95
8 1
|
20 31
|
.54
.41 |
1.22
1.23
1.23
.82
9 1
|
18 31
|
.21
.40 |
1.00
.97
.02
-.03
10 1
|
25 31
|
1.51
.49 |
1.19
1.28
.74
.69
.
.
7855 8
|
11 18
|
.45
.54 |
1.15
1.09
.70
.34
7856 8
|
13 18
|
1.07
.58 |
1.56
1.85
1.78
1.50
7857 8
|
13 18
|
1.07
.58 |
1.32
1.36
1.13
.80
7858 8
|
8 18
|
-.38
.53 |
.76
.68 -1.17
-.85
7859 8
|
11 18
|
.45
.54 |
.64
.55 -1.73 -1.24
7860 8
|
10 18
|
.17
.53 |
1.30
1.32
1.36
.91
7861 8
|
9 18
|
-.11
.52 |
.76
.69 -1.17
-.83
7862 8
|
10 18
|
.17
.53 |
1.06
1.02
.33
.18
7863 8
|
9 18
|
-.11
.52 |
.79
.70 -1.01
-.81
7864 8
|
6 18
|
-.96
.55 |
.73
.60 -1.22
-.86
-------------------------------------------------------------------------------Mean
|
|
-.27
|
1.00
1.01
-.01
.06
SD
|
|
.72
|
.18
.32
.96
.78
================================================================================
Ramon Mohandas
14
It has been noted earlier that the scoring procedure should be preceded by
the calibration procedure and the Rasch procedure requires that the tests to be
calibrated must measure one dimension. It has also been noted that the QUEST
program using the Rasch procedure could be used to provide a test of
unidimensionality and the test of fit to the Rasch model. Those items and persons
that did not fit a unidimensionality model would need to be eliminated from the
next analysis in order to satisfy the requirements of the model.
In addition, those persons who have a zero score or a perfect score must
be removed in calibration because they do not provide information on the relative
difficulty of items. However, in the scoring procedure, those students who do not
fit the model because they have been erratic in their responses must ultimately
have a score calculated for them, including the persons with zero and perfect
scores. Consequently, such erratic students together with those who had a perfect
or zero score must be removed for the calibration stage of analysis, but should
where possible be reinstated for the scoring stage of analysis.
Ramon Mohandas