Measuring Knowledge To Optimize Cognitive Load Fac
Measuring Knowledge To Optimize Cognitive Load Fac
net/publication/232596386
CITATIONS READS
184 387
2 authors, including:
Slava Kalyuga
UNSW Sydney
156 PUBLICATIONS 12,820 CITATIONS
SEE PROFILE
All content following this page was uploaded by Slava Kalyuga on 04 March 2016.
The expertise reversal effect occurs when a learning procedure that is effective for novices becomes
ineffective for more knowledgeable learners. The authors consider how to match instructional presen-
tations to levels of learner knowledge. Experiments 1–2 were designed to develop a schema-based rapid
method of measuring learners’ knowledge in a specific area. Experimental data using algebra and
geometry materials for students in Grades 9 –10 indicated a highly significant correlation (up to .92)
between performance on the rapid measure and traditional measures of knowledge, with test times
reduced by factors of 4.9 and 2.5, respectively. Experiments 3– 4 used this method to monitor learners’
cognitive performance to determine which instructional design should be used for given levels of
expertise.
The expertise reversal effect (see Kalyuga, Ayres, Chandler, & tional presentations. Working memory capacity is overloaded if
Sweller, 2003) occurs when an instructional procedure that is more than a few chunks of information are processed simulta-
relatively effective for novices becomes ineffective for more neously (see, e.g., Baddeley, 1986; Miller, 1956). To overcome the
knowledgeable learners. A consequence of the effect is that an limitations of working memory, hierarchically organized, domain-
instructor must be able to accurately estimate the knowledge levels specific long-term memory knowledge structures, or schemas,
of learners to determine an appropriate instructional design for allow people to categorize multiple elements of information as a
them. Frequently, knowledge levels of learners need to be assessed single higher level element (see Chi, Glaser, & Rees, 1982; Larkin,
and monitored continuously during instructional episodes to dy- McDermott, Simon, & Simon, 1980). Because a schema is treated
namically determine the design of further instruction. Accordingly, as a single element or chunk, such high-level elements require less
it is critical to have a simple, rapid measure of expertise, especially working memory capacity for processing than the multiple, lower
in computer- and Web-based learning. Current measurement and level elements they contain, making the working memory load
test procedures may not be adequate for this purpose. The aim of more manageable.
the current work was to devise a rapid test of levels of expertise Cognitive load imposed by processing instructional material
based on our knowledge of human cognitive architecture and then may depend on levels of learner knowledge. For example, textual
to use the test as a means of determining instructional procedures. material initially essential for understanding diagrams may, with
The expertise reversal effect is an example of an aptitude– increasing levels of knowledge, become redundant. Experts who
treatment interaction (e.g., see Cronbach & Snow, 1977; Lohman, have acquired considerable high-level schemas in their area of
1986; Snow, 1989) or, more specifically, a disordinal interaction expertise may not require any additional textual explanations. If
between person characteristics and educational treatment such that explanations, nevertheless, are provided, processing this redundant
if instructional design A is superior to B for novices, B is superior information may increase the load on limited-capacity working
to A for experts. In our research, the expertise reversal effect was memory. (For work on the redundancy effect, see Bobis, Sweller,
derived from longitudinal studies of the effectiveness of different & Cooper, 1993; Chandler & Sweller, 1991, 1996; Craig, Gholson,
instructional formats and procedures with changing levels of & Driscoll, 2002; Mayer, Bove, Bryman, Mars, & Tapangco,
learner expertise and explained using cognitive load theory (see 1996; Mayer, Heiser, & Lonn, 2001; Reder & Anderson, 1980,
Paas, Renkl, & Sweller, 2003; Sweller, 1999; and Sweller, Van 1982; Sweller & Chandler, 1994.) Kalyuga, Chandler, and Sweller
Merrienboer, & Paas, 1998, for reviews), a theory based on the (1998, 2000, 2001), Kalyuga, Chandler, Tuovinen, and Sweller
assumption that the processing limitations of working memory (2001), and Tuovinen and Sweller (1999) found that procedures
might be a major factor influencing the effectiveness of instruc- and techniques designed to reduce working memory overload,
such as integrating textual explanations into diagrams to minimize
split attention, replacing visual text with auditory narration, or
Slava Kalyuga, Educational Testing Centre, University of New South using worked examples to increase levels of instructional guid-
Wales, Sydney, New South Wales, Australia; John Sweller, School of ance, were most efficient for less knowledgeable learners. With the
Education, University of New South Wales, Sydney, New South Wales, development of knowledge in a domain, such procedures and
Australia.
techniques often became redundant, resulting in negative rather
The research reported in this article was supported by grants from the
than positive or neutral effects. These redundant sources of infor-
Australian Research Council to John Sweller and Slava Kalyuga.
Correspondence concerning this article should be addressed to Slava mation were hypothesized to have imposed an additional cognitive
Kalyuga, Educational Testing Centre, University of New South Wales, load. Data on subjective rating measures of cognitive load sup-
12–22 Rothschild Avenue, Roseberry, New South Wales 2018, Australia. ported these hypotheses. Knowledgeable learners with acquired
E-mail: [Link]@[Link] schemas in a specific area who try to learn relatively new infor-
558
MEASURING KNOWLEDGE TO OPTIMIZE COGNITIVE LOAD 559
mation in the same area find it more difficult to process diagrams Most current cognitive assessment theories are aimed primarily
with explanations than diagram-only formats because of the addi- at developing new statistical models to apply to the data. However,
tional, unnecessary information that they must attend to. (See also equally important is the availability of efficient means of gathering
McNamara, Kintsch, Songer, & Kintsch, 1996, who obtained clear evidence of learners’ cognitive attributes on which to base statis-
evidence of the expertise reversal effect although they did not tical inferences (Lohman, 2000). Traditional and newly developed
interpret their results in a cognitive load framework.) measurement theories may not sufficiently take into account con-
Why does presenting more experienced learners with well- temporary knowledge of human cognitive architecture and the
guided instructions result in a deterioration in performance com- nature of cognition and instruction. If a major aim of instruction is
pared with reduced guidance? Constructing integrated mental rep- the construction and storing of schemas in long-term memory and
resentations of a current task is likely to require a considerable if those schemas alter the characteristics of working memory, then,
share of working memory resources. This activity may be sup- paradoxically, tests of working memory content may provide a de
ported either by available schema-based knowledge structures facto measure of levels of expertise. In effect, such tests should be
from long-term memory or by external instructional guidance. The designed to assess the extent to which working memory limits
relative weight of each component depends on the level of a have been altered by information in long-term memory.
learner’s knowledge in a domain. For novices dealing with novel Schemas held in long-term memory, when brought into working
units of information, instruction may be the only available guid- memory, effectively form a new memory structure called long-
ance. For experts dealing with a previously learned familiar do- term working memory, which is durable, interference proof, and
main, appropriate schema-based knowledge can carry out neces- virtually unlimited in capacity (Ericsson & Kintsch, 1995). When
sary control and regulation functions for the task. Human cognitive assessing knowledge-based cognitive performance, researchers
architecture dramatically alters the manner in which information is need to test the content of a student’s long-term working memory
processed as that information increases in familiarity (Sweller, by using appropriate knowledge-driven tasks. Analyzing the con-
2003). If more knowledgeable learners are presented instruction tent of long-term working memory during students’ cognitive
intended for schema construction purposes, that redundant instruc- activities can be a powerful way of obtaining valuable evidence
tion may conflict with currently held schemas, resulting in the about underlying cognitive structures.
redundancy and expertise reversal effects. Thus, the optimization Our first approach to developing a rapid cognitive diagnostic
of cognitive load in instruction assumes not only the presentation test assumed that experts are able to remember more essential
of appropriate information at the appropriate time but also timely elements of the task they encounter than less knowledgeable
removal of inefficient, redundant information as learner levels of learners, as demonstrated by the studies of De Groot (1946/1965)
knowledge increase. and Chase and Simon (1973) on chess expertise. They found that
professional grand masters were better able than weekend players
Assessing Levels of Expertise to reproduce briefly presented chess positions taken from real
games. Schematic knowledge of large numbers of different game
A question of considerable practical interest is how to use the configurations held in long-term memory dramatically altered the
results of studies on the expertise reversal effect and the suggested characteristics of working memory. However, our preliminary
theoretical explanation of the effect in the design of instructional studies using coordinate geometry tasks found no significant cor-
presentations with an optimized cognitive load. Such instructional relation between learners’ ability to reproduce diagrams with a
designs should be based on user-adapted instructional procedures task statement to which they were exposed for several seconds and
by matching instructional presentations to levels of learner knowl- traditional measures of knowledge based on solving a series of test
edge in a specific task domain. To achieve this aim, a method of problems from the same domain. The reproduction task may not
measuring learners’ domain-specific knowledge that both is based have relied sufficiently on cognitive processes associated with
on knowledge of human cognitive architecture and is quick and problem-solution schemas but rather on images of the diagrams in
reliable is required. Many traditional methods of evaluating learn- visual working memory. Isolating the relevant elements of images
ers’ knowledge depend on tests involving the solution of a series from the irrelevant elements and then integrating the relevant
of problems representing a given task domain. Such methods are elements into a solution move may require far larger working
usually time consuming and hardly suitable for real-time evalua- memory resources than simply reproducing the diagram.
tion of learner progress during instruction required for online If knowledge of solution moves associated with a problem state
adaptation of instructional procedures to changing levels of reduces working memory load more than knowledge of the ele-
expertise. ments of the problem state, then a test of appropriate solution
The needs for new approaches to assessment based on cognitive moves may provide a more valid test of expertise than a test that
theories have been clearly established, and some promising at- simply emphasizes the elements of the problem state. Such a test
tempts to link cognitive science and psychometrics have been could be realized by presenting learners with incomplete interme-
undertaken (Embertson, 1993; Mislevy, 1996; Pellegrino, Baxter, diate stages of the task solution and asking them to indicate just an
& Glaser, 1999; Snow & Lohman, 1989; Tatsuoka, 1990). Sophis- immediate next step toward solution, instead of providing all of the
ticated statistical models have been developed and applied to the solution steps. Because schema-based problem solving requires
assessment of multiple cognitive skills involved in the perfor- learners to recognize both problem states and the moves associated
mance of complex tasks (Adams & Wilson, 1996; Embertson, with those states, it might be expected that such a measure of
1991; Martin & VanLehn, 1995; Mislevy, 1994). A schema-based expertise might be superior to simply being able to reproduce
approach to assessment of knowledge (Marshall, 1995; Singley & problem states. More knowledgeable learners presumably should
Bennett, 2002) has not yet resulted in widely usable testing be better able to recognize intermediate problem states and retrieve
methods. appropriate solution steps than less knowledgeable learners.
560 KALYUGA AND SWELLER
For example, assume the task of solving the algebraic equation courses differ in their content, levels of difficulty, and allocated study time.
(3x – 5)/2 ⫽ 5. The sequence of main intermediate steps in the The distribution of students between the courses is based on an evaluation
solution procedure, corresponding to the subgoal structure of the of their performance in previous years. By the time of the experiment, all
task, is: students had completed the sections of the mathematics course necessary
for solving the linear equations included in the test.
Multiply both sides of the equation by 2: 3x ⫺ 5 ⫽ 10.
Add 5 to both sides of the equation: 3x ⫽ 15.
Materials and Procedure
Divide both sides by 3: x ⫽ 15/3 ⫽ 5.
The experiment was conducted in a realistic class environment. All
To be able to solve the original equation, a learner should apply participants were tested simultaneously, and all tests were conducted in a
her or his schematic knowledge of solution procedures to all the single session of about 15 min. The experiment consisted of two tests. The
subtasks. Lack of knowledge of any subtask would interfere with first (traditional) test included a set of 12 equations similar to those
the entire solution procedure. On the other hand, knowing a first described above (three equations at each of four levels). Three equations at
move for each subtask leads directly to the next subtask. The the same level were located on each of four pages (e.g., ⫺4x ⫽ 5, 5x ⫽
testing procedure might be accelerated if, instead of requiring ⫺4, and ⫺4x ⫽ ⫺6 on the first page; 3x ⫺ 7 ⫽ ⫺3, 5 ⫺ 7x ⫽ 4, and
learners to provide complete solutions for all tasks in a test, ⫺4x ⫹ 3 ⫽ ⫺3 on the second page; etc.) with the statement “Solve for x”
researchers presented learners with a multilevel series of subtasks preceding each equation. Students were required to provide complete
solutions for all the equations as quickly and accurately as they could and
for a limited time (e.g., a few seconds for each subtask) with the
to let the teacher (or experimenter) know as soon as they finished the test.
requirement to indicate the immediate next step toward the com- Time taken to complete the test was recorded for each participant. The
plete solution of each subtask. Because the immediate next step for pages were distributed to students facedown so that all students could start
each task level takes a learner to the next task level and because the test simultaneously by turning the pages over. Students’ solutions for
that task level is represented by another task in the series, this rapid each equation were assessed as the number of correct steps. Omitted
testing method may be equivalent to the complete problem- operations (e.g., multiplying out the denominator before canceling out
solution alternative. common factors) were counted as separate steps. A total score out of 58
The following experiments were aimed at developing alternative was allocated to each student for the first test.
rapid methods of cognitive diagnosis that emphasized schematic During the second test, the learners were presented a page containing 12
knowledge of solution moves. Experiments 1 and 2 were designed equations similar to those used in the first test. The equations were
arranged in four rows with 3 equations of the same level (with levels
to evaluate the external validity of tests that implemented the
defined as before) in each row, starting with the simplest equations in the
above first-step approach. Experiments 3 and 4 tested possible top row. The distance between rows (about 5 cm) was enough to write
ways of using the methods resulting from Experiments 1 and 2 for answers under each equation. The following statement was placed at the
building learner-adapted instructional packages that are sourced on top of the page: “For each of the following equations, indicate the first step
monitoring learner performance before and during instruction. (operation) towards the solution.” Students were instructed to write down
their answers immediately and to let the teacher (or experimenter) know
Experiment 1 when they finished the test. Time taken to complete the test was recorded
for each student. The speed of response was accentuated, so the students’
Experiment 1 was designed to investigate if a rapid test of actions were more likely to reflect immediate traces of the content of
learner knowledge in a domain based on a schema theory view of memory rather than remote results of cognitive processes. The pages were
knowledge held in long-term memory could be validated by cor- distributed to students facedown so that all students could start the test
relating highly with a multilevel traditional test of knowledge. simultaneously by turning the pages over. Students’ answers for each
equation were judged as either correct or incorrect, providing a total score
During the rapid test, students were presented with a set of stand-
out of 12 for the second test. If an answer was not an immediate next step
alone algebraic equations that represented sequential levels of
but one of the following steps to the solution, it was counted as a correct
solution of top-level equations—for example, 2(2x ⫺ 5)/3 ⫽ 5; answer. It should be noted that the sequence of the test administration was
3(x ⫹ 1) ⫽ 5; 4x ⫺ 5 ⫽ 3; 5x ⫽ 3—and asked to indicate an not counterbalanced in this study. Counterbalancing would have eliminated
immediate next step toward solution. More knowledgeable learn- practice effects but caused misinterpretation of correlations between per-
ers presumably should have been better able to recognize problem formance scores in the two conditions, which was the main goal of this
states and retrieve appropriate moves than less knowledgeable experiment. Also, counterbalancing would inevitably have increased ex-
learners. In the traditional test, the same students were required to perimental intrusion into the normally scheduled class learning process.
provide complete solutions of similar multilevel equations. To
determine actual time reductions associated with rapid testing in Results and Discussion
comparison with traditional tests, we used self-paced tasks in both
tests. The variables under analysis were traditional test time (time in
seconds each learner spent on solving all 12 test tasks; M ⫽
Method 574.11, SD ⫽ 164.12), rapid test time (time in seconds each
learner spent on solving all 12 test tasks; M ⫽ 118.22, SD ⫽
Participants 46.35), test scores for the traditional test (M ⫽ 33.53, SD ⫽ 17.61;
58% correct, actual range of test performance scores from 0 to 58);
One class of 24 Year 9 (equivalent to Grade 9; 14 girls and 10 boys,
approximate age of 15 years) advanced-level mathematics students and one
and test scores for the rapid test (M ⫽ 8.64, SD ⫽ 3.0; 72%
class of 21 Year 10 (equivalent to Grade 10; 10 girls and 11 boys, correct, actual range of test performance scores from 0 to 12).
approximate age of 16 years) intermediate-level mathematics students from There was a significant difference between test times for the
a Sydney, Australia, public school located in a lower-middle-class suburb traditional and rapid tests, t(44) ⫽ 20.86, p ⬍ .01, Cohen’s f
participated in the experiment. Advanced and intermediate mathematics effect-size index ⫽ 2.23, which was expected considering the
MEASURING KNOWLEDGE TO OPTIMIZE COGNITIVE LOAD 561
much smaller number of solution steps that students had to record Figure 1). Lines AC and BC were parallel to the x- and y-axes, respectively.
in the rapid test compared with the traditional test. (For each of the The task was to find the length of AC and BC. The tasks were sequenced
four-level series of equations, the number of steps was reduced according to the level of knowledge they tested (with three tasks for each
from 4 ⫹ 3 ⫹ 2 ⫹ 1 ⫽ 10 to just 4). Students took on average of four levels). For the highest level tasks, no additional details were
provided (see Figure 1, top diagram). For each of the lower levels,
about 10 s per solution step in both conditions, suggesting that
progressively more details (indications of coordinates on axes, lines pro-
practice on the prior, traditional test had minimal influence on the
jecting coordinates of the points, etc.) or partial solutions were provided.
rapid test. The bottom diagram in Figure 1 is an example of the lowest level. Three
A Pearson product–moment correlation, r(44) ⫽ .92, p ⬍ .01, tasks of the same level were located on each of four pages, with the highest
between scores for the traditional and rapid tests was obtained, level tasks presented on the first page. Students were required to provide
with a 95% confidence interval extending from .86 to .96, sug- complete solutions for all the tasks as quickly as they could and to let the
gesting a very high degree of the concurrent validity for the rapid teacher (or experimenter) know as soon as they finished the test. The time
test. Estimates of Cronbach’s coefficient alpha were .78 for the taken to complete the test (test time) was recorded for each student. The
traditional test and .63 for the rapid test. The reliability of a longer pages were distributed to students facedown so that that all students could
test is higher than that of a shorter test. Cronbach’s alpha shows start the test simultaneously by turning the pages over. Students’ solutions
how reliably a test measures a single unidimensional latent con- for each task were assessed as the number of correct steps in the solution,
giving a total score out of 30 to be allocated to each student for the first test.
struct. Considering that our rapid test was designed to measure
During the second test, the learners were presented four pages containing
four separate and distinctive cognitive constructs (associated with
12 tasks similar to those used in the first test and sequenced in the same
schemas for multiplying out the denominator of a fraction, expand- way. Students were required to indicate only the first step toward finding
ing the grouping symbols, adding/subtracting the same number the length of AC and BC. We explained to participants that their answer
to/from both sides of an equation, and dividing both sides of an might include, for example, writing a number or drawing a line on the
equation by the same number), the relatively low (well below the diagram. Students were instructed to do this test as quickly as they could
traditionally acceptable 0.8) value of Cronbach’s alpha is not and to let the teacher (or experimenter) know immediately when they
surprising. When data have a multidimensional structure, a low finished the test. Time taken to complete the test was recorded for each
Cronbach’s alpha is expected. In the traditional test, item scores student. The pages were distributed to students facedown so that all
aggregated different dimensions and thus were less sensitive to the students could start the test simultaneously by turning the pages over.
underlying multidimensionality of knowledge structures. A low Students’ answers for each task were judged as either correct or incorrect,
providing a total score out of 12 for the second test.
number of items in both tests might also have contributed to low
In contrast with the algebra equations used in Experiment 1, it was
alphas. However, evidence that the reliability of the rapid test is
impossible to diagrammatically represent an intermediate state of a geom-
sufficiently high to allow validity comes from the very high etry task without depicting details of the previous solution steps. The tasks
correlation between the two tests. in this experiment could be effectively considered as a sequence of par-
The results of Experiment 1 indicated a highly significant cor- tially worked examples with gradually increasing levels of detail provided
relation between learners’ performance on the rapid test tasks and to learners. Because the solution details were provided to learners in both
traditional measures of learners’ knowledge. Furthermore, test the traditional and rapid test tasks, this procedure should not have poten-
time for the rapid method was reduced by a factor of 4.9 in tially decreased validity measures. The test sequence was not counterbal-
comparison with the time for the traditional test. To further vali- anced in this study for the same reasons as in Experiment 1.
date this technique, we applied it to a different set of instructional
materials in Experiment 2.
Experiment 2
On the basis of Experiment 1, the rapid test appeared to be a
viable candidate to measure levels of expertise for instructional
purposes. Experiment 2 was designed to replicate the results of
Experiment 1 using coordinate geometry materials.
Method
Participants
One class of 20 (10 girls and 10 boys) Year 9 advanced-level mathe-
matics students from a Sydney public school (the same class was used in
Experiment 1 about 2 months earlier) participated in the experiment. Prior
to the experiment, students had been taught sufficient coordinate plane and
coordinates of a point geometry to solve the tasks included in the test.
Results and Discussion geometry necessary for solving the tasks included in the test. The advanced
mathematics students had been also taught how to calculate the midpoint of
The variables under analysis were traditional test time (M ⫽ an interval and the distance between two points on a coordinate plane.
306.20 s, SD ⫽ 104.97), rapid test time (M ⫽ 124.30 s, SD ⫽ Those tasks might have provided the learners with some experience in
24.85), test scores for the traditional test (M ⫽ 14.20, SD ⫽ 10.62; calculating projections of an interval on the coordinate axes, which was the
47% correct, actual range of test performance scores from 0 to 30), essence of the experimental tasks in Experiment 3. However, the students
and test scores for the rapid test (M ⫽ 4.95, SD ⫽ 3.47; 41% had not previously encountered tasks formulated using the current format.
correct, actual range of test performance scores from 1 to 12 ).
There was a significant difference between test times for the Materials and Procedure
traditional and rapid tests, t(19) ⫽ 8.15, p ⬍ .01, Cohen’s f
The experiment was conducted in a realistic class environment. All
effect-size index ⫽ 1.33. A Pearson product–moment correlation,
participants were tested simultaneously, with the experiment conducted in
r(19) ⫽ .85, p ⬍ .01, was obtained between scores for the tradi-
two sessions separated by 1 week.
tional and rapid tests, with a 95% confidence interval extending During the first session (about 5 min long), all participants were pre-
from .65 to .94. Estimates of reliability using Cronbach’s coeffi- sented a rapid test with the purpose of evaluating the initial level of their
cient alpha were .68 for the traditional test and .57 for the rapid knowledge in the domain. The test included a set of eight tasks similar to
test. those used in Experiment 2. The following instructions were presented at
The results of Experiment 2 indicated a high correlation be- the top of the first page:
tween performance on the rapid test tasks and traditional measures
of knowledge requiring complete solutions of corresponding tasks. In each of the figures below, A and B are two points on a coordinate
plane. Lines AC and BC are parallel to the coordinate axes. Assume
The test time for the rapid method was reduced by a factor of 2.5
you need to find the lengths of AC and BC.
in comparison with the traditional test time. Similar to Experiment Some additional details (lines, coordinates) or partial solutions are
1, students took an average of about 10 s per solution step in both provided on most figures. For each figure, spend no more than a few
conditions. The same average rate of performance across condi- seconds to indicate your immediate next step towards solution of the
tions in both experiments provided us with an indication of the task.
magnitude of response time that could be expected in rapid tests Remember, you do not have to solve the whole task. All you have
(including retrieval of a solution schema from long-term memory, to do for each figure is to just show the next step towards the solution
applying the schema, and recording the result). (for example, it might be just writing a number or drawing a line on
Experiments 1 and 2 provided results indicating that basing an the diagram). If you don’t know your answer, proceed to the next
achievement test on cognitive theory can generate a test that can be page.
Do not spend more than a few seconds for each figure and do not
completed very rapidly but that has a high degree of concurrent
go back to pages you have already inspected.
validity. Experiment 3 was designed to apply that test to predicting
which instructional design procedures should be used for students Similar to Experiment 2, the problems were ordered according to the
with differing levels of expertise. level of knowledge that was required to solve them. In Experiment 3, the
levels were more fine-grained than in the previous studies, and instead of
four, we used eight levels, with one task corresponding to each level. For
Experiment 3 the highest level task, no additional details were provided. For each of the
The next step was to evaluate the ability of the rapid testing lower level tasks, progressively more additional details (indications of
coordinates on axes, lines projecting coordinates of the points, etc.) or
procedure to detect experimental effects that had been previously
partial solutions were provided. Each task was presented on a separate
observed using traditional testing methods. One such effect is an page. The test was experimenter paced. After about 10 s on a page, students
interaction between levels of learners’ knowledge in a domain and were instructed to proceed to the next page. Thus, time taken to complete
levels of instructional guidance (expertise reversal effect—see the test was the same for all students (around 90 s).
Kalyuga et al., 2003). For example, less knowledgeable learners In Experiment 3, we used a different test-scoring technique from the
usually benefit from more guided instructional procedures such as previous experiments. In the rapid diagnostic tests described above, if an
worked examples. By contrast, minimal guidance formats (such as answer was not an immediate next step expected in the fully worked-out,
solving problems) might be more beneficial for more knowledge- detailed solution but was one of the following steps toward the solution (or
able learners (Kalyuga, Chandler, et al., 2001). The aim of Exper- even the final step of the solution), it was counted as a correct answer. The
iment 3 was to see if we could replicate the expertise reversal same score of 1 was allocated for such an answer as for an answer
indicating the immediate next step. However, skipping some intermediate
effect using the rapid testing technique to assess levels of learners’
stages of the solution procedure is possible if the learner has corresponding
knowledge in a domain (coordinate geometry) before experimental operations automated or is able to perform these operations mentally
treatment and to measure levels of learners’ performance after without writing them down. The ability to skip steps reflects a higher level
their exposure to different instructional formats. of knowledge in comparison with the level of knowledge of a learner who
can indicate the immediate next step (Blessing & Anderson, 1996;
Method Koedinger & Anderson, 1990; Sweller, Mawer, & Ward, 1983). Knowl-
edge of immediate schematic solution procedures for each separate subtask
Participants might not necessarily guarantee the solution of the whole task. Omitting
some intermediate steps indicates the student’s ability to integrate separate
Two classes (an advanced-level mathematics class and an intermediate- steps in the solution procedure. The rapid diagnostic test scoring method
level mathematics class) of 42 Year 9 students from a Sydney Catholic was modified to take into account such differences in learners’ knowledge.
girls’ school located in a lower-middle-class suburb participated in the In the modified method, if a learner omitted some intermediate stages
experiment. By the time of the experiment, all students had been taught a while trying to find the length of either side of a rectangle, he or she was
basic introduction to the coordinate plane and coordinates of a point allocated an additional score for each skipped step. For example, if a
MEASURING KNOWLEDGE TO OPTIMIZE COGNITIVE LOAD 563
participant indicated the final answer for the length of AC on the very first Postinstruction performance levels were again measured using the rapid
page (skipping three steps), a score of 4 was allocated for this question. An testing method. The procedure was identical to that used in Stage 1, except
answer consisting of the final step for the length of AC on the second page that tasks at this stage had points A and B located not only in the right
qualified for a score of 3, and so on. Thus, if a learner was knowledgeable upper quarter of the coordinate plane but in different parts of the coordinate
enough to indicate the correct final answers for the length of AC on each plane, with some coordinate numbers being negative, similar to the in-
of the first four pages and the correct final answers for the length of BC on structional condition.
the following four pages, the allocated (maximum) score was 4 ⫹ 3 ⫹ 2 ⫹
1 ⫹ 4 ⫹ 3 ⫹ 2 ⫹ 1 ⫽ 20. Results and Discussion
It might be noted that analyses of students’ performance in Experiments
1 and 2 showed that many learners skipped steps during the traditional
A 2 (instructional procedure) ⫻ 2 (level of knowledge) analysis
tests. However, for the analogous rapid tests, the same students often of variance (ANOVA) was conducted using the data from Exper-
indicated just the immediate next steps without skipping steps. During our iment 3. The dependent variable under analysis was postinstruction
pretest instructions to students in Experiments 1 and 2, we did not empha- performance level as determined by the rapid test scores; indepen-
size that “your immediate next step” was not necessarily the step as dent variables were levels of knowledge (high/low) and format of
determined by a fully detailed, worked-out solution sequence. In Experi- instruction (problems/worked examples).
ment 3, we deliberately explained this point to participants before they An analysis of the low-knowledge/high-knowledge main effect
commenced the test. Because this explanation was not provided in Exper- produced a significant difference, F(1, 38) ⫽ 25.01, MSE ⫽ 15.17,
iments 1 and 2, the modified procedure could not be used in those p ⬍ .01, Cohen’s f effect-size index ⫽ 1.58. As could be expected,
experiments. high-knowledge learners (M ⫽ 9.10, SD ⫽ 5.47; 46% correct,
On the basis of scores obtained in the rapid test, participants were actual range of test performance scores from 0 to 20) performed
divided into two groups: more knowledgeable learners (upper median significantly better than low-knowledge learners (M ⫽ 2.91, SD ⫽
group) and less knowledgeable learners (lower median group). It should be 2.41; 15% correct, actual range of test performance scores from 0
noted that the division did not correspond exactly to the division of
to 7). No main effect of experimental formats was found (for the
advanced and regular mathematics classes. Some higher performers in the
worked-examples group, M ⫽ 6.00, SD ⫽ 3.69; 30% correct,
regular class received higher scores than lower performers in the advanced
class, indicating that using the existing division of classes could not replace
actual range of test performance scores from 0 to 15; and for the
the initial rapid diagnostic test for purposes of distributing students be- problem-solving group, M ⫽ 6.00, SD ⫽ 6.39; 30% correct, actual
tween experimental groups. Students in each of these two groups were range of test performance scores from 0 to 20). It should be noted
further randomly allocated to two subgroups according to their perfor- that according to the expertise measurement scale used in this
mance rank (those with even or uneven performance rank numbers). In the study, a score of 6 out of 20 does not indicate low knowledge or
second stage of the experiment, one of these two groups was given inability to solve tasks in this domain. Rather, it means a lack of
worked-examples-based instruction, and the other group was given a well-learned or automated solution procedures that are typical of
problem-solving-based instructional format. the experts in the domain. For example, a person who correctly
The second stage of the experiment took place 1 week later. Students solves all the test’s tasks by consistently using one step at a time
were assigned to four experimental groups: (a) high knowledge/worked without skipping any intermediate operations could only score 8.
examples (10 students), (b) high knowledge/problem solving (11 students), Because one of the main purposes of this experiment was to
(c) low knowledge/worked examples (10 students), and (d) low knowledge/ study the change in effectiveness of instructional formats with
problem solving (11 students). Group numbers were not equal because knowledge, we were primarily interested in the interaction effect
some students who had participated in Stage 1 were absent during Stage 2.
between knowledge and instructional procedures. We expected
Participants in the problem-solving groups were presented a series of
that a difference in relative knowledge in the domain would
eight problems to solve. The problems were similar to those used in the
rapid test in Stage 1 (“A and B are two points on a coordinate plane. Lines
produce a change in the effectiveness of different methods of
AC and BC are parallel to the coordinate axes. Find the lengths of AC and instruction. In accordance with this prediction, the interaction data
BC”), except that points A and B were located not only in the right upper of the 2 ⫻ 2 ANOVA were of major interest in this study. There
quarter of the coordinate plane with all the coordinates being positive but was a significant knowledge–instructional format disordinal inter-
could be located in different parts of the coordinate plane with some action for the performance indicator measured by the rapid testing
coordinate numbers being negative. method, F(1, 38) ⫽ 9.04, MSE ⫽ 15.17, p ⬍ .01, Cohen’s f
The worked-examples condition contained a series of four fully worked- effect-size index ⫽ 0.96, suggesting that the most efficient mode
out procedures for calculating the lengths of AC and BC. Participants were of instruction depends on the level of learners’ knowledge.
requested to follow all steps in each example according to a numbered Following the significant interaction, simple effect tests indi-
sequence from 1 to 6. Each example was followed by a problem-solving cated that for more knowledgeable learners, the problem-solving
task. The eight tasks used in this condition were identical to the tasks used format produced better results (M ⫽ 10.82, SD ⫽ 5.79; 54%
in the problem-solving condition. The tasks with even numbers were correct, actual range of test performance scores from 3 to 20) than
problem-solving tasks identical in both treatments, whereas the tasks with
worked examples (M ⫽ 7.20, SD ⫽ 4.64; 36% correct, actual
uneven numbers were presented as worked examples in the worked-
range of test performance scores from 0 to 15). Although this
example condition. Thus, participants in the worked-example condition
effect was not statistically significant, F(1, 19) ⫽ 2.46, MSE ⫽
studied four examples and attempted four problems, whereas participants
in the problem-solving condition attempted eight problems. To avoid a
27.86, a Cohen’s f index of 0.36 indicated a medium to large effect
split-attention effect, we embedded explanations of procedural steps in size. For less knowledgeable learners, the worked-examples group
worked examples into diagrams as close as possible to the corresponding (M ⫽ 4.80, SD ⫽ 1.99; 24% correct, actual range of test perfor-
diagrammatic elements, with arrows used to limit search. All participants mance scores from 2 to 8) performed significantly better than the
in each group were given sufficient time to complete the tasks. It took 12 problem-solving group (M ⫽ 1.18, SD ⫽ 1.08; 6% correct, actual
min to complete this phase of the experiment (those student who finished range of test performance scores from 0 to 3), F(1, 19) ⫽ 27.58,
earlier were encouraged to revise the examples or to check their solutions). MSE ⫽ 2.49, p ⬍ .01, Cohen’s f effect-size index ⫽ 1.20.
564 KALYUGA AND SWELLER
Thus, as the level of knowledge was raised, the performance of of an expression was typed incorrectly, a learner was invited to correct the
the problem-solving group improved more than performance of the error and try again.
worked-examples group. Less knowledgeable learners performed The experimental procedure included an initial rapid diagnostic test, an
significantly better after studying worked examples. For more adaptive training session for the experimental group with yoked partici-
pants in the control group, and a final rapid diagnostic test. A flowchart of
knowledgeable learners, there was some indication of problem-
the adaptive procedure for the experimental training session is provided in
solving benefits compared with studying worked examples. A
Figure 2.
possible floor effect due to the measurement scale used in the Initial rapid diagnostic test. After learners completed their exercises in
study (with an overall mean of 6.0 out of 20) could have reduced typing algebraic expressions on the computer, they were presented with the
levels of statistical significance of the mean differences. Never- initial rapid diagnostic test designed to evaluate their initial level of
theless, the results demonstrate a strong expertise reversal effect, knowledge in the task domain. The following task statement preceded the
with levels of expertise determined by the new, rapid test of test:
learners’ knowledge.
In Experiment 3, the rapid testing method was used to initially On each of the following three pages, you will see an equation. For
diagnose levels of learners’ knowledge in the domain to subdivide each equation, you have to type a single one-line step that you would
normally do first when solving the equation on paper.
the learners into two groups of relative experts and novices. The
For example, when asked to solve the equation 2(3x ⫺ 1) ⫽ 1,
instructional procedures (worked examples and problem solving) some people would first write 2 * 3x ⫺ 2 * 1 ⫽ 1, others could start
were the same for both novices and experts and were not adapted from 6x ⫺ 2 ⫽ 1 or 6x ⫽ 3, and some might even write the final
to the individual levels of expertise. For the next step, we tested the answer (x ⫽ 1/2) as the first step.
usability of the rapid test as a means of applying real-time indi- If, when you are given an equation, you do not know how to solve
vidualized adaptation of instructional procedures to current levels it, click the button “Don’t know”. You will be allowed no more than
of learners’ knowledge in a domain. one minute to type your answer.
Figure 2. Flowchart of the adaptive procedure for the experimental training session.
nostic test for this level. If he or she scored 1, indicating knowledge of the The third stage of the training session was similar to the second stage
procedure but not enough knowledge to skip the intermediate step, only the except for a lower level of instructional guidance provided to learners (in
last (reduced) four examples were presented. When the learner scored 2 on faded examples, explanations of the two final procedural steps were elim-
the rapid test, he or she was allowed to proceed to the next stage of the inated) and a higher level of the rapid test at the end of this stage (similar
training session. to the third equation of the initial diagnostic test). The fourth and final stage
The second stage of the training session contained two faded worked of the training session contained four problem-solving exercises. If learn-
examples, each followed by a corresponding problem-solving exercise. In ers’ attempts to solve each problem within the 3 min limit were unsuc-
both faded examples, the explanation of the last procedural step (corre- cessful, they were presented with fully worked-out solutions.
sponding to the solution of the equation of the type 2x ⫽ 5) was eliminated, Thus, in the learner-adapted format, learners who scored 0 or 1 for the
and learners were asked to complete the solution themselves and to type in first equation of the initial diagnostic test went through all four stages of
their final answer. If a learner could not solve the remaining equation in 1 the training session. How long they stayed at each stage depended on their
min, the correct solution was provided. In problem-solving exercises, performance on diagnostic tests during the session. Learners who scored 2
similar to the first stage, if learners’ attempts within the 3-min limit were for the first equation of the initial diagnostic test but still scored 0 or 1 for
unsuccessful, learners were presented with a fully worked-out solution. At the second equation (no matter what their scores on the third equation
the end of the second stage, a rapid diagnostic test similar to the second were) started the training session from the second stage. Similarly, learners
question of the initial diagnostic test was used. The procedure followed was who scored at least 2 for the first and second equations of the initial
very similar to the procedure for the first stage. diagnostic test but scored 0 or 1 for the third equation started the training
566 KALYUGA AND SWELLER
session from the third stage. Finally, learners who managed to score at least first operation (dividing both parts of an equation by the same
2 for all three equations of the initial diagnostic test started the training number) was the most difficult one. During training sessions, only
session from the fourth stage, which included only problem-solving 2 students completed all problem exercises and faded worked-
exercises. examples tasks without errors and repeated attempts; 1 participant
In contrast, in the non-learner-adapted format, allocation of learners to
reattempted solutions at each of four stages, 3 participants made
different stages of the training session was random rather than being based
repeated attempts during at least three stages, 5 students reat-
on the results of the initial rapid diagnostic test. To equalize experimental
conditions in both groups, each learner in the nonadapted-format group tempted solutions at two stages, and 2 participants reattempted
started the training session from exactly the same stage as the previous solutions at one stage. Combined with higher knowledge gains for
learner in the learner-adapted-format group. Thus, learners in both groups the learner-adapted group, these records indicate that the suggested
went through similar stages of the training session. The difference was that tailoring method did individualize instructional procedures as
in the learner-adapted-format group, the instructional sequence was based intended.
on the learner’s actual performance on the rapid diagnostic tests. In the
nonadapted-format group, the procedure was random in relation to a
learner’s knowledge. The learner’s progress through the training session General Discussion
was not monitored using a rapid diagnostic technique similar to that used
If instructional formats and procedures need to change radically
in the learner-adapted-format group. Each learner had to study all worked
with alterations in expertise, a question of considerable practical
examples and perform all problem exercises that were included in the
corresponding stages of the training session, with the same time limits and interest is how to match instructional presentations to levels of
feedback experienced by the learner’s yoked participant. learner knowledge. In this article, we have suggested using a rapid
Final rapid diagnostic test. After learners completed the training ses- method of measuring learner levels of knowledge in a specific
sion, they were presented with the final rapid diagnostic test designed to area. Students were presented with intermediate stages of a task
evaluate their posttraining level of knowledge in the task domain. The test solution and asked to indicate their next step toward solution for
and evaluation procedures were exactly the same as in the initial rapid each stage instead of providing a complete solution. Our rationale
diagnostic test. was that more knowledgeable learners would be able to use their
schemas to recognize intermediate problem states and retrieve
Results and Discussion appropriate solution steps depending on their level of knowledge
in the domain. The procedure can be generally described as fol-
The independent variable was the format of the training session lows: (a) For a specific task area, establish a sequence of main
(learner adapted or randomly assigned). The dependent variables intermediate steps in the solution procedure corresponding to the
under analysis were differences between the sum of the three test subgoal structure of the task; (b) for each step, design representa-
scores for the final rapid test and the sum of the three test scores tive subtasks, then arrange them in a properly ordered series; and
for the initial rapid test, providing indicators of learners’ knowl- (c) present the series of subtasks to learners for a limited time (e.g.,
edge gains due to the training session, and training-session time. a few seconds for each subtask) with the requirement to quickly
There was a significant difference between groups for knowl- indicate the next step toward a complete solution of each task.
edge gains, t(24) ⫽ 2.26, p ⬍ .05, Cohen’s f effect size ⫽ 0.46. Experimental data using algebra and coordinate geometry ma-
The learner-adapted-format group (M ⫽ 3.23, SD ⫽ 2.77) per- terials for Year 9 and Year 10 students indicated significant
formed significantly better than the randomly assigned–format correlations (up to .92) between performance on these tasks and
group (M ⫽ .77, SD ⫽ 2.77). There were no significant differences traditional measures of knowledge that required complete solu-
for training-session time (M ⫽ 990.62, SD ⫽ 353.04, for the tions of corresponding tasks. Moreover, test times were reduced by
learner-adapted format, and M ⫽ 907.15, SD ⫽ 426.88, for the factors of 4.9 (for algebra materials) and 2.5 (for coordinate
randomly assigned format), t(24) ⫽ .54, Cohen’s f effect size ⫽ geometry materials) in comparison with traditional test times.
0.11. The training-session-time results were expected because of Although correlations between tests were high, measured reli-
the paired equalization procedure. The significantly higher knowl- ability levels were relatively low. These low values were an
edge gains for the learner-adapted instructional format than the artifact of the number and type of test items used. Test reliability
randomly assigned format of training provides strong evidence that increases with increased numbers of items. The essence of effec-
the suggested rapid measure of expertise, based on knowledge of tive rapid tests of knowledge is that they have few items but still
human cognitive processes, can be successfully used to enhance correlate highly with traditional tests. Nevertheless, low numbers
learning outcomes by adapting instruction to learners’ knowledge of items result in low reliability indices on normal reliability
levels based on the expertise reversal effect. measures. As well, the possible heterogeneity of content might
Electronic records indicated that students’ progress through the have contributed to the relatively low internal consistency esti-
learner-adapted instruction depended considerably on their indi- mates for the tests in Experiments 1 and 2.
vidual performance. For example, 10 (out of 13) participants This study has been limited to two narrow domains associated
proceeded through the Stage 1 diagnostic test more than once, 7 with well-defined tasks and predictable sequences of solution steps
participants went through this test more than twice, and 4 partic- (linear algebra equations and simple coordinate geometry). In such
ipants repeated it more than three times. Ten students proceeded areas, the application of the rapid assessment method is straight-
through the Stage 2 diagnostic test more than once, 3 students went forward. Establishing the generality of the suggested approach and
through this test more than twice, and 1 student repeated it more finding the limits of its usability are important research questions.
than three times. Five participants proceeded through the Stage 3 In more complex domains involving multiple-step problems, stu-
test more than once, 4 participants went through this test more than dents might be able to take many different routes to problem
twice, and 1 participant repeated it more than three times. Judging solutions. If all those routes are identifiable, the method still could
by the number of students reattempting tests at different steps, the be used in both paper-based and electronic formats. If the number
MEASURING KNOWLEDGE TO OPTIMIZE COGNITIVE LOAD 567
of routes is too large, a limited number of steps representing are critically dependent on researchers’ being ably to quickly and
different levels of expert-type solutions could be selected for a test. accurately measure learners’ levels of expertise. However, no
The levels of expertise then could be assessed by, for example, appropriate, cognitively oriented expertise assessment procedures
requiring learners to rapidly verify the correctness of each sug- are available to be used in conjunction with the new instructional
gested step. To establish the predictive validity of the rapid mea- designs that are rapidly appearing. This article is intended as a first
sures of expertise in other domains, researchers need to test them step in remedying this deficiency.
in other areas of mathematics and science, as well as in less
well-structured domains such as text comprehension or second
language learning. References
Concerning the practical application of the rapid diagnostic
Adams, R. A., & Wilson, M. (1996). Formulating the Rasch model as a
procedure in instruction, Experiment 3 confirmed that the sug-
mixed coefficients multinomial logit. In G. Engelhard, Jr., & M. Wilson
gested testing technique can be used for evaluating levels of (Eds.), Objective measurement: Theory into practice (Vol. 3, pp. 143–
learners’ knowledge in a realistic learning environment. The tech- 166). Norwood, NJ: Ablex Publishing.
nique allowed us to divide learners into appropriate instructional Atkinson, R. K., Derry, S. J., Renkl, A., & Wortham, D. W. (2000).
groups and to predict the expertise reversal effect. Experiment 4 Learning from examples: Instructional principles from the worked ex-
indicated that the test could be used as a means of matching ample research. Review of Educational Research, 70, 181–214.
different instructional formats to levels of learner knowledge. We Authorware Professional 3.0 [Computer software]. (1995). San Francisco:
used the rapid test to build learner-adapted computer-based in- Macromedia.
structional procedures based on online monitoring of learner per- Baddeley, A. D. (1986). Working memory. New York: Oxford University
formance before and during instruction. The approach proved to be Press.
Blessing, S. B., & Anderson, J. R. (1996). How people learn to skip steps.
superior to more traditional approaches. The high values of most
Journal of Experimental Psychology: Learning, Memory, and Cogni-
effect sizes in Experiments 3 and 4 strongly support the effective-
tion, 22, 576 –598.
ness of the suggested rapid diagnostic method. Bobis, J., Sweller, J., & Cooper, M. (1993). Cognitive load effects in a
We suggest not only that the rapid diagnostic technique can be primary school geometry task. Learning and Instruction, 3, 1–21.
used to determine instructional procedures because it is completed Brünken, R., Plass, J., & Leutner, D. (2003). Direct measurement of
rapidly but also that it provides a highly valid cognitive diagnosis cognitive load in multimedia learning. Educational Psychologist, 38,
because it is designed to directly capture students’ schematic 53– 61.
knowledge of solution steps. Guiding the sequence and format of Chandler, P., & Sweller, J. (1991). Cognitive load theory and the format of
instruction should be based on cognitive diagnostic methods. The instruction. Cognition and Instruction, 8, 293–332.
advantages of doing so can be seen in the results of Experiments Chandler, P., & Sweller, J. (1996). Cognitive load while learning to use a
3 and 4. computer program. Applied Cognitive Psychology, 10, 1–20.
Chase, W. G., & Simon, H. A. (1973). Perception in chess. Cognitive
In our previous studies (Kalyuga, Chandler, & Sweller, 1998,
Psychology, 4, 55– 81.
2000, 2001; Kalyuga, Chandler, et al., 2001), by using subjective
Chi, M., Glaser, R., & Rees, E. (1982). Expertise in problem solving. In
ratings of mental load, we obtained evidence for the assumption R. J. Sternberg (Ed.), Advances in the psychology of human intelligence
that redundant information may consume considerable cognitive (pp. 7–75). Hillsdale, NJ: Erlbaum.
resources. In future research, we intend to monitor cognitive load Craig, S., Gholson, B., & Driscoll, D. (2002). Animated pedagogical agents
during learner-adapted instruction (using, e.g., subjective ratings in multimedia educational environments: Effects of agent properties,
or a dual-task approach; see Brünken, Plass, & Leutner, 2003; picture features, and redundancy. Journal of Educational Psychology,
Paas, Tuovinen, Tabbers, & Van Gerven, 2003) to track changes in 94, 428 – 434.
working memory load during transitions between instructional Cronbach, L. J., & Snow, R. E. (1977). Aptitudes and instructional meth-
procedures. By doing so, we may be able to verify the cognitively ods: A handbook for research on interaction. New York: Irvington
optimal status of learner-adapted instructional techniques. In this Publishers.
De Groot, A. D. (1965). Thought and choice in chess. The Hague, the
way, the learner-adapted instructional systems could be more
Netherlands: Mouton. (Original work published 1946)
efficiently tailored to changing levels of expertise by using com-
Embertson, S. E. (1991). A multidimensional latent trait model for mea-
binations of rapid tests of knowledge with measures of cognitive suring learning and change. Psychometrika, 56, 495–516.
load. Embertson, S. (1993). Psychometric models for learning and cognitive
In cognitive diagnostics, the aim of the test is to assess learners’ processes. In N. Frederiksen, R. J. Mislevy, & I. I. Bejar (Eds.), Test
individual cognitive structures rather than their overall level of theory for a new generation of tests (pp. 125–150). Mahwah, NJ:
performance in a domain, as occurs using traditional tests. This Erlbaum.
procedure is an example of the use of multidimensional assessment Ericsson, K. A., & Kintsch, W. (1995). Long-term working memory.
tasks, and appropriate multidimensional measurement models need Psychological Review, 102, 211–245.
to be applied to the data to make valid statistical inferences. Kalyuga, S., Ayres, P., Chandler, P., & Sweller, J. (2003). The expertise
Exploring different multidimensional measurement approaches reversal effect. Educational Psychologist, 38, 23–31.
Kalyuga, S., Chandler, P., & Sweller, J. (1998). Levels of expertise and
(e.g., based on multidimensional item-response theories or Bayes-
instructional design. Human Factors, 40, 1–17.
ian inference networks) in conjunction with rapid testing tech-
Kalyuga, S., Chandler, P., & Sweller, J. (2000). Incorporating learner
niques should be another important direction of future research. experience into the design of multimedia instruction. Journal of Educa-
Knowledge of human cognitive architecture and processes has tional Psychology, 92, 126 –136
advanced substantially over the past 3 or 4 decades. Although that Kalyuga, S., Chandler, P., & Sweller, J. (2001). Learner experience and
knowledge has been used by, for example, cognitive load theory to efficiency of instructional guidance. Educational Psychology, 21, 5–23.
devise novel instructional procedures, some of those procedures Kalyuga, S., Chandler, P., Tuovinen, J., & Sweller, J. (2001). When
568 KALYUGA AND SWELLER
problem solving is superior to studying worked examples. Journal of Reder, L., & Anderson, J. R. (1982). Effects of spacing and embellishment
Educational Psychology, 93, 579 –588. on memory for main points of a text. Memory & Cognition, 10, 97–102.
Koedinger, K. R., & Anderson, J. R. (1990). Abstract planning and per- Renkl, A., & Atkinson, R. K. (2003). Structuring the transition from
ceptual chunks: Elements of expertise in geometry. Cognitive Science, example study to problem solving in cognitive skills acquisition: A
14, 511–550. cognitive load perspective. Educational Psychologist, 38, 15–22.
Larkin, J., McDermott, J., Simon, D., & Simon, H. (1980). Models of Renkl, A., Atkinson, R. K., Maier, U. H., & Staley, R. (2002). From
competence in solving physics problems. Cognitive Science, 4, 317–348. example study to problem solving: Smooth transitions help learning.
Lohman, D. F. (1986). Predicting mathemathanic effects in the teaching of Journal of Experimental Education, 70, 293–315.
higher-order thinking skills. Educational Psychologist, 21, 191–208. Singley, M. K., & Bennett, R. E. (2002). Item generation and beyond:
Lohman, D. F. (2000). Complex information processing and intelligence. Applications of schema theory to mathematics assessment. In S. H.
In R. J. Sternberg (Ed.), Handbook of human intelligence (2nd ed., pp. Irvine & P. C. Kyllonen (Eds.), Item generation for test development (pp.
285–340). Cambridge, MA: Cambridge University Press. 361–384). Mahwah, NJ: Erlbaum.
Marshall, S. (1995). Schemas in problem solving. New York: Cambridge Snow, R. E. (1989). Aptitude-treatment interaction as a framework for
University Press. research on individual differences in learning. In P. L. Ackerman, R. J.
Martin, J., & VanLehn, K. (1995). A Bayesian approach to cognitive Sternberg, & R. Glaser (Eds.), Learning and individual differences:
assessment. In P. D. Nichols, S. F. Chipman, & R. L. Brennan (Eds.), Advances in theory and research (pp. 13–59). New York: Freeman.
Cognitively diagnostic assessment (pp. 141–165). Hillsdale, NJ: Snow, R. E., & Lohman, D. F. (1989). Implications of cognitive psychol-
Erlbaum. ogy for educational measurement. In R. Linn (Ed.), Educational mea-
Mayer, R., Bove, W., Bryman, A., Mars, R., & Tapangco, L. (1996). When surement (pp. 263–331). New York: Macmillan.
less is more: Meaningful learning from visual and verbal summaries of Sweller, J. (1999). Instructional design. Melbourne, Australia: Australian
science textbook lessons. Journal of Educational Psychology, 88, 64 – Council for Educational Research.
73. Sweller, J. (2003). Evolution of human cognitive architecture. In B. Ross
Mayer, R., Heiser, J., & Lonn, S. (2001). Cognitive constraints on multi- (Ed.), The psychology of learning and motivation (Vol. 43, pp. 215–
media learning: When presenting more material results in less under- 266). San Diego, CA: Academic Press.
standing. Journal of Educational Psychology, 93, 187–198. Sweller, J., & Chandler, P. (1994). Why some material is difficult to learn.
McNamara, D., Kintsch, E., Songer, N. B., & Kintsch, W. (1996). Are Cognition and Instruction, 12, 185–233.
good texts always better? Interactions of text coherence, background Sweller, J., Mawer, R., & Ward, M. (1983). Development of expertise in
knowledge, and levels of understanding in learning from text. Cognition mathematical problem solving. Journal of Experimental Psychology:
and Instruction, 14, 1– 43. General, 12, 639 – 661.
Miller, G. A. (1956). The magical number seven, plus or minus two: Some Sweller, J., Van Merrienboer, J. J. G., & Paas, F. (1998). Cognitive
limits on our capacity for processing information. Psychological Review, architecture and instructional design. Educational Psychology Review,
63, 81–97. 10, 251–296.
Mislevy, R. J. (1994). Evidence and inference in educational assessment. Tatsuoka, K. (1990). Toward an integration of item-response theory and
Psychometrika, 59, 439 – 483. cognitive error diagnosis. In N. Frederiksen, R. Glaser, A. Lesgold, &
Mislevy, R. J. (1996). Test theory reconceived. Journal of Educational M. G. Shafto (Eds.), Diagnostic monitoring of skill and knowledge
Measurement, 33, 379 – 416. acquisition (pp. 453– 487). Hillsdale, NJ: Erlbaum.
Paas, F., Renkl, A., & Sweller, J. (2003). Cognitive load theory and Tuovinen, J., & Sweller, J. (1999). A comparison of cognitive load asso-
instructional design: Recent developments. Educational Psychologist, ciated with discovery learning and worked examples. Journal of Edu-
38, 1– 4. cational Psychology, 91, 334 –341.
Paas, F., Tuovinen, J., Tabbers, H., & Van Gerven, P. (2003). Cognitive Van Merrienboer, J. J. G. (1990). Strategies for programming instruction in
load measurement as a means to advance cognitive load theory. Educa- high school: Program completion vs. program generation. Journal of
tional Psychologist, 38, 63–71. Educational Computing Research, 6, 265–287.
Pellegrino, J. W., Baxter, G. P., & Glaser, R. (1999). Addressing the “two Van Merrienboer, J. J. G., Kirschner, P. A., & Kester, L. (2003). Taking the
disciplines” problem: Linking theories of cognition and learning with load off a learner’s mind: Instructional design principles for complex
assessment and instructional practice. In A. Iran-Nejad & P. D. Pearson learning. Educational Psychologist, 38, 5–13.
(Eds.), Review of research in education (Vol. 24, pp. 307–353). Wash-
ington, DC: American Educational Research Association.
Reder, L., & Anderson, J. R. (1980). A comparison of texts and their Received October 13, 2003
summaries: Memorial consequences. Journal of Verbal Learning and Revision received March 1, 2004
Verbal Behavior, 19, 121–134. Accepted March 31, 2004 䡲