Session F3G
Work in Progress - Work Sampling of Behavioral
Observations for Process-Oriented Outcomes
Mary Besterfield-Sacre, Elaine Newcome, Matt Tokorcheck, Larry Shuman, and Harvey Wolfe,
University of Pittsburgh, Department of Industrial Engineering
Pittsburgh, PA 15261
Abstract - Recent accreditation practices are migrating this limitation, work sampling has not been used to assess
towards direct measures of student achievement of the “thinking” and process-oriented aspects of work.
eleven enunciated engineering outcomes. Though common Our best current method for observing student teams (100
examinations are capable of measuring certain outcomes, percent observation) requires considerable time and resources.
they are not fully capable of assessing many of the more It is our intention to bridge the gap between common work
process-oriented outcomes such as teamwork, problem sampling practices in industry and assessment in education by
solving, design, etc. Rich, in-depth assessment methods extending sampling theories to the observation of intervals
such as behavioral observation are desirable because they that can capture the cognitive, behavioral and affective
enable us to investigate student learning outcomes and domains. Specifically, we are developing and validating work
thus evaluate the students’ ability to function in the higher sampling as an alternative, cost effective evaluation tool that
level learning domains. Our best current method for takes advantage of probability theory to “sample” the
doing this (100% observation) requires considerable time observable environment in hopes that it will significantly
and resources. Industry has learned that activities can be reduce the time and costs necessary without the loss of
assessed using statistical methods that “sample” the information.
observable environment. Work sampling and related This work in progress paper describes the research
methods use probability theory to reduce the amount of involved in developing sampling intervals and provides
time necessary to observe events or activities that do not preliminary results for one process-oriented outcome in a
occur in a systematic manner without loss of information. particular environment.
We are bridging this gap between educational assessment
and industry practices by extending these methods to the SCIENTIFIC AND EDUCATIONAL RELEVANCE
observation of intervals that capture the cognitive,
behavioral and affective domains of student learning The reduction in the human and financial resources necessary
outcomes. This paper describes the research involved in for assessment of student teams will allow educators to more
developing sampling intervals and provides preliminary effectively evaluate students involved in team-oriented
results for one process-oriented outcome that of teamwork. learning environments. Work sampling evaluation techniques
will better assess students' ability to function on teams;
Index Terms - behavioral observation, work sampling
OUTCOMES AND ATTRIBUTES
INTRODUCTION
Four behavioral, process-oriented outcomes are of interest to
How do we assess the processes by which students work on the research: Teamwork, Design, Problem Solving, and
teams, communicate effectively, problem solve, design, Ethics. It is our first goal to formally outline distinct
behave in a professional manner, etc., especially when many observable attributes (or “categories”) of each outcome. For
of these aspects of work (hereafter “outcomes”) are non- the purposes of work sampling, it is necessary that these
sequential? Assessing certain of these outcomes often attributes are both mutually exclusive, so that no two attributes
requires examining behavior at various levels of the cognitive overlap, and exhaustive, so that the entire range of observable
and affective domains (e.g. analysis, synthesis, evaluation, and behavior can be documented.
valuation). While such a tool has been needed for professional We have developed a list of behavioral categories for the
work, the recent movement towards outcomes-based Teamwork outcome [1]. These categories were first gleaned
assessment in education has highlighted the need for valid from existing literature, and then were refined through an
measures for similar student outcomes. iterative process that trimmed the categories so that they
Time studies and work sampling are commonly used effectively described all reasonably detectable behavior.
methodologies of work measurement for physical tasks Currently, we are implementing the same iterative process
primarily because they are low cost, time efficient, and to construct observable attributes for the Design outcome.
effective. Though beneficial in certain areas, work sampling The original, unrefined list of Design attributes was taken
lacks an effective and accurate methodological application to from a recent meta-analysis of over 40 journal articles of
measure cognitive and affective aspects of work. Because of empirical studies that characterized design [2]. It is currently
0-7803-8552-7/04/$20.00 © 2004 IEEE October 20 – 23, 2004, Savannah, GA
34th ASEE/IEEE Frontiers in Education Conference
F3G-20
Session F3G
being refined to an observable mutually exclusive exhaustive data for work sampling was more easily obtained than for
attributes list. 100% observation.
• Observers (raters) produced statistically similar 100%
EVALUATION OF BEHAVIORAL OBSERVATION behavioral observation data.
• Work sampling data was statistically similar to the
By viewing two unique videotapes, two teams of three corresponding 100% behavioral observation data for all
observers each documented Teamwork attributes through categories.
100% observation. Each observer recorded students' actions, • Floating data mirrored the target data set within a 95%
and the percentage of time that the student exhibited each of confidence interval.
the observable behavior categories was calculated. An • Floating intervals were more accurate than fixed intervals.
Analysis of Variance test indicated that the observers were
statistically reliable, thus yielding the same percentages of CURRENT AND FUTURE WORK
categorical attribute data. This allowed us to compile
averages over the entire length of each tape, giving us "target" We intend to test work sampling of all four process-oriented
percentages to which work sampling data could be compared. outcomes within four different environments: (1) single lab,
Intervals between work sampling data points were (2) short project (~3 hours), (3) long project (~30 hours), and
calculated based on the length of the tape, and random (4) capstone project (semester long). Two tapes have been
numbers were subsequently generated according to the collected for each of the environments to assess Teamwork, as
calculated intervals for the entirety of the tapes. well as for the first three environments to assess Design.
Problem Solving and Ethics will be treated with the same
DETERMINATION OF WORK SAMPLING INTERVAL SIZE procedures as Teamwork and Design.
Success has been achieved in work sampling for
In addition to collecting the work sampling data points Teamwork in the first environment, and behavioral
themselves, it was necessary to determine the optimal duration observation is currently being conducted on the third
of each work sampling interval. It is important to note here environment. Design attributes are being tested in the second
that the duration of each work sampling data observation environment, and will be followed by behavioral observation.
refers to the actual time an observer spends watching a sample Given the success of the first work sampling efforts for
before he or she is forced to make a decision and record a Teamwork, we hope to find consistency across all
category of behavior, not the interval between observations, environments, as well as with the process-oriented outcomes
which is determined by the length of the tape. of Design, Problem Solving, and Ethics.
We tested intervals of 10 seconds, 20 seconds, and a Because time studies and work sampling are successful
floating interval. Individual t-tests were performed, methodologies of work measurement for physical tasks,
comparing work sampling data of each interval type with the utilizing these methods to minimize the time needed and funds
100% behavioral observation “target” values for each required to carry out 100% observation is a reasonable and
teamwork category. The floating interval type was logical step in the efforts to measure cognitive and behavioral
statistically more accurate than the two fixed intervals when activities. If successful, work sampling will save valuable
comparing work sampling data with the target data (α = 0.05). resources, while enabling us to better assess students'' abilities
Although the fixed interval data sets resembled the target data, through process-oriented outcomes.
they did not meet the statistical accuracy of the floating
intervals. Anecdotally, the observers tended to prefer the ACKNOWLEDGMENTS
floating interval, describing that some observations would
only require mere seconds of observation to determine a This research is sponsored by a grant from the Department of
behavioral category, while other observations may take 30 or Education’s FIPSE program: Measuring Process-Oriented
more seconds. In contrast to the floating intervals, fixed 10 Student Learning Outcomes: A Work Sampling Approach to
and 20 second intervals required observers to watch their Behavioral Observation (P116B020716). Thanks to Dr.
videotape for exactly 10 or 20 seconds, and the observer was Matthew Mehalik, Learning Research and Development
required to record the behavior category that comprised the Center, University of Pittsburgh, for his assistance in
majority of the time interval. In general, as the fixed length of developing the attributes for the Design outcome.
the intervals increased, accuracy decreased, which can be
attributed in part to the nature of the less prominent attributes. REFERENCES
[1] Besterfield-Sacre, M., E. Newcome, L. Shuman, and H. Wolfe,
PRELIMINARY RESULTS FOR TEAMWORK “Extending Work Sampling to Behavioral and Cognitive Concepts,”
Industrial Engineering Research Conference, Houston, TX, May 16 –
Based on our preliminary data for the outcome Teamwork the 18, 2004 (CD-ROM - 6 pgs.).
following inferences may be plausible. [2] Mehalik, M.M. and C.D. Schunn, “What Constitues Good Design? A
• Confirming our proposition, due to the drastic reduction Review of Empirical Studies of Design Processes,” working paper,
in time required for data collection, attribute percentage December 2003.
0-7803-8552-7/04/$20.00 © 2004 IEEE October 20 – 23, 2004, Savannah, GA
34th ASEE/IEEE Frontiers in Education Conference
F3G-21