National health goals include an increase in the physical activity and physical fitness of school... more National health goals include an increase in the physical activity and physical fitness of school-age children by the year 2000. To assess current fitness levels in the state of Maine, more than 8,000 public school students, ages five through nine, were assessed using a nationally known (American Alliance for Health, Physical Education, Recreation and Dance) health-related physical fitness test. Maine students were then compared with a national norm group on (1) the one-mile walk/run (minutes:seconds), (2) skinfold thickness (centimeters), (3) one-minute timed sit-ups (number performed correctly), and (4) the sit and reach test for flexibility (centimeters). Generally, Maine boys and girls scored higher than the norms on the sit-up, sit and reach, and one-mile walk/run; however, they had significantly larger skinfold thicknesses. Implications for assessment of health-related fitness in this age group were discussed.
This chapter explores the challenges and implications of designing authentic assessments for acco... more This chapter explores the challenges and implications of designing authentic assessments for accountability purposes. Educational accountability takes many forms, from student-level certification for high school graduation to district-level accreditation to ensure that students are being provided with appropriate learning opportunities. Accountability systems are designed to instantiate policy values to lead to certain ends (e.g., 100 percent of students proficient in reading and math). Essentially all accountability systems involve collecting data, analyzing those data according to specific rules to transform the data into accountability indicators, classifying the indicators into various levels of performance (e.g., exemplary to failing), and attributing the results to appropriate individuals or organizations. Many current educational accountability systems have stated goals of promoting deeper learning for students for a variety of reasons, including, among other goals, improving college and career readiness. This chapter takes the position that performance-based and-related assessment approaches must be meaningfully incorporated into accountability systems-by serving as at least one key source of "input" data-if we are to do more than pay lip service to these policy goals. This chapter focuses on design considerations for performance-based assessments for use in accountability systems. This is a broad topic. In order to provide more than a superficial discussion, we focus much of the chapter on the use of performance-based assessments in educator evaluation systems, although the chapter includes brief discussions of school and student accountability systems as well. We do not focus on combining the results of performance-based or other open-response tasks with more traditional selected-response items into a single assessment score for use in any of these accountability determinations. This is clearly an important issue, but beyond the scope of this chapter. We first provide an exposition of the various terms-authentic, direct, alternative, performance, and portfolio-and describe key design features of each. We contend that "performance-based" assessments should be used as the umbrella term as long as certain design principles are met and then describe how performance-based assessment designs may differ depending on the specific accountability use. Next, we offer a rationale for using performance-based assessments in accountability systems and discuss how they can and are incorporated into a variety of accountability systems. We then articulate both general and specific design principles that must be considered when incorporating authentic assessments into educator accountability systems. We focus on key technical criteria, including construct validity, generalizability, and comparability, from the standpoints of both design and evaluation. We conclude by addressing the potential for corruptibility of authentic assessments in accountability systems. Our discussion of performance assessments as part of teacher accountability systems highlights the truism that all test design, especially for accountability purposes, is an exercise in optimization under constraints. In other words, test designers must consider technical, political, fiscal, and capacity constraints when trying to craft an assessment that best meets the design goals. Incorporating
Educational Measurement: Issues and Practice, 2018
C reating balanced assessment systems is really hard. There are few examples of well-functioning ... more C reating balanced assessment systems is really hard. There are few examples of well-functioning assessment systems, other than a limited number of research-practice partnerships. Marc Wilson's recent presidential address to the National Council of Measurement in Education (Wilson, 2018) and Shepard, Penuel, and Pellegrino (2018) extend our understanding of assessment systems by building on conceptualizations from Knowing What Students Know (NRC, 2001). Both papers make the case that high-quality classroom assessments should be situated within balanced assessment systems located at district or state levels. I first discuss slightly the varied orientations toward classroom assessment expressed in these two papers and then focus most of my efforts on practical suggestions for moving productively forward on the development and implementation of balanced assessment systems.
This report reviews the current status, empirical findings, theoretical issues, and practical con... more This report reviews the current status, empirical findings, theoretical issues, and practical considerations related to state-level minimum competency testing programs. It finds that, although two-thirds of current testing programs now use direct writing prompts to assess writing achievement, essentially all programs rely on multiple choice tests to measure knowledge in the other subject areas. It also concludes that empirical evidence regarding the effectiveness of minimum competency testing programs is mixed. It reports improved achievement in basic reading and mathematics skills, especially when curriculum focuses on the same basic skill items found on the tests. However, the report also finds evidence of unintended negative effects of minimum competency testing programs, including lack of transfer to higher order skills, increased dropout rates (especially for minority and low achieving students), a narrowing of the curriculum to test content, corruptibility of high stakes tests, and testing time as time taken from teaching. Overall, the report finds a conflict between minimum competency and standards-based assessment systems since competency testing essentially contradicts current mandates for having students learn rigorous content standards. It recommends against mandating a state-level minimum competency program. (Contains 28 references.) (DB) Reproductions supplied by EDRS are the best that can be made from the original document.
Measurement: Interdisciplinary Research and Perspectives, 2015
The measurement industry is in crisis. The public outcry against “over testing” and the optout mo... more The measurement industry is in crisis. The public outcry against “over testing” and the optout movement are symptoms of a larger sociopolitical battle being fought over Common Core, teacher evaluation, federal intrusion, and a host of other issues, but much of the vitriol is directed at the tests and the testing industry. If we, as measurement professionals, think that these critics just don’t understand the complexities and challenges of measuring the new expectations for deeper learning, we could very well end up being seen as even more scientifically aloof than we already are. What’s worse, we could end up being irrelevant to the larger policy conversations. I argue that at least part of the reason driving the over testing/opt-out movement is that too many stakeholders see too little value in the results of our large-scale assessments. How could that be? We put so much effort into carefully designing our assessments to make sure they meet rigorous psychometric criteria to accurately measure the target constructs. So why isn’t the public more appreciative? Clearly the accountability uses (abuses) are conflated with the perceptions of the tests themselves, but if users were able to get a clear picture of what students actually know and are able to do, I think they would be less likely to want to opt out of receiving such valuable information. Briggs and Peck (this issue) are putting forth an approach for trying to extract more understandable and meaningful information from large-scale-test scores. They focus specifically on doing so when tests are vertically scaled across grades, but it is important to consider the more general application of their ideas to within-grade test score scales as well.
This study assessed the level of scientific and natural resource knowledge that fourth-, eighth-,... more This study assessed the level of scientific and natural resource knowledge that fourth-, eighth-, and eleventh-grade students in Maine possess concerning acidic deposition. A representative sample of public school students (N = 175) was interviewed on twelve concept principles considered critical to a full understanding of the acidic deposition problem. These included geological, meteorological, ecological, political, and economic concepts. Student knowledge was rated for each concept principle on a scale of complete, high partial, low partial, or no understanding. Common misconceptions were also noted. Generalized correct concept statements of current student knowledge are reported, as well as generalized missing concepts. Our conclusions have implications for teaching about acidic deposition and the design of environmental education curriculum materials based upon student knowledge. This information can help teachers better instruct students about current environmental problems and thus help learners gain an appreciation for the complex and multidisciplinary nature of science and the environment. cience and environmental education can be taught * * CY = 0.05, Tukey HSD post-hoc test. *Significant differences; NSD = no significant difference.
The proliferation of local assessment systems, often called benchmark or formative, has become a ... more The proliferation of local assessment systems, often called benchmark or formative, has become a concern for some measurement professionals. Many of these assessment systems being marketed as formative are not at all similar to the types of assessments and strategies studied by Black and Wiliam (1998). Instead these assessments, which we call interim, can be an important piece of a comprehensive assessment system that includes formative, interim, and summative assessments. Interim assessments are given on a larger scale than formative assessments, have less flexibility, and can be aggregated to the school or district level to help inform policy. Interim assessments are driven by their purpose, which can be instructional, evaluative, or predictive. Our intent is to define these "interim assessments" and develop a framework so that district and state leaders can better evaluate these systems for purchase or development. The discussion lays out some concerns with the current ...
Educational Measurement: Issues and Practice, 2007
This article presents findings from two projects designed to improve evaluations of technical qua... more This article presents findings from two projects designed to improve evaluations of technical quality of alternate assessments for students with the most significant cognitive disabilities. We argue that assessment technical documents should allow for the evaluation of the construct validity of the alternate assessments following the traditions
This paper and its "Executive Summary," separately published, are intended primarily for Chief St... more This paper and its "Executive Summary," separately published, are intended primarily for Chief State School Officers and their immediate staff members involved in statewide accountability policy development and implementation. The paper addresses certain key issues and contains a full exploration of the related technical aspects of validity and reliability in Adequate Yearly Progress (AYP) determinations by states responding to the requirements of the No Child Left Behind Act (NCLB). The paper also considers unique issues that arise in designing accountability systems under the NCLB and the critical variables that relate to decisions states must make in financing these systems. The key issues that must be considered are: (1) multiple, separate indicators that may be needed for NCLB and current state programs; (2) the definftion of "proficient" for each purpose; (3) selecting assessments and other indicators; (4) starting points and goals; (5) the minimum number per student group for AYP determinations; (6) the inclusion of all students and schools; and (7) the need to use and advantages of multiple years of data. The passage of NCLB had marked a shift in federal educational policy from an emphasis on standards and assessment to an emphasis on accountability at school, district, and state levels so that all students reach, at a minimum, proficiency on the state's academic achievement standards and state academic assessments. This document is intended to help state educational leaders understand and work with the new accountability requirements. Four appendixes contain supplemental information, including an excerpt from the NCLB and a glossary. (Contains 5 tables, 9 figures, and 16 references.) (SLD) Reproductions supplied by EDRS are the best that can be made from the ori inal document.
A major concern of science educators is the lack of talented females selecting science careers. I... more A major concern of science educators is the lack of talented females selecting science careers. In spite of attempts during the past few decades to create equality in all facets of life in the United States, many sexual stereotypes persist. The research presented here is designed to examine the relationship of sex to the relevant factors influencing the decision to choose a college major in the technical sciences. Data from students in the High School and Beyond sophomore cohort who had participated in surveys in 1980, 1982, 1984, and 1986 (n=188) were analyzed and a patr, model was constructed. While noticeable gender differences did emerge in this study, there was a very small direct effect of gender on the dependent measure of ;cience major. Many of the strong paths showed similar results for both sexes, for example, the path leading from family background through ability, achievement, and advanced courses, to college major. Separating the analyses by sex helped to point out some interesting path differences. The most interesting difference involved the path for females-from self-efficacy through science and mathematics attitudes, to advanced courses, and then on to college major. The same path was virtually absent for males. A list of 16 references is included. (CW) Reproductions supplied by EDRS are the best that can be made from the original document.
A pilot program in New Hampshire models innovative ways creating and applying state assessments a... more A pilot program in New Hampshire models innovative ways creating and applying state assessments and educator accountability. A study of New Hampshire’s new system, which has already received approval by the U.S. Department of Education under a waiver from NCLB, finds some positive results and also suggests challenges states might face in putting these new systems in place in New Hampshire and elsewhere.
This paper examines differences in educational outcomes among 1,792 students from small (fewer th... more This paper examines differences in educational outcomes among 1,792 students from small (fewer than 300 pupils), average (400-700 pupils), and large (900-1,200 pupils) rural high schools, and among 1,084 students from small high schools in urban,
Affective and academic outcomes of grade retention were studied using the High School and Beyond ... more Affective and academic outcomes of grade retention were studied using the High School and Beyond (HSB) data set-a large, nationally representative sample of students. More specifically, the study: compared the academic (achievement and educational attainment) and affective (eacational aspirations) outcomes of retained and non-retained students, compared the academic and affective (including self-efficacy) outcomes of early and late retained students; assessed potential factors influencing the success stories in retention; examined the contribution of sex and socioeconomic status (SE); and identified areas for more detailed inquiry. The cohnrt of high school sophomores was assessed in 1980 and reassessed ia 1982, 1984, and 1986. A total of 1,015 schools were selected for the sample, and 36 seniors and 35 sophomores were randomly selected in each school. In those schools with fewer than 36 seniors and/or sophomores, all eligible students were included in the sample. Participants in all 4 waves of the survey included 13,425 students, of which 1,469 had been retained at qome point in their scholastic careers. The Statistical Package foL. the Social Sciences (SPSSX) was used to conduct descriptive and inferential statistical procedures used to analyze the data. Variables were either items drawn directly from the HSB data or composite variables. Results indicate that there are some success stories for retained students. Slightly fewer than half of the higher SES students who were retained scored above the median on the sophomore achtevement measure, wh. eas fewer than 20% of the lower SES retained students scored above the median on the achievement composite. Gender also played a part in retention decisions and outcomes. Five data tables are included.
This document contains findings of a year-long evaluation of the Mathematics and Education Reform... more This document contains findings of a year-long evaluation of the Mathematics and Education Reform (MER) Forum, a voluntary association targeting the academic mathematics community in four-year colleges and universities. Specifically, the evaluation sought to assess the extent to which MER influenced its members' involvement in mathematics-education reform at both postsecondary and K-12 levels. Since its inception in 1988, MER has expanded from a network targeted at individuals to include a departmental network directed tco,ard mathematics departments of research universities. Data were obtained through a survey of the entire national population of MER participants (n=730), which elicited a 32 percent response rate, site visits to four university departments, participant observation at MER functions, and interviews with department personnel. Findings indicate that MER provided support to mathematicians interested in improving their own teaching, leadership to mathematics departments, and legitimization of educational interests. MER also facilitated faculty participation in the reform of undergraduate mathematics education and, to a lesser extent, reform of K-12 mathematics education. Although mathematicians generally could not attribute changes in their teaching directly to MER, they attributed at least an indirect effect to MER. Although the majority of MER's impact was at the individual level, the program to some extent also facilitated change at broader levels, particularly within mathematics departments. Suggestions for best portraying MER's program are included. Appendices contain a workshop-evaluation questionnaire and copies of survey instruments. (LMI)
The effects of performance assessments on student learning were examined in a year-long project t... more The effects of performance assessments on student learning were examined in a year-long project to help teachers in 13 third-grade classrooms begin to use performance assessments as part of regular instruction in reading and mathematics. The essential research question was whether students learned more or developed qualitatively different understandings because performance assessments were introduced. Achievement resulLs for the approximately 335 students were compared to the performance of third-grad(students in the same schools the year before and to third-grade performance in matched control schools. Researchers worked with the teachers throughout the year to help them develop performance assessments congruent with their own instructional goals. Standardized achievement tests and some items from the Maryland State Department of Education's performance assessment were used to measure achievement. Overall, the predominant finding was one of no difference or no gains in student learning following the year-long effort to introduce performance assessment. Researchers indicated that they saw qualitative changes in performance, but it must be acknowledged that any demonstrated benefits in achievement were small and ephemeral. (Contains 10 tables, 4 figures, and 18 references.) (SLD) **A Reproductions supplied by EDRS are the best that can be made from the original document.
Page 1. Understanding the Relationship Between Student Achievement and the Quality of Educational... more Page 1. Understanding the Relationship Between Student Achievement and the Quality of Educational Facilities: Evidence From Wyoming Lawrence O. Picus Rossier School of Education University of Southern California Scott ...
Increasing numbers of schools and districts have expressed interest in interim assessment systems... more Increasing numbers of schools and districts have expressed interest in interim assessment systems to prepare for summative assessments and to improve teaching and learning. However, with so many commercial interim assessments available, schools and districts are struggling to determine which interim assessment is most appropriate to their needs. Unfortunately, there is little research-based guidance to help schools and districts to make the right choice about how to spend their money. Because we realize the urgency of developing criteria that can describe or evaluate the quality of interim assessments, this article presents the results of an initial attempt to create an instrument that school and district educators could use to evaluate the quality and usefulness of the interim assessment. The instrument is designed for use by state and district leaders to help them select an appropriate interim assessment system for their needs, but it could also be used by test vendors looking to evaluate and improve their own systems and by researchers engaged in studies of interim assessment use. The standards-based reform movement has resulted in the widespread use of assessments designed to measure students' performance at specific points in time-generally at the end of the school year-and to help instantiate learning targets. In spite of the hopes and efforts of policymakers and test developers, these end-of-year tests provide very little useful information to improve the instruction and learning of current students (e.g., Stecher et al., 2008). 1 This is not because there is something "wrong" with these summative accountability tests, but rather they were not designed to meet instructional purposes. Recognizing the inherent limitations of summative assessment for classroom use, educators are looking for additional assessments to inform and monitor student learning during the year in which they are actually instructing the students. Many vendors are now selling what they call "benchmark," "diagnostic," "formative," and/or "predictive" assessments with promises of improving student performance. These systems often
National health goals include an increase in the physical activity and physical fitness of school... more National health goals include an increase in the physical activity and physical fitness of school-age children by the year 2000. To assess current fitness levels in the state of Maine, more than 8,000 public school students, ages five through nine, were assessed using a nationally known (American Alliance for Health, Physical Education, Recreation and Dance) health-related physical fitness test. Maine students were then compared with a national norm group on (1) the one-mile walk/run (minutes:seconds), (2) skinfold thickness (centimeters), (3) one-minute timed sit-ups (number performed correctly), and (4) the sit and reach test for flexibility (centimeters). Generally, Maine boys and girls scored higher than the norms on the sit-up, sit and reach, and one-mile walk/run; however, they had significantly larger skinfold thicknesses. Implications for assessment of health-related fitness in this age group were discussed.
This chapter explores the challenges and implications of designing authentic assessments for acco... more This chapter explores the challenges and implications of designing authentic assessments for accountability purposes. Educational accountability takes many forms, from student-level certification for high school graduation to district-level accreditation to ensure that students are being provided with appropriate learning opportunities. Accountability systems are designed to instantiate policy values to lead to certain ends (e.g., 100 percent of students proficient in reading and math). Essentially all accountability systems involve collecting data, analyzing those data according to specific rules to transform the data into accountability indicators, classifying the indicators into various levels of performance (e.g., exemplary to failing), and attributing the results to appropriate individuals or organizations. Many current educational accountability systems have stated goals of promoting deeper learning for students for a variety of reasons, including, among other goals, improving college and career readiness. This chapter takes the position that performance-based and-related assessment approaches must be meaningfully incorporated into accountability systems-by serving as at least one key source of "input" data-if we are to do more than pay lip service to these policy goals. This chapter focuses on design considerations for performance-based assessments for use in accountability systems. This is a broad topic. In order to provide more than a superficial discussion, we focus much of the chapter on the use of performance-based assessments in educator evaluation systems, although the chapter includes brief discussions of school and student accountability systems as well. We do not focus on combining the results of performance-based or other open-response tasks with more traditional selected-response items into a single assessment score for use in any of these accountability determinations. This is clearly an important issue, but beyond the scope of this chapter. We first provide an exposition of the various terms-authentic, direct, alternative, performance, and portfolio-and describe key design features of each. We contend that "performance-based" assessments should be used as the umbrella term as long as certain design principles are met and then describe how performance-based assessment designs may differ depending on the specific accountability use. Next, we offer a rationale for using performance-based assessments in accountability systems and discuss how they can and are incorporated into a variety of accountability systems. We then articulate both general and specific design principles that must be considered when incorporating authentic assessments into educator accountability systems. We focus on key technical criteria, including construct validity, generalizability, and comparability, from the standpoints of both design and evaluation. We conclude by addressing the potential for corruptibility of authentic assessments in accountability systems. Our discussion of performance assessments as part of teacher accountability systems highlights the truism that all test design, especially for accountability purposes, is an exercise in optimization under constraints. In other words, test designers must consider technical, political, fiscal, and capacity constraints when trying to craft an assessment that best meets the design goals. Incorporating
Educational Measurement: Issues and Practice, 2018
C reating balanced assessment systems is really hard. There are few examples of well-functioning ... more C reating balanced assessment systems is really hard. There are few examples of well-functioning assessment systems, other than a limited number of research-practice partnerships. Marc Wilson's recent presidential address to the National Council of Measurement in Education (Wilson, 2018) and Shepard, Penuel, and Pellegrino (2018) extend our understanding of assessment systems by building on conceptualizations from Knowing What Students Know (NRC, 2001). Both papers make the case that high-quality classroom assessments should be situated within balanced assessment systems located at district or state levels. I first discuss slightly the varied orientations toward classroom assessment expressed in these two papers and then focus most of my efforts on practical suggestions for moving productively forward on the development and implementation of balanced assessment systems.
This report reviews the current status, empirical findings, theoretical issues, and practical con... more This report reviews the current status, empirical findings, theoretical issues, and practical considerations related to state-level minimum competency testing programs. It finds that, although two-thirds of current testing programs now use direct writing prompts to assess writing achievement, essentially all programs rely on multiple choice tests to measure knowledge in the other subject areas. It also concludes that empirical evidence regarding the effectiveness of minimum competency testing programs is mixed. It reports improved achievement in basic reading and mathematics skills, especially when curriculum focuses on the same basic skill items found on the tests. However, the report also finds evidence of unintended negative effects of minimum competency testing programs, including lack of transfer to higher order skills, increased dropout rates (especially for minority and low achieving students), a narrowing of the curriculum to test content, corruptibility of high stakes tests, and testing time as time taken from teaching. Overall, the report finds a conflict between minimum competency and standards-based assessment systems since competency testing essentially contradicts current mandates for having students learn rigorous content standards. It recommends against mandating a state-level minimum competency program. (Contains 28 references.) (DB) Reproductions supplied by EDRS are the best that can be made from the original document.
Measurement: Interdisciplinary Research and Perspectives, 2015
The measurement industry is in crisis. The public outcry against “over testing” and the optout mo... more The measurement industry is in crisis. The public outcry against “over testing” and the optout movement are symptoms of a larger sociopolitical battle being fought over Common Core, teacher evaluation, federal intrusion, and a host of other issues, but much of the vitriol is directed at the tests and the testing industry. If we, as measurement professionals, think that these critics just don’t understand the complexities and challenges of measuring the new expectations for deeper learning, we could very well end up being seen as even more scientifically aloof than we already are. What’s worse, we could end up being irrelevant to the larger policy conversations. I argue that at least part of the reason driving the over testing/opt-out movement is that too many stakeholders see too little value in the results of our large-scale assessments. How could that be? We put so much effort into carefully designing our assessments to make sure they meet rigorous psychometric criteria to accurately measure the target constructs. So why isn’t the public more appreciative? Clearly the accountability uses (abuses) are conflated with the perceptions of the tests themselves, but if users were able to get a clear picture of what students actually know and are able to do, I think they would be less likely to want to opt out of receiving such valuable information. Briggs and Peck (this issue) are putting forth an approach for trying to extract more understandable and meaningful information from large-scale-test scores. They focus specifically on doing so when tests are vertically scaled across grades, but it is important to consider the more general application of their ideas to within-grade test score scales as well.
This study assessed the level of scientific and natural resource knowledge that fourth-, eighth-,... more This study assessed the level of scientific and natural resource knowledge that fourth-, eighth-, and eleventh-grade students in Maine possess concerning acidic deposition. A representative sample of public school students (N = 175) was interviewed on twelve concept principles considered critical to a full understanding of the acidic deposition problem. These included geological, meteorological, ecological, political, and economic concepts. Student knowledge was rated for each concept principle on a scale of complete, high partial, low partial, or no understanding. Common misconceptions were also noted. Generalized correct concept statements of current student knowledge are reported, as well as generalized missing concepts. Our conclusions have implications for teaching about acidic deposition and the design of environmental education curriculum materials based upon student knowledge. This information can help teachers better instruct students about current environmental problems and thus help learners gain an appreciation for the complex and multidisciplinary nature of science and the environment. cience and environmental education can be taught * * CY = 0.05, Tukey HSD post-hoc test. *Significant differences; NSD = no significant difference.
The proliferation of local assessment systems, often called benchmark or formative, has become a ... more The proliferation of local assessment systems, often called benchmark or formative, has become a concern for some measurement professionals. Many of these assessment systems being marketed as formative are not at all similar to the types of assessments and strategies studied by Black and Wiliam (1998). Instead these assessments, which we call interim, can be an important piece of a comprehensive assessment system that includes formative, interim, and summative assessments. Interim assessments are given on a larger scale than formative assessments, have less flexibility, and can be aggregated to the school or district level to help inform policy. Interim assessments are driven by their purpose, which can be instructional, evaluative, or predictive. Our intent is to define these "interim assessments" and develop a framework so that district and state leaders can better evaluate these systems for purchase or development. The discussion lays out some concerns with the current ...
Educational Measurement: Issues and Practice, 2007
This article presents findings from two projects designed to improve evaluations of technical qua... more This article presents findings from two projects designed to improve evaluations of technical quality of alternate assessments for students with the most significant cognitive disabilities. We argue that assessment technical documents should allow for the evaluation of the construct validity of the alternate assessments following the traditions
This paper and its "Executive Summary," separately published, are intended primarily for Chief St... more This paper and its "Executive Summary," separately published, are intended primarily for Chief State School Officers and their immediate staff members involved in statewide accountability policy development and implementation. The paper addresses certain key issues and contains a full exploration of the related technical aspects of validity and reliability in Adequate Yearly Progress (AYP) determinations by states responding to the requirements of the No Child Left Behind Act (NCLB). The paper also considers unique issues that arise in designing accountability systems under the NCLB and the critical variables that relate to decisions states must make in financing these systems. The key issues that must be considered are: (1) multiple, separate indicators that may be needed for NCLB and current state programs; (2) the definftion of "proficient" for each purpose; (3) selecting assessments and other indicators; (4) starting points and goals; (5) the minimum number per student group for AYP determinations; (6) the inclusion of all students and schools; and (7) the need to use and advantages of multiple years of data. The passage of NCLB had marked a shift in federal educational policy from an emphasis on standards and assessment to an emphasis on accountability at school, district, and state levels so that all students reach, at a minimum, proficiency on the state's academic achievement standards and state academic assessments. This document is intended to help state educational leaders understand and work with the new accountability requirements. Four appendixes contain supplemental information, including an excerpt from the NCLB and a glossary. (Contains 5 tables, 9 figures, and 16 references.) (SLD) Reproductions supplied by EDRS are the best that can be made from the ori inal document.
A major concern of science educators is the lack of talented females selecting science careers. I... more A major concern of science educators is the lack of talented females selecting science careers. In spite of attempts during the past few decades to create equality in all facets of life in the United States, many sexual stereotypes persist. The research presented here is designed to examine the relationship of sex to the relevant factors influencing the decision to choose a college major in the technical sciences. Data from students in the High School and Beyond sophomore cohort who had participated in surveys in 1980, 1982, 1984, and 1986 (n=188) were analyzed and a patr, model was constructed. While noticeable gender differences did emerge in this study, there was a very small direct effect of gender on the dependent measure of ;cience major. Many of the strong paths showed similar results for both sexes, for example, the path leading from family background through ability, achievement, and advanced courses, to college major. Separating the analyses by sex helped to point out some interesting path differences. The most interesting difference involved the path for females-from self-efficacy through science and mathematics attitudes, to advanced courses, and then on to college major. The same path was virtually absent for males. A list of 16 references is included. (CW) Reproductions supplied by EDRS are the best that can be made from the original document.
A pilot program in New Hampshire models innovative ways creating and applying state assessments a... more A pilot program in New Hampshire models innovative ways creating and applying state assessments and educator accountability. A study of New Hampshire’s new system, which has already received approval by the U.S. Department of Education under a waiver from NCLB, finds some positive results and also suggests challenges states might face in putting these new systems in place in New Hampshire and elsewhere.
This paper examines differences in educational outcomes among 1,792 students from small (fewer th... more This paper examines differences in educational outcomes among 1,792 students from small (fewer than 300 pupils), average (400-700 pupils), and large (900-1,200 pupils) rural high schools, and among 1,084 students from small high schools in urban,
Affective and academic outcomes of grade retention were studied using the High School and Beyond ... more Affective and academic outcomes of grade retention were studied using the High School and Beyond (HSB) data set-a large, nationally representative sample of students. More specifically, the study: compared the academic (achievement and educational attainment) and affective (eacational aspirations) outcomes of retained and non-retained students, compared the academic and affective (including self-efficacy) outcomes of early and late retained students; assessed potential factors influencing the success stories in retention; examined the contribution of sex and socioeconomic status (SE); and identified areas for more detailed inquiry. The cohnrt of high school sophomores was assessed in 1980 and reassessed ia 1982, 1984, and 1986. A total of 1,015 schools were selected for the sample, and 36 seniors and 35 sophomores were randomly selected in each school. In those schools with fewer than 36 seniors and/or sophomores, all eligible students were included in the sample. Participants in all 4 waves of the survey included 13,425 students, of which 1,469 had been retained at qome point in their scholastic careers. The Statistical Package foL. the Social Sciences (SPSSX) was used to conduct descriptive and inferential statistical procedures used to analyze the data. Variables were either items drawn directly from the HSB data or composite variables. Results indicate that there are some success stories for retained students. Slightly fewer than half of the higher SES students who were retained scored above the median on the sophomore achtevement measure, wh. eas fewer than 20% of the lower SES retained students scored above the median on the achievement composite. Gender also played a part in retention decisions and outcomes. Five data tables are included.
This document contains findings of a year-long evaluation of the Mathematics and Education Reform... more This document contains findings of a year-long evaluation of the Mathematics and Education Reform (MER) Forum, a voluntary association targeting the academic mathematics community in four-year colleges and universities. Specifically, the evaluation sought to assess the extent to which MER influenced its members' involvement in mathematics-education reform at both postsecondary and K-12 levels. Since its inception in 1988, MER has expanded from a network targeted at individuals to include a departmental network directed tco,ard mathematics departments of research universities. Data were obtained through a survey of the entire national population of MER participants (n=730), which elicited a 32 percent response rate, site visits to four university departments, participant observation at MER functions, and interviews with department personnel. Findings indicate that MER provided support to mathematicians interested in improving their own teaching, leadership to mathematics departments, and legitimization of educational interests. MER also facilitated faculty participation in the reform of undergraduate mathematics education and, to a lesser extent, reform of K-12 mathematics education. Although mathematicians generally could not attribute changes in their teaching directly to MER, they attributed at least an indirect effect to MER. Although the majority of MER's impact was at the individual level, the program to some extent also facilitated change at broader levels, particularly within mathematics departments. Suggestions for best portraying MER's program are included. Appendices contain a workshop-evaluation questionnaire and copies of survey instruments. (LMI)
The effects of performance assessments on student learning were examined in a year-long project t... more The effects of performance assessments on student learning were examined in a year-long project to help teachers in 13 third-grade classrooms begin to use performance assessments as part of regular instruction in reading and mathematics. The essential research question was whether students learned more or developed qualitatively different understandings because performance assessments were introduced. Achievement resulLs for the approximately 335 students were compared to the performance of third-grad(students in the same schools the year before and to third-grade performance in matched control schools. Researchers worked with the teachers throughout the year to help them develop performance assessments congruent with their own instructional goals. Standardized achievement tests and some items from the Maryland State Department of Education's performance assessment were used to measure achievement. Overall, the predominant finding was one of no difference or no gains in student learning following the year-long effort to introduce performance assessment. Researchers indicated that they saw qualitative changes in performance, but it must be acknowledged that any demonstrated benefits in achievement were small and ephemeral. (Contains 10 tables, 4 figures, and 18 references.) (SLD) **A Reproductions supplied by EDRS are the best that can be made from the original document.
Page 1. Understanding the Relationship Between Student Achievement and the Quality of Educational... more Page 1. Understanding the Relationship Between Student Achievement and the Quality of Educational Facilities: Evidence From Wyoming Lawrence O. Picus Rossier School of Education University of Southern California Scott ...
Increasing numbers of schools and districts have expressed interest in interim assessment systems... more Increasing numbers of schools and districts have expressed interest in interim assessment systems to prepare for summative assessments and to improve teaching and learning. However, with so many commercial interim assessments available, schools and districts are struggling to determine which interim assessment is most appropriate to their needs. Unfortunately, there is little research-based guidance to help schools and districts to make the right choice about how to spend their money. Because we realize the urgency of developing criteria that can describe or evaluate the quality of interim assessments, this article presents the results of an initial attempt to create an instrument that school and district educators could use to evaluate the quality and usefulness of the interim assessment. The instrument is designed for use by state and district leaders to help them select an appropriate interim assessment system for their needs, but it could also be used by test vendors looking to evaluate and improve their own systems and by researchers engaged in studies of interim assessment use. The standards-based reform movement has resulted in the widespread use of assessments designed to measure students' performance at specific points in time-generally at the end of the school year-and to help instantiate learning targets. In spite of the hopes and efforts of policymakers and test developers, these end-of-year tests provide very little useful information to improve the instruction and learning of current students (e.g., Stecher et al., 2008). 1 This is not because there is something "wrong" with these summative accountability tests, but rather they were not designed to meet instructional purposes. Recognizing the inherent limitations of summative assessment for classroom use, educators are looking for additional assessments to inform and monitor student learning during the year in which they are actually instructing the students. Many vendors are now selling what they call "benchmark," "diagnostic," "formative," and/or "predictive" assessments with promises of improving student performance. These systems often
Multiple needs and responses have driven the move to personalized and competency-based learning s... more Multiple needs and responses have driven the move to personalized and competency-based learning systems, including the desire to enhance the learning outcomes for all students by creating contexts for students to engage in and take control of their own learning. Some might immediately think that such personalization would be a barrier to assessment design. For an extreme example, imagine if every student was pursuing a different learning path. While it might not be very efficient to create a slightly or radically different assessment for each student, as long as the learning targets are explicit, measurement specialists should be able to design appropriate assessments to document what students have learned related to what they were expected to learn. We are able to draw on well-established procedures for assessment design that would not necessarily change for the more personalized case. The real challenge arises, however, when there is a desire or need to include the results from such assessments in school or educator accountability systems.
Ensuring that the rules used to produce the accountability results are fair to the various types of individuals or entities subject to the accountability system is a key tenet of accountability system design. Fairness is manifest, among other ways, by holding people (or schools) to comparable achievement standards using the same or similar types of data, transformations of the data into indicators, and criteria to judge the value of the indicators. More succinctly, “comparability” is viewed by many as an important aspect of designing fair accountability systems. On first glance, personalization and comparability appear to be at odds and, if each is taken strictly, they are. However, in most cases, personalization, at least in K-12 public schools, does not mean complete freedom to choose any possible learning path. In almost all cases, the pace of learning is usually personalized with somewhat less freedom to choose the content to be studied. Nevertheless, even this more limited level of comparability may pose significant accountability challenges. This paper addresses the range of personalized learning that we might expect in current K-12 systems, but focuses primarily on understanding the ways in which comparability may be considered to help bridge the apparent divide between fair accountability systems and personalized learning. For example, if comparability is defined as strict psychometric interchangeability, it is doubtful that such personalized systems could meet such a threshold. On the other hand, if comparability is evaluated by the ways in which the assessment results predict or support similarly rigorous outcomes, the door may be open to incorporating the results of personalized and competency-based assessments into accountability systems. This paper presents a conceptualization of comparability that is less stringent than interchangeability of student scores. We then present the results of applying such a perspective to a competency-based pilot project where the school-based results must still be used in school accountability determinations. The paper concludes with a discussion of the challenges and opportunities associated with trying to use the results of personalized and competency-based learning systems in large-scale accountability systems.
Uploads
Papers by Scott Marion
Ensuring that the rules used to produce the accountability results are fair to the various types of individuals or entities subject to the accountability system is a key tenet of accountability system design. Fairness is manifest, among other ways, by holding people (or schools) to comparable achievement standards using the same or similar types of data, transformations of the data into indicators, and criteria to judge the value of the indicators. More succinctly, “comparability” is viewed by many as an important aspect of designing fair accountability systems. On first glance, personalization and comparability appear to be at odds and, if each is taken strictly, they are. However, in most cases, personalization, at least in K-12 public schools, does not mean complete freedom to choose any possible learning path. In almost all cases, the pace of learning is usually personalized with somewhat less freedom to choose the content to be studied. Nevertheless, even this more limited level of comparability may pose significant accountability challenges.
This paper addresses the range of personalized learning that we might expect in current K-12 systems, but focuses primarily on understanding the ways in which comparability may be considered to help bridge the apparent divide between fair accountability systems and personalized learning. For example, if comparability is defined as strict psychometric interchangeability, it is doubtful that such personalized systems could meet such a threshold. On the other hand, if comparability is evaluated by the ways in which the assessment results predict or support similarly rigorous outcomes, the door may be open to incorporating the results of personalized and competency-based assessments into accountability systems. This paper presents a conceptualization of comparability that is less stringent than interchangeability of student scores. We then present the results of applying such a perspective to a competency-based pilot project where the school-based results must still be used in school accountability determinations. The paper concludes with a discussion of the challenges and opportunities associated with trying to use the results of personalized and competency-based learning systems in large-scale accountability systems.