Skip to main content

Richard P Phelps

None, Nonpartisan Education Group, Founder & Editor

Followers

229

Following

48

Public Views

Richard P. Phelps is founder of the Nonpartisan Education Group and editor of its peer-reviewed journal, the Nonpartisan Education Review (http://nonpartisaneducation.org) a Fulbright Scholar, and a fellow of the Psychophysics Laboratory. He has authored, or edited and authored, four books on assessment policy -- Correcting Fallacies about Educational and Psychological Testing (APA); Standardized Testing Primer (Peter Lang); Defending Standardized Testing (Psychology Press); and Kill the Messenger: The War on Standardized Testing (Transaction) – and several statistical compendia. Phelps has held positions with several organizations working in assessment, including ACT, AIR, ETS, the OECD, Pearson, and Westat. He holds degrees from Washington, Indiana, and Harvard Universities, and a PhD in Public Policy from the University of Pennsylvania’s Wharton School.
Address: Asheville, North Carolina, USA

less

Gregorio Perez Arrau, Phd.

Universidad de Santiago de Chile

Jonathan Zittrain

Harvard University

University of California, Santa Barbara

University of British Columbia

Nicola Jane Holt

University of the West of England

George K. # Zarifis

Aristotle University of Thessaloniki

National University of "Kyiv-Mohyla Academy"

Daffodil International University(DIU)

University of Rhode Island

Czarina Medina-Guce

Ateneo de Manila University

InterestsView All (27)

Uploads

Papers by Richard P Phelps

No Child Left Behind, Common Core, and the Lost Benefits of Effective Testing

Presentation before the Governor's council on common core review, State Capitol, Little Rock,... more Presentation before the Governor's council on common core review, State Capitol, Little Rock, May 2015.- Ordinary citizens seem to have more leverage at the state level.- US public debate on education testing now totally one-sided.- What do most of our successful international competitors do? - multi-level, multi-target “grade span” high-stakes testing - effect of testing with stakes? ~ 2 grade levels of increased achievement.

It's a Myth: High Stakes Causes Test Score Inflation (Presentation Slides)

SSRN Electronic Journal, 2017

OECD Encourages World to Adopt Failed US Ed Programs

Social Science Research Network, 2013

Whereas, innovation is a holy commandment for the US education professoriate, critics charge that... more Whereas, innovation is a holy commandment for the US education professoriate, critics charge that it leads to a continuous cycle of fad after fad after fad. After all, if innovation is always good, then any program that has been around for a while must be bad, no matter how successful it might be in improving student achievement. Moreover, if the pace of today’s-innovation-replacing-yesterday’s-innovation proceeds fast enough, evaluation reports are finished well after one program has been replaced by another, become irrelevant before they are published and end up unread. Ultimately, in a rapidly innovating environment, we learn nothing about what works. Some critics of the radical constructivists suspect that that chaotic, swirling maelstrom may be their desired equilibrium state.

Interview with Jim Zellner of Schoolinfosystem.org

Nonpartisan Education Review, 2016

Weekly from Madison, Wisconsin, Jim Zellmer emails a selection of links to fifty or so education-... more Weekly from Madison, Wisconsin, Jim Zellmer emails a selection of links to fifty or so education-related news stories, essays, blog posts, and other relevant sources. It's my favorite, and most edifying, source of education information. I wanted to learn more about Mr. Zellmer and his web site schoolinfosystem.org and so requested an interview. Here it is.

What Happened at the OECD

The OECD’s Review on Evaluation and Assessment Frameworks for Improving School Outcomes -- REAFIS... more The OECD’s Review on Evaluation and Assessment Frameworks for Improving School Outcomes -- REAFISO -- relies on staff generalists and itinerant workers to compose its most essential reports. I suspect that the REAFISO writers started out unknowing, trusted the research work they found most easily, and followed in the direction those researchers pointed them. Ultimately, they relied on the most easily and inexpensively gathered document sources. I believe that REAFISO got caught in a one-way trap or, as others might term it: a bubble, echo chamber, infinite (feedback) loop, or myopia. They began their study with the work of celebrity researchers -- dismissive reviewers -- researchers who ignore (or declare nonexistent) those researchers and that research that contradicts their own (Phelps, 2012a) -- and never found their way out. Dismissive reviewers blow bubbles, construct echo chambers, and program infinite loops by acknowledging only that research and those researchers they like o...

Are Education Journalists Objective or Players in the Game

There are two education establishments. On one side are the stand-pat public-school vested intere... more There are two education establishments. On one side are the stand-pat public-school vested interests that resist any encroachment to their power and control but, ironically, often portray themselves as innovative and democratic. They consolidated control over education school hiring and ideology — and consequently education research and teacher training — more than a quarter century ago. Yet, they somehow manage to convince journalists that they have had nothing to do with running our public schools lately, and the simultaneous deterioration of US public school quality. Others, such as allegedly nefarious corporate interests and school-bashing politicians must be at fault. So long as the education establishment can get away with this — playing their progressive education fiddle while our public schools burn — and have the casualties blamed on others — they can maintain the conceit of continually wanting to fix things through degrading, incessant “innovation”. It’s cynical, but can w...

What the Show Me Institute will not show us

Nonpartisan Education Review, 2018

no abstract

On Test Security, the Gao Could Do More

A “leading practice” in the terms of the CCSSO and ATP is not a “practice” at all; it is a plan f... more A “leading practice” in the terms of the CCSSO and ATP is not a “practice” at all; it is a plan for practice. That is, it is not about behavior or action, it is about a plan for behavior or action. And even the character of the plan is left to the discretion of the local school or district. Any local school or district with a test security plan in its files can claim that it is following leading practices. As model test security plans are routinely provided by test developers as part of their contract, every local school or district can be a leading test security practitioner by default.

Effect of Testing on Student Achievement: Meta-Analyses and Research Summary

PsycEXTRA Dataset

no abstract

Setting Academic Performance Standards: MCAS vs. PARCC

Common Core proponents have managed to convince most journalists, policymakers, and other opinion... more Common Core proponents have managed to convince most journalists, policymakers, and other opinion leaders that the Common Core standards are higher, deeper, tougher, more challenging, and more rigorous than their antecedents. This is, arguably, their greatest accomplishment. Ask those journalists, policymakers, and other opinion leaders to identify the aspects of the Common Core standards that make them superior, however, and one is likely to hear only more marketing doublespeak about "problem solving", "deeper learning", "critical thinking", or the like. Most supporters of the Common Core do not understand how the Common Core standards or tests might be better. They simply assume that they must be because they have been told so often that they are. Large sums from private foundations and the U.S. Education Department have been employed to sell Common Core to the U.S. public. 1 It is unfortunate that funds were not directed toward educating the public about how standards actually work to raise student academic achievement. Their two-part nature-comprising both content and performance-is most fundamental for such an understanding. The Common Core State Standards (CCSS) document itself comprises only the pretend-content part-listing topics in math, and skills in English language arts that teachers should cover or develop over the course of a student's school career. By themselves, however, these and most other sets of content standards amount to little more than a plan. Indeed, absent any sort of monitoring or evaluation, teachers may feel free to ignore them. The second part of the structure-the performance standards, or the tests based on the content standards-is essential for standards to be effective. Performance standards tell us how well students master the content via letter grades, test scores, or other types of evaluative feedback.

Down the memory hole: Evidence on educational testing

Academic Questions, 2020

What happens to the research evidence in a scientific field when the professionals in that field ... more

A Critical Review of 'Getting Tough? The Impact of High School Graduation Exams

SSRN Electronic Journal, 2020

The highly-praised and influential study, “Getting Tough?,” was published in 2001. Briefly, while... more The highly-praised and influential study, “Getting Tough?,” was published in 2001. Briefly, while controlling for a host of student, school, state, and educator background variables, the study regressed 1988 to 1992 student-level achievement score gains onto a dummy variable for the presence (or not) of a high school graduation test at the student’s school. The 1992-1988 difference in scores on the embedded cognitive test in a US Educational Department longitudinal survey comprised the gain scores. The study was praised for its methodology, controlling for multiple baseline variables which previous researchers allegedly had not, and by some opposed to high-stakes standardized testing for its finding of no achievement gains. Indeed, some characterized the work as so far superior in method it justified dismissing all previous work on the topic. Moreover, the article was timely, its appearance in print coincident with congressional consideration of the No Child Left Behind Act (2001) and its new federal mandate requiring annual testing in seven grades and three subjects in all U.S. public schools. The article also served as the foundation for a string of ensuing studies nominally showing that graduation exams bore few benefits and outsized costs (e.g., in dropout rates). Graduation exam opponents would employ these critical studies as evidence to political effect. From a high number of more than thirty states around the turn of the millennium, graduation tests are now administered in only seven or eight states. The multivariate analysis in “Getting Tough?” should have had the advantage of authenticity — an analysis of a phenomenon studied in its actual context. But that should mean that the context is understood and specified in the analysis, not ignored as if it couldn’t matter. And, it could have been understood and specified. Most of the relevant information left out of “Getting Tough?” — specific values for other factors that tend to affect test performance or student achievement — was available from the three contemporary surveys, and the rest could have been obtained from a more detailed evidence-gathering effort. The study could have been more insightful had it been done differently, perhaps with less emphasis on “more sophisticated” and “more rigorous” mathematical analysis, and more emphasis on understanding and specifying the context — how testing programs are organized, how tests are administered, the effective differences among the wide variety and forms of tests and how students respond differently to each, the legal context of testing in the late 1980s and early 1990s, and so on.

DC Education Reform Ten Years After, Part 2: Test Cheats

SSRN Electronic Journal, 2020

The National Research Council's Testing Expertise

Imagine this scenario if you can: A new surgical technique has been developed by medical surgeons... more Imagine this scenario if you can: A new surgical technique has been developed by medical surgeons that is estimated will provide health and longevity benefits to U.S. citizens on the scale of tens of billions of dollars. Over a thousand controlled studies have been conducted to date, and the aggregate results are overwhelmingly positive. Moreover, many of the studies, and their meta-analyses, have been conducted by some the world’s most respected medical professors and surgeons. The U.S. Department of Health and Human Services, which would have to pay for the new surgical procedure if it were to be approved for reimbursement under Medicare, wishes to conduct one final evaluation of the efficacy of the new technique. So, they contract with the scientific “court of last resort”, the National Research Council (NRC), to evaluate. The NRC agrees and, as usual, sets about recruiting experts to serve on a committee that will conduct the study and produce a final evaluative report. But, the...

The (Secret) Document that Drives Standardized Testing

Nonpartisan Education Review, 2015

Ironically, the same industry insider who warned me against revealing the contents of the revised... more Ironically, the same industry insider who warned me against revealing the contents of the revised Standards draft has himself publicly asserted its colossal social and legal impact on US society. Yet, he defends both the secrecy and insularity of the current drafting process. Power to draft the Standards as they see fit has been divvied up among the chosen few on the “Joint Committee”, a baker’s dozen of education professors and industry insiders. This is an extraordinarily small number of people to essentially be writing our country’s testing law. In the case of chapter 13, at most “2-3 persons” craft our country’s testing policy. Not only is this a tiny group in number, these particular persons represent a biased and extreme point of view. Read the revised Standards, though, and theirs is the only point of view you will be allowed to know. As they have for a few decades now, these folks arrogantly declare a cornucopia of contrary opinion and evidence nonexistent.

Looking Back on DC Education Reform 10 Years After, Part 1: The Grand Tour

Nonpartisan Education Review, 2020

Ten years ago, I worked as the Director of Assessments for the District of Columbia Public School... more

The Education Writers Association Casts Its Narrowing Gaze on Boston

no abstract

On the Reporting of Measurement Uncertainty and Reliability for U.S. Educational and Licensure Tests: Report to Ofqual

SSRN Electronic Journal, 2010

Ofqual seeks to determine the prevalence and character of measurement uncertainty reporting for h... more Ofqual seeks to determine the prevalence and character of measurement uncertainty reporting for high-stakes tests in the United States. The research questions might be phrased as: Is the reporting of measurement error (i.e., score imprecision) common or typical, or is it uncommon or atypical? And, if it is common or typical, how is it commonly or typically done? We conducted Web searches (and followed up where needed with telephone calls) and contacted key researchers at relevant entities involved in reporting test results in the United States. We sought to learn: The prevalence among our sample respondents of the reporting of measurement uncertainty in high-stakes tests. The degree of ease or difficulty with which ordinary citizens may access such information. The degree of transparency with measurement uncertainty issues varies. Transparency seems to be greater for education than for licensure tests, for mostly objective than for mostly essay tests, for larger programs than for smaller programs, and, perhaps ironically, the greater the role of test contractors and the smaller the role of state government. With educational tests, many of the states highlight imprecision along with the student scores on the parent/student reports. (More states now are reporting score bands.) But all states prepare technical manuals, and just about all technical manuals are readily available to those who want them. With licensure exams, the situation is mixed. Some provide information about uncertainly on the candidate report itself, and more reliability information in a yearly technical document. Others make available various technical reports and papers summarizing reliability information. Still others produce reports with substantial detail that are not released to the public. Is the totality of uncertainty reported to all stakeholders in U.S. educational and licensure testing programs? No. It would be difficult for the average parent to find a full range of measurement uncertainty statistics for their children’s tests, for example. But, then, the average parent would not be looking. And, that is why technical manuals are not found front and center on the home page of testing program Web sites. Documents that better respond to the typical consumer’s needs are placed front and center, and the technical manuals are placed a few to several clicks behind. But, they are not hidden. There seems not to be any effort to hide information; the level of dissemination appears to respond well to the demand for it.

The (Secret) Document that Drives Standardized Testing

Education News, 2012

Ironically, the same industry insider who warned me against revealing the contents of the revised... more Ironically, the same industry insider who warned me against revealing the contents of the revised Standards draft has himself publicly asserted its colossal social and legal impact on US society. Yet, he defends both the secrecy and insularity of the current drafting process. Power to draft the Standards as they see fit has been divvied up among the chosen few on the “Joint Committee”, a baker’s dozen of education professors and industry insiders. This is an extraordinarily small number of people to essentially be writing our country’s testing law. In the case of chapter 13, at most “2-3 persons” craft our country’s testing policy. Not only is this a tiny group in number, these particular persons represent a biased and extreme point of view. Read the revised Standards, though, and theirs is the only point of view you will be allowed to know. As they have for a few decades now, these folks arrogantly declare a cornucopia of contrary opinion and evidence nonexistent.

Test Basher Benefit-Cost Analysis

Starting in the late 1980s, two teams of researchers, well known for their criticism of standardi... more Starting in the late 1980s, two teams of researchers, well known for their criticism of standardized tests on equity and validity grounds, began attacking standardized testing on efficiency grounds as well, using cost-benefit analysis to do it. Their analyses are reviewed, and their conclusions discussed. The first team, Lorrie

No Child Left Behind, Common Core, and the Lost Benefits of Effective Testing

Presentation before the Governor's council on common core review, State Capitol, Little Rock,... more Presentation before the Governor's council on common core review, State Capitol, Little Rock, May 2015.- Ordinary citizens seem to have more leverage at the state level.- US public debate on education testing now totally one-sided.- What do most of our successful international competitors do? - multi-level, multi-target “grade span” high-stakes testing - effect of testing with stakes? ~ 2 grade levels of increased achievement.

It's a Myth: High Stakes Causes Test Score Inflation (Presentation Slides)

SSRN Electronic Journal, 2017

OECD Encourages World to Adopt Failed US Ed Programs

Social Science Research Network, 2013

Whereas, innovation is a holy commandment for the US education professoriate, critics charge that... more Whereas, innovation is a holy commandment for the US education professoriate, critics charge that it leads to a continuous cycle of fad after fad after fad. After all, if innovation is always good, then any program that has been around for a while must be bad, no matter how successful it might be in improving student achievement. Moreover, if the pace of today’s-innovation-replacing-yesterday’s-innovation proceeds fast enough, evaluation reports are finished well after one program has been replaced by another, become irrelevant before they are published and end up unread. Ultimately, in a rapidly innovating environment, we learn nothing about what works. Some critics of the radical constructivists suspect that that chaotic, swirling maelstrom may be their desired equilibrium state.

Interview with Jim Zellner of Schoolinfosystem.org

Nonpartisan Education Review, 2016

Weekly from Madison, Wisconsin, Jim Zellmer emails a selection of links to fifty or so education-... more Weekly from Madison, Wisconsin, Jim Zellmer emails a selection of links to fifty or so education-related news stories, essays, blog posts, and other relevant sources. It's my favorite, and most edifying, source of education information. I wanted to learn more about Mr. Zellmer and his web site schoolinfosystem.org and so requested an interview. Here it is.

What Happened at the OECD

The OECD’s Review on Evaluation and Assessment Frameworks for Improving School Outcomes -- REAFIS... more The OECD’s Review on Evaluation and Assessment Frameworks for Improving School Outcomes -- REAFISO -- relies on staff generalists and itinerant workers to compose its most essential reports. I suspect that the REAFISO writers started out unknowing, trusted the research work they found most easily, and followed in the direction those researchers pointed them. Ultimately, they relied on the most easily and inexpensively gathered document sources. I believe that REAFISO got caught in a one-way trap or, as others might term it: a bubble, echo chamber, infinite (feedback) loop, or myopia. They began their study with the work of celebrity researchers -- dismissive reviewers -- researchers who ignore (or declare nonexistent) those researchers and that research that contradicts their own (Phelps, 2012a) -- and never found their way out. Dismissive reviewers blow bubbles, construct echo chambers, and program infinite loops by acknowledging only that research and those researchers they like o...

Are Education Journalists Objective or Players in the Game

There are two education establishments. On one side are the stand-pat public-school vested intere... more There are two education establishments. On one side are the stand-pat public-school vested interests that resist any encroachment to their power and control but, ironically, often portray themselves as innovative and democratic. They consolidated control over education school hiring and ideology — and consequently education research and teacher training — more than a quarter century ago. Yet, they somehow manage to convince journalists that they have had nothing to do with running our public schools lately, and the simultaneous deterioration of US public school quality. Others, such as allegedly nefarious corporate interests and school-bashing politicians must be at fault. So long as the education establishment can get away with this — playing their progressive education fiddle while our public schools burn — and have the casualties blamed on others — they can maintain the conceit of continually wanting to fix things through degrading, incessant “innovation”. It’s cynical, but can w...

What the Show Me Institute will not show us

Nonpartisan Education Review, 2018

no abstract

On Test Security, the Gao Could Do More

A “leading practice” in the terms of the CCSSO and ATP is not a “practice” at all; it is a plan f... more A “leading practice” in the terms of the CCSSO and ATP is not a “practice” at all; it is a plan for practice. That is, it is not about behavior or action, it is about a plan for behavior or action. And even the character of the plan is left to the discretion of the local school or district. Any local school or district with a test security plan in its files can claim that it is following leading practices. As model test security plans are routinely provided by test developers as part of their contract, every local school or district can be a leading test security practitioner by default.

Effect of Testing on Student Achievement: Meta-Analyses and Research Summary

PsycEXTRA Dataset

no abstract

Setting Academic Performance Standards: MCAS vs. PARCC

Common Core proponents have managed to convince most journalists, policymakers, and other opinion... more Common Core proponents have managed to convince most journalists, policymakers, and other opinion leaders that the Common Core standards are higher, deeper, tougher, more challenging, and more rigorous than their antecedents. This is, arguably, their greatest accomplishment. Ask those journalists, policymakers, and other opinion leaders to identify the aspects of the Common Core standards that make them superior, however, and one is likely to hear only more marketing doublespeak about "problem solving", "deeper learning", "critical thinking", or the like. Most supporters of the Common Core do not understand how the Common Core standards or tests might be better. They simply assume that they must be because they have been told so often that they are. Large sums from private foundations and the U.S. Education Department have been employed to sell Common Core to the U.S. public. 1 It is unfortunate that funds were not directed toward educating the public about how standards actually work to raise student academic achievement. Their two-part nature-comprising both content and performance-is most fundamental for such an understanding. The Common Core State Standards (CCSS) document itself comprises only the pretend-content part-listing topics in math, and skills in English language arts that teachers should cover or develop over the course of a student's school career. By themselves, however, these and most other sets of content standards amount to little more than a plan. Indeed, absent any sort of monitoring or evaluation, teachers may feel free to ignore them. The second part of the structure-the performance standards, or the tests based on the content standards-is essential for standards to be effective. Performance standards tell us how well students master the content via letter grades, test scores, or other types of evaluative feedback.

Down the memory hole: Evidence on educational testing

Academic Questions, 2020

What happens to the research evidence in a scientific field when the professionals in that field ... more

A Critical Review of 'Getting Tough? The Impact of High School Graduation Exams

SSRN Electronic Journal, 2020

The highly-praised and influential study, “Getting Tough?,” was published in 2001. Briefly, while... more The highly-praised and influential study, “Getting Tough?,” was published in 2001. Briefly, while controlling for a host of student, school, state, and educator background variables, the study regressed 1988 to 1992 student-level achievement score gains onto a dummy variable for the presence (or not) of a high school graduation test at the student’s school. The 1992-1988 difference in scores on the embedded cognitive test in a US Educational Department longitudinal survey comprised the gain scores. The study was praised for its methodology, controlling for multiple baseline variables which previous researchers allegedly had not, and by some opposed to high-stakes standardized testing for its finding of no achievement gains. Indeed, some characterized the work as so far superior in method it justified dismissing all previous work on the topic. Moreover, the article was timely, its appearance in print coincident with congressional consideration of the No Child Left Behind Act (2001) and its new federal mandate requiring annual testing in seven grades and three subjects in all U.S. public schools. The article also served as the foundation for a string of ensuing studies nominally showing that graduation exams bore few benefits and outsized costs (e.g., in dropout rates). Graduation exam opponents would employ these critical studies as evidence to political effect. From a high number of more than thirty states around the turn of the millennium, graduation tests are now administered in only seven or eight states. The multivariate analysis in “Getting Tough?” should have had the advantage of authenticity — an analysis of a phenomenon studied in its actual context. But that should mean that the context is understood and specified in the analysis, not ignored as if it couldn’t matter. And, it could have been understood and specified. Most of the relevant information left out of “Getting Tough?” — specific values for other factors that tend to affect test performance or student achievement — was available from the three contemporary surveys, and the rest could have been obtained from a more detailed evidence-gathering effort. The study could have been more insightful had it been done differently, perhaps with less emphasis on “more sophisticated” and “more rigorous” mathematical analysis, and more emphasis on understanding and specifying the context — how testing programs are organized, how tests are administered, the effective differences among the wide variety and forms of tests and how students respond differently to each, the legal context of testing in the late 1980s and early 1990s, and so on.

DC Education Reform Ten Years After, Part 2: Test Cheats

SSRN Electronic Journal, 2020

The National Research Council's Testing Expertise

Imagine this scenario if you can: A new surgical technique has been developed by medical surgeons... more Imagine this scenario if you can: A new surgical technique has been developed by medical surgeons that is estimated will provide health and longevity benefits to U.S. citizens on the scale of tens of billions of dollars. Over a thousand controlled studies have been conducted to date, and the aggregate results are overwhelmingly positive. Moreover, many of the studies, and their meta-analyses, have been conducted by some the world’s most respected medical professors and surgeons. The U.S. Department of Health and Human Services, which would have to pay for the new surgical procedure if it were to be approved for reimbursement under Medicare, wishes to conduct one final evaluation of the efficacy of the new technique. So, they contract with the scientific “court of last resort”, the National Research Council (NRC), to evaluate. The NRC agrees and, as usual, sets about recruiting experts to serve on a committee that will conduct the study and produce a final evaluative report. But, the...

The (Secret) Document that Drives Standardized Testing

Nonpartisan Education Review, 2015

Ironically, the same industry insider who warned me against revealing the contents of the revised... more Ironically, the same industry insider who warned me against revealing the contents of the revised Standards draft has himself publicly asserted its colossal social and legal impact on US society. Yet, he defends both the secrecy and insularity of the current drafting process. Power to draft the Standards as they see fit has been divvied up among the chosen few on the “Joint Committee”, a baker’s dozen of education professors and industry insiders. This is an extraordinarily small number of people to essentially be writing our country’s testing law. In the case of chapter 13, at most “2-3 persons” craft our country’s testing policy. Not only is this a tiny group in number, these particular persons represent a biased and extreme point of view. Read the revised Standards, though, and theirs is the only point of view you will be allowed to know. As they have for a few decades now, these folks arrogantly declare a cornucopia of contrary opinion and evidence nonexistent.

Looking Back on DC Education Reform 10 Years After, Part 1: The Grand Tour

Nonpartisan Education Review, 2020

Ten years ago, I worked as the Director of Assessments for the District of Columbia Public School... more

The Education Writers Association Casts Its Narrowing Gaze on Boston

no abstract

On the Reporting of Measurement Uncertainty and Reliability for U.S. Educational and Licensure Tests: Report to Ofqual

SSRN Electronic Journal, 2010

Ofqual seeks to determine the prevalence and character of measurement uncertainty reporting for h... more Ofqual seeks to determine the prevalence and character of measurement uncertainty reporting for high-stakes tests in the United States. The research questions might be phrased as: Is the reporting of measurement error (i.e., score imprecision) common or typical, or is it uncommon or atypical? And, if it is common or typical, how is it commonly or typically done? We conducted Web searches (and followed up where needed with telephone calls) and contacted key researchers at relevant entities involved in reporting test results in the United States. We sought to learn: The prevalence among our sample respondents of the reporting of measurement uncertainty in high-stakes tests. The degree of ease or difficulty with which ordinary citizens may access such information. The degree of transparency with measurement uncertainty issues varies. Transparency seems to be greater for education than for licensure tests, for mostly objective than for mostly essay tests, for larger programs than for smaller programs, and, perhaps ironically, the greater the role of test contractors and the smaller the role of state government. With educational tests, many of the states highlight imprecision along with the student scores on the parent/student reports. (More states now are reporting score bands.) But all states prepare technical manuals, and just about all technical manuals are readily available to those who want them. With licensure exams, the situation is mixed. Some provide information about uncertainly on the candidate report itself, and more reliability information in a yearly technical document. Others make available various technical reports and papers summarizing reliability information. Still others produce reports with substantial detail that are not released to the public. Is the totality of uncertainty reported to all stakeholders in U.S. educational and licensure testing programs? No. It would be difficult for the average parent to find a full range of measurement uncertainty statistics for their children’s tests, for example. But, then, the average parent would not be looking. And, that is why technical manuals are not found front and center on the home page of testing program Web sites. Documents that better respond to the typical consumer’s needs are placed front and center, and the technical manuals are placed a few to several clicks behind. But, they are not hidden. There seems not to be any effort to hide information; the level of dissemination appears to respond well to the demand for it.

The (Secret) Document that Drives Standardized Testing

Education News, 2012

Ironically, the same industry insider who warned me against revealing the contents of the revised... more Ironically, the same industry insider who warned me against revealing the contents of the revised Standards draft has himself publicly asserted its colossal social and legal impact on US society. Yet, he defends both the secrecy and insularity of the current drafting process. Power to draft the Standards as they see fit has been divvied up among the chosen few on the “Joint Committee”, a baker’s dozen of education professors and industry insiders. This is an extraordinarily small number of people to essentially be writing our country’s testing law. In the case of chapter 13, at most “2-3 persons” craft our country’s testing policy. Not only is this a tiny group in number, these particular persons represent a biased and extreme point of view. Read the revised Standards, though, and theirs is the only point of view you will be allowed to know. As they have for a few decades now, these folks arrogantly declare a cornucopia of contrary opinion and evidence nonexistent.

Test Basher Benefit-Cost Analysis

Starting in the late 1980s, two teams of researchers, well known for their criticism of standardi... more Starting in the late 1980s, two teams of researchers, well known for their criticism of standardized tests on equity and validity grounds, began attacking standardized testing on efficiency grounds as well, using cost-benefit analysis to do it. Their analyses are reviewed, and their conclusions discussed. The first team, Lorrie

Why Standardized Tests?

Presentation to European eTwinning Conference, Romania, 26 May, 2023

Over- and Under-used Criticisms of Standardized Tests

Presentation at the University of Bucharest, 29 May, 2023

Dismissive Reviews, Citation Cartels, and the Replication Crisis

Presentation to the Interdisciplinary School of Doctoral Studies, University of Bucharest, 17 May... more

Standardized Testing: The Interplay of Aptitude and Achievement

Presentation at the University of Bucharest, 25 May 2023

Worse Than Plagiarism: Firstness Claims and Dismissive Reviews

International Test Commission, 7th Conference, Hong Kong; Center for Academic Integrity, 2009 Conference, St. Louis, 2009

Knowing ALL the research literature on a topic • There is so much, is anyone qualified to speak f... more

The Successful Degradation of Educational Research in the United States

International Test Commission 2021 Annual Collogium, 2021

Abundant scientific, scholarly research on the uses and effects of educational testing dates to t... more Abundant scientific, scholarly research on the uses and effects of educational testing dates to the 1890s; randomized experiments comprise much of the volume. Comprehensive reviews of the accumulating research literature date as far back as the 1930s. By the 1970s, however, the character of some of the more widely cited reviews had changed.
This presentation chronicles a half-century of relatively successful efforts to suppress much of the research on the uses and beneficial effects of educational testing.
This presentation: chronicles with several examples from literature reviews from the past 80 years the parallel trends of a decline in the volume of research cited and a rise in the volume of research dismissal claims; describes the methods used to dismiss and suppress research; and includes excerpts from invited testimony before US Congressional Committees.
Only a skewed subset of the relevant research literature was consulted in crafting highly consequential US educational testing policies.

It's a myth: High stakes causes test-score inflation

The myth is popular among education insiders who oppose high-stakes or externally mandated tests,... more The myth is popular among education insiders who oppose high-stakes or externally mandated tests, but is based on just two studies conducted without controls that employed an obscure definition of "high stakes". Both studies actually involved low-stakes tests administered without security protocols.

Harms caused by belief in the myth include: diverting attention from a widespread problem (at least in the US) of lax security in standardized test administration; encouraging ineffective and detrimental test preparation procedures (e.g., excessive drilling on format, learning “tricks” based on format in lieu of learning subject matter) and supporting an exploitive, predatory test preparation industry; encouraging teachers to teach to “a broader domain” (“away from the test”) – content different from the publicly mandated standards they are legally and ethically obligated to teach; encouraging numerous “wild goose chase” research studies using an unreliable low-stakes test score trend to “audit” a high-stakes test score trend; repeated declarations that a past (and contradictory) research literature does not exist; and justifying the use of value-added measures, calculated from student low-stakes test score trends, to judge teacher performance.

Extended comments on the draft Standards for Educational & Psychological Testing (but, in particular, draft chapters 9, 12, & 13)

to the Management Committee, American Psychological Association, National Council on Measurement ... more

PSU: El Desafío del Cambio

Pruebas de contenido para el acceso a la universidad ¿Cómo se validan?

PSU: El Desafío del Cambio

no abstract

Designing an Assessment System

If standardized testing were just now invented, with no predispositions or expectations about its... more If standardized testing were just now invented, with no predispositions or expectations about its use, how would we use it? The most important theme to keep in mind is that standardized tests are not all the same. They vary in length, format, content, purpose etc. in innumerable ways. The same assessment may be highly appropriate in one circumstance, and highly inappropriate in another. If one could design a system so that all tests in an education system were complementarily used to maximize their collective social benefit, what would that collection of tests look like? Which types of tests would be used where and when?

This presentation responds to these questions, recognizing that there is no single correct answer. An impressive body of research evidence will inform the talk; some of the most informative, from cognitive psychologists, is fairly recent. Topics will include cognitive load theory; the interplay between stakes and security, and stakes and motivation; retrieval, spacing, and other cognitive science concepts; the role of format (selected response, constructed response, authentic, etc.); and, more generally, the role of assessment in students’ intellectual development.

Innovaciones en la evaluación en el aula: El uso de pruebas para promover el aprendizaje

Es fácil saber lo que estás enseñando. Pero, sólo se puede saber lo que los estudiantes están apr... more

Fortalezas y debilidades de las pruebas estandarizadas como mecanismos inclusivos de admisión a la educación superior

No Child Left Behind, Common Core, and the Lost Benefits of Effective Testing

Presentation before the Governor's Council on Common Core Review, Little Rock, Arkansas, May 2015

The Effect of Testing on Student Achievement: 1910-2010 [slide show]

Article summarizes research on the effect of testing on student achievement as found in English-l... more

L'effet de tests standardisés sur les résultats scolaires des élèves : 1910-2010 [slide show]

Worse than Plagiarism? Firstness Claims and Dismissive Reviews (slide show)

Nonpartisan Education Review, 2009

Misrepresent the work of one person, by plagiarizing, and your career could be ruined; misreprese... more

The Source of Lake Wobegon [slide show]

Nonpartisan Education Review, 2007

The source of test-score inflation is lax test security, not high-stakes.

Economic Perspectives on Standardized Testing [slide show]

Nonpartisan Education Review, 2006

Overreach in High-Stakes Testing: The case of Chile’s National University Admission Test (PSU) [slide show]

The Malfunction of US Education Policy: Elite Misinformation, Disinformation, and Selfishness

The Malfunction of US Education Policy: Elite Misinformation, Disinformation, and Selfishness, 2023

Biased and inefficient information dissemination has degraded US education research and policy si... more Biased and inefficient information dissemination has degraded US education research and policy since the year 2001, when a series of unfortunate disruptions began:

first, the No Child Left Behind (NCLB) Act and federal imposition of an idiosyncratic and ineffectual testing program;
second, the “big bang” reorganization of the US education testing industry from a stable, cooperative oligopoly run by psychometricians to a commercially competitive free-for-all with more opportunist and customer-pleasing ambitions; and
third, the Common Core standards, which mandated homogenous lower content standards onto the still required NCLB testing structure.

Billions from the federal government and wealthy foundations have transformed many once-independent national education organizations into “cargo cult” dependents and promoters of the new order, intolerant of divergent points of view. The research and policy brain trust responsible comprised an alliance of convenience among two “citation cartels” of establishment and reform scholars and politicos, and an astonishingly cooperative and un-skeptical group of journalists. It succeeded in focusing attention on their work, while diverting attention away from a much larger universe of others’ work (by ignoring, dismissing, or demeaning it) that included a century’s worth of mostly experimental scholarship in the fields of psychology and program evaluation.

References and appendices may be found online https://nonpartisaneducation.org/MalfunctionAppendices.htm

Correcting Fallacies about Educational and Psychological Testing

Correcting Fallacies about Educational and Psychological Testing, Jan 2009

Standardized testing is used for diagnosis, selection, and achievement measurement by individuals... more Standardized testing is used for diagnosis, selection, and achievement measurement by individuals in many fields, including psychology, education, employment, and professional credentialing. Its benefits are numerous, substantial, and scientifically proven. However, these benefits are not well articulated or well publicized.

In their technical communications, measurement specialists are generally positive about the worth of standardized testing. Meanwhile, those who engage public debate, such as journalists and certain special interest groups, tend to be less scientifically informed and more negative about the value of testing. The contributors to this volume contend that most criticisms ignore readily accessible scientific evidence and have the unfortunate effect of discrediting the entire testing enterprise.

Standardized testing bears the twin burden of controversy and complexity and is difficult for many to understand either dispassionately or technically. In response to this reality, Richard P. Phelps and a team of well-noted measurement specialists present this book as a platform where they:

describe the current state of public debate about testing across fields
explain and refute the primary criticisms of testing
acknowledge the limitations and undesirable consequences of testing
provide suggestions for improving testing practices
present a vigorous defense of testing and practical vision for its promise and future

Those who are charged with translating the science of testing into public information and policy—including administrators, social scientists, test publishers, professors, and journalists who specialize in education and psychology—should find a wealth of usable information here with which to balance the debate.

Standardized Testing Primer (Peter Lang, 2007)

Provides non-specialists with a thorough overview of this controversial and complicated topic.

Defending Standardized Testing (Psychology Press, 2005)

Experts in educational testing weigh the validity of the most popular criticisms.

Kill the Messenger: The War on Standardized Testing (Transaction, 2003. 2005)

Describes the debate, the players, their interests, and their positions.

Summary of Kill the Messenger: The War on Standardized Testing

Association of American Publishers, 2003

Educational testing is an essential activity in every school, in every school district and in eve... more Educational testing is an essential activity in every school, in every school district and in every state. Standardized tests are used to evaluate students and schools; to help improve teaching and learning; and to generate important information from which educational policy decisions can be made. With the passage of the No Child Left Behind Act of 2001, which requires tests to be the primary measure of school accountability, testing has taken on added significance.

Yet, when the results of these tests are not as good as one would like, there has been an increasing tendency by some to blame the test. In other words, if you don’t like the results it maybe easier to kill the messenger than fix the underlying problem the test revealed.

The Association of American Publishers’ Test committee created this brochure, which is an executive summary of a book written by Richard Phelps entitled Kill the Messenger, to help policy makers and the public better understand the growing debate about the use of standardized tests in our nation’s schools.

Higher Education: An International Perspective

Nonpartisan Education Review, 2003

Higher education is a very important component of most countries' education systems.... more Higher education is a very important component of most countries' education systems. In most developed countries, over a third of young adults in the typical higher education age range are students. Modern societies now demand large numbers of graduates with knowledge and skills typically developed in higher education institutions, and they compensate those graduates more than in the past for the acquisition of those skills. Indeed, in the most developed countries, higher education has replaced secondary education as the focal point of access to rewarding careers. What has been said of U.S. job seekers is also true for those in most other developed countries: given current technologies in
transportation, communication, and trade, if a worker's skills are no better than those of poorly educated, low-paid workers in less-developed countries, that worker is likely to face tough economic pressure.
The purpose of this report is to provide a review of higher education systems in selected developed countries and to compare higher education in the United States and other countries.

Investing in Education: Analysis of the 1999 World Education Indicators

OECD, 2000

Building on the OECD indicators programme, eleven countries, together with UNESCO and the OECD an... more Building on the OECD indicators programme, eleven countries, together with UNESCO and the OECD and with financial support from the World Bank, launched the World Education Indicators (WEI) pilot programme in 1997. These countries were Argentina, Brazil, Chile, China, India, Indonesia, Jordan, Malaysia, the Philippines, the Russian Federation and Thailand. They first met in September 1997 in order to:
• explore the OECD indicators methodology;
• establish a mechanism whereby participating countries could agree on how to make common policy concerns amenable to comparative quantitative assessment;
• seek agreement on a small but critical mass of indicators that genuinely indicate educational performance relative to policy objectives and measure the current state of education in an internationally valid, efficient and timely manner;
• review methods and data collection instruments in order to develop these indicators; and
• determine the directions for further developmental work and analysis beyond the initial set of indicators.

Since then, participating countries have contributed in many ways to conceptual and developmental work, have applied the data collection instruments and methodology at the national level in
collaboration with the OECD and UNESCO, have co-operated in national, regional and international meetings of experts, and have worked jointly on the development of the indicators. Egypt, Morocco, Paraguay, Sri Lanka, Tunisia, Uruguay and Zimbabwe joined the programme during its second year.

This report provides an initial analysis of the data collected through this programme, bringing together data from the countries participating in the WEI programme with comparable data from OECD countries. Chapter 1 provides a brief profile for each country that highlights central government priorities in the development of education policy, identifies what the government perceives to be the major challenges facing the education system over the next decade, and explains reform efforts under way to meet these challenges. These profiles, which were contributed by participating countries, also provide the background for interpreting the international comparisons presented subsequently.

Chapters 2 and 3 analyse, within an international comparative framework, how countries have responded to rising demands for education and how effective they have been in mobilising the
necessary resources. Chapter 2 starts with an examination of patterns of demand, then looks at progression and completion, and finally examines patterns of participation by type of school and programme. Chapter 3 analyses aggregate spending, examines priorities within education budgets (such as spending by level of education, private provision and services targeted to specific target populations), and finally looks at spending choices within the classroom (teachers’ salaries, teachers’ qualifications, hours of instruction and class size). The Annex provides the indicators underlying the analysis, the classification of national education programmes used for the comparisons and other
relevant technical information.

This is the first report from the WEI programme. The indicators presented should not be considered final but have been, and continue to be, subject to a process of constant development,
consolidation and refinement. Furthermore, while it has been possible to provide for comparisons in educational enrolment and spending patterns, comparative information on the quality of
educational outcomes in WEI countries is only beginning to emerge. New comparative indicators will be needed in a wider range of educational domains in order to reflect the continuing shift in governmental and public concern, away from control over inputs and content towards a focus on educational outcomes.

International comparative assessments of achievement already figure prominently in national policy debates and in educational practice in WEI and OECD countries alike. To the extent that they
can now be successfully integrated into the WEI programme during its next phase, they will be able to provide a new basis for policy dialogue and for collaboration in defining and operationalising educational goals – in ways that reflect judgements about the skills that are relevant to adult life.
They will provide an opportunity for WEI countries to identify and assess gaps in national curricula, and provide information for benchmarking, the setting of standards and evaluation. They will also convey insights into the range of factors which contribute to the development of knowledge and skills, and into the similarities and differences between the ways in which these factors operate in the various countries. Ultimately, they can help countries to bring about improvements in schooling and better preparation for young people as they enter an adult life of rapid change and increasing global interdependence.

State Indicators in Education, 1997

National Center for Education Statistics, Oct 1997

In 1989, at what is now commonly called the nation's first "education summit," most of the nation... more In 1989, at what is now commonly called the nation's first "education summit," most of the nation's governors met with members of the White House and the U.S. Congress in Charlottesville, Virginia to begin to develop a coordinated national education strategy. Presiding over the meeting were the co-chairs of the National Governors' Association—a national association of state governors. As was customary, one co-chair was from the Republican party and the other from the Democratic party.

Deliberations at the first education summit led to the subsequent adoption of the first six National Education Goals 1 and the formation of the National Education Goals Panel. As some state governors themselves might say, it is significant that these products of the education summit bore the word "national" rather than "federal" in their titles. The meeting and its products were at once an assertion that education in the United States is a national concern, but still primarily a state and local responsibility.

A common education indicator called "Sources of funds for education" supports this contention. When revenues for public elementary and secondary education are traced to the original source of the funds, one finds that state governments contribute, on average, about the same percentage as local governments. Combined, state and local governments account for 93 percent of public education funding nationwide.

At the higher education level, state government's role is relatively even more substantial, contributing 37 percent of governments contribute 11 and 4 percent, respectively. (The remainder comes from tuition and fees, endowments and other private contributions, and sales and services.)

Since the Charlottesville summit, Americans have seen continued activity on education policy between the separate branches and levels of government. The Goals Panel, for example, has included members from the Congress, the White House, the U.S. Department of Education, and the ranks of governors and state legislators. The Goals Panel continues to produce a report every year which measures our country's and each state's progress toward the Goals.

Early in 1996, forty-three of the nation's governors met in a second "education summit" in Palisades, New York, along with corporate chief executives from their states, and other invited guests. The meeting was sponsored by two organizations run by U.S. state governors—the Education Commission of the States and The National Governors’ Association—and the International Business Machines Corporation (IBM), which served as host. The second summit's governors agreed to develop and establish within two years internationally competitive standards, assessments to measure progress toward meeting them, and accountability systems.

By joining efforts with the Federal government in some of these activities over the past ten years, the governors have acknowledged that the Federal government has an important role to play in the collection and dissemination of some of the comparative data needed to manage the quality of American education.

0In 1988, the U.S. Congress authorized the establishment of a Special Study Panel on Education Indicators for the U.S. Department of Education's National Center for Education Statistics (NCES). This panel was chartered in July 1989 and directed to prepare a report, published in 1991, Education Counts: An Indicator System to Monitor the Nation's Educational Health. The Panel's report recommended a variety of ways in which NCES should increase its collection and presentation of indicator data. Among the many recommendations, the report urged NCES to: strengthen its national role in data collection and provide technical assistance to the states; improve its capacity to collect international data; and develop a "mixed model" of indicators — international and national indicators, state and local indicators, and a subset of indicators held in common.

Two of NCES's primary indicators projects include The Condition of Education and the National Assessment of Educational Progress (NAEP). The Condition is an annual compendium of statistical information on American education, including trends over time, international country comparisons, and some comparisons among various groups (by sex, ethnicity, socioeconomic status, and others). However, the Condition contains very few state-by-state comparisons.

The National Assessment of Educational Progress (NAEP) is a congressionally-mandated assessment of the academic achievement of American students. Begun in the late 1960s, NAEP has been reporting assessment results state-by-state, on a trial basis, only since 1990. In that year, 37 states, the District of Columbia, and two territories participated in a trial state assessment program in eighth-grade mathematics. In the 1992 fourth-grade reading and mathematics and eighth-grade mathematics trial state assessments, voluntary participation increased to 41 states, the District of Columbia, and 2 territories. The same number of jurisdictions participated in the 1994 Trial State Assessment of fourth grade reading. Forty-three states participated in the 1996 Trial State Assessment of fourth and eighth grade mathematics.

NCES's Digest of Education Statistics is, perhaps, the most comprehensive source of education statistics in the United States. Published annually or biennially since 1962, it provides national and state statistics for all levels of American public and private education. Using both government and private sources, with particular emphasis upon surveys and projects conducted by NCES, the publication reports on the number of education institutions, teachers, enrollments, and graduates; educational attainment; finances; government funding; and outcomes of education. Background information on population trends, public attitudes toward education, education characteristics of the labor force, government finances, and economic trends is also presented. Most of the data is presented in over 400 tables, but some graphics are also included. Many of the tables contain state-by-state data.

For some time, NCES has also compiled similar volumes of education statistics focused on the U.S. states. These publications, two volumes of Historical Trends: State Education Facts and one volume of State Projections for Public Elementary and Secondary Enrollment, Graduates, and Teachers were compiled every few years, largely in order to present historical trends or future projections in state education statistics.

An NCES state indicator report published a year ago, State Comparisons of Education Statistics: 1969–70 to 1993–94 expanded on these earlier efforts with much new material, aggregated at the state level for the first time. But, State Comparisons also presents time series of NCES's most frequently requested state level statistics. About thirty graphics (bar charts and maps) and a considerable amount of explanatory text are also included.

This volume, State Indicators in Education 1997, is a logical extension of these earlier efforts. There is not an attempt in this report, however, to include the total volume of data that the Digest or State Comparisons presents, mostly in tabular form. Rather, the emphasis in this report veers toward explaining and presenting certain patterns and relationships in the data. While there are fewer data, there is more text and there are more graphics. State Indicators in Education, then, is perhaps more like a state-level version of NCES's indicator report, The Condition of Education, and less like a state-level version of NCES's comprehensive data volume, the Digest of Education Statistics.

Education Indicators: An International Perspective

Nattional Center for Education Statistics, Nov 1996

The need to compete in foreign markets with advanced technology has convinced U.S. business, econ... more The need to compete in foreign markets with advanced technology has convinced U.S. business, economic, and political leaders of the importance of understanding the education systems of other industrialized nations. The awareness of how other countries educate their citizens provides insight into the competitiveness of those nations, and it provides a benchmark with which to compare our own education system.
Education Indicators: An International Perspective expands on the traditional interest in student achievement and education finance by including a broad range of indicators, such as Gender differences in earnings, Time spent on homework, and Home and school language, among others. The indicators focus primarily upon comparisons between the United States and other industrialized nations with large economies - particularly those that most closely resemble the United States in terms of size and are viewed as our major economic competitors.
Among a multitude of sources used in this report, the most comprehensive is Education at a Glance (1995), the international education indicators report produced by the Organization for Economic Cooperation and Development (OECD). Other data sources include the International Assessment of Educational Progress, the International Association for the Evaluation of Educational Achievement, and the International Assessment of Adult Literacy.
The importance of Education Indicators: An International Perspective lies in its ability to provide a comprehensive selection of international indicators geared toward a U.S. audience. This particular set of indicators is presented together for the first time and much of the data are derived from sources not readily accessible to U.S. readers. The publication, then, contributes to the continuing effort to make comparative information accessible and useful to U.S. leaders.

Education in States and Nations: Indicators Comparing U.S. States with Other Industrialized Countries in 1991

National Center for Education Statistics, Jul 1996

Today's shrinking world brings us closer to other nations through improved communications, transp... more Today's shrinking world brings us closer to other nations through improved communications, transportation, and an increasingly global marketplace. Many Americans now agree that our nation's ability to compete in the world economy depends vitally on continuous improvements not only at the workplace, but in our education system as well.

Education in States and Nations reflects two realities increasing globalization and the centrality of the states in American education. In Education in States and Nations, indicators provide international benchmarks for assessing the condition of education in the U.S. states and in the United States as a whole by comparison with many other industrialized countries for which data are available. On six sets of education indicators background, participation, processes and institutions, achievement and attainment, labor market outcomes, and finance country-level and state-level measures are arrayed side-by-side in order to facilitate that comparison.

The country-level data come from a variety of sources, but two sources are most prominent: the second edition of international education indicators, Education at a Glance, of the Organization for Economic Co-operation and Development (OECD); and the International Assessment of Educational Progress, which administered a mathematics test to 13-year-olds in about 20 countries and surveyed them and their school administrators about various aspects of the education process. The indicators in Education in States and Nations correspond to as many of the international indicators for which state-level data were both applicable and available.

This report is the second effort of its kind; the first edition, produced in 1993, was based on state and country data from the late 1980s. This edition, using data primarily from the early 1990s, is much larger than its predecessor. This reflects both a greater availability of suitable international indicators and state-level data, as well as a greater effort to find relevant indicators, both domestic and international.

Defending Standardized Testing

Investing in Education: Analysis of the 1999 World Education Indicators

Investing in Education: Analysis of the 1999 World Education Indicators, Jan 2000

This report provides an initial analysis of the data collected through the World Education Indica... more This report provides an initial analysis of the data collected through the World Education Indicators programme, bringing together data from the countries participating in the WEI programme with comparable data from OECD countries. Chapter 1 provides a brief profile for each country that highlights central government priorities in the development of education policy, identifies what the government perceives to be the major challenges facing the education system over the next decade, and explains reform efforts under way to meet these challenges. These profiles, which were contributed by participating countries, also provide the background for interpreting the international comparisons presented subsequently. Chapters 2 and 3 analyse, within an international comparative framework, how countries have responded to rising demands for education and how effective they have been in mobilising the necessary resources. Chapter 2 starts with an examination of patterns of demand, then looks at progression and completion, and finally examines patterns of participation by type of school and programme. Chapter 3 analyses aggregate spending, examines priorities within education budgets (such as spending by level of education, private provision and services targeted to specific target populations), and finally looks at spending choices within the classroom (teachers’ salaries, teachers’ qualifications, hours of instruction and class size). The Annex provides the indicators underlying the analysis, the classification of national education programmes used for the comparisons and other relevant technical information.

Higher Education: An international perspective

Higher education is a very important component of most countries' education systems. In most dev... more Higher education is a very important component of most countries' education systems. In most developed countries, over a third of young adults in the typical higher education age range are students. Modern societies now demand large numbers of graduates with knowledge and skills typically developed in higher education institutions, and they compensate those graduates more than in the past for the acquisition of those skills. Indeed, in the most developed countries, higher education has replaced secondary education as the focal point of access to rewarding careers. What has been said of U.S. job seekers is also true for those in most other developed countries: given current technologies in transportation, communication, and trade, if a worker's skills are no better than those of poorly educated, low-paid workers in less-developed countries, that worker is likely to face tough economic pressure.

The purpose of this report is to provide a review of higher education systems in selected developed countries and to compare higher education in the United States and other countries.

Correcting Fallacies about Educational and Psychological Testing (American Psychological Association, 2008/2009)

Measurement specialists describe the current state of public debate about testing across fields.

Common Core: National Education Standards and the Threat to Democracy, by Nicholas Tampio

no abstract

Dan Koretz's Big Con

If one chooses to assume that the tiny sample of the research literature on educational testing c... more If one chooses to assume that the tiny sample of the research literature on educational testing conducted by him and a small coterie of sympathetic colleagues represents the universe of the relevant research literature, his new book, The Testing Charade: Pretending to Make Schools Better, makes some sense. Consider
the entire breadth of the research literature on testing, however, and it makes no sense at all.

Fordham Institute's Pretend Research

This latest Fordham Institute Common Core apologia is not so much research as a caricature of it.... more This latest Fordham Institute Common Core apologia is not so much research as a caricature of it.
1. Instead of referencing a wide range of relevant research, Fordham references only friends from inside their echo chamber and others paid by the Common Core’s wealthy benefactors. But, they imply that they have covered a relevant and adequately wide range of sources.
2. Instead of evaluating tests according to the industry standard
Standards for Educational and Psychological Testing, or any of
dozens of other freely-available and well-vetted test evaluation
standards, guidelines, or protocols used around the world by testing experts, they employ “a brand new methodology” specifically developed for Common Core, for the owners of the Common Core, and paid for by Common Core’s funders.
3. Instead of suggesting as fact only that which has been rigorously evaluated and accepted as fact by skeptics, the authors continue the practice of Common Core salespeople of attributing benefits to their tests for which no evidence exists
4. Instead of addressing any of the many sincere, profound critiques of their work, as confident and responsible researchers would do, the Fordham authors tell their critics to go away—“If you don’t care for the standards…you should probably ignore this study” (p. 4).
5. Instead of writing in neutral language as real researchers do, the authors adopt the practice of coloring their language as so many Common Core salespeople do, attaching nice-sounding adjectives and adverbs to what serves their interest, and bad-sounding words to what does not.

The Gauntlet: Think tanks and federally funded centers misrepresent and suppress other education research

Nonpartisan Education Review, Sep 1, 2014

The tragic results illustrate how federal and foundation money can concentrate power to achieve e... more The tragic results illustrate how federal and foundation money can concentrate power to achieve exactly the opposite result from that intended. Once these small, cohesive groups captured the larger organizations, they focused their efforts on restricting entry into policy arenas to those their own circles. The careers of those inside these groups have soared. Meanwhile, the amount of objective information available to policymakers and the public—our collective working memory—has shrunk.

The stated mandates of these organizations are to objectively review all the research available; instead they promote their own and declare most of the rest nonexistent. They are mandated to serve the public interest; instead they serve their own.

Currently, too few people have too much influence over those who control the education research purse strings. And, those who control the purse strings have too much influence over policy decisions. Until folk at the Bill and Melinda Gates Foundation and the US Education Department—to mention just a couple of consistent funders of education policy debacles—broaden their networks, expand their reading lists, and open their minds to more intellectual diversity, they will continue to produce education policy failure.

It would help if they would fund a wider pool of education researchers, evidence, and information. In recent years, they have, instead, encouraged the converse—funding a saturating dissemination of a narrow pool of information—thereby contributing to US education policy’s number 1 problem: pervasive misinformation.

The Test: Why our schools are obsessed with standardized testing—but you don’t have to be, by Anya Kamanentz [book review]

Nonpartisan Education Review, Jan 2015

...a rant, unworthy of a journalist.

Fordham report predictable, conflicted

[ no abstract ]

Measuring Up: What Educational Testing Really Tells Us by Daniel Koretz, Harvard University Press, 2008 [book review]

Educational Horizons, Oct 2008

In Measuring Up, Daniel Koretz continues his defense of the theory with which he is most famously... more In Measuring Up, Daniel Koretz continues his defense of the theory with which he is most famously identified: “Score inflation is a preoccupation of mine.” He argues that high-stakes testing induces “teaching to the test,” which in turn produces artificial test-score gains (i.e., test-score inflation). The result, according to Koretz:

Scores on high-stakes tests—tests that have serious consequences for students or teachers—often become severely inflated. That is, gains in scores on these tests are often far larger than true gains in students’ learning. Worse, this inflation is highly variable and unpredictable, so one cannot tell which school’s scores are inflated and which are legitimate. (p. 131)

Thus, Koretz, a long-time associate of the federally funded Center for Research on Evaluation, Standards, & Student Testing (CRESST), provides the many educators predisposed to dislike high-stakes tests anyway a seemingly scientific (and seemingly not self-serving or ideological) argument for opposing them. Meanwhile, he provides policymakers a conundrum: if scores on high-stakes tests improve, likely they are meaningless—leaving them no external measure for school improvement. So they might just as well do nothing as bother doing anything.

Measuring Up supports this theory by ridiculing straw men— declaring a pittance of flawed supporting evidence sufficient (pp. 11, 59, 63, 132, and chapter 10) and a superabundance of contrary evidence nonexistent—and mostly by repeatedly insisting that he is right. (See, for example, chapter 1, pp. 131–133, and pp. 231–236.) He also shows little patience for those who choose to disagree with him. They want “simple answers,” speak “nonsense,” assert “hogwash,” employ “logical sleight[s] of hand,” write “polemics,” or are “social scientists who ought to know better.”

Raising the Grade: How High School Reform Can Save Our Youth and Our Nation by Bob Wise, Jossey-Bass, 2008 [book review]

Educational Horizons, Apr 2010

A fair reading of Raising the Grade leads to several conclusions: • The author often misuses e... more A fair reading of Raising the Grade leads to several conclusions:

• The author often misuses education statistics.

• He charges two of the world’s most expert and responsible
statistical agencies—the U. S. Census Bureau and the National
Center for Education Statistics—with incompetence, neglect,
and willfully misleading the public without making any effort
to learn their side of the story.

• His proposed solutions are illogical: he advocates increasing
rigor for students who are unable to meet current standards,
and at the same time he shames schools for course repetition
and grade retention. The inevitable result will be lower, not
higher, standards.

There is no single best method for calculating graduation rates or completion ratios. There are several, each of them valid and useful in different contexts. Ironically, Wise proves this point himself by (unknowingly) employing various, and sometimes quite-different, graduation measures throughout his book. Only the semantics are constant in Raising the Grade—each quite-different measure is consistently identified as the graduation rate.

Synergies for better learning: an international perspective on evaluation and assessment [a review]

Assessment in Education: Principles, Policy & Practice, Oct 2, 2014

Synergies itself, the “final synthesis” of the REAFISO project, runs 670 pages. The country repor... more Synergies itself, the “final synthesis” of the REAFISO project, runs 670 pages. The country reports accumulate another 1,500 pages or so. The ten background papers average about 50 pages each. Press some more tree pulp to accommodate the requisite press releases, talking points, or the multitude of each country’s own background papers, and, all told, REAFISO’s work took a few years, substantial commitments of resources from 26 countries, and stimulated the printing of several thousand pages.

This hefty mass represents an enormous expenditure of time, money, and effort to, essentially, get it all wrong.

With the REAFISO project, the OECD has taken sides, but appears to have done so in a cowardly manner. REAFISO staff have not described evidence and sources on multiple sides of topics, weighed them in the balance, and then justified their preference. Rather, on each controversial topic they broach, they present only one side of the story. On some topics, huge research literatures several hundred studies large are completely ignored.

OECD Encourages World to Adopt Failed US Ed Programs

Education News, Jan 28, 2013

Whereas, innovation is a holy commandment for the US education professoriate, critics charge that... more Whereas, innovation is a holy commandment for the US education professoriate, critics charge that it leads to a continuous cycle of fad after fad after fad. After all, if innovation is always good, then any program that has been around for a while must be bad, no matter how successful it might be in improving student achievement. Moreover, if the pace of today’s-innovation-replacing-yesterday’s-innovation proceeds fast enough, evaluation reports are finished well after one program has been replaced by another, become irrelevant before they are published and end up unread. Ultimately, in a rapidly innovating environment, we learn nothing about what works. Some critics of the radical constructivists suspect that that chaotic, swirling maelstrom may be their desired equilibrium state.

What Happened at the OECD?

Education News, Feb 12, 2013

The OECD’s Review on Evaluation and Assessment Frameworks for Improving School Outcomes — REAFISO... more The OECD’s Review on Evaluation and Assessment Frameworks for Improving School Outcomes — REAFISO — relies on staff generalists and itinerant workers to compose its most essential reports. I suspect that the REAFISO writers started out unknowing, trusted the research work they found most easily, and followed in the direction those researchers pointed them. Ultimately, they relied on the most easily and inexpensively gathered document sources.

I believe that REAFISO got caught in a one-way trap or, as others might term it: a bubble, echo chamber, infinite (feedback) loop, or myopia. They began their study with the work of celebrity researchers—dismissive reviewers—researchers who ignore (or declare nonexistent) those researchers and that research that contradicts their own (Phelps, 2012a)—and never found their way out. Dismissive reviewers blow bubbles, construct echo chambers, and program infinite loops by acknowledging only that research and those researchers they like or agree with.

The Rot Spreads Worldwide: The OECD: Taken In and Taking Sides

New Educational Foundations, Apr 2013

REAFISO’s efforts should be judged unfavorably even by its own standards. In the Design and Imple... more REAFISO’s efforts should be judged unfavorably even by its own standards. In the Design and Implementation Plan for the Review (OECD, 2009), REAFISO promised to, among other goals:

. . . extend and add value to the existing body of international
work on evaluation and assessment policies. (p. 5)
Synthesise research-based evidence on the impact of
evaluation and assessment strategies and disseminate this
knowledge among countries. Identify policy options for policy
makers to consider. (p. 4)

. . . take stock of the existing knowledge base within the OECD and member countries as well as academic research on the relationship between assessment and evaluation procedures and performance of students, teachers and schools. It will look at the quantitative and qualitative evidence available on the different approaches used to evaluate and assess educational practice and performance. (p. 16)

To the contrary, REAFISO has not synthesized the existing body of research-based evidence on evaluation and assessment policies, much less extended it. By telling the world that a small proportion instead hidden from the world most of the useful and relevant information (or implied that it is not worth considering).

The ordinary Citizen Joe knows that one shouldn’t trust everything one finds on the Internet, nor assume that Internet search engines rank documents according to their accuracy. So naturally, scholarly researchers who are trained to be skeptical, systematic, thorough, aware of biases, and facile with statistical sampling methods would be too. After all, scholarly researchers have spent several more years in school, often prestigious schools. They should “know how to know” as well or better than the average citizen.

Yet REAFISO’s reviews repeatedly offer one or a few examples of research from their favored sources to summarize topics, even though thorough reviews of dozens, hundreds, or thousands of sources were to be found had they simply looked widely enough. In some cases, REAFISO writers conclude a policy recommendation on the basis of one or a few studies, when a reading of the whole of the research literature on the topic would suggest exactly the opposite policy.

In its document, Evaluation and Assessment Frameworks for Improving School Outcomes: Common Policy Challenges (2011), written two years after the Design and Implementation Plan, REAFISO claims to have completed “a thorough analysis of the evidence on evaluation and assessment.”

The Rot Festers: Another National Research Council Report on Testing

New Educational Foundations, Jul 2012

Education's vested interests captured the NRC's assessment research group a quarter-century ago; ... more

Education Establishment Bias? A Look at the National Research Council's Critique of Test Utility Studies

The Industrial-Organizational Psychologist, Apr 1999

I have spoken with three persons intimately familiar with the activity of the National Research C... more I have spoken with three persons intimately familiar with the activity of the National Research Council's Committee on the General Aptitude Test Battery. After considerable deliberation of the available evidence, I reach the following judgments.

One person claims that the Committee was deliberately set up to be a hostile committee. I think the odds are strong that that claim is correct.

Another person claims that the Committee considered only one personnel testing study from among hundreds in existence, yet made claims that implied they had considered all of them. I believe this assertion is also true.

The third claims that the Committee refused to consider some of the most basic and relevant evidence pertaining to personnel testing issues, such as: the ways in which the Hunter and Schmidt estimates of utility underestimated the benefits of testing; the true magnitude of the effect of range restriction on the utility estimates (for which the Committee refused to correct); the true value of average interrater reliability of ratings of .50 (they assumed .80, thus undercorrecting for criterion unreliability); and (pertaining to the NRC assertion that Hunter and Schmidt did not adjust their estimates for the time value of money, incremental validity, or what have you) the substantial research in personnel psychology that has explicitly considered all those issues (and found little difference in the direction or magnitude of the resulting utility estimates).

This is a serious charge, that those at the National Research Council responsible for the evaluation of testing issues were (and remain) biased. Yet, I believe it to be true, and I believe that any fair-minded person who looked at the evidence would agree.

The National Research Council is supposed to represent the pinnacle of objectivity, the "court of last resort" on controversial research issues. Alas, I believe, it represents neither on testing issues. It seems biased—biased in conformity with an "education establishment" perspective.

A Review of Greene (2002) High School Graduation Rates in the United States

Practical Assessment, Research, and Evaluation, Sep 2005

The "Greene Method" of calculating school "graduation rates" and the Manhattan Institute (MI) cri... more The "Greene Method" of calculating school "graduation rates" and the Manhattan Institute (MI) criticisms of official graduation and completion statistics are outlined and scrutinized. The methodology fails to recognize the complexity of the issue and appears to ignore the considerable efforts that have been undertaken by education statisticians to remediate the problems inherent to these types of data. The Greene method for calculating completion ratios is simulated and found to have little to no reliability. It is recommended that anyone intent on reporting valid and reliable education indicators avoid use of the Greene Method.

High stakes: Testing for tracking, promotion, and graduation. Edited by J. P. Heubert & R. M. Hauser. 1999. Washington, DC: National Academy Press. [Book Review]

Educational and Psychological Measurement, Dec 2000

If high-stakes standardized testing were to be abolished, as the report authors might like, Am... more If high-stakes standardized testing were to be abolished, as the report authors might like, American society as a whole would be much worse off, and so would many individual students. Probably the most unfairly affected would be the high achievers among the poor.Wealthy families who value academics have the choice of moving to a school district where their high achieving children can excel, or sending their children to a private school. It is a waste of money and otherwise too bad that they feel they must move, but they can. Poor families are not so mobile.
High achieving students who cannot leave a school district where academic achievement is undervalued face varied pressures that impede them: pressure to fit in and be popular; to excel at sports; to work at low-pay, dead-end jobs to earn money for cars and parties; and so on. If they study hard and excel at academics, they will be taunted; disliked; called “nerd,” “geek,” “dork”; or be accused of “acting White.”
The report spends considerable effort worrying about the feelings of students who might fail high-stakes tests, but little if any effort worrying about the social fallout of abandoning high academic standards. High-achieving students among our poor should be considered our country’s most precious human resources.
For a variety of reasons, our society very badly needs these students to prosper; so their gifts and ambitions should be nurtured, not discouraged. The report, however, in the effect of its recommendations, would have these students treated as pariahs and have them feel guilty for wanting to work hard and succeed. After all, if these students work hard and succeed, won’t that make other students who do not want to work hard look and feel bad?
Abandoning the enforcement of high academic standards will not eliminate pressures and hurt feelings among our youth, however. Pressure and hurt feelings are facts of life. Abandoning academics just means the pressures will come from and the hurt feelings will be caused by nonacademic aspects of these students’ lives.
Is that really what we want? In radical egalitarian bliss, there will be no high-stakes tests, no academic standards enforced in any meaningful way, and no academic tracking. Academic progress in every school and for every student will be slowed to the preferred pace of the least motivated student.

Career academies: Partnerships for reconstructing American high schools [book review[

Economics of Education Review, 1995

Walter M. Haney, George F. Madaus and Robert Lyons, The Fractured Marketplace for Standardized Testing [book review]

Economics of Education Review, 1994