Papers by Richard P Phelps
Presentation before the Governor's council on common core review, State Capitol, Little Rock,... more Presentation before the Governor's council on common core review, State Capitol, Little Rock, May 2015.- Ordinary citizens seem to have more leverage at the state level.- US public debate on education testing now totally one-sided.- What do most of our successful international competitors do? - multi-level, multi-target “grade span” high-stakes testing - effect of testing with stakes? ~ 2 grade levels of increased achievement.
SSRN Electronic Journal, 2017
Social Science Research Network, 2013
Whereas, innovation is a holy commandment for the US education professoriate, critics charge that... more Whereas, innovation is a holy commandment for the US education professoriate, critics charge that it leads to a continuous cycle of fad after fad after fad. After all, if innovation is always good, then any program that has been around for a while must be bad, no matter how successful it might be in improving student achievement. Moreover, if the pace of today’s-innovation-replacing-yesterday’s-innovation proceeds fast enough, evaluation reports are finished well after one program has been replaced by another, become irrelevant before they are published and end up unread. Ultimately, in a rapidly innovating environment, we learn nothing about what works. Some critics of the radical constructivists suspect that that chaotic, swirling maelstrom may be their desired equilibrium state.
Nonpartisan Education Review, 2016
Weekly from Madison, Wisconsin, Jim Zellmer emails a selection of links to fifty or so education-... more Weekly from Madison, Wisconsin, Jim Zellmer emails a selection of links to fifty or so education-related news stories, essays, blog posts, and other relevant sources. It's my favorite, and most edifying, source of education information. I wanted to learn more about Mr. Zellmer and his web site schoolinfosystem.org and so requested an interview. Here it is.

The OECD’s Review on Evaluation and Assessment Frameworks for Improving School Outcomes -- REAFIS... more The OECD’s Review on Evaluation and Assessment Frameworks for Improving School Outcomes -- REAFISO -- relies on staff generalists and itinerant workers to compose its most essential reports. I suspect that the REAFISO writers started out unknowing, trusted the research work they found most easily, and followed in the direction those researchers pointed them. Ultimately, they relied on the most easily and inexpensively gathered document sources. I believe that REAFISO got caught in a one-way trap or, as others might term it: a bubble, echo chamber, infinite (feedback) loop, or myopia. They began their study with the work of celebrity researchers -- dismissive reviewers -- researchers who ignore (or declare nonexistent) those researchers and that research that contradicts their own (Phelps, 2012a) -- and never found their way out. Dismissive reviewers blow bubbles, construct echo chambers, and program infinite loops by acknowledging only that research and those researchers they like o...

There are two education establishments. On one side are the stand-pat public-school vested intere... more There are two education establishments. On one side are the stand-pat public-school vested interests that resist any encroachment to their power and control but, ironically, often portray themselves as innovative and democratic. They consolidated control over education school hiring and ideology — and consequently education research and teacher training — more than a quarter century ago. Yet, they somehow manage to convince journalists that they have had nothing to do with running our public schools lately, and the simultaneous deterioration of US public school quality. Others, such as allegedly nefarious corporate interests and school-bashing politicians must be at fault. So long as the education establishment can get away with this — playing their progressive education fiddle while our public schools burn — and have the casualties blamed on others — they can maintain the conceit of continually wanting to fix things through degrading, incessant “innovation”. It’s cynical, but can w...
Nonpartisan Education Review, 2018
no abstract
A “leading practice” in the terms of the CCSSO and ATP is not a “practice” at all; it is a plan f... more A “leading practice” in the terms of the CCSSO and ATP is not a “practice” at all; it is a plan for practice. That is, it is not about behavior or action, it is about a plan for behavior or action. And even the character of the plan is left to the discretion of the local school or district. Any local school or district with a test security plan in its files can claim that it is following leading practices. As model test security plans are routinely provided by test developers as part of their contract, every local school or district can be a leading test security practitioner by default.
PsycEXTRA Dataset
no abstract

Common Core proponents have managed to convince most journalists, policymakers, and other opinion... more Common Core proponents have managed to convince most journalists, policymakers, and other opinion leaders that the Common Core standards are higher, deeper, tougher, more challenging, and more rigorous than their antecedents. This is, arguably, their greatest accomplishment. Ask those journalists, policymakers, and other opinion leaders to identify the aspects of the Common Core standards that make them superior, however, and one is likely to hear only more marketing doublespeak about "problem solving", "deeper learning", "critical thinking", or the like. Most supporters of the Common Core do not understand how the Common Core standards or tests might be better. They simply assume that they must be because they have been told so often that they are. Large sums from private foundations and the U.S. Education Department have been employed to sell Common Core to the U.S. public. 1 It is unfortunate that funds were not directed toward educating the public about how standards actually work to raise student academic achievement. Their two-part nature-comprising both content and performance-is most fundamental for such an understanding. The Common Core State Standards (CCSS) document itself comprises only the pretend-content part-listing topics in math, and skills in English language arts that teachers should cover or develop over the course of a student's school career. By themselves, however, these and most other sets of content standards amount to little more than a plan. Indeed, absent any sort of monitoring or evaluation, teachers may feel free to ignore them. The second part of the structure-the performance standards, or the tests based on the content standards-is essential for standards to be effective. Performance standards tell us how well students master the content via letter grades, test scores, or other types of evaluative feedback.
Academic Questions, 2020
What happens to the research evidence in a scientific field when the professionals in that field ... more What happens to the research evidence in a scientific field when the professionals in that field do not like it?

SSRN Electronic Journal, 2020
The highly-praised and influential study, “Getting Tough?,” was published in 2001. Briefly, while... more The highly-praised and influential study, “Getting Tough?,” was published in 2001. Briefly, while controlling for a host of student, school, state, and educator background variables, the study regressed 1988 to 1992 student-level achievement score gains onto a dummy variable for the presence (or not) of a high school graduation test at the student’s school. The 1992-1988 difference in scores on the embedded cognitive test in a US Educational Department longitudinal survey comprised the gain scores. The study was praised for its methodology, controlling for multiple baseline variables which previous researchers allegedly had not, and by some opposed to high-stakes standardized testing for its finding of no achievement gains. Indeed, some characterized the work as so far superior in method it justified dismissing all previous work on the topic. Moreover, the article was timely, its appearance in print coincident with congressional consideration of the No Child Left Behind Act (2001) and its new federal mandate requiring annual testing in seven grades and three subjects in all U.S. public schools. The article also served as the foundation for a string of ensuing studies nominally showing that graduation exams bore few benefits and outsized costs (e.g., in dropout rates). Graduation exam opponents would employ these critical studies as evidence to political effect. From a high number of more than thirty states around the turn of the millennium, graduation tests are now administered in only seven or eight states. The multivariate analysis in “Getting Tough?” should have had the advantage of authenticity — an analysis of a phenomenon studied in its actual context. But that should mean that the context is understood and specified in the analysis, not ignored as if it couldn’t matter. And, it could have been understood and specified. Most of the relevant information left out of “Getting Tough?” — specific values for other factors that tend to affect test performance or student achievement — was available from the three contemporary surveys, and the rest could have been obtained from a more detailed evidence-gathering effort. The study could have been more insightful had it been done differently, perhaps with less emphasis on “more sophisticated” and “more rigorous” mathematical analysis, and more emphasis on understanding and specifying the context — how testing programs are organized, how tests are administered, the effective differences among the wide variety and forms of tests and how students respond differently to each, the legal context of testing in the late 1980s and early 1990s, and so on.
SSRN Electronic Journal, 2020

Imagine this scenario if you can: A new surgical technique has been developed by medical surgeons... more Imagine this scenario if you can: A new surgical technique has been developed by medical surgeons that is estimated will provide health and longevity benefits to U.S. citizens on the scale of tens of billions of dollars. Over a thousand controlled studies have been conducted to date, and the aggregate results are overwhelmingly positive. Moreover, many of the studies, and their meta-analyses, have been conducted by some the world’s most respected medical professors and surgeons. The U.S. Department of Health and Human Services, which would have to pay for the new surgical procedure if it were to be approved for reimbursement under Medicare, wishes to conduct one final evaluation of the efficacy of the new technique. So, they contract with the scientific “court of last resort”, the National Research Council (NRC), to evaluate. The NRC agrees and, as usual, sets about recruiting experts to serve on a committee that will conduct the study and produce a final evaluative report. But, the...

Nonpartisan Education Review, 2015
Ironically, the same industry insider who warned me against revealing the contents of the revised... more Ironically, the same industry insider who warned me against revealing the contents of the revised Standards draft has himself publicly asserted its colossal social and legal impact on US society. Yet, he defends both the secrecy and insularity of the current drafting process. Power to draft the Standards as they see fit has been divvied up among the chosen few on the “Joint Committee”, a baker’s dozen of education professors and industry insiders. This is an extraordinarily small number of people to essentially be writing our country’s testing law. In the case of chapter 13, at most “2-3 persons” craft our country’s testing policy. Not only is this a tiny group in number, these particular persons represent a biased and extreme point of view. Read the revised Standards, though, and theirs is the only point of view you will be allowed to know. As they have for a few decades now, these folks arrogantly declare a cornucopia of contrary opinion and evidence nonexistent.
Nonpartisan Education Review, 2020
Ten years ago, I worked as the Director of Assessments for the District of Columbia Public School... more Ten years ago, I worked as the Director of Assessments for the District of Columbia Public Schools (DCPS). My tenure coincided with Michelle Rhee’s last nine months as Chancellor. I departed shortly after Vincent Gray defeated Adrian Fenty in the September 2010 DC mayoral primary.

SSRN Electronic Journal, 2010
Ofqual seeks to determine the prevalence and character of measurement uncertainty reporting for h... more Ofqual seeks to determine the prevalence and character of measurement uncertainty reporting for high-stakes tests in the United States. The research questions might be phrased as: Is the reporting of measurement error (i.e., score imprecision) common or typical, or is it uncommon or atypical? And, if it is common or typical, how is it commonly or typically done? We conducted Web searches (and followed up where needed with telephone calls) and contacted key researchers at relevant entities involved in reporting test results in the United States. We sought to learn: The prevalence among our sample respondents of the reporting of measurement uncertainty in high-stakes tests. The degree of ease or difficulty with which ordinary citizens may access such information. The degree of transparency with measurement uncertainty issues varies. Transparency seems to be greater for education than for licensure tests, for mostly objective than for mostly essay tests, for larger programs than for smaller programs, and, perhaps ironically, the greater the role of test contractors and the smaller the role of state government. With educational tests, many of the states highlight imprecision along with the student scores on the parent/student reports. (More states now are reporting score bands.) But all states prepare technical manuals, and just about all technical manuals are readily available to those who want them. With licensure exams, the situation is mixed. Some provide information about uncertainly on the candidate report itself, and more reliability information in a yearly technical document. Others make available various technical reports and papers summarizing reliability information. Still others produce reports with substantial detail that are not released to the public. Is the totality of uncertainty reported to all stakeholders in U.S. educational and licensure testing programs? No. It would be difficult for the average parent to find a full range of measurement uncertainty statistics for their children’s tests, for example. But, then, the average parent would not be looking. And, that is why technical manuals are not found front and center on the home page of testing program Web sites. Documents that better respond to the typical consumer’s needs are placed front and center, and the technical manuals are placed a few to several clicks behind. But, they are not hidden. There seems not to be any effort to hide information; the level of dissemination appears to respond well to the demand for it.

Education News, 2012
Ironically, the same industry insider who warned me against revealing the contents of the revised... more Ironically, the same industry insider who warned me against revealing the contents of the revised Standards draft has himself publicly asserted its colossal social and legal impact on US society. Yet, he defends both the secrecy and insularity of the current drafting process. Power to draft the Standards as they see fit has been divvied up among the chosen few on the “Joint Committee”, a baker’s dozen of education professors and industry insiders. This is an extraordinarily small number of people to essentially be writing our country’s testing law. In the case of chapter 13, at most “2-3 persons” craft our country’s testing policy. Not only is this a tiny group in number, these particular persons represent a biased and extreme point of view. Read the revised Standards, though, and theirs is the only point of view you will be allowed to know. As they have for a few decades now, these folks arrogantly declare a cornucopia of contrary opinion and evidence nonexistent.
Starting in the late 1980s, two teams of researchers, well known for their criticism of standardi... more Starting in the late 1980s, two teams of researchers, well known for their criticism of standardized tests on equity and validity grounds, began attacking standardized testing on efficiency grounds as well, using cost-benefit analysis to do it. Their analyses are reviewed, and their conclusions discussed. The first team, Lorrie
Uploads
Papers by Richard P Phelps
This presentation chronicles a half-century of relatively successful efforts to suppress much of the research on the uses and beneficial effects of educational testing.
This presentation: chronicles with several examples from literature reviews from the past 80 years the parallel trends of a decline in the volume of research cited and a rise in the volume of research dismissal claims; describes the methods used to dismiss and suppress research; and includes excerpts from invited testimony before US Congressional Committees.
Only a skewed subset of the relevant research literature was consulted in crafting highly consequential US educational testing policies.
Harms caused by belief in the myth include: diverting attention from a widespread problem (at least in the US) of lax security in standardized test administration; encouraging ineffective and detrimental test preparation procedures (e.g., excessive drilling on format, learning “tricks” based on format in lieu of learning subject matter) and supporting an exploitive, predatory test preparation industry; encouraging teachers to teach to “a broader domain” (“away from the test”) – content different from the publicly mandated standards they are legally and ethically obligated to teach; encouraging numerous “wild goose chase” research studies using an unreliable low-stakes test score trend to “audit” a high-stakes test score trend; repeated declarations that a past (and contradictory) research literature does not exist; and justifying the use of value-added measures, calculated from student low-stakes test score trends, to judge teacher performance.
This presentation responds to these questions, recognizing that there is no single correct answer. An impressive body of research evidence will inform the talk; some of the most informative, from cognitive psychologists, is fairly recent. Topics will include cognitive load theory; the interplay between stakes and security, and stakes and motivation; retrieval, spacing, and other cognitive science concepts; the role of format (selected response, constructed response, authentic, etc.); and, more generally, the role of assessment in students’ intellectual development.
first, the No Child Left Behind (NCLB) Act and federal imposition of an idiosyncratic and ineffectual testing program;
second, the “big bang” reorganization of the US education testing industry from a stable, cooperative oligopoly run by psychometricians to a commercially competitive free-for-all with more opportunist and customer-pleasing ambitions; and
third, the Common Core standards, which mandated homogenous lower content standards onto the still required NCLB testing structure.
Billions from the federal government and wealthy foundations have transformed many once-independent national education organizations into “cargo cult” dependents and promoters of the new order, intolerant of divergent points of view. The research and policy brain trust responsible comprised an alliance of convenience among two “citation cartels” of establishment and reform scholars and politicos, and an astonishingly cooperative and un-skeptical group of journalists. It succeeded in focusing attention on their work, while diverting attention away from a much larger universe of others’ work (by ignoring, dismissing, or demeaning it) that included a century’s worth of mostly experimental scholarship in the fields of psychology and program evaluation.
References and appendices may be found online https://nonpartisaneducation.org/MalfunctionAppendices.htm
In their technical communications, measurement specialists are generally positive about the worth of standardized testing. Meanwhile, those who engage public debate, such as journalists and certain special interest groups, tend to be less scientifically informed and more negative about the value of testing. The contributors to this volume contend that most criticisms ignore readily accessible scientific evidence and have the unfortunate effect of discrediting the entire testing enterprise.
Standardized testing bears the twin burden of controversy and complexity and is difficult for many to understand either dispassionately or technically. In response to this reality, Richard P. Phelps and a team of well-noted measurement specialists present this book as a platform where they:
describe the current state of public debate about testing across fields
explain and refute the primary criticisms of testing
acknowledge the limitations and undesirable consequences of testing
provide suggestions for improving testing practices
present a vigorous defense of testing and practical vision for its promise and future
Those who are charged with translating the science of testing into public information and policy—including administrators, social scientists, test publishers, professors, and journalists who specialize in education and psychology—should find a wealth of usable information here with which to balance the debate.
Yet, when the results of these tests are not as good as one would like, there has been an increasing tendency by some to blame the test. In other words, if you don’t like the results it maybe easier to kill the messenger than fix the underlying problem the test revealed.
The Association of American Publishers’ Test committee created this brochure, which is an executive summary of a book written by Richard Phelps entitled Kill the Messenger, to help policy makers and the public better understand the growing debate about the use of standardized tests in our nation’s schools.
transportation, communication, and trade, if a worker's skills are no better than those of poorly educated, low-paid workers in less-developed countries, that worker is likely to face tough economic pressure.
The purpose of this report is to provide a review of higher education systems in selected developed countries and to compare higher education in the United States and other countries.
• explore the OECD indicators methodology;
• establish a mechanism whereby participating countries could agree on how to make common policy concerns amenable to comparative quantitative assessment;
• seek agreement on a small but critical mass of indicators that genuinely indicate educational performance relative to policy objectives and measure the current state of education in an internationally valid, efficient and timely manner;
• review methods and data collection instruments in order to develop these indicators; and
• determine the directions for further developmental work and analysis beyond the initial set of indicators.
Since then, participating countries have contributed in many ways to conceptual and developmental work, have applied the data collection instruments and methodology at the national level in
collaboration with the OECD and UNESCO, have co-operated in national, regional and international meetings of experts, and have worked jointly on the development of the indicators. Egypt, Morocco, Paraguay, Sri Lanka, Tunisia, Uruguay and Zimbabwe joined the programme during its second year.
This report provides an initial analysis of the data collected through this programme, bringing together data from the countries participating in the WEI programme with comparable data from OECD countries. Chapter 1 provides a brief profile for each country that highlights central government priorities in the development of education policy, identifies what the government perceives to be the major challenges facing the education system over the next decade, and explains reform efforts under way to meet these challenges. These profiles, which were contributed by participating countries, also provide the background for interpreting the international comparisons presented subsequently.
Chapters 2 and 3 analyse, within an international comparative framework, how countries have responded to rising demands for education and how effective they have been in mobilising the
necessary resources. Chapter 2 starts with an examination of patterns of demand, then looks at progression and completion, and finally examines patterns of participation by type of school and programme. Chapter 3 analyses aggregate spending, examines priorities within education budgets (such as spending by level of education, private provision and services targeted to specific target populations), and finally looks at spending choices within the classroom (teachers’ salaries, teachers’ qualifications, hours of instruction and class size). The Annex provides the indicators underlying the analysis, the classification of national education programmes used for the comparisons and other
relevant technical information.
This is the first report from the WEI programme. The indicators presented should not be considered final but have been, and continue to be, subject to a process of constant development,
consolidation and refinement. Furthermore, while it has been possible to provide for comparisons in educational enrolment and spending patterns, comparative information on the quality of
educational outcomes in WEI countries is only beginning to emerge. New comparative indicators will be needed in a wider range of educational domains in order to reflect the continuing shift in governmental and public concern, away from control over inputs and content towards a focus on educational outcomes.
International comparative assessments of achievement already figure prominently in national policy debates and in educational practice in WEI and OECD countries alike. To the extent that they
can now be successfully integrated into the WEI programme during its next phase, they will be able to provide a new basis for policy dialogue and for collaboration in defining and operationalising educational goals – in ways that reflect judgements about the skills that are relevant to adult life.
They will provide an opportunity for WEI countries to identify and assess gaps in national curricula, and provide information for benchmarking, the setting of standards and evaluation. They will also convey insights into the range of factors which contribute to the development of knowledge and skills, and into the similarities and differences between the ways in which these factors operate in the various countries. Ultimately, they can help countries to bring about improvements in schooling and better preparation for young people as they enter an adult life of rapid change and increasing global interdependence.
Deliberations at the first education summit led to the subsequent adoption of the first six National Education Goals 1 and the formation of the National Education Goals Panel. As some state governors themselves might say, it is significant that these products of the education summit bore the word "national" rather than "federal" in their titles. The meeting and its products were at once an assertion that education in the United States is a national concern, but still primarily a state and local responsibility.
A common education indicator called "Sources of funds for education" supports this contention. When revenues for public elementary and secondary education are traced to the original source of the funds, one finds that state governments contribute, on average, about the same percentage as local governments. Combined, state and local governments account for 93 percent of public education funding nationwide.
At the higher education level, state government's role is relatively even more substantial, contributing 37 percent of governments contribute 11 and 4 percent, respectively. (The remainder comes from tuition and fees, endowments and other private contributions, and sales and services.)
Since the Charlottesville summit, Americans have seen continued activity on education policy between the separate branches and levels of government. The Goals Panel, for example, has included members from the Congress, the White House, the U.S. Department of Education, and the ranks of governors and state legislators. The Goals Panel continues to produce a report every year which measures our country's and each state's progress toward the Goals.
Early in 1996, forty-three of the nation's governors met in a second "education summit" in Palisades, New York, along with corporate chief executives from their states, and other invited guests. The meeting was sponsored by two organizations run by U.S. state governors—the Education Commission of the States and The National Governors’ Association—and the International Business Machines Corporation (IBM), which served as host. The second summit's governors agreed to develop and establish within two years internationally competitive standards, assessments to measure progress toward meeting them, and accountability systems.
By joining efforts with the Federal government in some of these activities over the past ten years, the governors have acknowledged that the Federal government has an important role to play in the collection and dissemination of some of the comparative data needed to manage the quality of American education.
0In 1988, the U.S. Congress authorized the establishment of a Special Study Panel on Education Indicators for the U.S. Department of Education's National Center for Education Statistics (NCES). This panel was chartered in July 1989 and directed to prepare a report, published in 1991, Education Counts: An Indicator System to Monitor the Nation's Educational Health. The Panel's report recommended a variety of ways in which NCES should increase its collection and presentation of indicator data. Among the many recommendations, the report urged NCES to: strengthen its national role in data collection and provide technical assistance to the states; improve its capacity to collect international data; and develop a "mixed model" of indicators — international and national indicators, state and local indicators, and a subset of indicators held in common.
Two of NCES's primary indicators projects include The Condition of Education and the National Assessment of Educational Progress (NAEP). The Condition is an annual compendium of statistical information on American education, including trends over time, international country comparisons, and some comparisons among various groups (by sex, ethnicity, socioeconomic status, and others). However, the Condition contains very few state-by-state comparisons.
The National Assessment of Educational Progress (NAEP) is a congressionally-mandated assessment of the academic achievement of American students. Begun in the late 1960s, NAEP has been reporting assessment results state-by-state, on a trial basis, only since 1990. In that year, 37 states, the District of Columbia, and two territories participated in a trial state assessment program in eighth-grade mathematics. In the 1992 fourth-grade reading and mathematics and eighth-grade mathematics trial state assessments, voluntary participation increased to 41 states, the District of Columbia, and 2 territories. The same number of jurisdictions participated in the 1994 Trial State Assessment of fourth grade reading. Forty-three states participated in the 1996 Trial State Assessment of fourth and eighth grade mathematics.
NCES's Digest of Education Statistics is, perhaps, the most comprehensive source of education statistics in the United States. Published annually or biennially since 1962, it provides national and state statistics for all levels of American public and private education. Using both government and private sources, with particular emphasis upon surveys and projects conducted by NCES, the publication reports on the number of education institutions, teachers, enrollments, and graduates; educational attainment; finances; government funding; and outcomes of education. Background information on population trends, public attitudes toward education, education characteristics of the labor force, government finances, and economic trends is also presented. Most of the data is presented in over 400 tables, but some graphics are also included. Many of the tables contain state-by-state data.
For some time, NCES has also compiled similar volumes of education statistics focused on the U.S. states. These publications, two volumes of Historical Trends: State Education Facts and one volume of State Projections for Public Elementary and Secondary Enrollment, Graduates, and Teachers were compiled every few years, largely in order to present historical trends or future projections in state education statistics.
An NCES state indicator report published a year ago, State Comparisons of Education Statistics: 1969–70 to 1993–94 expanded on these earlier efforts with much new material, aggregated at the state level for the first time. But, State Comparisons also presents time series of NCES's most frequently requested state level statistics. About thirty graphics (bar charts and maps) and a considerable amount of explanatory text are also included.
This volume, State Indicators in Education 1997, is a logical extension of these earlier efforts. There is not an attempt in this report, however, to include the total volume of data that the Digest or State Comparisons presents, mostly in tabular form. Rather, the emphasis in this report veers toward explaining and presenting certain patterns and relationships in the data. While there are fewer data, there is more text and there are more graphics. State Indicators in Education, then, is perhaps more like a state-level version of NCES's indicator report, The Condition of Education, and less like a state-level version of NCES's comprehensive data volume, the Digest of Education Statistics.
Education Indicators: An International Perspective expands on the traditional interest in student achievement and education finance by including a broad range of indicators, such as Gender differences in earnings, Time spent on homework, and Home and school language, among others. The indicators focus primarily upon comparisons between the United States and other industrialized nations with large economies - particularly those that most closely resemble the United States in terms of size and are viewed as our major economic competitors.
Among a multitude of sources used in this report, the most comprehensive is Education at a Glance (1995), the international education indicators report produced by the Organization for Economic Cooperation and Development (OECD). Other data sources include the International Assessment of Educational Progress, the International Association for the Evaluation of Educational Achievement, and the International Assessment of Adult Literacy.
The importance of Education Indicators: An International Perspective lies in its ability to provide a comprehensive selection of international indicators geared toward a U.S. audience. This particular set of indicators is presented together for the first time and much of the data are derived from sources not readily accessible to U.S. readers. The publication, then, contributes to the continuing effort to make comparative information accessible and useful to U.S. leaders.
Education in States and Nations reflects two realities increasing globalization and the centrality of the states in American education. In Education in States and Nations, indicators provide international benchmarks for assessing the condition of education in the U.S. states and in the United States as a whole by comparison with many other industrialized countries for which data are available. On six sets of education indicators background, participation, processes and institutions, achievement and attainment, labor market outcomes, and finance country-level and state-level measures are arrayed side-by-side in order to facilitate that comparison.
The country-level data come from a variety of sources, but two sources are most prominent: the second edition of international education indicators, Education at a Glance, of the Organization for Economic Co-operation and Development (OECD); and the International Assessment of Educational Progress, which administered a mathematics test to 13-year-olds in about 20 countries and surveyed them and their school administrators about various aspects of the education process. The indicators in Education in States and Nations correspond to as many of the international indicators for which state-level data were both applicable and available.
This report is the second effort of its kind; the first edition, produced in 1993, was based on state and country data from the late 1980s. This edition, using data primarily from the early 1990s, is much larger than its predecessor. This reflects both a greater availability of suitable international indicators and state-level data, as well as a greater effort to find relevant indicators, both domestic and international.
The purpose of this report is to provide a review of higher education systems in selected developed countries and to compare higher education in the United States and other countries.
the entire breadth of the research literature on testing, however, and it makes no sense at all.
1. Instead of referencing a wide range of relevant research, Fordham references only friends from inside their echo chamber and others paid by the Common Core’s wealthy benefactors. But, they imply that they have covered a relevant and adequately wide range of sources.
2. Instead of evaluating tests according to the industry standard
Standards for Educational and Psychological Testing, or any of
dozens of other freely-available and well-vetted test evaluation
standards, guidelines, or protocols used around the world by testing experts, they employ “a brand new methodology” specifically developed for Common Core, for the owners of the Common Core, and paid for by Common Core’s funders.
3. Instead of suggesting as fact only that which has been rigorously evaluated and accepted as fact by skeptics, the authors continue the practice of Common Core salespeople of attributing benefits to their tests for which no evidence exists
4. Instead of addressing any of the many sincere, profound critiques of their work, as confident and responsible researchers would do, the Fordham authors tell their critics to go away—“If you don’t care for the standards…you should probably ignore this study” (p. 4).
5. Instead of writing in neutral language as real researchers do, the authors adopt the practice of coloring their language as so many Common Core salespeople do, attaching nice-sounding adjectives and adverbs to what serves their interest, and bad-sounding words to what does not.
The stated mandates of these organizations are to objectively review all the research available; instead they promote their own and declare most of the rest nonexistent. They are mandated to serve the public interest; instead they serve their own.
Currently, too few people have too much influence over those who control the education research purse strings. And, those who control the purse strings have too much influence over policy decisions. Until folk at the Bill and Melinda Gates Foundation and the US Education Department—to mention just a couple of consistent funders of education policy debacles—broaden their networks, expand their reading lists, and open their minds to more intellectual diversity, they will continue to produce education policy failure.
It would help if they would fund a wider pool of education researchers, evidence, and information. In recent years, they have, instead, encouraged the converse—funding a saturating dissemination of a narrow pool of information—thereby contributing to US education policy’s number 1 problem: pervasive misinformation.
Scores on high-stakes tests—tests that have serious consequences for students or teachers—often become severely inflated. That is, gains in scores on these tests are often far larger than true gains in students’ learning. Worse, this inflation is highly variable and unpredictable, so one cannot tell which school’s scores are inflated and which are legitimate. (p. 131)
Thus, Koretz, a long-time associate of the federally funded Center for Research on Evaluation, Standards, & Student Testing (CRESST), provides the many educators predisposed to dislike high-stakes tests anyway a seemingly scientific (and seemingly not self-serving or ideological) argument for opposing them. Meanwhile, he provides policymakers a conundrum: if scores on high-stakes tests improve, likely they are meaningless—leaving them no external measure for school improvement. So they might just as well do nothing as bother doing anything.
Measuring Up supports this theory by ridiculing straw men— declaring a pittance of flawed supporting evidence sufficient (pp. 11, 59, 63, 132, and chapter 10) and a superabundance of contrary evidence nonexistent—and mostly by repeatedly insisting that he is right. (See, for example, chapter 1, pp. 131–133, and pp. 231–236.) He also shows little patience for those who choose to disagree with him. They want “simple answers,” speak “nonsense,” assert “hogwash,” employ “logical sleight[s] of hand,” write “polemics,” or are “social scientists who ought to know better.”
• The author often misuses education statistics.
• He charges two of the world’s most expert and responsible
statistical agencies—the U. S. Census Bureau and the National
Center for Education Statistics—with incompetence, neglect,
and willfully misleading the public without making any effort
to learn their side of the story.
• His proposed solutions are illogical: he advocates increasing
rigor for students who are unable to meet current standards,
and at the same time he shames schools for course repetition
and grade retention. The inevitable result will be lower, not
higher, standards.
There is no single best method for calculating graduation rates or completion ratios. There are several, each of them valid and useful in different contexts. Ironically, Wise proves this point himself by (unknowingly) employing various, and sometimes quite-different, graduation measures throughout his book. Only the semantics are constant in Raising the Grade—each quite-different measure is consistently identified as the graduation rate.
This hefty mass represents an enormous expenditure of time, money, and effort to, essentially, get it all wrong.
With the REAFISO project, the OECD has taken sides, but appears to have done so in a cowardly manner. REAFISO staff have not described evidence and sources on multiple sides of topics, weighed them in the balance, and then justified their preference. Rather, on each controversial topic they broach, they present only one side of the story. On some topics, huge research literatures several hundred studies large are completely ignored.
I believe that REAFISO got caught in a one-way trap or, as others might term it: a bubble, echo chamber, infinite (feedback) loop, or myopia. They began their study with the work of celebrity researchers—dismissive reviewers—researchers who ignore (or declare nonexistent) those researchers and that research that contradicts their own (Phelps, 2012a)—and never found their way out. Dismissive reviewers blow bubbles, construct echo chambers, and program infinite loops by acknowledging only that research and those researchers they like or agree with.
. . . extend and add value to the existing body of international
work on evaluation and assessment policies. (p. 5)
Synthesise research-based evidence on the impact of
evaluation and assessment strategies and disseminate this
knowledge among countries. Identify policy options for policy
makers to consider. (p. 4)
. . . take stock of the existing knowledge base within the OECD and member countries as well as academic research on the relationship between assessment and evaluation procedures and performance of students, teachers and schools. It will look at the quantitative and qualitative evidence available on the different approaches used to evaluate and assess educational practice and performance. (p. 16)
To the contrary, REAFISO has not synthesized the existing body of research-based evidence on evaluation and assessment policies, much less extended it. By telling the world that a small proportion instead hidden from the world most of the useful and relevant information (or implied that it is not worth considering).
The ordinary Citizen Joe knows that one shouldn’t trust everything one finds on the Internet, nor assume that Internet search engines rank documents according to their accuracy. So naturally, scholarly researchers who are trained to be skeptical, systematic, thorough, aware of biases, and facile with statistical sampling methods would be too. After all, scholarly researchers have spent several more years in school, often prestigious schools. They should “know how to know” as well or better than the average citizen.
Yet REAFISO’s reviews repeatedly offer one or a few examples of research from their favored sources to summarize topics, even though thorough reviews of dozens, hundreds, or thousands of sources were to be found had they simply looked widely enough. In some cases, REAFISO writers conclude a policy recommendation on the basis of one or a few studies, when a reading of the whole of the research literature on the topic would suggest exactly the opposite policy.
In its document, Evaluation and Assessment Frameworks for Improving School Outcomes: Common Policy Challenges (2011), written two years after the Design and Implementation Plan, REAFISO claims to have completed “a thorough analysis of the evidence on evaluation and assessment.”
One person claims that the Committee was deliberately set up to be a hostile committee. I think the odds are strong that that claim is correct.
Another person claims that the Committee considered only one personnel testing study from among hundreds in existence, yet made claims that implied they had considered all of them. I believe this assertion is also true.
The third claims that the Committee refused to consider some of the most basic and relevant evidence pertaining to personnel testing issues, such as: the ways in which the Hunter and Schmidt estimates of utility underestimated the benefits of testing; the true magnitude of the effect of range restriction on the utility estimates (for which the Committee refused to correct); the true value of average interrater reliability of ratings of .50 (they assumed .80, thus undercorrecting for criterion unreliability); and (pertaining to the NRC assertion that Hunter and Schmidt did not adjust their estimates for the time value of money, incremental validity, or what have you) the substantial research in personnel psychology that has explicitly considered all those issues (and found little difference in the direction or magnitude of the resulting utility estimates).
This is a serious charge, that those at the National Research Council responsible for the evaluation of testing issues were (and remain) biased. Yet, I believe it to be true, and I believe that any fair-minded person who looked at the evidence would agree.
The National Research Council is supposed to represent the pinnacle of objectivity, the "court of last resort" on controversial research issues. Alas, I believe, it represents neither on testing issues. It seems biased—biased in conformity with an "education establishment" perspective.
High achieving students who cannot leave a school district where academic achievement is undervalued face varied pressures that impede them: pressure to fit in and be popular; to excel at sports; to work at low-pay, dead-end jobs to earn money for cars and parties; and so on. If they study hard and excel at academics, they will be taunted; disliked; called “nerd,” “geek,” “dork”; or be accused of “acting White.”
The report spends considerable effort worrying about the feelings of students who might fail high-stakes tests, but little if any effort worrying about the social fallout of abandoning high academic standards. High-achieving students among our poor should be considered our country’s most precious human resources.
For a variety of reasons, our society very badly needs these students to prosper; so their gifts and ambitions should be nurtured, not discouraged. The report, however, in the effect of its recommendations, would have these students treated as pariahs and have them feel guilty for wanting to work hard and succeed. After all, if these students work hard and succeed, won’t that make other students who do not want to work hard look and feel bad?
Abandoning the enforcement of high academic standards will not eliminate pressures and hurt feelings among our youth, however. Pressure and hurt feelings are facts of life. Abandoning academics just means the pressures will come from and the hurt feelings will be caused by nonacademic aspects of these students’ lives.
Is that really what we want? In radical egalitarian bliss, there will be no high-stakes tests, no academic standards enforced in any meaningful way, and no academic tracking. Academic progress in every school and for every student will be slowed to the preferred pace of the least motivated student.