Estimating the Costs and Benefits of Educational Testing Programs

By Richard Phelps, Ph.D.
Benefit-cost analysis is imbedded in all studies that ask the essential question of an activity, “Is it worth doing?” Benefit-cost analysis is a set of techniques, philosophy, and logic that can impose an order and rigor on the process used to answer the essential question.

The logic of benefit-cost analysis is that of the accountant’s spreadsheet. Indeed, one could accurately describe it as economists’ accounting method. The essential idea is to capture all relevant costs and benefits, broadly considered, on one sheet of paper and weigh them in the balance. If the enterprise or project shows more benefit than cost (i.e. net benefits are positive) it can be said to be economically worthwhile. It is assumed that the researcher will do an honest and responsible job of trying to capture all the relevant benefits and costs. If they can’t be estimated with any precision, the researcher should at least enumerate them and leave it to the reader to estimate their value.

What one person considers a benefit, however, another person may not. Indeed, what one person considers a benefit, another person may regard as a cost. The details of benefit-cost analyses, then, are often subject to debate. It is, however, considered incumbent upon the researcher to properly identify what perspective she is adopting. Ideally, a benefit-cost analysis calculates the benefits and costs as they accrue to all of society – such is the nature of a social benefit-cost analysis. Anything less – an analysis that calculates benefits and costs for a sub-group – is a private benefit-cost analysis, and the researcher is obligated to explicitly declare it as such.

Benefit-cost analysis should be most welcome in education research. Benefit-cost analysis imposes a structure in which “the whole picture” gets considered. It provides a framework that can impose rigor and honesty onto evaluations that could otherwise be sloppy.

By the same token, most readers are probably also well aware of how benefit-cost analysis can be misused. A researcher can make unreasonable or dishonest estimates, ignore some relevant benefits or costs, and include some irrelevant benefits or costs, or double count. There can be a tendency among advocates to exclude or include benefits or costs according to their preferences.

What costs and benefits are relevant? Generally, they are the marginal costs or benefits that are attributable to the activity in question and not another activity. When someone argues that the cost of a test is X, the appropriate cost to cite is the marginal cost of the test, the cost that can be attributed to the existence of the test and not to any other activity. Looked at another way, a marginal cost of a test is a cost that is caused by the test, one that doesn’t exist without the test. An heuristic one can use to determine if an activity or object is a marginal cost of a test or not: take the test away and see if the activity or object disappears.

It turns out that the costs of standardized testing are minuscule by comparison with huge potential benefits. This fact is little known among educators, as few mainstream education researchers are trained to attempt such studies, more common to economists, and the few who have attempted such studies have produced bungled, or biased, results.

In the early 1990s, the Center for the Study of Testing, Evaluation, and Educational Policy (CSTEEP), at Boston College, calculated a “high” estimate of $22.7 billion spent on standardized testing per year. U.S. schools, the CSTEEP report claimed, suffer from “too much standardized testing” that amounts to “a complete and utter waste of resources.” Their estimate breaks down to about $575 per student per year.

A report from the federally-funded Center for Research on Education, Standards, and Student Testing (CRESST) counted cost components in much the same way as the CSTEEP study estimated costs of a certain state test at between $848 and $1,792 per student tested ($1,320 would be mid-range).

Testing critics exaggerate their cost estimates by counting the costs of any activities “related to” a test as costs of a test. In the CRESST study of Kentucky’s performance-based testing program, for example, teachers were asked to count the number of hours they spent “preparing materials related to the assessment program for classroom use.” In an instructional program like Kentucky’s, with the intention of unifying all instruction and assessment into a “seamless” web, where the curriculum and the test mutually determine each other, all instruction throughout the entire school year will be “related to” the assessment.

The CSTEEP study counted even more cost items, such as student time. The CSTEEP researchers assumed that there is no instructional value whatsoever to student time preparing for or taking a test (i.e., students learn absolutely nothing while preparing for or taking tests). Then they calculated the present discounted value of that “lost” learning time against future earnings, assuming all future earnings to be the direct outcome of school instruction. The CSTEEP researchers also counted building overhead (maintenance and capital costs) for the amount of time spent testing, even though those costs are constant (i.e., “sunk”) and not affected by the existence of a test. In sum, CSTEEP counts any and all costs incurred simultaneously to tests, not just those caused by testing, which would not exist without testing.

In stark contrast to these incredible estimates are the actual prices charged for tests such as the ACT, SAT, and AP exams, ranging from $20 to $70 a student. The makers of these tests must cover all their costs, or they would go out of business.

The bipartisan U.S. General Accounting Office (GAO) also conducted a survey of state and local testing directors and administrators to learn the costs of statewide and districtwide tests. The GAO estimate of $15 to $33 per student contrasts markedly with CRESST and CSTEEP estimates of $575 and $1,320. And, the GAO estimates counted all relevant costs, including that for teacher time used in administering tests. The GAO estimate for the total national cost of systemwide testing of about $500 million contrasts with a CSTEEP estimate 45 times higher.

> The GAO estimated all-inclusive, stand-alone marginal costs of large-scale, systemwide tests, costs that would portend in a situation where the tests had to be administered independent of any school system structure or schedule, say during the summer months and by hired personnel. The independent SAT, ACT, and AP exams are administered this way.

Recalculating the GAO study’s estimates under two reasonable assumptions: (1) that the tests, as is usually the case, would be administered during the regular school year, using regular school personnel, and would be integral parts of the school system curricular and instructional plan; and (2) that the tests would be used in many school districts to replace, rather than supplement, some preexisting test. With these adjustments, marginal costs become $2 per student for multiple-choice and $11 per student for performance tests.

Far from being the hugely expensive enterprise that some testing critics claim for it, standardized testing is not very expensive by most standards. Even under the rather unrealistic assumptions of the GAO study’s upper-bound estimates, systemwide tests impose a time and cost burden, as one state testing director put it, “on a par with field trips.”

Distilled to the most rudimentary elements, the main benefits of standardized testing are four – information, motivation, organizational clarity, and goodwill. But, that amounts to quite a thorough distillation. The information benefits alone can manifest themselves in several different forms, to several different audiences. Test results can tell us about the performance of an individual student. They can provide information about a teacher, a curriculum, a textbook, a school, a program, a district, or a state policy. Moreover, the information provided by test results can inform one or more among many parties – parents, voters, employers, higher education institutions, other schools, state departments of education, and so on.

Perhaps the simplest, and least disputed, benefit of standardized tests is in diagnosis. Test results can pinpoint a student’s academic strengths and weaknesses, areas that need work, and areas where help is needed. Test scores provide a measurement tool that can be used to judge the effectiveness of preexisting or proposed school programs. Test results can inform teachers, schools, and school systems about their curricular and instructional strengths and weaknesses. That may lead to a better alignment of curriculum with instruction, a benefit often enumerated by teachers and administrators in evaluations of testing programs. Teachers have also reported that they learn more about their students, their own teaching, and other teachers’ methods from high-stakes external tests.

Information can also be used for accountability purposes. Higher-level school system administrators can use information to make judgments about performance at the school or school district level and to increase efficiency. In an environment of school choice (e.g., school districts with open enrollment), information about school performance can help parent-student school shoppers to make a better-informed selection.

Finally, information benefits can consist of signaling, screening, and credentialing effects. College admissions counselors and employers can make a more informed decision about applicants’ academic achievement with test scores than they can without. Colleges, for example, use measures of predictive validity (correlation coefficient of entrance test score with college achievement) to justify requiring applicants to submit scores from college admissions tests (ACT or SAT). Measures of allocative efficiency (efficient sorting of applicants to organizations) are more difficult to measure, but are relevant benefits as well.

Of the four main categories of benefits listed above, information is arguably the only one common to educational tests whether or not they have “stakes,” and whether or not they are conducted “internally” or “externally.” The other categories of benefits – motivation, organizational clarity and efficiency, and goodwill – are unlikely to occur when tests “do not count.”

Motivation may not be an end in itself, but can lead to desirable behaviors, such as a student paying greater attention in class and studying more-activities that, in turn, lead to the accumulation of more knowledge and understanding. Like information benefits, motivation can affect many different parties to the educational enterprise and provide benefits to many different sectors of our society. Motivational effects are manifest when rewards or punishments are provided (or imposed upon) students, teachers, administrators, schools, districts, programs, service providers, politicians, or even parents. The beneficial effects of motivated efforts accrue to all of the parties above, employers, higher education institutions, and society in general.

Just one example of the organizational clarity or efficiency benefit of standardized testing is provided by the testimony of teachers in many states, provinces, and countries who participate in test development, administration, and scoring. Overwhelmingly, they assert that the experience helps them as instructors. After struggling, along with other teachers and testing experts, to design and score assessments fairly, they understand better how their students might misunderstand concepts and how they might better explain the concepts. Moreover, they can much more efficiently align their own instructional program with state standards after undergoing a deep immersion into the state standards.

The final general category of benefit cited above – goodwill – is certainly the most often overlooked, and is the most difficult to measure, but may be the most important. The public pays for the public schools and hands over responsibility for its children’s welfare to the public school authorities for substantial periods of time. The public has a right to objective, impartial information about the performance of the public schools’ main function – the academic achievement of their children. Classroom grades are unreliable and often invalid sources of such information. Standardized tests, when they are used validly, provide far more reliable and trustworthy information.

Examples of goodwill, then, include: renewed public confidence in the school system; public faith that the schools really are working to uphold standards; and the peace of mind that teachers and school administrators might gain in the wake of the new parental and public trust. Students have also reported in some surveys feelings of genuine achievement and accomplishment when they pass important, meaningful tests.

Even the four categories of benefits mentioned above, in all their varied manifestations, does not cover the gamut. Still other benefits probably exist, but may be more difficult to pin down, more hypothetical, or more difficult to measure. The economist John Bishop, for example, argues that it is illogical and counterproductive to insist that a teacher be both a “coach” and a “judge.” The teacher is a coach when she helps a student to succeed; a judge when she grades a student’s test and decides that the student should not be promoted to the next grade or level of education. By Bishop’s theory, this dual role puts the teacher in a moral dilemma that is often resolved through social promotion. Most teachers would rather be coaches than judges and, so, promote students to the next level even though they are not ready. After a few years of social promotion, of course, students may be so far behind that they cannot possibly succeed by any objective standard. They may become disillusioned, give up trying, and drop out. Bishop argues for external high-stakes testing as a means to free each teacher to be a coach the student can trust to help him meet the challenge of the examination which is “external” to both of them.

We may have reached the point in the United States where standardized tests provide the only pure measure of subject-matter mastery. For some time now, education schools have encouraged teachers to grade students using a cornucopia of criteria that include perceived persistence or effort; perceived level of handicap due to background, participation or enthusiasm, and perceived need. Subject matter mastery is just one, and usually not the most important factor, considered in calculating a student’s course grade. In addition to the missionary directive of the education schools, Bishop’s theory of the irreconcilability of the coach and judge roles may also explain the degradation of grades. But, regardless of the reason, if standardized tests are, indeed, the only trustworthy measure of academic achievement, can our society afford to not use them? External standardized tests may be the only reliable source of information on education performance not controlled by groups with an incentive to corrupt or suppress it.

Even for teachers who desire to grade their students only on the basis of academic achievement, few have training in testing and measurement. Those who criticize standardized tests for their alleged imperfections of structure and content seldom mention that standardized tests are written, tested, and retested by large groups of Ph.D.s with highly technical training in testing and measurement. By contrast, the typical classroom teacher has had no training in testing and measurement.

The full effect of all the benefits mentioned above, however, numerous as they are, cannot be felt so long as standardized tests are in use. “External” measures, such as systemwide standardized test scores, serve as a check on other measures of performance (psychometricians label this phenomenon generally “restriction of range”). To fully appreciate the benefits of external standardized testing, one must imagine a society without standardized testing. What would happen to grade inflation if there were no standardized test scores to which one could compare the grades? How much effort would students, teachers, and administrators make to improve achievement if there were no standardized tests with which to check their progress?

Economic studies that have focused primarily on the motivational, or incentive, effects of high-stakes testing programs estimate average benefits to students over their lifetimes of around $13,000 per subject area tested. That is, students in jurisdictions with high-stakes testing programs tend to learn more, and that increased amount of knowledge and skill is rewarded throughout their lives, through higher wages and greater job security.

Psychologists have conducted many studies – in excess of a thousand, actually – on the predictive validity and allocative efficiency of tests. Education professors have attacked the dollar estimates based on such studies but even they concede benefits on the order of $5,000 to $8,000 per student lifetime.

Total testing benefits vastly outweigh the costs, by a benefit-to-cost ratio that probably exceeds a thousand. The benefits can be so high because they affect a large number of people and they produce lasting and cumulative effects. Meanwhile, the testing costs are low and incurred only once or a few times.


