Inflated Test Scores Caused By the "No Child Left Behind Act.

Professor Daniel Koretz is a faculty member in the Administration, Planning, and Social Policy area at the Harvard Graduate School of Education. Here, in an excerpt from a presentation at HGSE, he discusses inflated test scores, a problem exacerbated by the No Child Left Behind Act.
![]() |
|
| Professor Daniel Koretz |
Even though this bill is 1100 pages long, it really is a very simple bill. The model of improvement it presupposes is this: you assess student performance using measures that you think are sufficient to summarize what kids have learned over a long period of time; you set very ambitious targets for improvements in scores on those tests; you require continual improvement; and then you reward and punish.
This is precedent-setting at the federal level, but it is nothing new. It's the culmination of a 30-year trend in assessment policy. Much of what's in the bill reflects developments at the state level over the past 10 to 12 years: there is virtually nothing in the bill's assessment provisions that has not been tried before by at least one state.
What that means is we actually know something about how these things will work, or at least how they have worked. We have not only experience, but also some hard research. That experience and those studies suggest that there are some very serious assessment-related problems with this model. For that reason, I'm very pessimistic the bill will do what its proponents expect it to do.
Test Scores—The Illusion of Progress
Taking just one of the many assessment issues, let's look at the inflation of
test scores, a problem I've been working on for almost a decade and a half now.
Inflation of test scores refers to increases in scores that are markedly bigger than the improvements in performance they're supposed to denote. It's problematic not only because it creates an illusion of progress, but also because it means that real problems go unnoticed. It covers things up, allowing serious problems to fester.
Why does this happen? We know actually quite a bit about the mechanisms that underlie score inflation, but I'll just give you four points:
All of these occur because of one simple problem: excessive attention to the indicator, rather than to what it's supposed to indicate. I'm going to give you just one example, from the first study of it I ever did, in the late 1980s when pressure to raise scores was far, far lower than it is now. By today's standards, this was a low-stakes testing program, but it was high enough stakes that teachers worried about scores. [View results from this study in a separate window]

Results from the study
[View these results in a separate window]
The scale of the results is in grade equivalence, which is years and academic months, ten months in a year. This is the end of third grade in mathematics; an average school would have been a 3.7, three years, seven months. You can see the blue diamond in the top left is the last year that the district used one of the five big, standard commercial tests. You can see that, at that point, they were reporting to their parents that the kids in this district—which was a high-minority, high-poverty district—were half an academic year above average.
The state then bought a new test, which is the red data points. Scores immediately dropped to average, and four years later were back again where they had been, a half a year above average. That much was not new. People in my field knew to expect that.
What was new was the blue diamond in the lower right. We went in in 1990, took a large sample of classrooms, and randomly assigned them to five different testing conditions, one of which was the exact same test the district had last used in 1986.
You can see that when we administered the test, four years after the kids had been told to worry about that particular test, they were, lo and behold, average.
This study came out in 1991, and I've had the opportunity to ask hundreds of people which number you should give parents; are these kids average, or are they half a year above average? Almost everyone says, "Clearly, they're not really a half a year above average." But the premise that underlies the No Child Left Behind bill is that you should give people the top number. Because what we're going to get is that red line, but at a steeper slope, at a steeper angle.
Making Things Better? Or Worse?
What can we do about this? We are not going to get out of this problem by using
different styles of tests. In the early 1990s, people claimed that the problem
was that this kind of thing occurred with multiple-choice tests. So the thinking
became, "We'll use tests worth teaching to, which involved writing, group
performances; all manner of things other than multiple choice."
But when I went to Kentucky, which offered the archetype of that kind of reform, and looked at what happened with the scores on their test that was worth teaching to, it was the same thing, but worse. Because it's still a small sample of performance, and teachers are being paid in that case—just as they will under the federal law—to improve performance on that small measure.
Nor are we're going to solve the problems with current trend in education policy toward tests that are aligned with standards. That makes no difference. In fact, it could make things worse.
So what should we do? Some suggestions: