“Why do good policymakers use bad indicators?”

Here we go again on testing – or at least the more general category of high stakes measures. This one by Larry Cuban. You know who Larry Cuban is – former high school social studies teacher (14 years, including in the District), seven years district superintendent in Arlington, VA, and professor emeritus of education at Stanford University for the past 20.

An interesting way for Cuban to close. Perhaps he was making a political argument in an attempt to reach the single measure aficionados. The research is pretty clear that the use of multiple measures – particularly the highly error-prone and inaccurate ones we have (mincomp tests and ratings – actually the greater problem is the consistency of the raters) – simply produces corruption of all of them.



“Why Do Good Policy Makers Use Bad Indicators?”*

Larry Cuban

January 29, 2012


Test scores are the coin of the educational realm in the U.S.. In No Child Left Behind, they are used to reward and punish districts, schools, and teachers for how well or poorly students score on state tests. In pursuit of federal dollars, The Race To The Top competition has shoved state after state into legislating that teacher evaluations include student test scores as part of judging teacher effectiveness.

Numbers glued to high stakes consequences, however, corrupt performance. Since the mid-1970s, social scientists have documented the untoward results of attaching high stakes to quantitative indicators not only for education but also across numerous institutions. They have pointed out that those who implement policies using specific quantitative measures will change their practices to insure better numbers.

The work of social scientist Donald T. Campbell and others about the perverse outcomes of incentives was available and known to many but went ignored. In Assessing the Impact of Planned Social Change, Campbell wrote:

“The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor” (p. 49).

Campbell drew instances of distorted behavior when police officials used clearance rates in solving crimes, the Soviets set numerical goals for farming and industry, and when the U.S military used “body counts” in Vietnam as evidence of winning the war.

That was nearly forty years ago. In the past decade, medical researchers have found similar patterns when health insurers and Medicare have used quantitative indicators to measure physician performance. For example, Medicare requires—as a quality measure—that doctors administer antibiotics to a pneumonia patient within six hours of arriving at the hospital. As one physician said: “The trouble is that doctors often cannot diagnose pneumonia that quickly. You have to talk to and examine the patient and wait for blood tests, chest X-rays and so on.” So what happens is that “more and more antibiotics are being used in emergency rooms today, despite all-too-evident dangers like antibiotic-resistant bacteria and antibiotic-associated infections.” He and other doctors also know that surgeons have been known to pick reasonably healthy patients for heart bypass operations and ignore elderly ones who have 3-5 chronic ailments to insure that results look good.

More examples.

TV stations charge for advertising on the basis of how many viewers they have during  “sweep” months (November, February, May, and July). Nielsen company has boxes in two million homes (representative of the nation’s viewership) that register whether the TV is on and what families are watching during those months. They also have viewers fill out diaries. Nielsen assumes that what the station shows in those months represents programming for the entire year (see 2011-2012-Sweeps-Dates). Nope. What TV networks and cable companies do is that during those “sweeps” they program new shows, films, extravaganzas, and sports that will draw viewers so they can charge higher advertising rates. They game the system and corrupt the measure (see p. 80).

And just this week, ripped from the headlines of the daily paper, online vendors secretly ask purchasers  of their products to write reviews and rate it with five stars in exchange for a kickback of the price the customer paid. Another corrupted measure.

Of course, educational researchers also have documented the link between standardized test scores and narrowed instruction to prepare students for test items, instances of state policymakers fiddling with cut-off scores on tests, increased dropouts, and straight out cheating by a few administrators. (see Dan Koretz, Measuring Up).

What Donald Campbell had said in 1976 about “highly corruptible indicators” applies not only in education but also to many different institutions.

So why do good policy makers use bad indicators? The answer is that numbers are highly prized in the culture because they are easy to grasp and use in making decisions.The simpler the number–wins/losses, products sold, profits made, test scores– the easier to judge worth. When numbers have high stakes attached to them, they then become incentives (either as a carrot or a stick) to make the numbers look good. And that is where  indicators turn bad as sour milk whose expiration date has long passed.

The best policymakers, not merely good ones, know that multiple measures for a worthy goal reduce the possibility of reporting false performance.


*Steven Glazerman and Liz Potamites, False Performance Gains: A Critique of Successive Cohort Indicators,” Working Paper, Mathematica Policy Research, December 2011, p. 13.

“After 10 years of NCLB, we should have seen dramatic progress on the National Assessment of Educational Progress, but we have not. By now, we should be able to point to sharp reductions of the achievement gaps between children of different racial and ethnic groups and children from different income groups, but we cannot. As I said in a recent speech, many children continue to be left behind, and we know who those children are: They are the same children who were left behind 10 years ago.”


“Congress, in its wisdom, will eventually reauthorize the Elementary and Secondary Education Act. I hope that in doing so, they recognize the negative consequences of NCLB and abandon the strategies that have borne such bitter fruit for our nation’s education system. NCLB cannot be fixed. It has failed. It has imposed a sterile and mean-spirited regime on the schools. It represents the dead hand of conformity and regulation from afar. It is time to abandon the status quo of test-based accountability and seek fresh and innovative thinking to support and strengthen our nation’s schools.”

One of the strongest pieces “celebrating” NCLB’s decade, and it has had extensive national coverage, is from the folks at FairTest: NCLB’s Lost Decade for Educational Progress: What Can We Learn from this Policy Failure? by Lisa Guisbond with Monty Neill and Bob Schaeffer. Monty has been a friend almost as long as Bracey. We crossed paths when he came to Virginia’s state board of ed meetings to caution against their early minimum competnecy insanity. Even back in those naive days of mine working for the state DOE what he had to say made a lot of sense, and marked the beginning of my typically fruitless battle against the insidious destructiveness of minimum competency testing.

The report, introduced at http://www.fairtest.org/NCLB-lost-decade-report-home, argues that

  • NCLB failed to significantly increase average academic performance or to significantly narrow achievement gaps, as measured by the NAEP. U.S. students made greater gains before NCLB became law than after it was implemented.  
  • NCLB damaged educational quality and equity by narrowing the curriculum in many schools and focusing attention on the limited skills standardized tests measure. These negative effects fell most severely on classrooms serving low-income and minority children.
  • So-called “reforms” to NCLB, such as “Race to the Top,” Obama Administration waivers and the Senate’s Education Committee’s Elementary and Secondary Education Act (ESEA) reauthorization bill, fail to address many of the law’s fundamental flaws and in some cases intensify them.

Other than that  – – – –

Nine Myths about Public Schools, by Gerald W. Bracey

Those of you who have been around me for more than five or ten minutes know that one of my favorite people in the world was Jerry Bracey, who I knew for nigh on 30 years. For those of you unfamiliar with his work, he was the outspoken critic of poorly done educational research and the endless misinformation spewed forth by detractors of public education. A strong supporter of public education, he felt a duty to take to task both pundits and public figures when they wrote or spoke in error or misinterpreted data. He had no fear taking to task the powerful, and if, with his Stanford psychology Ph.D., he had chosen to keep his mouth shut he could have been the consummate public official or university professor. Instead, he chose to speak truth as he saw it. Education writers, such as the Post’s Jay Mathews and our AJC’s Maureen Downey respected and liked him even though, and perhaps because, he never hesitated to skewer them when he though it justified. Perhaps his best known writing appeared in the Kappan as his monthly research column and the annual Rotten Apples awards.

This piece seems germane to the blog given the still rampant misinformation and misdirection flying around about education during an election year, and likely during the legislative session. I do have a certain fondness for this one as it’s the only time he ever asked me for an edit before he published a piece (given my composition skillset falls far short of what his was). Jerry died less than a month after he wrote this. Probability approximates unity that I’ll suffer you more of his work from time to time. 

Nine Myths about Public Schools


September 25, 2009

None of this will likely strike you as particularly new, but it might be good to have a bunch of myths lined up and debunked all in one place.

1.  The schools were to blame for letting the Russians get into space first. Granddaddy of all slanders and a great illustration of the absolute nuttiness with which people talk about education.

Sputnik, the first man-made satellite to orbit the earth, launched on October 4, 1957. On September 20, 1956, Werner von Braun’s Army Ballistic Missile Agency launched a 4-stage Jupiter C rocket from Cape Canaveral. After the first 3 stages fired, the rocket was 832 miles in the air and traveling at 13,000 miles an hour. The 4th stage could have easily bumped something into orbit. The 4th stage was filled with sand. There were a number of reasons for this including the fact that the Eisenhower administration was determined to keep its weapons rocket program and its space exploration project separate and von Braun’s rocket was clearly a weapon. Its primary intent was to incinerate Russian cities with nuclear warheads. Ike worried how the Russians might react. His Assistant Defense Secretary Donald Quarles actually said “the Russians did us a favor” because they established the precedent that deep space was free and international.

Most US engineers in the space program in 1957 would have graduated high school in the 1930s, but in the media, the schools of the 1950s took the hit for Sputnik. Ike was quite puzzled by this.

2.  Schools alone can close the achievement gap. This is codified in the disaster known as No Child Left Behind. Most of the differences come from family and community variables and many out-of-school factors, especially summer loss. Some studies have found that poor children enter school behind their middle class peers, learn as much during the year and then lose it over the summer. They fall farther and farther behind and schools are blamed. Middle class and affluent kids do not show summer loss.

3.  Money doesn’t matter. Tell this to wealthy districts. Money clearly affects changes in achievement although levels of achievement are more influenced by the variables just mentioned. Most studies are short term and look only at test scores, a very foolish mistake. Economists David Card and Alan Krueger also found investments in school show a payoff in terms of long-term earnings of graduates.

4.  The United States is losing its competitive edge. China and India ARE Rising. As economies collapsed all around it, China’s economy grew a remarkable 7% last year. On just humanitarian grounds, we should not wish China and India to remain poor forever, but the more they grow the more money they have to buy stuff from us. As China and India prosper, we prosper. The World Economic Forum and the Institute for Management Development have consistently ranked the U. S. economy as the most competitive in the world. Education is only one part of multi-factor systems in rankings. WEF is especially keen on innovation. Our obsession with testing makes testing a great instrument for destroying creativity.

5.  The U. S. has a shortage of scientists, mathematicians and engineers. This was a myth started oddly enough by the National Science Foundation in the 1980s in a study with assumptions so absurd the study was never published, but the myth lingers on. In fact, Hal Salzman of the Urban Institute and Lindsay Lowell of Georgetown University found that we have three newly minted scientists and engineers who are permanent residents or native citizens for every newly minted job. Within 2 years, 65% of them were no longer in scientific or engineering fields. That proportion might have fallen during the current debacle when people are more likely to hang on to a job even if they hate it. An article in the September 18 Wall Street Journal reported that before the economy collapsed, 30% of the graduates of MIT–MIT–headed directly into finance.

6.  Merit pay for teachers will improve performance. Bebchuk & Fried Pay Without Performance. Adams, Heywood & Rothstein, Teachers, Performance Pay, and Accountability. Bonus pay is concentrated in finance, insurance, and real estate. In most of private sector hard to determine and often leads to corruption and gaming the system. Campbell’s Law: “The more any quantitative social indicator is used for social decision making, the more subject it will be to corruption pressures and the more apt it will be to distort the social processes it is intended to monitor.”

 7.  The fastest growing jobs are all high-tech and require postsecondary education. “Postsecondary education” is a weasel word. A majority of the fastest growing jobs do, in fact, require some kind of postsecondary training. But, according to the Bureau of Labor Statistics, they account for very few jobs. It’s the Walmarts and Macdonald’s of America that generate the jobs. According to the BLS, the job of retail sales accounts for more jobs than the top ten fastest growing jobs combined.

8.  Test scores are related to economic competitiveness. We do well on international comparisons of reading, pretty good on one international comparison of math and science, and not so good on another math/science comparison. But these comparisons are based on the countries’ average scores and average scores don’t mean much. The Organization for Economic Cooperating and Development, the producer of the math science comparison in which we do worst has pointed out that in science the U. S. has 25% of all the highest scoring students in the entire world, at least the world as defined by the 60 countries that participate in the tests. Finland might have the highest scores, but that only gives them 2,000 warm bodies compared to the U. S. figure of 67,000. It’s the high scorers who are most likely to become leaders and innovators. Only four nations have a higher proportion of researchers per 1000 fulltime employees, Sweden, Finland, New Zealand and Japan. Only Finland is much above the U. S.

Consider Japan, the economic juggernaut of the 1980’s. It kids score well on tests and people made a causal link between scores and Japan’s economy. But Japan’s economy has been in the doldrums for almost a whole generation. Its kids still ace tests.

9.  Education itself produces jobs. President Obama and Secretary of Education Duncan have both linked any economic recovery to school improvement. This is nonsense. There are parts of India where thousands of educated people compete for a single relatively low-level white-collar job. Some of you might recall that in the 1970’s many sociologists and commentators worried that America was becoming TOO educated, that they would be bored by the work available.

Why educators (and others) behaved badly under NCLB

Maureen Downey (Get Schooled, www.ajc.com) was kind enough to post my ramblings on the Milgram research of the 60’s. I’d been bothered for a long time at the self-righteous respoonses of so many posters on Maureen’s various reports on the educators accused of cheating at the two investigated school sytems in the state. Certainly those who transgressed will get their just due, but Milgram (and many others who replicated that research) showed quite conclusively that most of us when pressured by authority will in fact follow orders.

Here’s the piece:

This inferential statistician asks a probability question: Who among you think that two school systems in Georgia were the only ones in the nation that engaged in unauthorized test data manipulation (“cheating”) under NCLB?

I have watched the Georgia events unfold since questions arose about test results more than a decade ago. This saga has reminded me frequently of Stanley Milgram’s research in the 1960’s. An overview is at http://psychology.about.com/od/historyofpsychology/a/milgram.htm.  Milgram wondered whether Adolf Eichmann could have “just” followed orders as he testified during his trial. In Milgram’s studies, participants readily administered what they were told were potentially lethal electric shocks to others after simply being told to do so. (The “recipients” actually just acted as if they received shock.) Numerous other studies have confirmed Milgram’s findings (a review of them was published by Thomas Blass in 1999). In his 1974 book Obedience to Authority, Milgram asked, “Could it be that Eichmann and his million accomplices in the Holocaust were just following orders? Could we call them all accomplices?” Generalizing his findings beyond questions about the Holocaust, he concluded that “ordinary people, simply doing their jobs, and without any particular hostility on their part, can become agents in a terrible destructive process. Moreover, even when the destructive effects of their work become patently clear, and they are asked to carry out actions incompatible with fundamental standards of morality, relatively few people have the resources needed to resist authority.”

Of course, the Holocaust was infinitely worse than any amount of student test results manipulation, yet if the Milgram study illustrates how readily so many will shock others, and if the Holocaust illustrates how readily so many will send others to their deaths, it’s not at all difficult to imagine that some educators might manipulate test scores if pressured by higher authorities. That’s not to say manipulating test scores (or shocking participants in an experiment) is excusable; it’s simply to suggest that current national accountability policy creates an environment in which we should not be surprised that some people behaved badly.  Perhaps we should be surprised, pleased, and perhaps even awed that the vast majority remained steadfast to their core educational beliefs and focused on doing what they knew was best for their students.

Given we’re so incessantly disposed to finger pointing, who in relation to NCLB would you choose as the equivalent to Hitler and Eichmann? Far more importantly, how might you suggest the-beatings-will-continue-until-morale-improves-prone policymakers rethink education policy so that we might begin making public education better rather than continuing to tear it apart? Will “Race to the Top” correct the mistakes of NCLB or is it just working around the edges of the same underlying approach?

I find this lesson from Milgram’s later work of interest: When a peer, told privately to refuse to administer high shock, was “planted” in the room, almost all of the participants also refused to administer high shock. Unfortunately, teachers who objected to cheating or refused to cheat were frequently threatened, punished or fired, and others learned that lesson. Perhaps if teachers were treated as respected professionals rather than as serfs (and scapegoats), they might have been heard when they spoke and we never would have had the sad tragedy of Georgia’s cheating scandals.  But then if teachers were treated as respected professionals, perhaps we would never have had the inexcusable travesty of NCLB in the first place.

