Tracing the test-cheating scandal back to its roots

May 9, 2012

GAERA "Reform", Assessment Issues, Cheating, Politics, Teacher Evaluation Leave a comment

I thought Jay’s piece in this morning’s print version was superb. I’ve been ranting about these issues for several decades yet on deaf ears,and perhaps his nicely done words will also fail. However, I think the tide is turning, albeit slowly – even with the scandals (with inescapably more to come), we as a whole are slowly recognizing that “it’s the leadership, stupid.”

There are teachers, as I was reminded today, who were single mothers facing no other option but welfare should they lose their jobs. Their choice was to put their babies at risk or erase bubbles on answer sheets under the orders of superiors. What would you do? I already know. See my piece here of January 8 on the Milgram research.

I don’t think he’s entirely correct (the elders among us will remember our frustration with automobile quality control), as Errol Davis has noted, education is the only profession in which the blame is placed on the workers.

My favorite turn of phrase from Jay, of many: ” . . . we place a burden on testing that it is too fragile to bear.”

Disparate thoughts, but all touched in Jays far more coherent piece below.

Jer

Tracing the test-cheating scandal back to its roots

10:20 am May 9, 2012, by Jay Bookman, Atlanta Journal Constitution

http://blogs.ajc.com/jay-bookman-blog/2012/05/09/tracing-the-test-cheating-scandal-back-to-its-roots/

For weeks, teachers and administrators implicated in the Atlanta Public Schools cheating scandal have been appearing one by one in front of a tribunal, telling their stories in hopes that they will be allowed to retain their jobs and careers.

The process — guaranteed to them by law — is meant to ensure that if fired and stripped of the right to teach, they will be fired and decertified for good cause and after they have had the chance to defend themselves. Frustrating as it might be to some who want a quicker, cheaper resolution of the controversy, that’s important.

However, the public nature of the process and testimony has also produced an important side benefit: Taxpayers, parents and citizens in general are getting a more complete and in many ways more human picture of the internal culture of the Atlanta school system and how that culture contributed to the scandal. It is possible in at least some cases to sympathize with the individuals involved and the pressure they experienced, even if that sympathy does not mean excusing what they did.

In fact, while each educator implicated in the controversy has had a unique story to tell, in the end they leave me circling back to the same basic question:

Where was Beverly Hall?

Whatever mistakes were made by individual educators, the atmosphere of fear and casual corruption within the school system was Hall’s creation as longtime superintendent. The absence of safeguards and indeed the total lack of concern about potential cheating was Hall’s responsibility. The institution’s reluctance and even aggressive refusal to support district employees who knew something was wrong and who tried to protest is a direct consequence of her leadership style and priorities.

Hall has retired and left the district, and so far has played no role in the tribunal proceedings. And while investigations continue, there is no indication that she will be held officially accountable in any way.

In her rare public utterances, she has portrayed herself as a victim of employees who failed to do their duty, but in the end she failed them, not the other way around. In fact, Hall bears a significant degree of responsibility for every career that is being ended and every future that is being compromised.

However, it’s important not to leave the issue there, because in some ways Hall herself is as much a symptom as a cause. As AJC investigations have established, cheating on standardized tests has become a nationwide problem, with high-profile schools all over the country producing wildly implausible claims of improvement in student performance. Confronted with that evidence, public officials in too many cases have retreated into the same pattern of denial that has become familiar to Atlanta residents.

When the same problems occur on such a large scale, in so many different communities and school systems in more than 30 states, it is no longer possible to dismiss it as the actions of an unethical few, or of a corrupted bureaucracy here or there. Something deeper is driving the phenomenon.

There is no question that standardized tests are an essential diagnostic tool. They can tell us which students, teachers and schools are performing well and which require attention. But when we take it a step farther and use those same test results to dictate fates, we place a burden on testing that it is too fragile to bear. When that happens, the tests themselves become a form of cheating, a means of producing misleadingly easy answers to what are really hard questions.

It’s also deeply confusing. In recent years, education reform has been dominated by two themes that are directly contradictory yet are often espoused by the very same people. And that contradiction is almost never acknowledged.

Here in Georgia, for example, state leaders have insisted that standardized testing be used as the educational equivalent of an industrial quality-control system. They produce a standardized model, and the tests determine how closely students conform to that model as they come off the assembly line.

Yet at the same time, we are told, the one-size-fits-all public-school industrial model must be dynamited to make way for a more experimental, let-a-thousand-flowers-bloom approach to education via charter schools and even vouchers. There’s a fundamental incoherence between those two messages that leads me to suspect that we really don’t know what we’re doing, and in fact are using schools as a battlefield in a deeper social struggle that we do not wish to acknowledge.

– Jay Bookman

“Why do good policymakers use bad indicators?”

January 31, 2012

GAERA Assessment Issues, Cheating Leave a comment

Here we go again on testing – or at least the more general category of high stakes measures. This one by Larry Cuban. You know who Larry Cuban is – former high school social studies teacher (14 years, including in the District), seven years district superintendent in Arlington, VA, and professor emeritus of education at Stanford University for the past 20.

An interesting way for Cuban to close. Perhaps he was making a political argument in an attempt to reach the single measure aficionados. The research is pretty clear that the use of multiple measures – particularly the highly error-prone and inaccurate ones we have (mincomp tests and ratings – actually the greater problem is the consistency of the raters) – simply produces corruption of all of them.

“Why Do Good Policy Makers Use Bad Indicators?”*

Larry Cuban

January 29, 2012

http://larrycuban.wordpress.com/

Test scores are the coin of the educational realm in the U.S.. In No Child Left Behind, they are used to reward and punish districts, schools, and teachers for how well or poorly students score on state tests. In pursuit of federal dollars, The Race To The Top competition has shoved state after state into legislating that teacher evaluations include student test scores as part of judging teacher effectiveness.

Numbers glued to high stakes consequences, however, corrupt performance. Since the mid-1970s, social scientists have documented the untoward results of attaching high stakes to quantitative indicators not only for education but also across numerous institutions. They have pointed out that those who implement policies using specific quantitative measures will change their practices to insure better numbers.

The work of social scientist Donald T. Campbell and others about the perverse outcomes of incentives was available and known to many but went ignored. In Assessing the Impact of Planned Social Change, Campbell wrote:

“The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor” (p. 49).

Campbell drew instances of distorted behavior when police officials used clearance rates in solving crimes, the Soviets set numerical goals for farming and industry, and when the U.S military used “body counts” in Vietnam as evidence of winning the war.

That was nearly forty years ago. In the past decade, medical researchers have found similar patterns when health insurers and Medicare have used quantitative indicators to measure physician performance. For example, Medicare requires—as a quality measure—that doctors administer antibiotics to a pneumonia patient within six hours of arriving at the hospital. As one physician said: “The trouble is that doctors often cannot diagnose pneumonia that quickly. You have to talk to and examine the patient and wait for blood tests, chest X-rays and so on.” So what happens is that “more and more antibiotics are being used in emergency rooms today, despite all-too-evident dangers like antibiotic-resistant bacteria and antibiotic-associated infections.” He and other doctors also know that surgeons have been known to pick reasonably healthy patients for heart bypass operations and ignore elderly ones who have 3-5 chronic ailments to insure that results look good.

More examples.

TV stations charge for advertising on the basis of how many viewers they have during “sweep” months (November, February, May, and July). Nielsen company has boxes in two million homes (representative of the nation’s viewership) that register whether the TV is on and what families are watching during those months. They also have viewers fill out diaries. Nielsen assumes that what the station shows in those months represents programming for the entire year (see 2011-2012-Sweeps-Dates). Nope. What TV networks and cable companies do is that during those “sweeps” they program new shows, films, extravaganzas, and sports that will draw viewers so they can charge higher advertising rates. They game the system and corrupt the measure (see p. 80).

And just this week, ripped from the headlines of the daily paper, online vendors secretly ask purchasers of their products to write reviews and rate it with five stars in exchange for a kickback of the price the customer paid. Another corrupted measure.

Of course, educational researchers also have documented the link between standardized test scores and narrowed instruction to prepare students for test items, instances of state policymakers fiddling with cut-off scores on tests, increased dropouts, and straight out cheating by a few administrators. (see Dan Koretz, Measuring Up).

What Donald Campbell had said in 1976 about “highly corruptible indicators” applies not only in education but also to many different institutions.

So why do good policy makers use bad indicators? The answer is that numbers are highly prized in the culture because they are easy to grasp and use in making decisions.The simpler the number–wins/losses, products sold, profits made, test scores– the easier to judge worth. When numbers have high stakes attached to them, they then become incentives (either as a carrot or a stick) to make the numbers look good. And that is where indicators turn bad as sour milk whose expiration date has long passed.

The best policymakers, not merely good ones, know that multiple measures for a worthy goal reduce the possibility of reporting false performance.

——————————————————————————–

*Steven Glazerman and Liz Potamites, False Performance Gains: A Critique of Successive Cohort Indicators,” Working Paper, Mathematica Policy Research, December 2011, p. 13.

Nine Myths about Public Schools, by Gerald W. Bracey

January 9, 2012

GAERA Assessment Issues, Cheating, Federal Education Policies, Politics Leave a comment

Those of you who have been around me for more than five or ten minutes know that one of my favorite people in the world was Jerry Bracey, who I knew for nigh on 30 years. For those of you unfamiliar with his work, he was the outspoken critic of poorly done educational research and the endless misinformation spewed forth by detractors of public education. A strong supporter of public education, he felt a duty to take to task both pundits and public figures when they wrote or spoke in error or misinterpreted data. He had no fear taking to task the powerful, and if, with his Stanford psychology Ph.D., he had chosen to keep his mouth shut he could have been the consummate public official or university professor. Instead, he chose to speak truth as he saw it. Education writers, such as the Post’s Jay Mathews and our AJC’s Maureen Downey respected and liked him even though, and perhaps because, he never hesitated to skewer them when he though it justified. Perhaps his best known writing appeared in the Kappan as his monthly research column and the annual Rotten Apples awards.

This piece seems germane to the blog given the still rampant misinformation and misdirection flying around about education during an election year, and likely during the legislative session. I do have a certain fondness for this one as it’s the only time he ever asked me for an edit before he published a piece (given my composition skillset falls far short of what his was). Jerry died less than a month after he wrote this. Probability approximates unity that I’ll suffer you more of his work from time to time.

Nine Myths about Public Schools

http://www.huffingtonpost.com/gerald-bracey/nine-myths-about-public-s_b_298664.html

September 25, 2009

None of this will likely strike you as particularly new, but it might be good to have a bunch of myths lined up and debunked all in one place.

1. The schools were to blame for letting the Russians get into space first. Granddaddy of all slanders and a great illustration of the absolute nuttiness with which people talk about education.

Sputnik, the first man-made satellite to orbit the earth, launched on October 4, 1957. On September 20, 1956, Werner von Braun’s Army Ballistic Missile Agency launched a 4-stage Jupiter C rocket from Cape Canaveral. After the first 3 stages fired, the rocket was 832 miles in the air and traveling at 13,000 miles an hour. The 4th stage could have easily bumped something into orbit. The 4th stage was filled with sand. There were a number of reasons for this including the fact that the Eisenhower administration was determined to keep its weapons rocket program and its space exploration project separate and von Braun’s rocket was clearly a weapon. Its primary intent was to incinerate Russian cities with nuclear warheads. Ike worried how the Russians might react. His Assistant Defense Secretary Donald Quarles actually said “the Russians did us a favor” because they established the precedent that deep space was free and international.

Most US engineers in the space program in 1957 would have graduated high school in the 1930s, but in the media, the schools of the 1950s took the hit for Sputnik. Ike was quite puzzled by this.

2. Schools alone can close the achievement gap. This is codified in the disaster known as No Child Left Behind. Most of the differences come from family and community variables and many out-of-school factors, especially summer loss. Some studies have found that poor children enter school behind their middle class peers, learn as much during the year and then lose it over the summer. They fall farther and farther behind and schools are blamed. Middle class and affluent kids do not show summer loss.

3. Money doesn’t matter. Tell this to wealthy districts. Money clearly affects changes in achievement although levels of achievement are more influenced by the variables just mentioned. Most studies are short term and look only at test scores, a very foolish mistake. Economists David Card and Alan Krueger also found investments in school show a payoff in terms of long-term earnings of graduates.

4. The United States is losing its competitive edge. China and India ARE Rising. As economies collapsed all around it, China’s economy grew a remarkable 7% last year. On just humanitarian grounds, we should not wish China and India to remain poor forever, but the more they grow the more money they have to buy stuff from us. As China and India prosper, we prosper. The World Economic Forum and the Institute for Management Development have consistently ranked the U. S. economy as the most competitive in the world. Education is only one part of multi-factor systems in rankings. WEF is especially keen on innovation. Our obsession with testing makes testing a great instrument for destroying creativity.

5. The U. S. has a shortage of scientists, mathematicians and engineers. This was a myth started oddly enough by the National Science Foundation in the 1980s in a study with assumptions so absurd the study was never published, but the myth lingers on. In fact, Hal Salzman of the Urban Institute and Lindsay Lowell of Georgetown University found that we have three newly minted scientists and engineers who are permanent residents or native citizens for every newly minted job. Within 2 years, 65% of them were no longer in scientific or engineering fields. That proportion might have fallen during the current debacle when people are more likely to hang on to a job even if they hate it. An article in the September 18 Wall Street Journal reported that before the economy collapsed, 30% of the graduates of MIT–MIT–headed directly into finance.

6. Merit pay for teachers will improve performance. Bebchuk & Fried Pay Without Performance. Adams, Heywood & Rothstein, Teachers, Performance Pay, and Accountability. Bonus pay is concentrated in finance, insurance, and real estate. In most of private sector hard to determine and often leads to corruption and gaming the system. Campbell’s Law: “The more any quantitative social indicator is used for social decision making, the more subject it will be to corruption pressures and the more apt it will be to distort the social processes it is intended to monitor.”

7. The fastest growing jobs are all high-tech and require postsecondary education. “Postsecondary education” is a weasel word. A majority of the fastest growing jobs do, in fact, require some kind of postsecondary training. But, according to the Bureau of Labor Statistics, they account for very few jobs. It’s the Walmarts and Macdonald’s of America that generate the jobs. According to the BLS, the job of retail sales accounts for more jobs than the top ten fastest growing jobs combined.

8. Test scores are related to economic competitiveness. We do well on international comparisons of reading, pretty good on one international comparison of math and science, and not so good on another math/science comparison. But these comparisons are based on the countries’ average scores and average scores don’t mean much. The Organization for Economic Cooperating and Development, the producer of the math science comparison in which we do worst has pointed out that in science the U. S. has 25% of all the highest scoring students in the entire world, at least the world as defined by the 60 countries that participate in the tests. Finland might have the highest scores, but that only gives them 2,000 warm bodies compared to the U. S. figure of 67,000. It’s the high scorers who are most likely to become leaders and innovators. Only four nations have a higher proportion of researchers per 1000 fulltime employees, Sweden, Finland, New Zealand and Japan. Only Finland is much above the U. S.

Consider Japan, the economic juggernaut of the 1980’s. It kids score well on tests and people made a causal link between scores and Japan’s economy. But Japan’s economy has been in the doldrums for almost a whole generation. Its kids still ace tests.

9. Education itself produces jobs. President Obama and Secretary of Education Duncan have both linked any economic recovery to school improvement. This is nonsense. There are parts of India where thousands of educated people compete for a single relatively low-level white-collar job. Some of you might recall that in the 1970’s many sociologists and commentators worried that America was becoming TOO educated, that they would be bored by the work available.

Why educators (and others) behaved badly under NCLB

January 8, 2012

GAERA Cheating, Federal Education Policies Leave a comment

Maureen Downey (Get Schooled, www.ajc.com) was kind enough to post my ramblings on the Milgram research of the 60’s. I’d been bothered for a long time at the self-righteous respoonses of so many posters on Maureen’s various reports on the educators accused of cheating at the two investigated school sytems in the state. Certainly those who transgressed will get their just due, but Milgram (and many others who replicated that research) showed quite conclusively that most of us when pressured by authority will in fact follow orders.

Here’s the piece:

This inferential statistician asks a probability question: Who among you think that two school systems in Georgia were the only ones in the nation that engaged in unauthorized test data manipulation (“cheating”) under NCLB?

I have watched the Georgia events unfold since questions arose about test results more than a decade ago. This saga has reminded me frequently of Stanley Milgram’s research in the 1960’s. An overview is at http://psychology.about.com/od/historyofpsychology/a/milgram.htm. Milgram wondered whether Adolf Eichmann could have “just” followed orders as he testified during his trial. In Milgram’s studies, participants readily administered what they were told were potentially lethal electric shocks to others after simply being told to do so. (The “recipients” actually just acted as if they received shock.) Numerous other studies have confirmed Milgram’s findings (a review of them was published by Thomas Blass in 1999). In his 1974 book Obedience to Authority, Milgram asked, “Could it be that Eichmann and his million accomplices in the Holocaust were just following orders? Could we call them all accomplices?” Generalizing his findings beyond questions about the Holocaust, he concluded that “ordinary people, simply doing their jobs, and without any particular hostility on their part, can become agents in a terrible destructive process. Moreover, even when the destructive effects of their work become patently clear, and they are asked to carry out actions incompatible with fundamental standards of morality, relatively few people have the resources needed to resist authority.”

Of course, the Holocaust was infinitely worse than any amount of student test results manipulation, yet if the Milgram study illustrates how readily so many will shock others, and if the Holocaust illustrates how readily so many will send others to their deaths, it’s not at all difficult to imagine that some educators might manipulate test scores if pressured by higher authorities. That’s not to say manipulating test scores (or shocking participants in an experiment) is excusable; it’s simply to suggest that current national accountability policy creates an environment in which we should not be surprised that some people behaved badly. Perhaps we should be surprised, pleased, and perhaps even awed that the vast majority remained steadfast to their core educational beliefs and focused on doing what they knew was best for their students.

Given we’re so incessantly disposed to finger pointing, who in relation to NCLB would you choose as the equivalent to Hitler and Eichmann? Far more importantly, how might you suggest the-beatings-will-continue-until-morale-improves-prone policymakers rethink education policy so that we might begin making public education better rather than continuing to tear it apart? Will “Race to the Top” correct the mistakes of NCLB or is it just working around the edges of the same underlying approach?

I find this lesson from Milgram’s later work of interest: When a peer, told privately to refuse to administer high shock, was “planted” in the room, almost all of the participants also refused to administer high shock. Unfortunately, teachers who objected to cheating or refused to cheat were frequently threatened, punished or fired, and others learned that lesson. Perhaps if teachers were treated as respected professionals rather than as serfs (and scapegoats), they might have been heard when they spoke and we never would have had the sad tragedy of Georgia’s cheating scandals. But then if teachers were treated as respected professionals, perhaps we would never have had the inexcusable travesty of NCLB in the first place.

GAERA