measuring

High stakes failure

It might be becoming apparent to any rational observer that high stakes corporate education policies are failing catastrophically. Where once various data and tests were used to inform educators and provide diagnostic feedback, they are increasingly being used to rank, grade, and even punish.

This is leading to the inevitable behaviors that are always present when such systems are created - whether it was in the world of energy companies such as Enron, or other accounting scandals including those affecting Tyco International, Adelphia, Peregrine Systems and WorldCom, to the more recent scandals involving Lehman Brothers, JPM or Barclays bank.

Here's another example, in news from Pennsylvania

After authorities imposed unprecedented security measures on the 2012 statewide exams, test scores tumbled across Pennsylvania, The Inquirer has learned.

At some schools, Pennsylvania Secretary of Education Ronald Tomalis said, the drops are "noticeable" - 25 percent or more.

In some school systems, investigators have found evidence of outright doctoring of previous years' tests - and systemic fraud that took place across multiple grades and subjects.

In Philadelphia and elsewhere, some educators have already confessed to cheating, and investigators have found violations ranging from "overcoaching" to pausing a test to reteach material covered in the exam, according to people familiar with the investigations.

When trillions of dollars of the world's money is at stake, investing in tight oversight and regulation is imperative, but when it comes to evaluating the progress of a 3rd grader, do we really want to spend valuable education dollars measuring the measurers?

The question becomes even more pertinent when one considers that the the efficacy of many of the measures is questionable at best. Article after article, study after study, places significant questions at the feet of value add proponents, and now a new study even places questions at the feet of the tests themselves

Now, in studies that threaten to shake the foundation of high-stakes test-based accountability, Mr. Stroup and two other researchers said they believe they have found the reason: a glitch embedded in the DNA of the state exams that, as a result of a statistical method used to assemble them, suggests they are virtually useless at measuring the effects of classroom instruction.

Pearson, which has a five-year, $468 million contract to create the state’s tests through 2015, uses “item response theory” to devise standardized exams, as other testing companies do. Using I.R.T., developers select questions based on a model that correlates students’ ability with the probability that they will get a question right.

That produces a test that Mr. Stroup said is more sensitive to how it ranks students than to measuring what they have learned. That design flaw also explains why Richardson students’ scores on the previous year’s TAKS test were a better predictor of performance on the next year’s TAKS test than the benchmark exams were, he said. The benchmark exams were developed by the district, the TAKS by the testing company.

We have built a high stakes system on questionable tests, measured using questionable statistical models, subject to gaming and cheating, and further goosed by the scrubbing of other student data. We've seen widespread evidence of it in New York, California, Washington DC, Georgia, Tennessee, Pennsylvania, and now Ohio.

Policymakers are either going to have to spend more and more money developing better tests, better models, tighter security and more bureaucratic data handling policies, or return to thinking about the core mission of providing a quality education to all students. Either way, when you have reached the point where the State Superintendent talks of criminalizing the corporate education system, things have obviously gone seriously awry.

State Superintendent Stan Heffner, who leads the department, has launched his own investigation and has said the probe could lead to criminal charges against educators who committed fraud.

National Research Council Gives High-Stakes Testing an F

The long experiment with incentives and test-based accountability has so far failed to boost student achievement.

That’s the conclusion of a comprehensive examination of education research by the National Research Council , an arm of the National Academies of Science.

“The available evidence does not give strong support for the use of test-based incentives to improve education,” the NRC concluded. The benefits of these incentives, the group said, have been “small or nonexistent.”

The NRC report is the latest of a long series of research summaries by eminent, mainstream test experts concluding that there is no scientific basis for the current heavy reliance on high-stakes tests for measuring student achievement, teacher quality, and school performance.

The full report can be read here.

[readon2 url="http://neatoday.org/2011/07/18/national-research-council-gives-high-stakes-testing-an-f/"]Continue reading...[/readon2]