It might be becoming apparent to any rational observer that high stakes corporate education policies are failing catastrophically. Where once various data and tests were used to inform educators and provide diagnostic feedback, they are increasingly being used to rank, grade, and even punish.
This is leading to the inevitable behaviors that are always present when such systems are created - whether it was in the world of energy companies such as Enron, or other accounting scandals including those affecting Tyco International, Adelphia, Peregrine Systems and WorldCom, to the more recent scandals involving Lehman Brothers, JPM or Barclays bank.
Here's another example, in news from Pennsylvania
After authorities imposed unprecedented security measures on the 2012 statewide exams, test scores tumbled across Pennsylvania, The Inquirer has learned.
At some schools, Pennsylvania Secretary of Education Ronald Tomalis said, the drops are "noticeable" - 25 percent or more.
In some school systems, investigators have found evidence of outright doctoring of previous years' tests - and systemic fraud that took place across multiple grades and subjects.
In Philadelphia and elsewhere, some educators have already confessed to cheating, and investigators have found violations ranging from "overcoaching" to pausing a test to reteach material covered in the exam, according to people familiar with the investigations.
When trillions of dollars of the world's money is at stake, investing in tight oversight and regulation is imperative, but when it comes to evaluating the progress of a 3rd grader, do we really want to spend valuable education dollars measuring the measurers?
The question becomes even more pertinent when one considers that the the efficacy of many of the measures is questionable at best. Article after article, study after study, places significant questions at the feet of value add proponents, and now a new study even places questions at the feet of the tests themselves
Now, in studies that threaten to shake the foundation of high-stakes test-based accountability, Mr. Stroup and two other researchers said they believe they have found the reason: a glitch embedded in the DNA of the state exams that, as a result of a statistical method used to assemble them, suggests they are virtually useless at measuring the effects of classroom instruction.
Pearson, which has a five-year, $468 million contract to create the state’s tests through 2015, uses “item response theory” to devise standardized exams, as other testing companies do. Using I.R.T., developers select questions based on a model that correlates students’ ability with the probability that they will get a question right.
That produces a test that Mr. Stroup said is more sensitive to how it ranks students than to measuring what they have learned. That design flaw also explains why Richardson students’ scores on the previous year’s TAKS test were a better predictor of performance on the next year’s TAKS test than the benchmark exams were, he said. The benchmark exams were developed by the district, the TAKS by the testing company.
We have built a high stakes system on questionable tests, measured using questionable statistical models, subject to gaming and cheating, and further goosed by the scrubbing of other student data. We've seen widespread evidence of it in New York, California, Washington DC, Georgia, Tennessee, Pennsylvania, and now Ohio.
Policymakers are either going to have to spend more and more money developing better tests, better models, tighter security and more bureaucratic data handling policies, or return to thinking about the core mission of providing a quality education to all students. Either way, when you have reached the point where the State Superintendent talks of criminalizing the corporate education system, things have obviously gone seriously awry.
State Superintendent Stan Heffner, who leads the department, has launched his own investigation and has said the probe could lead to criminal charges against educators who committed fraud.