Corporate education reform science fiction, is having an unintended(?) science fact effect.

If VAM scores are at all accurate, there ought to be a significant correlation between a teacher's score one year compared to the next. In other words, good teachers should have somewhat consistently higher scores, and poor teachers ought to remain poor. He created a scatter plot that put the ratings from 2009 on one axis, and the ratings from 2010 on the other axis. What should we expect here? If there is a correlation, we should see some sort of upward sloping line.

There is one huge takeway from all this. VAM ratings are not an accurate reflection of a teacher's performance, even on the narrow indicators on which they focus. If an indicator is unreliable, it is a farce to call it "objective."

This travesty has the effect of discrediting the whole idea of using test score data to drive reform. What does it say about "reformers" when they are willing to base a large part of teacher and principal evaluations on such an indicator?

That travesty is now manifesting itself in real personal terms.

In 2009, 96 percent of their fifth graders were proficient in English, 89 percent in math. When the New York City Education Department released its numerical ratings recently, it seemed a sure bet that the P.S. 146 teachers would be at the very top.

Actually, they were near the very bottom.
[...]
Though 89 percent of P.S. 146 fifth graders were rated proficient in math in 2009, the year before, as fourth graders, 97 percent were rated as proficient. This resulted in the worst thing that can happen to a teacher in America today: negative value was added.

The difference between 89 percent and 97 percent proficiency at P.S. 146 is the result of three children scoring a 2 out of 4 instead of a 3 out of 4.

While Ms. Allanbrook does not believe in lots of test prep, her fourth-grade teachers do more of it than the rest of the school.

In New York City, fourth-grade test results can determine where a child will go to middle school. Fifth-grade scores have never mattered much, so teachers have been free to focus on project-based learning. While that may be good for a child’s intellectual development, it is hard on a teacher’s value-added score.

These teachers are not the only ones.

Bill Turque tells the story of teacher Sarah Wysocki, who was let go by D.C. public schools because her students got low standardized test scores, even though she received stellar personal evaluations as a teacher.

She was evaluated under the the D.C. teacher evaluation system, called IMPACT, a so-called “value-added” method of assessing teachers that uses complicated mathematical formulas that purport to tell how much “value” a teacher adds to how much a student learns.

As more data is demanded, more analysis can be done to demonstrate how unreliable it is for these purposes, and consequently we are guaranteed to read more stories of good teachers becoming victims of bad measurements. It's unfortunate we're going to have to go through all this to arrive at this understanding.