Dublin teacher, Kevin Griffin, brings to our attention this graph, which he describes thusly

The chart plots the Value-Added scores of teachers who teach the same subject to two different grade levels in the same school year. (ex. Ms. Smith teaches 7th Math and 8th Math, and Mr. Richards 4th Grade Reading and 5th Grade Reading.) The X-axis represents the teachers VA score for one grade level and the Y-axis represents the VA score from the other grade level taught.

If the theory behind evaluating teachers based on value-added is valid then a “great” 7th grade math teacher should also be a “great” 8th grade math teacher (upper right corner) and a “bad” 7th grade math teacher should also be a “bad” 8th grade math teacher (lower left corner). There should, in theory, be a straight line (or at least close) showing a direct correlation between 7th grade VA scores and 8th grade VA scores since those students, despite being a grade apart, have the same teacher.

Here's the graph

Looks morel ike a random number generator to us. Would you like your career to hinge on a random number generator?

Corporate education reform science fiction, is having an unintended(?) science fact effect.

First the science

If VAM scores are at all accurate, there ought to be a significant correlation between a teacher's score one year compared to the next. In other words, good teachers should have somewhat consistently higher scores, and poor teachers ought to remain poor. He created a scatter plot that put the ratings from 2009 on one axis, and the ratings from 2010 on the other axis. What should we expect here? If there is a correlation, we should see some sort of upward sloping line.

There is one huge takeway from all this. VAM ratings are not an accurate reflection of a teacher's performance, even on the narrow indicators on which they focus. If an indicator is unreliable, it is a farce to call it "objective."

This travesty has the effect of discrediting the whole idea of using test score data to drive reform. What does it say about "reformers" when they are willing to base a large part of teacher and principal evaluations on such an indicator?

That travesty is now manifesting itself in real personal terms.

In 2009, 96 percent of their fifth graders were proficient in English, 89 percent in math. When the New York City Education Department released its numerical ratings recently, it seemed a sure bet that the P.S. 146 teachers would be at the very top.

Actually, they were near the very bottom.
[...]
Though 89 percent of P.S. 146 fifth graders were rated proficient in math in 2009, the year before, as fourth graders, 97 percent were rated as proficient. This resulted in the worst thing that can happen to a teacher in America today: negative value was added.

The difference between 89 percent and 97 percent proficiency at P.S. 146 is the result of three children scoring a 2 out of 4 instead of a 3 out of 4.

While Ms. Allanbrook does not believe in lots of test prep, her fourth-grade teachers do more of it than the rest of the school.

In New York City, fourth-grade test results can determine where a child will go to middle school. Fifth-grade scores have never mattered much, so teachers have been free to focus on project-based learning. While that may be good for a child’s intellectual development, it is hard on a teacher’s value-added score.

These teachers are not the only ones.

Bill Turque tells the story of teacher Sarah Wysocki, who was let go by D.C. public schools because her students got low standardized test scores, even though she received stellar personal evaluations as a teacher.

She was evaluated under the the D.C. teacher evaluation system, called IMPACT, a so-called “value-added” method of assessing teachers that uses complicated mathematical formulas that purport to tell how much “value” a teacher adds to how much a student learns.

As more data is demanded, more analysis can be done to demonstrate how unreliable it is for these purposes, and consequently we are guaranteed to read more stories of good teachers becoming victims of bad measurements. It's unfortunate we're going to have to go through all this to arrive at this understanding.

Correlation? What correlation?

Science Fact