Shame, errors and demoralizing, just some of the emerging rhetoric being used since the NYT and other publications went ahead and published teacher level value add scores. A great number of articles have been written decrying the move.
Perhaps most surprising of all was Bill Gates, in a piece titled "Shame Is Not the Solution". In it, Gates argues
Putting sophisticated personnel systems in place is going to take a serious commitment. Those who believe we can do it on the cheap — by doing things like making individual teachers’ performance reports public — are underestimating the level of resources needed to spur real improvement.
[...]
Developing a systematic way to help teachers get better is the most powerful idea in education today. The surest way to weaken it is to twist it into a capricious exercise in public shaming. Let’s focus on creating a personnel system that truly helps teachers improve.
Following that, Matthew Di Carlo at the Shanker institute took a deeper look at the data and the error margins inherent in using it
[...]
This can be illustrated by taking a look at the categories that the city (and the Journal) uses to label teachers (or, in the case of the Times, schools).
Here’s how teachers are rated: low (0-4th percentile); below average (5-24); average (25-74); above average (75-94); and high (95-99).
To understand the rocky relationship between value-added margins of error and these categories, first take a look at the Times’ “sample graph” below.
That level of error in each measurement renders the teacher grades virtually useless. But that was just the start of the problems, as David Cohen notes in a piece titled "Big Apple’s Rotten Ratings".
First of all, as I’ve repeated every chance I get, the three leading professional organizations for educational research and measurement (AERA, NCME, APA) agree that you cannot draw valid inferences about teaching from a test that was designed and validated to measure learning; they are not the same thing. No one using value-added measurement EVER has an answer for that.
Then, I thought of a set of objections that had already been articulated on DiCarlo’s blog by a commenter. Harris Zwerling called for answers to the following questions if we’re to believe in value-added ratings:
1. Does the VAM used to calculate the results plausibly meet its required assumptions? Did the contractor test this? (See Harris, Sass, and Semykina, “Value-Added Models and the Measurement of Teacher Productivity” Calder Working Paper No. 54.)
2. Was the VAM properly specified? (e.g., Did the VAM control for summer learning, tutoring, test for various interactions, e.g., between class size and behavioral disabilities?)
3. What specification tests were performed? How did they affect the categorization of teachers as effective or ineffective?
4. How was missing data handled?
5. How did the contractors handle team teaching or other forms of joint teaching for the purposes of attributing the test score results?
6. Did they use appropriate statistical methods to analyze the test scores? (For example, did the VAM provider use regression techniques if the math and reading tests were not plausibly scored at an interval level?)
7. When referring back to the original tests, particularly ELA, does the range of teacher effects detected cover an educationally meaningful range of test performance?
8. To what degree would the test results differ if different outcome tests were used?
9. Did the VAM provider test for sorting bias?
Today, education historian Diane Ravitch published a piece titled "How to Demoralize Teachers", which draws all these problems together to highlight how counter productive the effort is becoming
[...]
Interesting that teaching is the only profession where job ratings, no matter how inaccurate, are published in the news media. Will we soon see similar evaluations of police officers and firefighters, legislators and reporters? Interesting, too, that no other nation does this to its teachers. Of course, when teachers are graded on a curve, 50 percent will be in the bottom half, and 25 percent in the bottom quartile.
Is this just another ploy to undermine public confidence in public education?
It's hard to conclude that for some, that might very well be the goal.