There is a growing body of research demonstrating that "Value-Added" measures (VAM) is simply unreliable as a stand-alone measure of teacher effectiveness. When the legislature inserted language into HB 555 with no hearings or public input (or news coverage, for that matter) to eliminate the possibility of using multiple measures of student performance for teachers with value-added scores, it moved in a direction utterly lacking in scientific evidence. The new language calls on teachers to be evaluated based on a methodology that, by its very design, cannot measure the true quality of the interaction between teacher and students in the classroom. This has serious implications for students and teachers alike.
The Governor is advocating for expanded use of student test scores not only for teacher evaluation, but also for decisions involving teacher hiring, layoffs and pay. There simply is no credible expert testimony that supports such a move. Value-added measures are influenced by far too many variables beyond the control of the teacher to be used in such high-stakes decisions.
In other parts of the country where similar evaluation systems have been implemented, stories of great teachers who were branded as ineffective because of aberrations in student test data abound. (See, for example, the story of New York City 8th grade math teacher Carolyn Abbott or Washington, DC, 5th grade teacher Sarah Wysocki.) This isn't just a theoretical policy debate. Decisions made by our elected officials have real human consequences.
What follows is an accurate summation of the current scientific knwoledge of the use of VAM in evaluations.
Value Added in Evaluation
Many policy makers are enthusiastic about using value added measures (VAM) for teacher evaluation. Many states have incorporated it into teacher evaluations. Its use, however, is problematic due to concerns about accuracy, fairness, and the incentives it would create for teachers that are potentially harmful for students.
VAM has serious limitations in determining teacher effectiveness
A teacher can be ranked in the top quartile one year and sink to the middle or even the bottom the next independent of any changes they made in their own instructional practice.
A paper written for the Carnegie Knowledge Network examining this issue cited a study that found that half of the teachers in the top fifth of performance remained there the following year while 20% of them fell to the lowest two quintiles. This defies reason – how could one fifth of teachers be identified as top performers in one year but among the worst in the next?
There are many reasons for this: VAM doesn’t account for school effect, students don’t grow at the linear pace assumed by the models, the students aren’t randomly assigned and VAM seems to be worse for teachers of students who have limited English proficiency. According to a RAND corporation study, VAM scores varied depending on what test was used.
Many Researchers Caution Against Use of VAM in Teacher Evaluations as a Sole Measure
The Brookings Institute supports use of VAM but cautions that the error ranges in measurement are so wide that one can’t make precise differentiation between levels of teacher effectiveness. The RAND study mentioned above also made a similar recommendation.
Jesse Rothstein of UC Berkeley found that non-random assignment of students caused the model to demonstrate a teacher caused student growth in the year prior to having them as students.
A synthesis of available research conducted by Marzano found that teachers account for only about 13 percent of the variance in student achievement.
Student variables (including home environment, student motivation, and prior knowledge) account for 80 percent of the variance. VAM does not necessarily isolate the teacher’s contribution to student achievement growth.
Erik Hanushek, whom the Ohio General Assembly relies on for policy advice, also gives caution to the over-reliance on value added for high stakes decisions with respect to teachers:
“The bigger set of issues, however, relates to the use of teacher value-added estimates in compensation, employment, promotion, or assignment decisions. The possibility of introducing performance pay based on value-added estimates motivates much of the prior analysis of the properties of these estimates, but movement in this direction has so far been limited.” “Despite the strength of the research findings, concerns about accuracy, fairness, and potential adverse effects of incentives based on a limited set of outcomes raise worries about the use of value added estimates in education personnel and policy decisions. Many of the possible drawbacks are related to the measurement and estimation issues discussed above, but there are also concerns about incentives to cheat, adopt teaching methods that teach narrowly to tests, and ignore non-tested subjects.”
And…
“Although researchers can mitigate the effects of sampling error on estimates of teacher quality, such error would inevitably lead some successful teachers to receive low ratings and some unsuccessful teachers to receive high ratings.”
And, finally, it may have an adverse effect on students:
“In terms of fairness, any failure to account for sorting on unobservable characteristics would potentially penalize teachers given … more difficult classrooms and reward teachers given … less difficult classrooms. This could discourage educationally beneficial decisions including the assignment of more difficult or disruptive students to higher quality teachers.”
Hanushek recommends that these problems could be mitigated by combining value-added with subjective observations. Hanushek’s paper may be found here.
HB 555 Magnifies the Problematic Nature of Over-reliance on VAM to Evaluate Teachers
HB 156 and SB 316 set forth the framework for the Ohio Teacher Evaluation System (OTES) in requiring that student achievement growth accounts for 50% of a teacher’s evaluation. The law mandated that VAM, when available, must be part of the student growth calculation but didn’t specify to what degree. The Ohio Department of Education, in creating the OTES framework mandated that student growth be calculated using multiple measures and that VAM, when available, must account for at least 10% of the the whole evaluation. Presumably ODE constructed the model in this way in recognition of the limitations of VAM as a primary determinant of teacher effectiveness.
HB 555 changes the framework to require that, if VAM is available for a teacher, it must be used in proportion to the amount they teach subjects covered by VAM in their schedule. In other words, a middle school math teacher who teaches an entire day of 7th and 8th grade math would have the 50% growth measure solely determined by VAM.
The OTES model has an imbedded bias to overvalue student growth. For instance, if a teacher with a poor student growth measure can be rated no greater than “Developing” (the second lowest category) no matter how their evaluator rated their classroom performance.(Fig below)
Because of the overvaluing of student growth in the OTES teacher rating matrix, HB 555 magnifies the random errors in VAM due to selection bias, non-school factors, the effect from other teachers and the school itself which are out of the teacher’s control. When VAM is fully 50% of a teacher’s evaluation and is overvalued so that it essentially trumps any teacher rating from subjective observations, the inevitable errors that occur will cause teachers to be unfairly rated in the lowest two categories putting them at risk for dismissal or being first in line to be laid off through reduction in force.
Simply put, we don’t believe that teachers should have an element of randomness determine career risk.
Using VAM to De-Select Teachers May Have Adverse School and Labor Market Effects
If teachers believe that their VAM score can cause them to lose their jobs, they will be much more likely to hoard information and teaching methods from their colleagues. They will also resist assignment of difficult students to their class, believing that the very students who need the most help may cause them to suffer adverse career consequences.
If teachers are being asked to assume a greater amount of career risk without a commensurate rise in pay, it is less than clear that there will be a willing pool of candidates waiting to fill positions of deselected teachers. This is especially problematic in the mathematics field, where there are already shortages of willing and qualified candidates. This situation will likely be exacerbated if teachers believe that the evaluation system is inherently unfair.
There will likely be an adverse effect on students as well. Schools and teachers will choose to narrow the curriculum and in-class instruction to only that which will be tested. Such narrowing of the curriculum will strip away the enjoyable aspects of school from students’ lives.
Alternatives to the Current System
This is not to say that there is no place for VAM in a comprehensive teacher evaluation system. There are alternatives to the current system in which VAM is a prominent part of the teacher evaluation but not a primary determinant of quality and leaving sufficient margin of error.
Several states have a student growth component that is lower than 50% - DC’s impact system (the prototypical model for OTES) has recently been revised to de-escalate the role of VAM in response to the concerns about its accuracy.
Teacher resistance to VAM is not monolithic – it’s much less likely they will resist OTES if VAM were a much lower component than the current mandated level. Furthermore, there is evidence that multiple observations and VAM can work in concert to successfully identify top performers as well as laggards.
Policy Recommendations
- Reverse the VAM requirement put forth in HB555
- Reduce the overall proportion of student growth required in the teacher evaluation
- Maintain flexibility to refine the evaluation system as needed – this is mostly new and unproven
- Systematically solicit and incorporate large scale teacher input – efforts in this area have been at best inadequate
Some Value Added Research Resources from ASCD:
Using Value-Added Measures to Evaluate Teachers
Use Caution with Value-Added Measures