models

Do Different Value-Added Models Tell Us the Same Things?

Via

Highlights

  • Statistical models that evaluate teachers based on growth in student achievement differ in how they account for student backgrounds, school, and classroom resources. They also differ by whether they compare teachers across a district (or state) or just within schools.
  • Statistical models that do not account for student background factors produce estimates of teacher quality that are highly correlated with estimates from value-added models that do control for student backgrounds, as long as each includes a measure of prior student achievement.
  • Even when correlations between models are high, different models will categorize many teachers differently.
  • Teachers of advantaged students benefit from models that do not control for student background factors, while teachers of disadvantaged students benefit from models that do.
  • The type of teacher comparisons, whether within or between schools, generally has a larger effect on teacher rankings than statistical adjustments for differences in student backgrounds across classrooms.

Introduction

There are good reasons for re-thinking teacher evaluation. As we know, evaluation systems in most school districts appear to be far from rigorous. A recent study showed that more than 99 percent of teachers in a number of districts were rated “satisfactory,” which does not comport with empirical evidence that teachers differ substantially from each other in terms of their effectiveness. Likewise, the ratings do not reflect the assessment of the teacher workforce by administrators, other teachers, or students.

Evaluation systems that fail to recognize the true differences that we know exist among teachers greatly hamper the ability of school leaders and policymakers to make informed decisions about such matters as which teachers to hire, what teachers to help, which teachers to promote, and which teachers to dismiss. Thus it is encouraging that policymakers are developing more rigorous evaluation systems, many of which are partly based on student test scores.

Yet while the idea of using student test scores for teacher evaluations may be conceptually appealing, there is no universally accepted methodology for translating student growth into a measure of teacher performance. In this brief, we review what is known about how measures that use student growth align with one another, and what that agreement or disagreement might mean for policy.

[readon2 url="http://www.carnegieknowledgenetwork.org/briefs/value-added/different-growth-models/"]Continue reading...[/readon2]

Do Value-Added Methods Level the Playing Field for Teachers?

Via

Highlights

  • Value-added measures partially level the playing field by controlling for many student characteristics. But if they don't fully adjust for all the factors that influence achievement and that consistently differ among classrooms, they may be distorted, or confounded (An estimate of a teacher’s effect is said to be confounded when her contribution cannot be separated from other factors outside of her control, namely the the students in her classroom.)
  • Simple value-added models that control for just a few tests scores (or only one score) and no other variables produce measures that underestimate teachers with low-achieving students and overestimate teachers with high-achieving students.
  • The evidence, while inconclusive, generally suggests that confounding is weak. But it would not be prudent to conclude that confounding is not a problem for all teachers. In particular, the evidence on comparing teachers across schools is limited.
  • Studies assess general patterns of confounding. They do not examine confounding for individual teachers, and they can't rule out the possibility that some teachers consistently teach students who are distinct enough to cause confounding.
  • Value-added models often control for variables such as average prior achievement for a classroom or school, but this practice could introduce errors into value-added estimates.
  • Confounding might lead school systems to draw erroneous conclusions about their teachers – conclusions that carry heavy costs to both teachers and society.

Introduction

Value-added models have caught the interest of policymakers because, unlike using student tests scores for other means of accountability, they purport to "level the playing field." That is, they supposedly reflect only a teacher's effectiveness, not whether she teaches high- or low-income students, for instance, or students in accelerated or standard classes. Yet many people are concerned that teacher effects from value-added measures will be sensitive to the characteristics of her students. More specifically, they believe that teachers of low-income, minority, or special education students will have lower value-added scores than equally effective teachers who are teaching students outside these populations. Other people worry that the opposite might be true - that some value-added models might cause teachers of low-income, minority, or special education students to have higher value-added scores than equally effective teachers who work with higher-achieving, less risky populations.

In this brief, we discuss what is and is not known about how well value-added measures level the playing field for teachers by controlling for student characteristics. We first discuss the results of empirical explorations. We then address outstanding questions and the challenges to answering them with empirical data. Finally, we discuss the implications of these findings for teacher evaluations and the actions that may be based on them.

[readon2 url="http://www.carnegieknowledgenetwork.org/briefs/value-added/level-playing-field/"]Continue reading...[/readon2]

New Research Uncovers Fresh Trouble for VAM Evaluations

As more and more schools implement various forms of Value-Added method (VAM) evaluation systems, we are learning some disturbing things about how reliable these methods are.

Education Week's Stephan Sawchuk, in "'Value-Added' Measures at Secondary Level Questioned," explains that value-added statistical modeling was once limited to analyzing large sets of data. These statistical models projected students' test score growth, based on their past performance, and thus estimated a growth target. But, now 30 states require teacher evaluations to use student performance, and that has expanded use of algorithms for high-stakes purposes. Value-added estimates are now being applied to secondary schools, even though the vast majority of research on their use has been limited to elementary schools.

Sawchuk reports on two major studies that should slow this rush to evaluate all teachers with experimental models. This month, Douglas Harris will be presenting "Bias of Public Sector Worker Performance Monitoring." It is based on a six years of Florida middle school data on 1.3 million math students.

Harris divides classes into three types, remedial, midlevel, and advanced. After controlling for tracking, he finds that between 30 to 70% of teachers would be placed in the wrong category by normative value-added models. Moreover, Harris discovers that teachers who taught more remedial classes tended to have lower value-added scores than teachers who taught mainly higher-level classes. "That phenomenon was not due to the best teachers' disproportionately teaching the more-rigorous classes, as is often asserted. Instead, the paper shows, even those teachers who taught courses at more than one level of rigor did better when their performance teaching the upper-level classes was compared against that from the lower-level classes."

[readon2 url="http://blogs.edweek.org/teachers/living-in-dialogue/2012/11/john_thompson_new_research_unc.html"]Continue reading...[/readon2]

Opportunity Knocks

Here's a new one for the ol' Reformy Thesaurus: the "Opportunity Culture" in education.

Sure sounds good, doesn't it? Who doesn't want our American kids to have more opportunities in life? Except--oops--this campaign, rolled out by Public Impact, is actually about opportunities for "teacherpreneurs" to make more money by teaching oversized classes--and of course, for school districts to seize that same opportunity to save money through "innovative" staffing models.

How did this exciting window of opportunity emerge? Public Impact explains:

Only 25 percent of classes are taught by excellent teachers. With an excellent teacher versus an average teacher, students make about an extra half-year of progress every year--closing achievement gaps fast, leaping ahead to become honors students, and surging forward like top international peers.

That's a whole lot of leaping and surging. Unfortunately, it's based on a faux statistic, sitting triumphantly on a pyramid of dubious research, prettied up with some post-modern infographics. Like other overhyped blah-blah of "reform"--the "three great teachers in a row" myth, for example, or nearly every "fact" in Waiting for Superman--it's a triumph of slick media slogans over substance. A quick look at the Opportunity Culture Advisory Team tells you what the real purpose of the OC is: cutting teachers, privatizing services, plugging charters and cultivating a little astroturf to cover the scars.

The Opportunity Culture's bold plan begins with a policy recommendation: Schools should be required by law to identify the top 25% of their teachers. Then, once that simple task is completed, OC suggests ten exciting new models for staffing schools, beginning with giving these excellent teachers a lot more students (plus a merit pay carrot) and ending with enlisting "accountable remote teachers down the street or across the nation" who would "provide live, but not in-person instruction while on-site teammates manage administrative duties and develop the whole child."

[readon2 url="http://blogs.edweek.org/teachers/teacher_in_a_strange_land/2012/05/heres_a_new_one_for.html"]Continue Reading...[/readon2]

Misconceptions and Realities about Teacher Evaluations

A letter, signed by 88 educational researchers from 16 universities was recently sent to the Mayor of Chicago regarding his plans to implement a teacher evaluation system. Because of some of the similarities of the Chicago plan to that of Ohio, we thought we would reprint the letter here.

In what follows, we draw on research to describe three significant concerns with this plan.

Concern #1: CPS is not ready to implement a teacher-evaluation system that is based on significant use of “student growth.” For Type I or Type II assessments, CPS must identify the assessments to be used, decide how to measure student growth on those assessments, and translate student growth into teacher-evaluation ratings. They must determine how certain student characteristics such as placement in special education, limited English-language proficiency, and residence in low-income households will be taken into consideration. They have to make sure that the necessary technology is available and usable, guarantee that they can correctly match teachers to their actual students, and determine that the tests are aligned to the new Common Core State Standards (CCSS).

In addition, teachers, principals, and other school administrators have to be trained on the use of student assessments for teacher evaluation. This training is on top of training already planned about CCSS and the Charlotte Danielson Framework for Teaching, used for the “teacher practice” part of evaluation.

For most teachers, a Type I or II assessment does not exist for their subject or grade level, so most teachers will need a Type III assessment. While work is being done nationally to develop what are commonly called assessments for “non-tested” subjects, this work is in its infancy. CPS must identify at least one Type III assessment for every grade and every subject, determine how student growth will be measured on these assessments, and translate the student growth from these different assessments into teacher-evaluation ratings in an equitable manner.

If CPS insists on implementing a teacher-evaluation system that incorporates student growth in September 2012, we can expect to see a widely flawed system that overwhelms principals and teachers and causes students to suffer.

Concern #2: Educational research and researchers strongly caution against teacher-evaluation approaches that use Value-Added Models (VAMs).

Chicago already uses a VAM statistical model to determine which schools are put on probation, closed, or turned around. For the new teacher-evaluation system, student growth on Type I or Type II assessments will be measured with VAMs or similar models. Yet, ten prominent researchers of assessment, teaching, and learning recently wrote an open letter that included some of the following concerns about using student test scores to evaluate educators[1]:

a. Value-added models (VAMs) of teacher effectiveness do not produce stable ratings of teachers. For example, different statistical models (all based on reasonable assumptions) can yield different effectiveness scores. [2] Researchers have found that how a teacher is rated changes from class to class, from year to year, and even from test to test. [3]

b. There is no evidence that evaluation systems that incorporate student test scores produce gains in student achievement. In order to determine if there is a relationship, researchers recommend small-scale pilot testing of such systems. Student test scores have not been found to be a strong predictor of the quality of teaching as measured by other instruments or approaches. [4]

c. Assessments designed to evaluate student learning are not necessarily valid for measuring teacher effectiveness or student learning growth. [5] Using them to measure the latter is akin to using a meter stick to weigh a person: you might be able to develop a formula that links height and weight, but there will be plenty of error in your calculations.

Concern #3: Students will be adversely affected by the implementation of this new teacher-evaluation system.

When a teacher’s livelihood is directly impacted by his or her students’ scores on an end-of-year examination, test scores take front and center. The nurturing relationship between teacher and student changes for the worse, including in the following ways:

a. With a focus on end-of-year testing, there inevitably will be a narrowing of the curriculum as teachers focus more on test preparation and skill-and-drill teaching. [6] Enrichment activities in the arts, music, civics, and other non-tested areas will diminish.

b. Teachers will subtly but surely be incentivized to avoid students with health issues, students with disabilities, students who are English Language Learners, or students suffering from emotional issues. Research has shown that no model yet developed can adequately account for all of these ongoing factors. [7]

c. The dynamic between students and teacher will change. Instead of “teacher and student versus the exam,” it will be “teacher versus students’ performance on the exam.”

d. Collaboration among teachers will be replaced by competition. With a “value-added” system, a 5th grade teacher has little incentive to make sure that his or her incoming students score well on the 4th grade exams, because incoming students with high scores would make his or her job more challenging.

e. When competition replaces collaboration, every student loses.

You can read the whole letter below.

Misconceptions and Realities about Teacher and Principal Evaluation