Shame on the PD and NPR

June 17, 2013 in Article

When the Cleveland Plain Dealer and NPR decided to publish the names of 4,200 Ohio teachers and their value-added grades, their reasoning was specious and self-serving. Most of all, it is damaging to the teaching profession in Ohio.

Despite pointing out all the flaws, caveats, and controversies with the use of value-add as a means to evaluate teachers, both publications decided to go ahead and shame these 4,200 teacher anyway. The publication of teachers names and scores isn't new. It was first done by the LA Times, and was a factor in the suicide of one teacher. The LA Times findings and analysis was then discredited

The research on which the Los Angeles Times relied for its August 2010 teacher effectiveness reporting was demonstrably inadequate to support the published rankings. Using the same L.A. Unified School District data and the same methods as the Times, this study probes deeper and finds the earlier research to have serious weaknesses.

DUE DILIGENCE AND THE EVALUATION OF TEACHERS by National Education Policy Center

The Plain Dealer analysis is weaker than the LA Times, relying on just 2 years worth of data rather than 7. In fact, the Pleain Dealer and NPR stated they only published 4,200 teachers scores and not the 12,000 scores they had data for because most only had 1 years worth of data. A serious error as value-add is known to be highly unreliable and subject to massive variance.

Beyond the questionable statistical analysis, the publication of teachers names and value-added scores has been criticized by a great number of people, including corporate education reformer Bill Gates, in NYT op-ed titled "Shame Is Not the Solution"

LAST week, the New York State Court of Appeals ruled that teachers’ individual performance assessments could be made public. I have no opinion on the ruling as a matter of law, but as a harbinger of education policy in the United States, it is a big mistake.

I am a strong proponent of measuring teachers’ effectiveness, and my foundation works with many schools to help make sure that such evaluations improve the overall quality of teaching. But publicly ranking teachers by name will not help them get better at their jobs or improve student learning. On the contrary, it will make it a lot harder to implement teacher evaluation systems that work.

Gates isn't the only high profile corporate education reformer who is critical of such shaming, Wendy Knopp, CEO of Teach for America has also spoken out against the practice

Kopp is not shy about saying what she'd do differently as New York City schools chancellor. While the Bloomberg administration is fighting the United Federation of Teachers in court for the right to release to the news media individual teachers' "value added" ratings—an estimate of how effective a teacher is at improving his or her students' standardized test scores—Kopp says she finds the idea "baffling" and believes doing so would undermine trust among teachers and between teachers and administrators.

"The principals of very high performing schools would all say their No. 1 strategy is to build extraordinary teams," Kopp said. "I can't imagine it's a good organizational strategy to go publish the names of teachers and one data point about whether they are effective or not in the newspaper."

Indeed, if the editors of the Plain Dealer and NPR had read their own reporting, they would have realized the public release of this information was unsound, unfair and damaging. Let's look at the warning signs in their own reporting

...scores can vary from year to year.

Yet they relied upon only 1 years worth of data for much of their analysis, and just 2 for the teachers whose names they published.

...decided it was more important to provide information — even if flawed.

How can it be useful to the layperson to be provided with flawed information? Why would a newspaper knowingly publish flawed information?

...these scores are only a part of the criteria necessary for full and accurate evaluation of an individual teacher.

And yet they publish 4,200 teachers value-added scores based solely on value add, which at best makes up only 35% of a teachers evaluation. Lay people will not understand these scores are only a partial measurment of a teachers effectiveness, and a poor one at that.

...There are a lot of questions still about the particular formula Ohio.

Indeed, so many questions that one would best be advised to wait until those questions are answered before publically shaming teachers who were part of a pilot program being used to answer those questions.

...variables beyond a teacher’s control need to be considered in arriving at a fair and accurate formula.

Yet none of these reporters considered any of these factors in publishing teachers names, and readers will wholly miss that necassary context.

...The company that calculates value-added for Ohio says scores are most reliable with three years of data.

Again, the data is unreliable, especially with less than 3 years worth of data, yet the Plain Dealer and NRP decided they should shame teachers using just 2 years worth of data.

...Ohio’s value-added ratings do not account for the socioeconomic backgrounds of students, as they do in some other states.

How many "ineffective" teachers are really just working in depressed socioeconomic classrooms? The reporters seem not to care and publish the names anyway.

...Value-added scores are not a teacher’s full rating.

No where in the publication of these names are the teachers full ratings indicated. This again leaves lay-people and site visitors to think these flawed value-added scores are the final reflection of a teachers quality

...ratings are still something of an experiment.

How absurd is the decision to publish now seeming? Shaming people on the basis of the results of an experiement! By their very nature experiments can demonstrate something is wrong, not right.

...The details of how the scores are calculated aren’t public.

We don't even know if the value-added scores are correct and accurate, because the formula is secret. How can it be fair for the results of a secret forumla be public? Did that not rasie any alarm bells for the Plain Dealer and NPR?

...The department’s top research official, Matt Cohen, acknowledged that he can’t explain the details of exactly how Ohio’s value-added model works.

But somehow NPR listeners and Cleveland Plain Dealer readers are supposed to understand the complexities, and read the necessary context into the publication of individual teacher scores?

...StateImpact/Plain Dealer analysis of initial state data suggests.

"Initial", "Suggests". They have decided to shame teachers without properly vetting the data and their own analysis - exactly the same problem the LA Times ran into that we highlighted at the top of this article.

It doesn't take a lot of "analysis" to understand that a failing newspaper needed controversy and eyeballs and that their decision to shame teachers was made in their own economic interests and not that of the public good. In the end then, the real shame falls not on teachers who are working hard everyday often in difficult situations made worse by draconian budget cuts, endless political meddling, and student poverty - but on the editors of these 2 publications for putting their own narrow self-interest above that of Ohio's children.

It's a disgrace that they ought to make 4,200 apologies for.

Value-added: How Ohio is destroying a profession

June 17, 2013 in Article

We ended the week last week with a post titled "The 'fun' begins soon", which took a look at the imminent changes to education policy in Ohio. We planned on detailing each of these issues over the next few weeks.

Little did we know that the 'fun' would begin that weekend. It came in the manner of the Cleveland Plain Dealer and NPR publishing a story on the changing landscape of teacher evaluations titled "Grading the Teachers: How Ohio is Measuring Teacher Quality by the Numbers".

It's a solid, long piece, worth the time taken to read it. It covers some, though not all, of the problems of using value-added measurements to evaluate teachers

Those ratings are still something of an experiment. Only reading and math teachers in grades four to eight get value-added ratings now. But the state is exploring how to expand value-added to other grades and subjects.

Among some teachers, there’s confusion about how these measures are calculated and what they mean.

“We just know they have to do better than they did last year,” Beachwood fourth-grade teacher Alesha Trudell said.

Some of the confusion may be due to a lack of transparency around the value-added model.

The details of how the scores are calculated aren’t public. The Ohio Education Department will pay a North Carolina-based company, SAS Institute Inc., $2.3 million this year to do value-added calculations for teachers and schools. The company has released some information on its value-added model but declined to release key details about how Ohio teachers’ value-added scores are calculated.

The Education Department doesn’t have a copy of the full model and data rules either.

The department’s top research official, Matt Cohen, acknowledged that he can’t explain the details of exactly how Ohio’s value-added model works. He said that’s not a problem.

Evaluating a teacher on a secret formula isn't a practice that can be sustained, supported or defended. The article further details a common theme we hear over and over again

But many teachers believe Ohio’s value-added model is essentially unfair. They say it doesn’t account for forces that are out of their control. They also echo a common complaint about standardized tests: that too much is riding on these exams.

“It’s hard for me to think that my evaluation and possibly some day my pay could be in a 13-year-old’s hands who might be falling asleep during the test or might have other things on their mind,” said Zielke, the Columbus middle school teacher.

The article also performs analysis on several thousands value add scores, and that analysis demonstrates what we have long reported, that value-add is a poor indicator of teacher quality, with too many external factors affecting the score

A StateImpact/Plain Dealer analysis of initial state data suggests that teachers with high value-added ratings are more likely to work in schools with fewer poor students: A top-rated teacher is almost twice as likely to work at a school where most students are not from low-income families as in a school where most students are from low-income families.
[…]
Teachers say they’ve seen their value-added scores drop when they’ve had larger classes. Or classes with more students who have special needs. Or more students who are struggling to read.

Teachers who switch from one grade to another are more likely to see their value-added ratings change than teachers who teach the same grade year after year, the StateImpact/Plain Dealer analysis shows. But their ratings went down at about the same rate as teachers who taught the same grade level from one year to the next and saw their ratings change.

What are we measuring here? Surely not teacher quality, but rather socioeconomic factors and budget conditions of the schools and their students.

Teachers are intelligent people, and they are going to adapt to this knowledge in lots of unfortunate ways. It will become progressively harder to districts with poor students to recruit and retain the best teachers. But perhaps the most pernicious effect is captured at the end of the article

Stephon says the idea of Plecnik being an ineffective teacher is “outrageous.”

But Plecnik is through. She’s quitting her job at the end of this school year to go back to school and train to be a counselor — in the community, not in schools.

Plecnik was already frustrated by the focus on testing, mandatory meetings and piles of paperwork. She developed medical problems from the stress of her job, she said. But receiving the news that despite her hard work and the praise of her students and peers the state thought she was Least Effective pushed her out the door.

“That’s when I said I can’t do it anymore,” she said. “For my own sanity, I had to leave.”

The Cleveland Plain Dealer and NPR then decided to add to this stress by publishing individual teachers value-added scores - a matter we will address in our next post.

Charter School Authorization And Growth

June 14, 2013 in Article

If you ask a charter school supporter why charter schools tend to exhibit inconsistency in their measured test-based impact, there’s a good chance they’ll talk about authorizing. That is, they will tell you that the quality of authorization laws and practices — the guidelines by which charters are granted, renewed and revoked — drives much and perhaps even most of the variation in the performance of charters relative to comparable district schools, and that strengthening these laws is the key to improving performance.

Accordingly, a recently-announced campaign by the National Association of Charter School Authorizers aims to step up the rate at which charter authorizers close “low-performing schools” and are more selective in allowing new schools to open. In addition, a recent CREDO study found (among other things) that charter middle and high schools’ performance during their first few years is more predictive of future performance than many people may have thought, thus lending support to the idea of opening and closing schools as an improvement strategy.

Below are a few quick points about the authorization issue, which lead up to a question about the relationship between selectivity and charter sector growth.

The reasonable expectation is that authorization matters, but its impact is moderate. Although there has been some research on authorizer type and related factors, there is, as yet, scant evidence as to the influence of authorization laws/practices on charter performance. In part, this is because such effects are difficult to examine empirically. However, without some kind of evidence, the “authorization theory” may seem a bit tautological: There are bad charters because authorizers allow bad charters to open, and fail to close them.

That said, the criteria and processes by which charters are granted/renewed almost certainly have a meaningful effect on performance, and this is an important area for policy research. On the other hand, it’s a big stretch to believe that these policies can explain a large share of the variation in charter effects. There’s a reasonable middle ground for speculation here: Authorization has an important but moderate impact, and, thus, improving these laws and practices is definitely worthwhile, but seems unlikely to alter radically the comparative performance landscape in the short- and medium-term (more on this below).

Strong authorization policies are a good idea regardless of the evidence. Just to be clear, even if future studies find no connection between improved authorization practices and outcomes, test-based or otherwise, it’s impossible to think of any credible argument against them. If you’re looking to open a new school (or you’re deciding whether or not to renew an existing one), there should be strong, well-defined criteria for being allowed to do so. Anything less serves nobody, regardless of their views on charter schools.

[readon2 url="http://shankerblog.org/?p=8510"]Continue reading...[/readon2]

Why Test Scores CAN'T Evaluate Teachers

June 14, 2013 in Article

From the National Education Policy Center. the entire post is well worth a read, here's the synopsis

The key element here that distinguishes Student Growth Percentiles from some of the other things that people have used in research is the use of percentiles. It's there in the title, so you'd expect it to have something to do with percentiles. What does that mean? It means that these measures are scale-free. They get away from psychometric scaling in a way that many researchers - not all, but many - say is important.

Now these researchers are not psychometricians, who aren't arguing against the scale. The psychometricians as who create our tests, they create a scale, and they use scientific formulae and theories and models to come up with a scale. It's like on the SAT, you can get between 200 and 800. And the idea there is that the difference in the learning or achievement between a 200 and a 300 is the same as between a 700 and an 800.

There is no proof that that is true. There is no proof that that is true. There can't be any proof that is true. But, if you believe their model, then you would agree that that's a good estimate to make. There are a lot of people who argue... they don't trust those scales. And they'd rather use percentiles because it gets them away from the scale.

Let's state this another way so we're absolutely clear: there is, according to Jonah Rockoff, no proof that a gain on a state test like the NJASK from 150 to 160 represents the same amount of "growth" in learning as a gain from 250 to 260. If two students have the same numeric growth but start at different places, there is no proof that their "growth" is equivalent.

Now there's a corollary to this, and it's important: you also can't say that two students who have different numeric levels of "growth" are actually equivalent. I mean, if we don't know whether the same numerical gain at different points on the scale are really equivalent, how can we know whether one is actually "better" or "worse"? And if that's true, how can we possibly compare different numerical gains?

[readon2 url="http://nepc.colorado.edu/blog/why-test-scores-cant-evaluate-teachers"]Continue reading...[/readon2]

The "fun" begins soon

June 13, 2013 in Article

A lot of changes have been legislated in education in recent years, and many of those changes due dates are almost upon us. Here is jus a sample of what we can expect and when, from Common Core and report cards to teacher evaluations.

click for large version

Next week we will begin to take a look at each of these and asses their merits and readiness.

Why Join the Future exists

June 13, 2013 in Article

This piece provides as good a rationale as we have read for why Join the Future exists

"The public common school is the greatest discovery made by man," said Horace Mann. There is a direct link between public education and this nation's civility among citizens, its standard of living and its pivotal position in the world community. Multiple forces, such as the greed associated with for-profit education ventures, the philosophy that education is primarily a private benefit instead of a common good and that sectarian and other private purposes, should be supported from the public largess are unraveling the public common school system. A massive focused, committed offensive is required to preserve the public common school.

During the common school movement in the early and middle 19th century, there were strong forces that opposed the implementation of a tax-supported education system, free to all the children of all the people. Some of the opponents wanted tax money for their sectarian and private purposes. Others opposed tax funds used for the education of the children of others but tolerated public education so long as the public cost was minimal.

The common school movement was successful in spite of strong opposition because the proponents were united and totally committed to the concept that quality educational opportunities should be provided for all children via the public common school system. Thus, the constitutional provision for a thorough and efficient system of common schools was adopted by Ohioans in the mid-19th century. This provision prodded state officials in every generation to enhance educational opportunities within the state system.

The priority for improving the public system changed in Ohio in the early 1990s when the governor joined forces with a current for-profit charter school kingpin to start the voucher and charter school programs. These programs began as "pilots" and thus most local public school personnel and advocates tended to ignore these public policy changes. "This isn't my problem since it doesn't affect me" seemed to be the view. Now that charters and vouchers have grown to the point of extracting about $1 billion from school districts this year, some are becoming concerned that the public common school is unraveling; however, many within the public school community have withdrawn by indicating, "This isn't my problem. 'They' will have to fix it."

HB 59 is a prime exhibit that it is not being fixed. The entire public K-12 common school community must become involved in fixing the problem. "They" are not going to rid the mischief in HB 59 but "we", if, united in the cause of the public common school, determine to do so.

The expansion of choice and short-changing of public K-12 school districts, inherent in HB 59, demand a collective response, immediately.

William Phillis
Ohio E & A