differences

Is Ohio ready for computer testing?

The Cincinnati Enquirer has a report on how Ohio schools are not going to be ready for the new online PARCC tests that are scheduled to be deployed next year.

Ohio public schools appear to be far short of having enough computers to have all their students take new state-mandated tests within a four-week period beginning in the 2014-15 school year.

“With all the reductions in education funds over the last several years and the downturn in the economy, districts have struggled to be able to bring their (computer technology) up to the level that would be needed for this,” said Barbara Shaner, associate executive director of the Ohio Association of School Business Officials.

Districts could seek state permission to deliver the new tests on paper if they can’t round up enough computers, tablets and gadgets to go around, Jim Wright, director of curriculum and assessment for the Ohio Department of Education, said. A student taking a paper test could be at a disadvantage, though. While the paper tests won’t have substantially different questions, a student taking the test online will have the benefit of audio and visual prompts as well as online tasks that show their work on computer, said Chad Colby, a spokesman for the Partnership for Assessment of Readiness for College and Careers.

The state really does need to step up and help districts fund this costly mandate that has been foisted upon them. Added to this, the computer industry is going through significant changes as more and more people move away from the traditional desktops and laptops in favor of the simpler more portable tablets. School districts could find themselves having to make costly investments again in the near future if they pick the wrong technologies.

The article makes note of the possibility of paper based test takers being at a possible disadvantage over those taking the computer based tests. There has been a significant amount of research over the years on this, and the results seem to indicate the opposite effect - that computer based test takers score lower than paper based tests.

The comparability of test scores based on online versus paper testing has been studied for more than 20 years. Reviews of the comparability literature research were reported by Mazzeo and Harvey (1988), who reported mixed results, and Drasgow (1993), who concluded that there were essentially no differences in examinee scores by mode-of-administration for power tests. Paek (2005) provided a summary of more recent comparability research and concluded that, in general, computer and paper versions of traditional multiple-choice tests are comparable across grades and academic subjects. However, when tests are timed, differential speededness can lead to mode effects. For example, a recent study by Ito and Sykes (2004) reported significantly lower performance on timed web-based norm-referenced tests at grades 4-12 compared with paper versions. These differences seemed to occur because students needed more time on the web-based test than they did on the paper test. Pommerich (2004) reported evidence of mode differences due to differential speededness in tests given at grades 11 and 12, but in her study online performance on questions near the end of several tests was higher than paper performance on these same items. She hypothesized that students who are rushed for time might actually benefit from testing online because the computer makes it easier to respond and move quickly from item to item.

A number of studies have suggested that no mode differences can be expected when individual test items can be presented within a single screen (Poggio, Glassnapp, Yang, & Poggio, 2005; Hetter, Segall & Bloxom, 1997; Bergstrom, 1992; Spray, Ackerman, Reckase, & Carlson, 1989). However, when items are associated with text that requires scrolling, such as is typically the case with reading tests, studies have indicated lower performance for students testing online (O’Malley, 2005; Pommerich, 2004; Bridgeman, Lennon, & Jackenthal, 2003; Choi & Tinkler, 2002; Bergstrom, 1992)

Do Different Value-Added Models Tell Us the Same Things?

Via

Highlights

  • Statistical models that evaluate teachers based on growth in student achievement differ in how they account for student backgrounds, school, and classroom resources. They also differ by whether they compare teachers across a district (or state) or just within schools.
  • Statistical models that do not account for student background factors produce estimates of teacher quality that are highly correlated with estimates from value-added models that do control for student backgrounds, as long as each includes a measure of prior student achievement.
  • Even when correlations between models are high, different models will categorize many teachers differently.
  • Teachers of advantaged students benefit from models that do not control for student background factors, while teachers of disadvantaged students benefit from models that do.
  • The type of teacher comparisons, whether within or between schools, generally has a larger effect on teacher rankings than statistical adjustments for differences in student backgrounds across classrooms.

Introduction

There are good reasons for re-thinking teacher evaluation. As we know, evaluation systems in most school districts appear to be far from rigorous. A recent study showed that more than 99 percent of teachers in a number of districts were rated “satisfactory,” which does not comport with empirical evidence that teachers differ substantially from each other in terms of their effectiveness. Likewise, the ratings do not reflect the assessment of the teacher workforce by administrators, other teachers, or students.

Evaluation systems that fail to recognize the true differences that we know exist among teachers greatly hamper the ability of school leaders and policymakers to make informed decisions about such matters as which teachers to hire, what teachers to help, which teachers to promote, and which teachers to dismiss. Thus it is encouraging that policymakers are developing more rigorous evaluation systems, many of which are partly based on student test scores.

Yet while the idea of using student test scores for teacher evaluations may be conceptually appealing, there is no universally accepted methodology for translating student growth into a measure of teacher performance. In this brief, we review what is known about how measures that use student growth align with one another, and what that agreement or disagreement might mean for policy.

[readon2 url="http://www.carnegieknowledgenetwork.org/briefs/value-added/different-growth-models/"]Continue reading...[/readon2]

How Stable are Value-Added Estimates

Via

Highlights:

  • A teacher’s value-added score in one year is partially but not fully predictive of her performance in the next.
  • Value-added is unstable because true teacher performance varies and because value-added measures are subject to error.
  • Two years of data does a meaningfully better job at predicting value added than does just one. A teacher’s value added in one subject is only partially predictive of her value added in another, and a teacher’s value added for one group of students is only partially predictive of her valued added for others.
  • The variation of a teacher’s value added across time, subject, and student population depends in part on the model with which it is measured and the source of the data that is used.
  • Year-to-year instability suggests caution when using value-added measures to make decisions for which there are no mechanisms for re-evaluation and no other sources of information.

Introduction

Value-added models measure teacher performance by the test score gains of their students, adjusted for a variety factors such as the performance of students when they enter the class. The measures are based on desired student outcomes such as math and reading scores, but they have a number of potential drawbacks. One of them is the inconsistency in estimates for the same teacher when value added is measured in a different year, or for different subjects, or for different groups of students.

Some of the differences in value added from year to year result from true differences in a teacher’s performance. Differences can also arise from classroom peer effects; the students themselves contribute to the quality of classroom life, and this contribution changes from year to year. Other differences come from the tests on which the value-added measures are based; because test scores are not perfectly accurate measures of student knowledge, it follows that they are not perfectly accurate gauges of teacher performance.

In this brief, we describe how value-added measures for individual teachers vary across time, subject, and student populations. We discuss how additional research could help educators use these measures more effectively, and we pose new questions, the answers to which depend not on empirical investigation but on human judgment. Finally, we consider how the current body of knowledge, and the gaps in that knowledge, can guide decisions about how to use value-added measures in evaluations of teacher effectiveness.

[readon2 url="http://www.carnegieknowledgenetwork.org/briefs/value-added/value-added-stability/"]Continue reading...[/readon2]

Recruiting the best?

Interesting

This goal – recruiting and retaining talented people into teaching – is shared by most everyone, but it is among the most central emphases of the diverse group that might be called market-based reformers. Their idea is to change compensation structures, performance evaluations and other systems in order to create the kind of environment that will be appealing to high-achieving, less risk-averse people, as well as to ensure that those who aren’t cut out for the job are compelled to leave. This will, so the argument goes, create a “dynamic profession” more in line with the high risk, high reward model common among the private sector firms competing for the same pool of young workers.

No matter your feelings on TFA, it’s more than fair to say that their corps members fit this profile perfectly. On paper, they aren’t just “top third,” but top third of the top third. TFA cohorts enter the labor market having been among the highest achievers in the best colleges and universities in the nation. Getting accepted to the program is very, very difficult. Those who make it are not only service-oriented, but also smart, hard-working and ambitious. They are exactly the kind of worker that employers crave, and market-based reformers have made it among their central purposes to attract to the profession.

Yet, at least by the standard of test-based productivity, TFA teachers really don’t do better, on average, than their peers, and when there are demonstrated differences, they are often relatively small and concentrated in math (the latter, by the way, might suggest the role of unobserved differences in content knowledge). Now, again, there is some variation in the findings, and the number and scope of these analyses are limited – we’re nowhere near some kind of research consensus on these comparisons of test-based productivity, to say nothing of other sorts of student outcomes.
[...]
But, to me, one of the big, underdiscussed lessons of TFA is less about the program itself than what the test-based empirical research on its corps members suggests about the larger issue of teacher recruitment. Namely, it indicates that “talent” as typically gauged in the private sector may not make much of a difference in the classroom, at least not by itself. This doesn’t necessarily mean that market-based policies won’t lure great teachers, but it does suggest that, if we’re going to enact massive changes in personnel policy to attract a certain “type” of person to teaching, we might reexamine our assumptions on who we’re trying to attract and what they want.

Via.

Research doesn’t back up key ed reforms

Via the Washington Post

There is no solid evidence supporting many of the positions on teachers and teacher evaluation taken by some school reformers today, according to a new assessment of research on the subject.

The Education Writers Association released a new brief that draws on more than 40 research studies or research syntheses, as well as interviews with scholars who work in this field.

You can read the entire brief (written by Education Week assistant editor Stephen Sawchuk), but here are the bottom-line conclusions of each section:

Q) Are teachers the most important factor affecting student achievement?

A) Research has shown that the variation in student achievement is predominantly a product of individual and family background characteristics. Of the school factors that have been isolated for study, teachers are probably the most important determinants of how students will perform on standardized tests.

Q) Are value-added estimations reliable or stable?

A) Value-added models appear to pick up some differences in teacher quality, but they can be influenced by a number of factors, such as the statistical controls selected. They may also be affected by the characteristics of schools and peers. The impact of unmeasured factors in schools, such as principals and choice of curriculum, is less clear.

Q) What are the differences in achievement between students who have effective or ineffective teachers for several years in a row?

A) Some teachers produce stronger achievement gains among their students than others do. However, estimates of an individual teacher’s effectiveness can vary from year to year, and the impact of an effective teacher seems to decrease with time. The cumulative effect on students’ learning from having a succession of strong teachers is not clear.

Q) Do teacher characteristics such as academic achievement, years of experience, and certification affect student test scores?

A) Teachers improve in effectiveness at least over their first few years on the job. Characteristics such as board certification, and content knowledge in math sometimes are linked with student achievement. Still, these factors don’t explain much of the differences in teacher effectiveness overall.

Q) Does merit pay for teachers produce better student achievement or retain more-effective teachers?

A) In the United States, merit pay exclusively focused on rewarding teachers whose students produce gains has not been shown to improve student achievement, though some international studies show positive effects. Research has been mixed on comprehensive pay models that incorporate other elements, such as professional development. Scholars are still examining whether such programs might work over time by attracting more effective teachers.

Q) Do students in unionized states do better than students in states without unions?

A) Students tend to do well in some heavily unionized states, but it isn’t possible to conclude that it is the presence or absence of unions that cause that achievement.

What Studies Say About Teacher Effectiveness

Analysis shows charters underperform in Ohio's big 8

Every year, the state of Ohio releases an enormous amount of district- and school-level performance data. Since Ohio has among the largest charter school populations in the nation, the data provide an opportunity to examine performance differences between charters and regular public schools in the state.

Ohio’s charters are concentrated largely in the urban “Ohio 8” districts (sometimes called the “Big 8”): Akron; Canton; Cincinnati; Cleveland; Columbus; Dayton; Toledo; and Youngstown. Charter coverage varies considerably between the “Ohio 8” districts, but it is, on average, about 20 percent, compared with roughly five percent across the whole state. I will therefore limit my quick analysis to these districts.
[...]
In short, there are significant differences between charters and regular public schools in the likelihood that they receive different ratings, even controlling for the student characteristics mentioned above. To make things simpler, let’s take a look at how “being a charter school” affects the predicted probability of receiving ratings using three different “cutoff” points: The odds of schools receiving the rating of “continuous improvement” or better; “effective” or better; and “excellent” or better. The graph below represents the change in probability for charter schools.

The difference between the two types of schools in the probability of receiving “excellent” or better (-0.02, or two percent) is small and not statistically significant. The other two differences, on the other hand, are both large and significant. Charter schools are 13 percent less likely to receive a rating of “effective” or better, and they are 22 percent less likely to receive “continuous improvement” or better.

[readon2 url="http://shankerblog.org/?p=3554"]Continue reading...[/readon2]