Monday, March 21, 2011

My copy of the Harvard Education Letter arrived with an excerpt from a new book on value-added measurement. I have a question about this passage:
Misconception 2: Value-added scores are inaccurate because they are based on poorly designed tests. Most standardized tests are indeed flawed, but this is not a problem created or worsened by value-added.
by Douglas N. Harris
Harvard Education Letter - March|April 2011
p. 8
Is that correct?

As I understand it, New York's state tests can't be used for value-added purposes. The tests are shorter in some years, longer in others, and somehow don't correspond to a one-year measurement of learning. Or so we were told. Certainly they don't provide any sense of where a child might be within a year's worth of content. New York tests are scored 1 to 4, so if your child scores a middling 3, what does that mean on a scale of 10 months? Nobody knows.

I had been assuming that in order to use a standardized test as a value-added measurement, the tests had to be normed month-by-month as the Iowa Test of Basic Skills is normed:
The grade equivalent is a number that describes a student's location on an achievement continuum. The continuum is a number line that describes the lowest level of knowledge or skill on one end (lowest numbers) and the highest level of development on the other end (highest numbers). The GE is a decimal number that describes performance in terms of grade level and months. For example, if a sixth-grade student obtains a GE of 8.4 on the Vocabulary test, his score is like the one a typical student finishing the fourth month of eighth grade would likely get on the Vocabulary test. The GE of a given raw score on any test indicates the grade level at which the typical student makes this raw score. The digits to the left of the decimal point represent the grade and those to the right represent the month within that grade.
When your child takes the ITBS from one year to the next, it's simple to see whether he's made a year's progress in a year's time. If, at the end of grade 3, he scored a 3.10 on computation (grade 3, month 10), he should score a 4.10 on computation at the end of 4th grade.

But how would you make that determination using the New York tests?

Or is there some other comparison you make from year to year?

Anonymous said...

If you are willing to make the simplifying assumption that the average skills of the teachers for each grade are roughly comparable (e.g. on average the 3rd grade teachers are no better and no worse than the 6th grade teachers) ... then you don't need to care what the actual numbers are year-to-year.

You can do quite well by simply sorting the year-to-year scores for each grade level, after having adjusted for the things you wish to adjust for (e.g. IQ). This tells you who the better and worser teachers are, without needing to know if a "good" teacher had kids learn more than one year's worth of material in a school year.

The last bit that should be addressed is to have some confidence that the year-to-year scoring is *linear*, even if you can't compare 3rd grade teachers to 6th grade teachers.

If the scoring isn't linear, then you either need to adjust for it, or you will give an advantage to teachers with poorer (or better) students.

You still, of course, need to understand the expected random variation so that you don't punish/reward based on noise. A large baseline helps with this. As does understanding/measuring the standard deviation of the scores.

I think.

Is there a statistician in the house?

-Mark Roulo

Catherine Johnson said...

Here's a question: you're talking only about measuring a teacher's averages --- can you use tests that aren't consistent from year to year to know whether an individual child has made one year's progress in one year's time?

Anonymous said...

"can you use tests that aren't consistent from year to year to know whether an individual child has made one year's progress in one year's time?"

Probably not, but my point was that I don't think you *NEED* to know this.

You just want to know who the good and bad 6th grade teachers are for you measurement period, and if it is significant.

You don't need year-to-year consistency for this.

Think of it this way: I want to be able to rank the 6th grade teachers (Ms. Brown is better than Ms. Gray, and Ms. Gray is better than Ms. White), but I don't need to be able to say how they compare to the 3rd grade teachers.

And the only reason I want to be able to measure the variation within a single grade is to know if the differences are significant or not.

If I wasn't super concerned with fairness and being nice, I'd even be willing to fire the bottom 1-2% each year without knowing how much worse they are. I sorta don't need to care since I expect that the replacement will be average.

You *do* want to verify consistency, so if Mr. Brown is better than Ms. White this year, but they switch places next year, then you want to be more cautious about firing. My guess, though, is that the bottom 10% or so will tend to stay in the bottom 20%...

-Mark Roulo

Catherine Johnson said...

right

so this is another one of those things that lets you evaluate teachers --- but not children

not individual children

very frustrating

Not at that job anymore said...

Let's throw in that Ms. Brown has a special relationship with the principal and takes as many of the middle of the road kids as she can get. Ms. White gets the main behavioral problems and Ms. Gray has some low end scorers and some very high end scorers.

I *think* that in that scenario Ms. Gray may get the shortest end of the stick depending on value-added is done. Her high achievers may not get the advanced material they need at the pace they need to get a year's worth of growth since they may already be at that grade level or beyond. Ms. Gray does have to bring up those low scorers who will count most heavily for/against her on the state test that goes from 1 to 4, though. If she can move her 1s to 2s and 3s, she's golden, as long as her 4s don't actually have knowledge fall out of their heads.

And speaking as someone who had to wake a student 3-5 times during each of three 1.5 hour testing sessions to beg her to get the test done, I'm not sure what that had to do to my "value added." Maybe I should have brought her Red Bull each day? Or gone to her house each night to make sure she got some quality sleep in her very crowded and small house with several babies in it?

Sigh. She should have been a 4 scorer, too. Instead, by the end of the year she'd missed so much school that she was retained in the grade.

lgm said...

I don't think you can evaluate the elementary teachers...full inclusion means more than one teacher per subject per child. You're now evaluating the head teacher, the inclusion or sped teacher, and the rTi/resource teacher. On top of that, a curriculum coordinator or dept chair is telling them what to teach; the younger ones won't be closing the door and teaching the children - they teach curriculum and depend on the help to teach the children as they circulate.