The Lab School (Lab School at US News) has selective admissions, and Ms. Isaacson’s students have excelled. Her first year teaching, 65 of 66 scored proficient on the state language arts test, meaning they got 3’s or 4’s; only one scored below grade level with a 2. More than two dozen students from her first two years teaching have gone on to Stuyvesant High School or Bronx High School of Science, the city’s most competitive high schools.
“Definitely one of a kind,” said Isabelle St. Clair, now a sophomore at Bard, another selective high school. “I’ve had lots of good teachers, but she stood out — I learned so much from her.”
You would think the Department of Education would want to replicate Ms. Isaacson — who has degrees from the University of Pennsylvania and Columbia — and sprinkle Ms. Isaacsons all over town. Instead, the department’s accountability experts have developed a complex formula to calculate how much academic progress a teacher’s students make in a year — the teacher’s value-added score — and that formula indicates that Ms. Isaacson is one of the city’s worst teachers.
According to the formula, Ms. Isaacson ranks in the 7th percentile among her teaching peers — meaning 93 per cent are better.
This may seem disconnected from reality, but it has real ramifications. Because of her 7th percentile, Ms. Isaacson was told in February that it was virtually certain that she would not be getting tenure this year. “My principal said that given the opportunity, she would advocate for me,” Ms. Isaacson said. “But she said don’t get your hopes up, with a 7th percentile, there wasn’t much she could do.”
That’s not the only problem Ms. Isaacson’s 7th percentile has caused. If the mayor and governor have their way, and layoffs are no longer based on seniority but instead are based on the city’s formulas that scientifically identify good teachers, Ms. Isaacson is pretty sure she’d be cooked.
She may leave anyway. She is 33 and had a successful career in advertising and finance before taking the teaching job, at half the pay.
“I love teaching,” she said. “I love my principal, I feel so lucky to work for her. But the people at the Department of Education — you feel demoralized.”
How could this happen to Ms. Isaacson? It took a lot of hard work by the accountability experts.
Everyone who teaches math or English has received a teacher data report. On the surface the report seems straightforward. Ms. Isaacson’s students had a prior proficiency score of 3.57. Her students were predicted to get a 3.69 — based on the scores of comparable students around the city. Her students actually scored 3.63. So Ms. Isaacson’s value added is 3.63-3.69.
The calculation for Ms. Isaacson’s 3.69 predicted score is even more daunting. It is based on 32 variables — including whether a student was “retained in grade before pretest year” and whether a student is “new to city in pretest or post-test year.”
In plain English, Ms. Isaacson’s best guess about what the department is trying to tell her is: Even though 65 of her 66 students scored proficient on the state test, more of her 3s should have been 4s.
But that is only a guess.
Moreover, as the city indicates on the data reports, there is a large margin of error. So Ms. Isaacson’s 7th percentile could actually be as low as zero or as high as the 52nd percentile — a score that could have earned her tenure.
Evaluating New York Teachers, Perhaps the Numbers Do Lie
By MICHAEL WINERIP
Published: March 6, 2011
I have four reactions.
A 32-variable teacher evaluation scheme does not sit right with me if only because it lacks transparency. This teacher has no idea why her score falls in the bottom 7% of all teachers in NYC, and neither does anyone else including her principal and students.
Is this teacher running afoul of a ceiling effect? Her students were already scoring well above average coming into her class -- isn't it harder to bring above-average students further up than it is to bring below-average students to average? Working on SAT math with C., I'm convinced that the jump from 550 to 600 is a shorter leap than the one from 600 to 650. Whether or not that's true for the SAT specifically, I'm pretty sure people have shown it to be true with other tests.
Yes. It's a well-known effect. *
I flatly reject the assumption that New York state tests are capable of distinguishing between a group of students earning 3.57 on average and a group of students earning 3.69 on average. A few years back, when C., who is a fantastically good reader,** scored a 3 on reading, I got in touch with our then-curriculum director, who told me that NY state tests in some grades have essentially no range of scores in the 4 category at all. That is, if you score a 37 or 38 out of 38 correct, say, you earn a 4; score a 36 and you're a 3. I checked the test and sure enough. She was right. There was no range at all for the 4. I don't know whether David Steiner has changed the tests in the year he's been in office, but even if he has, I reject the idea that the tests are now valid and can accurately assess what the gap between a 3.57 and a 3.69 means (if anything) and whether it is equivalent to the gap between a 3.01 and a 3.13.***
On the other hand, suppose the 7% ranking is right. What might account for that?
One possibility: the Lab School is a constructivist enterprise (here's the Math Department), and this teacher was trained at Columbia Teachers College. She is teaching English and social studies to 7th graders. New York state requires that teachers have a Bachelor's degree in their field of specialty beginning in 7th grade, which means that most 7th grade teachers are teaching English or social studies, not both. One of her students says, "I really liked how she’d incorporate what we were doing in history with what we did in English,” Marya said. “It was much easier to learn.”
Interdisciplinary teaching at the middle school level tends to be shallow because students aren't expert in any of the fields being blurred together (and teachers are expert in just one field), and the only commonalities you can find between disciplines tend to be obvious and current eventsy. e.g.: back when one of our middle school principals explained to us that henceforth character education would be 'embedded' in all subject matter, the best example he could come up with was that the father in The Miracle Worker is an angry patriarchal male who is abusive towards his handicapped child. That "interdisciplinary" reading of The Miracle Worker is anachronistic, simplistic, and wrong.
It's impossible to say whether these scores mean anything.
But if they do, they suggest to me that a 7th grade teacher needs to focus all of her efforts on English or on history, not on both.
English literature and history are very different disciplines.
I wonder how other teachers in the school fared - and, if they did better, how true they were to the middle school model?
* The article doesn't tell us whether the city's statisticians correct for ceiling effects and regression to the mean.
**C. typically missed just 1 or 2 items on SAT reading & writing tests.
*** Of course, given the very wide range for 3s, perhaps a .12 difference is significant. It's impossible to know -- and that's the problem.