kitchen table math, the sequel: Educational malpractice for the sake of Reform Math

Tuesday, January 8, 2013

Educational malpractice for the sake of Reform Math

A couple of weeks ago, James Milgram, an emeritus Professor of Mathematics at Stanford University, updated me on some recent developments in the controversy over Jo Boaler's "Railside Study." It was only after I reviewed the various critiques, accusations, and rebuttals that I remembered what an enormously consequential case of educational malpractice is afoot here--one that deserves much wider attention than it's gotten so far.

Professor Milgram is known in the education world for his comprehensive critique of a study done by Jo Boaler, an education professor at Stanford, and Megan Staples, then an education professor at Purdue. Boaler and Staples' paper, preprinted in 2005 and published in 2008, is entitled Transforming Students’ Lives through an Equitable Mathematics Approach: The Case of Railside School. Focusing on three California schools, it compares cohorts of students who used either a traditional algebra curriculum, or the Reform Math algebra curriculum The College Preparatory Mathematics (CPM). According to Boaler and Staple's paper, the Reform Math cohort achieved substantially greater mathematical success than the traditional math cohorts.

In early 2005 a high ranking official from the U.S. Department of Education asked Professor Milgram to evaluate Boaler and Staples' study. The reason for her request? She was concerned that, if Boaler and Staples' conclusions were correct, the U.S. department of education would be obliged, in Milgram's words, "to begin to reconsider much if not all of what they were doing in mathematics education." This would entail an even stronger push by the U.S. educational establishment to implement the Constructivist Reform Math curricula throughout K12 education.

Milgram's evaluation of Boaler and Staples' study resulted in a paper, co-authored with mathematician Wayne Bishop and statistician Paul Clopton, entitled A close examination of Jo Boaler's Railside Report. The paper was accepted for publication in peer-reviewed journal Education Next, but statements made to Milgram by some of his math education colleagues caused him to become concerned that the paper's publication would, in Milgram's words, make it "impossible for me to work with the community of math educators in this country"--involved as he then was in a number of other math education-related projects. Milgram instead posted the paper to his Stanford website.

This past October a bullet-point response to Milgram's paper, entitled "When Academic Disagreement Becomes Harassment and Persecution," appeared on Boaler's Stanford website. A month ago, Milgram posted his response and alerted me to it. I have his permission to share parts of it here.

Entitled Private Data - The Real Story: A Huge Problem with Education Research, this second paper reviews Milgram et al's earlier critiques and adds several compelling updates. Together, the two papers make a series of highly significant points, all of them backed up with transparent references to data of the sort that Boaler and Staple's own paper completely lacks.

Indeed, among Milgram et al's points is precisely this lack of transparency. Boaler and Staples refuse to divulge their data, in particular data regarding which schools they studied, claiming that agreements with the schools and FERPA (Family Educational Rights and Privacy Act) rules disallow this. But FERPA only involves protecting the school records of individual students; not those of whole schools. More importantly, refusals to divulge such data violate the federal Freedom of Information Act. Boaler's refusal also violates the policies of Stanford University, specifically its stated "commitment to openness in research" and its prohibitions of secrecy, "including limitations on publishability of results."

Second, Milgram et al's examination of the actual data, once they were able to track it down via California's education records, shows that it was distorted in multiple ways.

1. Boaler and Staple's chosen cohorts aren't comparable:
It appears, from state data, that the cohort at Railside [the pseudonym of the Reform Math school] was comprised of students in the top half of the class in mathematics. For Greendale, it appears that the students were grouped between the 35th and 70th percentiles, and that the students at Hilltop were grouped between the 40th and 80th percentiles. [Excerpted from Milgram; boldface mine]
2. Boaler and Staple's testing instruments are flawed:
Our analysis shows that they contain numerous mathematical errors, even more serious imprecisions, and also that the two most important post-tests were at least 3 years below their expected grade levels.  [Excerpted from Milgram; boldface mine]
3. The data comparing test scores on California's standardized tests (STAR) comes from a comparison of test scores from students not involved in Boaler and Staple's study:
The students in the cohorts Boaler was studying should have been in 11th grade, not ninth in 2003! So [this] is not data for the population studied in [Boaler and Staple's paper]. This 2003 ninth grade algebra data is the only time where the Railside students clearly outperformed the students at the other two schools during this period. There is a possibility that they picked the unique data that might strengthen their assertions, rather than make use of the data relevant to their treatment groups.   [Excerpted from Milgram; boldface mine]
4. The most relevant actual data yields the opposite conclusion about the Reform Math cohort's mathematical success relative that of the traditional math cohorts:
o The most telling data we find is that the mathematics remediation rate for the cohort of Railside students that Boaler was following who entered the California State University system was 61%
o This was much higher than the state average of 37%
o Greendale's remediation rate was 35% o and Hilltop's was 29%.
5. School officials at "Railside" report that the results of the reform math curriculum are even worse than Milgram et al had originally indicated:
A high official in the district where Railside is located called and updated me on the situation there in May, 2010. One of that person's remarks is especially relevant. It was stated that as bad as [Milgram et al's original paper] indicated the situation was at Railside, the school district's internal data actually showed it was even worse. Consequently, they had to step in and change the math curriculum at Railside to a more traditional approach.

Changing the curriculum seems to have had some effect. This year (2012) there was a very large (27 point) increase in Railside's API score and an even larger (28 point) increase for socioeconomically disadvantaged students, where the target had been 7 points in each case.
6. Boaler’s responses to Milgram et al provide no substantiated refutations of any of their key points

In response to comments on an article on Boaler's critique of Milgram, Boaler states:
"I see in some of the comments people criticizing me for not addressing the detailed criticisms from Milgram/Bishop. I am more than happy to this. [...] I will write my detailed response today and post it to my site."
However, as Milgram notes in his December paper:
As I write this, nearly two months have passed since Boaler's rebuttal was promised, but it has not appeared. Nor is it likely to. The basic reason is that there is every reason to believe [Milgram et al's paper] is not only accurate but, in fact, understates the situation at "Railside" from 2000 - 2005.
In a nutshell: under the mantle of purported FERPA protection, we have hidden and distorted data supporting a continued revolution in K12 math education--a revolution that actual data show to be resulting, among other things, in substantially increased mathematics remediation rates among college students. Ever lower mathematical preparedness; ever greater college debt. Just what our country needs.

Nor is Boaler's Reform Math-supporting "research" unique in its lack of transparency, in its lack of independent verification, and in its unwarranted impact on K12 math practices. As Milgram notes,
This seems to be a very common occurrence within education circles.

For example, the results of a number of papers with enormous effects on curriculum and teaching, such as [Diane Briars and Lauren Resnick's paper "Standards, assessments -- and what else? The essential elements of Standards-based school improvement"] and [J. Riordan and P. Noyce's paper, "The impact of two standards-based mathematics curricula on student achievement in Massachusetts"] have never been independently verified.

Yet, [Briars and Resnick's paper] was the only independent research that demonstrated significant positive results for the Everyday Math program for a number of years. During this period district curriculum developers relied on [Briars and Resnick's paper] to justify choosing the program, and, today, EM is used by almost 20% of our students. Likewise [Riordan and Noyce's paper] was the only research accepted by [the U.S. Department of Education's] What Works Clearinghouse in their initial reports that showed positive effects for the elementary school program ``Investigations in Number, Data, and Space,'' which today is used by almost 10% of our students.
As Milgram notes:
Between one quarter and 30% of our elementary school students is a huge data set. Consequently, if these programs were capable of significantly improving our K-12 student outcomes, we would surely have seen evidence by now.
And to pretend that such evidence exists when it doesn't is nothing short of educational malpractice.

7 comments:

SteveH said...

Thanks for the update.

I had the What Works Clearinghouse data for EM thrown in my face long ago. I replied that the results were marginal at best. Even if the school didn't have the WWC data, they would have kept EM. This was the school where a number of bright fifth graders still didn't know the times table. The teacher had to NOT trust the spiral to remediate. She didn't cover 35% of the EM material by the end of the year but sent a letter home claiming victory over critical thinking and problem solving.

Whatever.

Laura in AZ said...

Nothing like having the results decided on before your data is even collected. *rolls eyes* This is probably only the tip of the iceberg, unfortunately.

When my daughter's Parochial school switched math curriculum's to one of these "fuzzy" math ones, that, along with several other issues, is one of the contributing reasons for pulling her out and starting to homeschool her. We started with Saxon and it worked for about 1 1/2 years, but then we had to go to a more regular, though traditional math curriculum (Abeka) which stays on one subject longer, but still has lots of review. Though I still do the Saxon drills with her.

Student of History/Robin said...

Katherine-

Couple of points. The first is that the references to "Standards" are not content standards but the Standards for Teaching and Learning created in Chicago in the early 90s. Standards became a euphemism for outcomes after OBE after OBE became controversial. Keep the practices. Change the name to make the heat die down.

This matters a lot because President Obama on multiple occasions has said those Chicago standards which he was involved in as part of the Chicago Annenberg challenge are the real common core. The Hewlett Foundation has also corroborated that.

I did not realize that Jo was now at Stanford as well. The Ed school there has basically put together a team to implement the Effective Schools research and get around all the obstacles turned up that varied implementation at the school and classroom level in that 1973-1978 Rand Change Agent research study.

Third, on Common Core which incorporates the CPM work in the assessments that are to be so much a part of the implementation, it's the assessments that are to drive the classroom. That's why what is going on in Texas and Virginia mirrors the other states. STAAR for example is based on Norman Webb's OBE based Depth of Knowledge. McTighe and Wiggins and their Understanding by Design is the model for the kind of performance assessments to be used. That's performance as in activities and behaviors and tasks. Most collaborative. UbD drives most of the teacher training now in Texas.

The tip in your post Katherine that tells me where to look additionally is the Resnick link. Lauren is forever ground zero for what is really going on. Has been for decades.

What is educational malpractice to us is just Dewey's Social Reconstruction and a different purpose for education if you are Boaler or Resnick.

Remember that CPM was created at Michigan State. Same place that conducted the Effective Schools research in the first place.

SteveH said...

Math curricula can always come up with statistics to make them look good over the average of all schools. I was just looking at some of CPM's data. There is also the issue of showing relatively better results, but ignoring huge absolute or systemic problems.

CPM has a curriculum that potentially gets a student into a calculus course that "Covers all content required for the AP Calculus Test – both AB and BC." AP at least keeps them honest at the top end, but what happens in the lower grades before the big math track split? How many more kids does it get to the pre-calc or calculus level? How many of the kids on the upper track get help at home or with tutors?

Many studies focus on whether more students get over minimal state proficiencies. You can always slow down and do better at the lower end no matter whether you talk about understanding or not. Our lower schools improved their math test scores (relatively speaking) long ago with MathLand just by trying harder. There is a lot of low hanging fruit that can be obtained just by focusing on teaching competence across grades.

There are so many assumptions and issues. Many know that proficiency on state tests is nowhere near good enough. I've ignored all of my son's state test results. Some talk about STEM and CCSS in the same sentence, but they really need to define a year-by-year STEM proficiency level that will lead to getting a good grade in (at least) pre-calc in high school. David Coleman wants to tie the SAT to the CCSS standards, but even a SAT math score of 600 may not be good enough for success in a STEM major.

Now that common standardized testing has become the norm, it will be interesting to see how they calibrate CCSS tests with the ACT and SAT. In the past, with so many different state tests, this was never done. They will have to show how many kids make the nonlinear leap from proficient scores on the state test to specific SAT/ACT scores. Then, all they have to do is ask parents how they helped their kids make that transition.

momof4 said...

They won't ask parents how their kids made that leap; they deliberately choose not to know. As long as "enough" (varies by school/district) kids are able to make the leap, schools do not want to hear how it was done. They don't want to know about parents, paid tutors, Kumon, online or other resources used. They simply point to the fact that "enough" kids get there and use that as a justification for what the school is doing. I'd love to see the local PTA do a survey about the use of outside resources (not limited to math) and post it for all to see.

Catherine Johnson said...

momof4- I have LONG wished the PTSA would conduct such surveys, but it seems hopeless.

I think I've mentioned before that last June, the night before C's high school graduation, our incoming president of the PTSA actually sent police to my house to question me (political intimidation by cop).

That episode is now long enough in the past that it's begun to seem funny, BUT ..... while our current PTSA president is not at all the norm, the fact that my six years of efforts to reform the curriculum here resulted in a PTSA-sponsored visit from the police does not give me hope re: PTSA-sponsored surveys!

(In case anyone is wondering, the police were told that I had made threats against a PTSA speaker who was to address parents on the subject of cyberbullying and that I was expected to "disrupt" the proceedings. That, too, is **starting** to be funny. Yes, indeed, if you're going to suspect a person of making threats against a PTSA speaker, you definitely want to pick a late-middle-aged mom of three who didn't attend the speech. (The police came to my house after the speech was over.)

momof4 said...

That's awful. That kind of action should be unbelievable, but the edworld (among others) doesn't give up its stranglehold easily. Our old district was apparently required to send out parent surveys to find out if the HS cluster wished to continue the 7-8 JHS format or move to the MCPS-proposed 6-7-8 MS format. The vote was something over 90% wishing to keep the JHS format, but we got the MS anyway. Apparently, there was no requirement that anyone read the results or follow the parent wishes. All of the best things (academics) about the JHS while my older kids were there had been lost and all of the worst things (artsy-crafty,touchy-feely, non-academic) came from ES to the MS. My younger kids hated it. Sigh