A couple of weeks ago, James Milgram, an emeritus Professor of Mathematics at Stanford University, updated me on some recent developments in the controversy over
Jo Boaler's "Railside Study." It was only after I reviewed the various critiques, accusations, and rebuttals that I remembered what an enormously consequential case of educational malpractice is afoot here--one that deserves much wider attention than it's gotten so far.
Professor Milgram is known in the education world for his comprehensive critique of a study done by Jo Boaler, an education professor at Stanford, and Megan Staples, then an education professor at Purdue. Boaler and Staples' paper, preprinted in 2005 and published in 2008, is entitled
Transforming Students’ Lives through an Equitable Mathematics Approach: The Case of Railside School. Focusing on three California schools, it compares cohorts of students who used either a traditional algebra curriculum, or the Reform Math algebra curriculum
The College Preparatory Mathematics (CPM). According to Boaler and Staple's paper, the Reform Math cohort achieved substantially greater mathematical success than the traditional math cohorts.
In early 2005 a high ranking official from the U.S. Department of Education asked Professor Milgram to evaluate Boaler and Staples' study. The reason for her request? She was concerned that, if Boaler and Staples' conclusions were correct, the U.S. department of education would be obliged, in Milgram's words, "to begin to reconsider much if not all of what they were doing in mathematics education." This would entail an even stronger push by the U.S. educational establishment to implement the Constructivist Reform Math curricula throughout K12 education.
Milgram's evaluation of Boaler and Staples' study resulted in a paper, co-authored with mathematician Wayne Bishop and statistician Paul Clopton, entitled
A close examination of Jo Boaler's Railside Report. The paper was accepted for publication in peer-reviewed journal
Education Next, but statements made to Milgram by some of his math education colleagues caused him to become concerned that the paper's publication would, in Milgram's words, make it "impossible for me to work with the community of math educators in this country"--involved as he then was in a number of other math education-related projects. Milgram instead posted the paper to his
Stanford website.
This past October a bullet-point response to Milgram's paper, entitled "When Academic Disagreement Becomes Harassment and Persecution," appeared on Boaler's
Stanford website. A month ago, Milgram posted his
response and alerted me to it. I have his permission to share parts of it here.
Entitled
Private Data - The Real Story: A Huge Problem with Education Research, this second paper reviews Milgram et al's earlier critiques and adds several compelling updates. Together, the two papers make a series of highly significant points, all of them backed up with transparent references to data of the sort that Boaler and Staple's own paper completely lacks.
Indeed, among Milgram et al's points is precisely this lack of transparency. Boaler and Staples
refuse to divulge their data, in particular data regarding which schools they studied, claiming that agreements with the schools and FERPA (Family Educational Rights and Privacy Act) rules disallow this. But FERPA only involves protecting the school records of individual students; not those of whole schools. More importantly,
refusals to divulge such data violate the federal Freedom of Information Act. Boaler's refusal also violates the policies of Stanford University, specifically its stated "commitment to openness in research" and its prohibitions of secrecy, "including limitations on publishability of results."
Second, Milgram et al's examination of the actual data, once they were able to track it down via California's education records, shows that it was
distorted in multiple ways.
1. Boaler and Staple's chosen cohorts aren't comparable:
It appears, from state data, that the cohort at Railside [the pseudonym of the Reform Math school] was comprised of students in the top half of the class in mathematics. For Greendale, it appears that the students were grouped between the 35th and 70th percentiles, and that the students at Hilltop were grouped between the 40th and 80th percentiles. [Excerpted from Milgram; boldface mine]
2. Boaler and Staple's testing instruments are flawed:
Our analysis shows that they contain numerous mathematical errors, even more serious imprecisions, and also that the two most important post-tests were at least 3 years below their expected grade levels. [Excerpted from Milgram; boldface mine]
3. The data comparing test scores on California's standardized tests (STAR) comes from a comparison of test scores from students not involved in Boaler and Staple's study:
The students in the cohorts Boaler was studying should have been in 11th grade, not ninth in 2003! So [this] is not data for the population studied in [Boaler and Staple's paper]. This 2003 ninth grade algebra data is the only time where the Railside students clearly outperformed the students at the other two schools during this period. There is a possibility that they picked the unique data that might strengthen their assertions, rather than make use of the data relevant to their treatment groups. [Excerpted from Milgram; boldface mine]
4. The most relevant actual data yields the opposite conclusion about the Reform Math cohort's mathematical success relative that of the traditional math cohorts:
o The most telling data we find is that the mathematics remediation rate for the cohort of Railside students that Boaler was following who entered the California State University system was 61%
o This was much higher than the state average of 37%
o Greendale's remediation rate was 35% o and Hilltop's was 29%.
5. School officials at "Railside" report that the results of the reform math curriculum are even worse than Milgram et al had originally indicated:
A high official in the district where Railside is located called and updated me on the situation there in May, 2010. One of that person's remarks is especially relevant. It was stated that as bad as [Milgram et al's original paper] indicated the situation was at Railside, the school district's internal data actually showed it was even worse. Consequently, they had to step in and change the math curriculum at Railside to a more traditional approach.
Changing the curriculum seems to have had some effect. This year (2012) there was a very large (27 point) increase in Railside's API score and an even larger (28 point) increase for socioeconomically disadvantaged students, where the target had been 7 points in each case.
6. Boaler’s responses to Milgram et al provide no substantiated refutations of any of their key points
In response to comments on an article on Boaler's critique of Milgram, Boaler states:
"I see in some of the comments people criticizing me for not addressing the detailed criticisms from Milgram/Bishop. I am more than happy to this. [...] I will write my detailed response today and post it to my site."
However, as Milgram notes in his December paper:
As I write this, nearly two months have passed since Boaler's rebuttal was promised, but it has not appeared. Nor is it likely to. The basic reason is that there is every reason to believe [Milgram et al's paper] is not only accurate but, in fact, understates the situation at "Railside" from 2000 - 2005.
In a nutshell: under the mantle of purported FERPA protection, we have hidden and distorted data supporting a continued revolution in K12 math education--a revolution that actual data show to be resulting, among other things, in substantially increased mathematics remediation rates among college students. Ever lower mathematical preparedness; ever greater college debt. Just what our country needs.
Nor is Boaler's Reform Math-supporting "research" unique in its lack of transparency, in its lack of independent verification, and in its unwarranted impact on K12 math practices. As Milgram notes,
This seems to be a very common occurrence within education circles.
For example, the results of a number of papers with enormous effects on curriculum and teaching, such as [Diane Briars and Lauren Resnick's paper "Standards, assessments -- and what else? The essential elements of Standards-based school improvement"] and [J. Riordan and P. Noyce's paper, "The impact of two standards-based mathematics curricula on student achievement in Massachusetts"] have never been independently verified.
Yet, [Briars and Resnick's paper] was the only independent research that demonstrated significant positive results for the Everyday Math program for a number of years. During this period district curriculum developers relied on [Briars and Resnick's paper] to justify choosing the program, and, today, EM is used by almost 20% of our students. Likewise [Riordan and Noyce's paper] was the only research accepted by [the U.S. Department of Education's] What Works Clearinghouse in their initial reports that showed positive effects for the elementary school program ``Investigations in Number, Data, and Space,'' which today is used by almost 10% of our students.
As Milgram notes:
Between one quarter and 30% of our elementary school students is a huge data set. Consequently, if these programs were capable of significantly improving our K-12 student outcomes, we would surely have seen evidence by now.
And to pretend that such evidence exists when it doesn't is nothing short of educational malpractice.