Turns out I was right.
Here's Andrew Gelman (and here, too).
Brain imaging studies under fire (naturenews)
interview: Have the Results of Some Brain Scanning Experiments Been Overstated? (Scientific American)
LEHRER: What is a "voodoo correlation"?Of course, now I'm wondering whether there is anything I think I know about brain & behavior that is not based on non-independent analysis.
VUL: We use that term as a humorous way to describe mysteriously high correlations produced by complicated statistical methods (which usually were never clearly described in the scientific papers we examined)—and which turn out unfortunately to yield some very misleading results. The specific issue we focus on, which is responsible for a great many mysterious correlations, is something we call “non-independent” testing and measurement of correlations. Basically, this involves inadvertently cherry-picking data and it results in inflated estimates of correlations.
To go into a bit more detail:
An fMRI scan produces lots of data: a 3-D picture of the head, which is divided into many little regions, called voxels. In a high-resolution fMRI scan, there will be hundreds of thousands of these voxels in the 3-D picture.
When researchers want to determine which parts of the brain are correlated with a certain aspect of behavior, they must somehow choose a subset of these thousands of voxels. One tempting strategy is to choose voxels that show a high correlation with this behavior. So far this strategy is fine.
The problem arises when researchers then go on to provide their readers with a quantative measure of the correlation magnitude measured just within the voxels they have pre-selected for having a high correlation. This two-step procedure is circular: it chooses voxels that have a high correlation, and then estimates a high average correlation. This practice inflates the correlation measurement because it selects those voxels that have benefited from chance, as well as any real underlying correlation, pushing up the numbers.
One can see closely analogous phenomena in many areas of life. Suppose we pick out the investment analysts whose stock picks for April 2005 did best for that month. These people will probably tend to have talent going for them, but they will also have had unusual luck (and some finance experts, such as Nassim Taleb, actually say the luck will probably be the bigger element). But even assuming they are more talented than average—as we suspect they would be—if we ask them to predict again, for some later month, we will invariably find that as a group, they cannot duplicate the performance they showed in April. The reason is that next time, luck will help some of them and hurt some of them—whereas in April, they all had luck on their side or they wouldn’t have gotten into the top group. So their average performance in April is an overestimate of their true ability—the performance they can be expected to duplicate on the average month.
It is exactly the same with fMRI data and voxels. If researchers select only highly correlated voxels, they select voxels that "got lucky," as well as having some underlying correlation. So if you take the correlations you used to pick out the voxels as a measure of the true correlation for these voxels, you will get a very misleading overestimate.
This, then, is what we think is at the root of the voodoo correlations: the analysis inadvertently capitalized on chance, resulting in inflated measurements of correlation. The tricky part, which I can’t go into here, was that investigators were actually trying to take account of the fact they were checking so many different brain areas—but their precautions made the problem that I am describing worse, not better!
I've talked to two neuroscience friends who are angry at this paper; they think the underlying issue is really a political war between various camps.
ReplyDeletethat said, i immediately believed it, because when i was working as a physicist trying to detect explosives in luggage, we made exactly the same mistakes in our algorithms (back then I'd not gone to CS grad school, and no one in our company knew anything about algorithms, and precious little about probability.) EXACTLY the same mistake. we used the voxels to select the set of voxels out of which we then detected, and reported how great we were at finding our contraband,
it took a lot of fixing to create independent data sets--time consuming, miserable, painful work that no one wanted to do.
I believe it.
ReplyDeleteI have not idea whether there is also a war between two camps as well.
Brain scan data has always seemed oversold to me, though I'm not in a particularly strong position to reach that conclusion.
Nevertheless, when I was at NAAR, scientists routinely told me they thought brain scans were bunk.
Now that was several years ago, and I've been told that scans are now much better than they were -- but on the other hand I'm wondering whether people have told me that the big improvement was the fMRIs.
I don't remember.
I'm always suspicious of gizmo idolatry.
ReplyDeleteThis kid Vul is only a graduate student, and not even from within the neuroimaging community. He has no expertise on this issue, and his article has been panned by the real experts in related fields who collectively have hundreds more years of experience and understanding.
ReplyDeleteMy personal feeling is that he is an immature scientist who greatly overstepped his bounds in trying to generate flashy press-friendly claims.