Turns out I was right.
Here's Andrew Gelman (and here, too).
Brain imaging studies under fire (naturenews)
interview: Have the Results of Some Brain Scanning Experiments Been Overstated? (Scientific American)
LEHRER: What is a "voodoo correlation"?Of course, now I'm wondering whether there is anything I think I know about brain & behavior that is not based on non-independent analysis.
VUL: We use that term as a humorous way to describe mysteriously high correlations produced by complicated statistical methods (which usually were never clearly described in the scientific papers we examined)—and which turn out unfortunately to yield some very misleading results. The specific issue we focus on, which is responsible for a great many mysterious correlations, is something we call “non-independent” testing and measurement of correlations. Basically, this involves inadvertently cherry-picking data and it results in inflated estimates of correlations.
To go into a bit more detail:
An fMRI scan produces lots of data: a 3-D picture of the head, which is divided into many little regions, called voxels. In a high-resolution fMRI scan, there will be hundreds of thousands of these voxels in the 3-D picture.
When researchers want to determine which parts of the brain are correlated with a certain aspect of behavior, they must somehow choose a subset of these thousands of voxels. One tempting strategy is to choose voxels that show a high correlation with this behavior. So far this strategy is fine.
The problem arises when researchers then go on to provide their readers with a quantative measure of the correlation magnitude measured just within the voxels they have pre-selected for having a high correlation. This two-step procedure is circular: it chooses voxels that have a high correlation, and then estimates a high average correlation. This practice inflates the correlation measurement because it selects those voxels that have benefited from chance, as well as any real underlying correlation, pushing up the numbers.
One can see closely analogous phenomena in many areas of life. Suppose we pick out the investment analysts whose stock picks for April 2005 did best for that month. These people will probably tend to have talent going for them, but they will also have had unusual luck (and some finance experts, such as Nassim Taleb, actually say the luck will probably be the bigger element). But even assuming they are more talented than average—as we suspect they would be—if we ask them to predict again, for some later month, we will invariably find that as a group, they cannot duplicate the performance they showed in April. The reason is that next time, luck will help some of them and hurt some of them—whereas in April, they all had luck on their side or they wouldn’t have gotten into the top group. So their average performance in April is an overestimate of their true ability—the performance they can be expected to duplicate on the average month.
It is exactly the same with fMRI data and voxels. If researchers select only highly correlated voxels, they select voxels that "got lucky," as well as having some underlying correlation. So if you take the correlations you used to pick out the voxels as a measure of the true correlation for these voxels, you will get a very misleading overestimate.
This, then, is what we think is at the root of the voodoo correlations: the analysis inadvertently capitalized on chance, resulting in inflated measurements of correlation. The tricky part, which I can’t go into here, was that investigators were actually trying to take account of the fact they were checking so many different brain areas—but their precautions made the problem that I am describing worse, not better!