Multiple Comparisons - Large-scale Multiple Testing

Large-scale Multiple Testing

Traditional methods for multiple comparisons adjustments focus on correcting for modest numbers of comparisons, often in an analysis of variance. A different set of techniques have been developed for "large-scale multiple testing," in which thousands or even greater numbers of tests are performed. For example, in genomics, when using technologies such as microarrays, expression levels of tens of thousands of genes can be measured, and genotypes for millions of genetic markers can be measured. Particularly in the field of genetic association studies, there has been a serious problem with non-replication — a result being strongly statistically significant in one study but failing to be replicated in a follow-up study. Such non-replication can have many causes, but it is widely considered that failure to fully account for the consequences of making multiple comparisons is one of the causes.

In different branches of science, multiple testing is handled in different ways. It has been argued that if statistical tests are only performed when there is a strong basis for expecting the result to be true, multiple comparisons adjustments are not necessary. It has also been argued that use of multiple testing corrections is an inefficient way to perform empirical research, since multiple testing adjustments control false positives at the potential expense of many more false negatives. On the other hand, it has been argued that advances in measurement and information technology have made it far easier to generate large datasets for exploratory analysis, often leading to the testing of large numbers of hypotheses with no prior basis for expecting many of the hypotheses to be true. In this situation, very high false positive rates are expected unless multiple comparisons adjustments are made.

For large-scale testing problems where the goal is to provide definitive results, the familywise error rate remains the most accepted parameter for ascribing significance levels to statistical tests. Alternatively, if a study is viewed as exploratory, or if significant results can be easily re-tested in an independent study, control of the false discovery rate (FDR) is often preferred. The FDR, defined as the expected proportion of false positives among all significant tests, allows researchers to identify a set of "candidate positives," of which a high proportion are likely to be true. The false positives within the candidate set can then be identified in a follow-up study.

Read more about this topic: Multiple Comparisons

Famous quotes containing the words large-scale, multiple and/or testing:

“Is an intelligent human being likely to be much more than a large-scale manufacturer of misunderstanding?”
—Philip Roth (b. 1933)

“... the generation of the 20’s was truly secular in that it still knew its theology and its varieties of religious experience. We are post-secular, inventing new faiths, without any sense of organizing truths. The truths we accept are so multiple that honesty becomes little more than a strategy by which you manage your tendencies toward duplicity.”
—Ann Douglas (b. 1942)

“Bourbon’s the only drink. You can take all that champagne stuff and pour it down the English Channel. Well, why wait 80 years before you can drink the stuff? Great vineyards, huge barrels aging forever, poor little old monks running around testing it, just so some woman in Tulsa, Oklahoma can say it tickles her nose.”
—John Michael Hayes (b.1919)