Multiple Comparisons - Large-scale Multiple Testing

Large-scale Multiple Testing

Traditional methods for multiple comparisons adjustments focus on correcting for modest numbers of comparisons, often in an analysis of variance. A different set of techniques have been developed for "large-scale multiple testing," in which thousands or even greater numbers of tests are performed. For example, in genomics, when using technologies such as microarrays, expression levels of tens of thousands of genes can be measured, and genotypes for millions of genetic markers can be measured. Particularly in the field of genetic association studies, there has been a serious problem with non-replication — a result being strongly statistically significant in one study but failing to be replicated in a follow-up study. Such non-replication can have many causes, but it is widely considered that failure to fully account for the consequences of making multiple comparisons is one of the causes.

In different branches of science, multiple testing is handled in different ways. It has been argued that if statistical tests are only performed when there is a strong basis for expecting the result to be true, multiple comparisons adjustments are not necessary. It has also been argued that use of multiple testing corrections is an inefficient way to perform empirical research, since multiple testing adjustments control false positives at the potential expense of many more false negatives. On the other hand, it has been argued that advances in measurement and information technology have made it far easier to generate large datasets for exploratory analysis, often leading to the testing of large numbers of hypotheses with no prior basis for expecting many of the hypotheses to be true. In this situation, very high false positive rates are expected unless multiple comparisons adjustments are made.

For large-scale testing problems where the goal is to provide definitive results, the familywise error rate remains the most accepted parameter for ascribing significance levels to statistical tests. Alternatively, if a study is viewed as exploratory, or if significant results can be easily re-tested in an independent study, control of the false discovery rate (FDR) is often preferred. The FDR, defined as the expected proportion of false positives among all significant tests, allows researchers to identify a set of "candidate positives," of which a high proportion are likely to be true. The false positives within the candidate set can then be identified in a follow-up study.

Read more about this topic:  Multiple Comparisons

Famous quotes containing the words large-scale, multiple and/or testing:

    Is an intelligent human being likely to be much more than a large-scale manufacturer of misunderstanding?
    Philip Roth (b. 1933)

    There is a continual exchange of ideas between all minds of a generation. Journalists, popular novelists, illustrators, and cartoonists adapt the truths discovered by the powerful intellects for the multitude. It is like a spiritual flood, like a gush that pours into multiple cascades until it forms the great moving sheet of water that stands for the mentality of a period.
    Auguste Rodin (1849–1917)

    Traditional scientific method has always been at the very best 20-20 hindsight. It’s good for seeing where you’ve been. It’s good for testing the truth of what you think you know, but it can’t tell you where you ought to go.
    Robert M. Pirsig (b. 1928)