Statistical Hypothesis Testing

Origins

Hypothesis testing is largely the product of Ronald Fisher, Jerzy Neyman, Karl Pearson and (son) Egon Pearson. Fisher was an agricultural statistician who emphasized rigorous experimental design and methods to extract a result from few samples assuming Gaussian distributions. Neyman (who teamed with the younger Pearson) emphasized mathematical rigor and methods to obtain more results from many samples and a wider range of distributions. Modern hypothesis testing is an (extended) hybrid of the Fisher vs Neyman/Pearson formulation, methods and terminology developed in the early 20th century.

Fisher popularized the "significance test". He required a null-hypothesis (corresponding to a population frequency distribution) and a sample. His (now familiar) calculations determined whether to reject the null-hypothesis or not. Significance testing did not utilize an alternative hypothesis so there was no concept of a Type II error.

Neyman & Pearson considered a different problem (which they called "hypothesis testing"). They initially considered two simple hypotheses (both with frequency distributions). They calculated two probabilities and typically selected the hypothesis associated with the higher probability (the hypothesis more likely to have generated the sample). Their method always selected a hypothesis. It also allowed the calculation of both types of error probabilities.

Fisher and Neyman/Pearson clashed bitterly. The pair considered their formulation to be an improved generalization of significance testing. Fisher thought that it was without application. (The defining paper was abstract. Mathematicians have generalized and refined the theory for three generations.) All parties moved on to other matters with the conflict unresolved.

The modern version of hypothesis testing is a hybrid of the two approaches. (But signal detection, for example, still uses the Neyman/Pearson formulation.) Great conceptual differences were ignored. Neyman and Pearson provided the stronger terminology, the more rigorous mathematics and the more consistent philosophy, but the subject taught today in introductory statistics has more similarities with Fisher's method than theirs. This history explains the inconsistent terminology (example: the null hypothesis is never accepted, but there is a region of acceptance).

While hypothesis testing was popularized early in the 20th century, evidence of its use can be found much earlier. In the 1770s Laplace considered the statistics of almost half a million births. The statistics showed an excess of boys compared to girls. He concluded by calculation of a p-value that the excess was a real, but unexplained, effect.

Read more about this topic: Statistical Hypothesis Testing

Famous quotes containing the word origins:

“Grown onto every inch of plate, except
Where the hinges let it move, were living things,
Barnacles, mussels, water weeds—and one
Blue bit of polished glass, glued there by time:
The origins of art.”
—Howard Moss (b. 1922)

“The origins of clothing are not practical. They are mystical and erotic. The primitive man in the wolf-pelt was not keeping dry; he was saying: “Look what I killed. Aren’t I the best?””
—Katharine Hamnett (b. 1948)

“Compare the history of the novel to that of rock ‘n’ roll. Both started out a minority taste, became a mass taste, and then splintered into several subgenres. Both have been the typical cultural expressions of classes and epochs. Both started out aggressively fighting for their share of attention, novels attacking the drama, the tract, and the poem, rock attacking jazz and pop and rolling over classical music.”
—W. T. Lhamon, U.S. educator, critic. “Material Differences,” Deliberate Speed: The Origins of a Cultural Style in the American 1950s, Smithsonian (1990)

Related Phrases

Alternative Hypothesis

Confidence Intervals

Critical Region

Directional Alternative Hypothesis

Error Rate

False Positive

Independent Observations

Null Hypothesis

Radioactive Material

Statistical Hypothesis Test

Related Words