Statistical Hypothesis Testing - The Testing Process

The Testing Process

In the statistical literature, statistical hypothesis testing plays a fundamental role. The usual line of reasoning is as follows:

There is an initial research hypothesis of which the truth is unknown.
The first step is to state the relevant null and alternative hypotheses. This is important as mis-stating the hypotheses will muddy the rest of the process. Specifically, the null hypothesis allows to attach an attribute: it should be chosen in such a way that it allows us to conclude whether the alternative hypothesis can either be accepted or stays undecided as it was before the test.
The second step is to consider the statistical assumptions being made about the sample in doing the test; for example, assumptions about the statistical independence or about the form of the distributions of the observations. This is equally important as invalid assumptions will mean that the results of the test are invalid.
Decide which test is appropriate, and state the relevant test statistic T.
Derive the distribution of the test statistic under the null hypothesis from the assumptions. In standard cases this will be a well-known result. For example the test statistic may follow a Student's t distribution or a normal distribution.
Select a significance level (α), a probability threshold below which the null hypothesis will be rejected. Common values are 5% and 1%.
The distribution of the test statistic under the null hypothesis partitions the possible values of T into those for which the null-hypothesis is rejected, the so called critical region, and those for which it is not. The probability of the critical region is α.
Compute from the observations the observed value t_obs of the test statistic T.
Decide to either fail to reject the null hypothesis or reject it in favor of the alternative. The decision rule is to reject the null hypothesis H₀ if the observed value t_obs is in the critical region, and to accept or "fail to reject" the hypothesis otherwise.

An alternative process is commonly used:

Compute from the observations the observed value t_obs of the test statistic T.
From the statistic calculate a probability of the observation under the null hypothesis (the p-value).
Reject the null hypothesis or not. The decision rule is to reject the null hypothesis if and only if the p-value is less than the significance level (the selected probability) threshold.

The two processes are equivalent. The former process was advantageous in the past when only tables of test statistics at common probability thresholds were available. It allowed a decision to be made without the calculation of a probability. It was adequate for classwork and for operational use, but it was deficient for reporting results.

The latter process relied on extensive tables or on computational support not always available. The explicit calculation of a probability is useful for reporting. The calculations are now trivially performed with appropriate software.

The difference in the two processes applied to the Radioactive suitcase example:

"The Geiger-counter reading is 10. The limit is 9. Check the suitcase."
"The Geiger-counter reading is high; 97% of safe suitcases have lower readings. The limit is 95%. Check the suitcase."

The former report is adequate, the latter gives a more detailed explanation of the data and the reason why the suitcase is being checked.

It is important to note the philosophical difference between accepting the null hypothesis and simply failing to reject it. The "fail to reject" terminology highlights the fact that the null hypothesis is assumed to be true from the start of the test; if there is a lack of evidence against it, it simply continues to be assumed true. The phrase "accept the null hypothesis" may suggest it has been proved simply because it has not been disproved, a logical fallacy known as the argument from ignorance. Unless a test with particularly high power is used, the idea of "accepting" the null hypothesis may be dangerous. Nonetheless the terminology is prevalent throughout statistics, where its meaning is well understood.

Alternatively, if the testing procedure forces us to reject the null hypothesis (H₀), we can accept the alternative hypothesis (H₁) and we conclude that the research hypothesis is supported by the data. This fact expresses that our procedure is based on probabilistic considerations in the sense we accept that using another set of data could lead us to a different conclusion.

The processes described here are perfectly adequate for computation. They seriously neglect the design of experiments considerations.

It is particularly critical that appropriate sample sizes be estimated before conducting the experiment.

Read more about this topic: Statistical Hypothesis Testing

Famous quotes containing the words testing process, testing and/or process:

“Today so much rebellion is aimless and demoralizing precisely because children have no values to challenge. Teenage rebellion is a testing process in which young people try out various values in order to make them their own. But during those years of trial, error, embarrassment, a child needs family standards to fall back on, reliable habits of thought and feeling that provide security and protection.”
—Neil Kurshan (20th century)

“Traditional scientific method has always been at the very best 20-20 hindsight. It’s good for seeing where you’ve been. It’s good for testing the truth of what you think you know, but it can’t tell you where you ought to go.”
—Robert M. Pirsig (b. 1928)

“A designer who is not also a couturier, who hasn’t learned the most refined mysteries of physically creating his models, is like a sculptor who gives his drawings to another man, an artisan, to accomplish. For him the truncated process of creating will always be an interrupted act of love, and his style will bear the shame of it, the impoverishment.”
—Yves Saint Laurent (b. 1936)