Statistics
In statistics and regression analysis, an independent variable that can take on only two possible values is called a dummy variable. For example, it may take on the value 0 if an observation is of a male subject or 1 if the observation is of a female subject. The two possible categories associated with the two possible values are mutually exclusive, so that no observation falls into more than one category, and the categories are exhaustive, so that every observation falls into some category. Sometimes there are three or more possible categories, which are pairwise mutually exclusive and are collectively exhaustive — for example, under 18 years of age, 18 to 64 years of age, and age 65 or above. In this case a set of dummy variables is constructed, each dummy variable having two mutually exclusive and jointly exhaustive categories — in this example, one dummy variable (called D1) would equal 1 if age is less than 18, and would equal 0 otherwise; a second dummy variable (called D2) would equal 1 if age is in the range 18-64, and 0 otherwise. In this set-up, the dummy variable pairs (D1, D2) can have the values (1,0) (under 18), (0,1) (between 18 and 64), or (0,0) (65 or older) (but not (1,1), which would nonsensically imply that an observed subject is both under 18 and between 18 and 64). Then the dummy variables can be included as independent (explanatory) variables in a regression. Note that the number of dummy variables is always one less than the number of categories: with the two categories male and female there is a single dummy variable to distinguish them, while with the three age categories two dummy variables are needed to distinguish them.
Such qualitative data can also be used for dependent variables. For example, a researcher might want to predict whether someone goes to college or not, using family income, a gender dummy variable, and so forth as explanatory variables. Here the variable to be explained is a dummy variable that equals 0 if the observed subject does not go to college and equals 1 if the subject does go to college. In such a situation, ordinary least squares (the basic regression technique) is widely seen as inadequate; instead probit regression or logistic regression is used. Further, sometimes there are three or more categories for the dependent variable — for example, no college, community college, and four-year college. In this case, the multinomial probit or multinomial logit technique is used.
Read more about this topic: Mutually Exclusive Events
Famous quotes containing the word statistics:
“We already have the statistics for the future: the growth percentages of pollution, overpopulation, desertification. The future is already in place.”
—Günther Grass (b. 1927)
“We ask for no statistics of the killed,
For nothing political impinges on
This single casualty, or all those gone,
Missing or healing, sinking or dispersed,
Hundreds of thousands counted, millions lost.”
—Karl Shapiro (b. 1913)
“O for a man who is a man, and, as my neighbor says, has a bone in his back which you cannot pass your hand through! Our statistics are at fault: the population has been returned too large. How many men are there to a square thousand miles in this country? Hardly one.”
—Henry David Thoreau (18171862)