Logistic Regression - Introduction

Introduction

Both linear and logistic regression analyses compare the observed values of the criterion with the predicted values with and without the variable(s) in question in order to determine if the model that includes the variable(s) more accurately predicts the outcome than the model without that variable (or set of variables). Given that both analyses are guided by the same goal, why is it that logistic regression is needed for analyses with a dichotomous criterion? Why is linear regression inappropriate to use with a dichotomous criterion? There are several reasons why it is inappropriate to conduct linear regression on a dichotomous criterion. First, it violates the assumption of linearity. The linear regression line is the expected value of the criterion given the predictor(s) and is equal to the intercept (the value of the criterion when the predictor(s) are equal to zero) plus the product of the regression coefficient and some given value of the predictor plus some error term – this implies that it is possible for the expected value of the criterion given the value of the predictor to take on any value as the predictor(s) ranges from ; however, this is not the case with a dichotomous criterion. The conditional mean of a dichotomous criterion must be greater than or equal to zero and less than or equal to one, thus, the distribution is not linear but sigmoid or S-shaped. As the predictors approach the criterion asymptotes at zero and as the predictors approach the criterion asymptotes at one. Linear regression disregards this information and it becomes possible for the criterion to take on probabilities less than zero and greater than one although such values are not theoretically permissible. Furthermore, there is no straightforward interpretation of such values.

Second, conducting linear regression with a dichotomous criterion violates the assumption that the error term is homoscedastic. Homoscedasticity is the assumption that variance in the criterion is constant at all levels of the predictor(s). This assumption will always be violated when one has a criterion that is distributed binomially. Consider the variance formula: e = PQ, wherein P is equal to the proportion of "1's" or "cases" and Q is equal to (1 − P), the proportion of "0's" or "noncases" in the distribution. Given that there are only two possible outcomes in a binomial distribution, one can determine the proportion of "noncases" from the proportion of "cases" and vice versa. Likewise, one can also determine the variance of the distribution from either the proportion of "cases" or "noncases". That is to say that the variance is not independent of the predictor – the error term is not homoscedastic, but heteroscedastic, meaning that the variance is not equal at all levels of the predictor. The variance is greatest when the proportion of cases equals .5. e = PQ = .5(1 − .5) = .5(.5) = .25. As the proportion of cases approaches the extremes, however, error approaches zero. For example, when the proportion of cases equals .99, there is almost zero error: e = PQ = .99(1 − .99) = .99(.01) = .009. Therefore, error or variance in the criterion is not independent of the predictor variable(s).

Third, conducting linear regression with a dichotomous variable violates the assumption that error is normally distributed because the criterion has only two values. Given that a dichotomous criterion violates these assumptions of linear regression, conducting linear regression with a dichotomous criterion may lead to errors in inference and at the very least, interpretation of the outcome will not be straightforward.

Given the shortcomings of the linear regression model for dealing with a dichotomous criterion, it is necessary to use some other analysis. Besides logistic regression, there is at least one additional alternative analysis for dealing with a dichotomous criterion – discriminant function analysis. Like logistic regression, discriminant function analysis is a technique in which a set of predictors is used to determine group membership. There are two problems with discriminant function analysis, however: first, like linear regression, discriminant function analysis may produce probabilities greater than one or less than zero, even though such probabilities are theoretically inadmissible. In addition, discriminant function analysis assumes that the predictor variables are normally distributed. Logistic regression neither produces probabilities that lie below zero or above one, nor imposes restrictive normality assumptions on the predictors.

Logistic regression is a generalized linear model, specifically a type of binomial regression. Logistic regression serves to transform the limited range of a probability, restricted to the range, into the full range, which makes the transformed value more suitable for fitting using a linear function. The effect of both functions is to transform the middle of the probability range (near 50%) more or less linearly, while stretching out the extremes (near 0% or 100%) exponentially. This is because in the middle of the probability range, one expects a relatively linear function – it is towards the extremes that the regression line begins to curve as it approaches asymptote; hence, the sigmoidal distribution (see Figure 1). In essence, when conducting logistic regression, one is transforming the probability of a case outcome into the odds of a case outcome and taking the natural logarithm of the odds to create the logit. The odds as a criterion provides an improvement over probability as the criterion as the odds has no fixed upper limit; however, the odds is still limited in that it has a fixed lower limit of zero and its values do not tend to be normally distributed or linearly related to the predictors. Hence, it is necessary to take the natural logarithm of the odds to remedy these limitations.

The natural logarithm is the power to which the base, e must be raised to produce some value Y (the criterion). Euler's number or e is a mathematical constant equal to about 2.71828. An excellent example of this relationship is when Y = 2.71828 or e. When Y = 2.71828, ln(Y or 2.71828) = 1, because Y equals e in this instance, so e must only be raised to the power of 1 to equal itself. In other words, Y is the power to which the base, e, must be raised to equal Y (2.71828). Given that the logit is not generally interpreted and that the inverse of the natural logarithm, the exponential function of the logit is generally interpreted instead, it is also helpful to examine this function (denoted: ). To illustrate the relationship between the exponential function and the natural logarithm, consider the exponentiation of the product of the natural logarithm above. There it was evident that the natural logarithm of 2.71828 was equal to 1. Here, if one exponentiates 1, the product is 2.71828; thus, the exponential function is the reciprocal of the natural logarithm. The logit can be thought of as a latent continuous variable that is fit to the predictors analogous to the manner in which a continuous criterion is fit to the predictors in linear regression analysis. After the criterion (the logit) is fit to the predictors the result is exponentiated, converting the unintuitive logit back in to the easily interpretable odds. It is important to note that, the probability, odds ratio, and logit all provide the same information. A probability of .5 is equal to an odds ratio of 1 and a logit of 0 – all three values indicate that "case" and "noncase" outcomes are equally likely.

It is also important to note that, although the observed outcomes of the response variables are categorical — simple "yes" or "no" outcomes — logistic regression actually models a continuous variable (the probability of "yes"). This probability is a latent variable that is assumed to generate the observed yes/no outcomes. At its heart, this is conceptually similar to ordinary linear regression, which predicts the unobserved expected value of the outcome (e.g. the average income, height, etc.), which in turn generates the observed value of the outcome (which is likely to be somewhere near the average, but may differ by an "error" term). The difference is that for a simple normally distributed continuous variable, the average (expected) value and observed value are measured with the same units. Thus it is convenient to conceive of the observed value as simply the expected value plus some error term, and often to blur the difference between the two. For logistic regression, however, the expected value and observed value are different types of values (continuous vs. discrete), and visualizing the observed value as expected value plus error does not work. As a result, the distinction between expected and observed value must always be kept in mind.

Read more about this topic:  Logistic Regression

Famous quotes containing the word introduction:

    The role of the stepmother is the most difficult of all, because you can’t ever just be. You’re constantly being tested—by the children, the neighbors, your husband, the relatives, old friends who knew the children’s parents in their first marriage, and by yourself.
    —Anonymous Stepparent. Making It as a Stepparent, by Claire Berman, introduction (1980, repr. 1986)

    We used chamber-pots a good deal.... My mother ... loved to repeat: “When did the queen reign over China?” This whimsical and harmless scatological pun was my first introduction to the wonderful world of verbal transformations, and also a first perception that a joke need not be funny to give pleasure.
    Angela Carter (1940–1992)

    For the introduction of a new kind of music must be shunned as imperiling the whole state; since styles of music are never disturbed without affecting the most important political institutions.
    Plato (c. 427–347 B.C.)