Discriminant Function Analysis (DFA) and Logistic Regression (LR) are so similar, yet so different. Which one when, or either at any time? Lets see….
DISCRIMINANT FUNCTION ANALYSIS (DFA):
Is used to model the value (exclusive group membership) of a either a dichotomous or a nominal dependent variable (outcome) based on its relationship with one or more continuous scaled independent variables (predictors). A predictive model consisting of one or more discriminant functions (based on the linear combinations of the predictor variables that provide the best discrimination between the groups) is generated from a sample with a known membership on the dependent variable, to predict the membership of a sample with an unknown membership. So we train our model on current data to predict future data!
A classic example of DFA is for a bank’s loan department to develop a credit risk profile for current customers to predict the outcome (credit risk or not) for new loan applications. Other examples would be to predict whether new customers will buy or not, be brand loyal or not, if a sales strategy will have a low, medium, or high success rate, or into which segment new customers will fall. Furthermore, it indicates which of the predictors are the most differentiating (highest discriminant weights), in other words, which dimensions distinguish best among these consumer segments and why respondents fall into one group versus another group. In summary, it is a technique for classification, differentiation, and profiling.
Sounds good but DFA has some serious data assumptions and if our data fails to meet these assumptions, we can (and should) turn to logistic regression…. which is easier and has less strict assumptions (though in some cases LR may not be as robust as DFA).
LOGISTIC REGRESSION (LR): While logistic regression is very similar to discriminant function analysis, the primary question addressed by LR is “How likely is the case to belong to each group (DV)”. In contrast, the primary question addressed by DFA is “Which group (DV) is the case most likely to belong to”. So, LR estimates the probability of each case to belong to two or more groups (on the dependent variable) or the probability of occurrence if the predictor changes. Rather than estimating the value of the outcome (as in ordinary least squares regression [OLS]), logistic regression estimates the probability of either a binary (e.g. success or failure, buy or not buy) or a multinomial outcome (e.g. into group 1 or 2 or 3). As the focus is on probability (based on the probability theory), the goal of analyses is to create a linear combination of the log of the odds of a case being in one group or another. An odds ratio is estimated for each of the predictor variables in the model.
While either technique (DFA or LR) are applicable in many instances, it is important to understand the key differences between them:
- While both techniques require a categorical dependent variable, LR is preferred when the dependent variable is dichotomous, while DFA is preferred when it is nominal (more than two groups).
- LR accepts continuous as well as categorical predictor variables while DFA accepts only continuous (or dummy) and no categorical predictors. Avoid dichotomous (dummy) predictors in DFA unless the dependent variable groups are of equal size.
- If all variables are continuous, use linear regression – but we’ll then predict values rather than group membership unless you discretize the DV. If all variables are categorical, use log linear analysis (which is an extension of the Chi-square).
- In LR we are more interested in the independent variables’ prediction power of the outcome, rather than the outcome itself which is more important in DFA.
- LR is more appropriate when the researcher is interested in the underlying structure of the prediction (“what are the most important predictors?” or “what is the role that different variables play in the prediction), rather than in the specific prediction of which group people belong to which is the emphasis of DFA. So in LR the emphasis is on the predictors, while in DFA the emphasis is on the group prediction itself.
- LR is preferred over DFA when the stricter DFA assumptions are not met (LR requires fewer assumptions).
- DFA requires multivariate normality while LR is robust against deviations from normality.
- LR is applicable to a broader range of research questions than DFA.
- LR generates dummy variables automatically, while in DFA they need to be created by the researcher.
- Use LR if group membership is a truly categorical variable… rather than splitting a continious variable into a dichotomous variable (discretization).
- LR generally requires a larger sample size than DFA.
I generally prefer LR where possible as it is more closely related to ordinary regression and easier to interpret and explain to clients.