The first step is to review any violations in the data assumptions of linear regression. While these violations can lead to obvious problems with the results, it could also lead to less obvious problems, such a bias. In fact, we can say that if our data meet these assumptions, then our regression coefficients / parameters of the regression equation are most likely unbiased.
- Both the outcome and the predictor variables should be at least of an interval scale while the binary predictor variable(s) could also be dummy coded.
- Unbounded data: If a rating scale (i.e. 10-point) is used, check that there are no constraints to the variability of the outcome, e.g. responses that only vary between 1 and 5 which is indicative of poor scale choice). An alternative to bounded rating scales, is unbounded scales which allow respondents to express their feelings without imposing a specific “bounded” scale.
- Normality of the residuals and non-correlation with the predictors or outcome variables
- Independent observations
- Predictors can’t have a zero variance
- Predictors should be uncorrelated with “external variables”
- Linear bivariate relationship should exist between the outcome and predictor variables.
- Homoscedasticity is required and not heteroscedasticity
- Independence of error terms
- No multicollinearity (for multiple regression)
- Misspecification of the regression model: While some mis-specifications can be detected by checking the data assumptions (such as whether a linear relationship exists), others are more evasive to detect. Make sure the model was not over- or under-specified. In an over-specified model we included redundant predictors (possibly will be indicated by multicollinearity), and in an under-specified model we omitted important predictors (be careful of “omitted-variable bias”).
- Check if some extraneous variables have an effect on the outcome variable and which should be controlled for as covariates or as “blocking factors”.
- Are there any interaction effects (moderation or mediation) among the predictors that have not been accounted for.
- Check for some more obvious problems such as the handling of missing data and input specs in your statistics program.
The above list is by all means not comprehensive but only serves as what I believe are among the most important reasons why the regression horse sometimes does not look all too well.