Data Assumption: Multicollinearity

Very brief description

Multicollinearity is a condition in which the independent variables are highly correlated (r=0.8 or greater) such that the effects of the independents on the outcome variable cannot be separated. In other words, one of the predictor variables can be nearly perfectly predicted by one of the other predictor variables. 

Singularity is when the independent variables are (almost) perfectly correlated (r=1) so any one of the independent variables could be regarded as a combination of one or more of the other independent variables. In practice, you should not include independent variables with an inter-correlation of .70 or greater. 
 
Who cares
Collinearity is an issue whenever we have more than one scaled independent variable such as in ANCOVA, MANCOVA, Canonical Correlation, Multiple Regression, Multiple Discriminant Function Analysis, Logistic Regression, Structural Equation Modelling, Factor Analysis, and Cluster Analysis.

Why it is important
Under multicollinearity, assessments of the relative strength of the predictor variables (and their interaction effects) are unreliable. At the extreme, singularity leads to infinite large standard errors, large confidence intervals, indeterminant coefficients, and diminished predictive power due to the inclusion of redundant predictors that make no contribution to explaining variance (high Type II errors).
 
How to test
Several mechanisms exist to check for multicollinearity. An explanation of each would be too lengthy so I will only list each method:
  1. Bivariate correlation matrix
  2. Tolerance Statistic (TOL) 
  3. Variance Inflation Factor (VIF)
  4. Eigenvalues
  5. Condition Indices
  6. Variance Proportions
  7. Zero-Order vs Partial and Part Correlations

How to fix the problem
  1. Omit some of the redundant predictors but guard against the creation of specification errors (i.e. don’t omit important predictors of practical value in strategy formulation).
  2. Reduce the redundancy by combining highly correlated predictors into a newly created variable (factors). For those that can’t be combined, just eliminate them, but again, guard against the creation of specification errors.
  3. Conduct a principal components analysis (PCA) on the data and output the orthogonal component scores, for which there will be no multicollinearity. Use these scores in your analysis.
_____________________________________________
/zza41