Data Assumption: Multicollinearity

Posted on January 13, 2022 by Introspective-Mode in Assumptions, Multicollinearity

BRIEF DESCRIPTION:

Multicollinearity is a condition in which the independent variables are highly correlated (r=0.8 or greater) such that the effects of the independents on the outcome variable cannot be separated. In other words, one of the predictor variables can be nearly perfectly predicted by one of the other predictor variables.

Singularity is when the independent variables are (almost) perfectly correlated (r=1) so any one of the independent variables could be regarded as a combination of one or more of the other independent variables. In practice, you should not include independent variables with an inter-correlation of .70 or greater.

Who cares

Collinearity is an issue whenever we have more than one scaled independent variable such as in ANCOVA, MANCOVA, Canonical Correlation, Multiple Regression, Multiple Discriminant Function Analysis, Logistic Regression, Structural Equation Modelling, Factor Analysis, and Cluster Analysis.

Why it is important

Under multicollinearity, assessments of the relative strength of the predictor variables (and their interaction effects) are unreliable. At the extreme, singularity leads to infinite large standard errors, large confidence intervals, indeterminant coefficients, and diminished predictive power due to the inclusion of redundant predictors that make no contribution to explaining variance (high Type II errors).

How to test

Several mechanisms exist to check for multicollinearity. An explanation of each would be too long so I will only list each method:

Bivariate correlation matrix
Tolerance Statistic (TOL)
Variance Inflation Factor (VIF)
Eigenvalues
Condition Indices
Variance Proportions
Zero-Order vs Partial and Part Correlations

How to fix the problem

Omit some of the redundant predictors but guard against the creation of specification errors (i.e. don’t omit important predictors of practical value in strategy formulation).
Reduce the redundancy by combining highly correlated predictors into a newly created variable (factors). For those that can’t be combined, just eliminate them, but again, guard against the creation of specification errors.
Conduct a principal components analysis (PCA) on the data and output the orthogonal component scores, for which there will be no multicollinearity. Use these scores in your analysis.

_____________________________________________

/zza41

5 3 votes

Article Rating