Data Assumption: Homoscedasticity (Bivariate Tests)


Very brief description:

Homoscedasticity is the bivariate version of the univariate assumption of Homogeneity of variance, and the multivariate assumption of Homogeneity of variance-covariance matrices

Refer to the post “Homogeneity of variance” for a discussion of equality of variances. In short, homoscedasticity suggests that the metric dependent variable(s) have equal levels of variability across a range of either continuous or categorical independent variables.  More specifically, in bivariate analysis such as regression, homoscedasticity means that the variance of errors (model residuals) is the same across all levels of the predictor variable.
Who cares
Pearson product-moment Correlation and Regression. In Regression, homoscedasticity refers to constant variance of error terms, so residuals at each level of the predictors should have the same variance. 

Why it is important
Refer to the post “Homogeneity of variance
How to Test
  1. In correlation, a scatterplot can clearly show if the variance throughout the plot is about the same.
  2. In regression, we need to focus on the error variance of our model. A scatterplot of the standardized predicted dependent variable by the standardized residuals (or any type of residuals such as studentized, deleted, or studentized deleted residuals) will indicate whether we have normally distributed errors and if the variances of the residuals are being constant (so if the residuals are relatively uncorrelated with the linear combination of our predictors). The plot should have a random (scattered) distribution. If plots have a clear pattern, then residuals are not normally distributed (violation of the assumption of normality), variances of residuals are not constant (violation of the assumption of homoscedasticity), and/or residuals are correlated with the predictors (which is a problem in regression!).  If it is clearly funnel-shaped, then it is not homoscedastic so the assumption has been violated. Note that if the plots show an obvious trend-line, then the assumption of linearity has likely been violated. This plot is also great to spot some of the extreme outliers!  A plot of Standardized Predicted values against Studentized Residuals should have a random distribution.
  3. Some also suggest the White’s test to detect whether the residual variance in our regression model is constant.
How to fix the problem
Remedies are similar to those in Homogeneity of variance although transformations to fix non-normality may not necessarily remedy the problem of non-normality of residuals. “Weight Estimation” (that uses Weighted Least Squares) by including a WLS weighting variable could be a solution. This procedure will weight data points by the reciprocal of their variances so that observations with large variances have less impact than observations with small variances.