# Data Assumption: Linearity

**Very brief description:**

Linearity means that mean values of the outcome variable (dependent variable) for each increment of the predictors (independent variables) lie along a straight line (so we are modeling a straight relationship).

**Who cares**

The assumption of linearity is required by all multivariate techniques based on correlation measures of association e.g. Regression, Logistics Regression, Factor Analysis, Structural Equation Modeling, Discriminant Analysis, General Linear Models, etc.

**Why it is important**

Most, if not all of the tests of association / relationships that we commonly use in marketing research, are based on the strict assumption of a

*linear*relationship between two or more variables. The Pearson’s*r*only captures linear relationships and would be partly invalid for non-linear relationships.

Should relationships significantly deviate from linearity, these non-linear effects will not be represented in the correlation values (coefficients) and therefore result in an under-estimation of the actual strengths in relationships. So in the event of a violation of linearity, the results will not necessarily be invalid, but rather weakened as the linear regression coefficient will capture the linearity portion (if any), but ignore the curvilinear relationship. In such cases where we have non-linearity, other methods are more applicable such as Nonlinear Regression, Polynomial Models, Exponential Models, etc.

**How to Test**

- Graphical methods: Its easy to spot linearity (or a lack of it) when inspecting bivariate scatterplots of variables. Note that variables that are both normally distributed and as well as linearly related to each other will produce scatterplots that are oval-shaped or elliptical. However, a better approach of a graphic representation of linearity is to plot the residuals of your model. Plot standardized residuals against standardized estimates of the dependent variable. It should show a random pattern when linearity is present. You can also plot studentized residuals against standardized predicted values and it should have a random pattern.
- A general heuristic in Regression is that when the standard deviation of the residuals exceeds the standard deviation of the dependent, it indicates non-linearity.
- Other methods to detect linearity include curve fitting (curve estimations) with R-squared difference tests, Eta coefficient of nonlinear association, ANOVA test, and the Ramsey’s RESET test (a regression specification error test).

**How to fix the problem**

Nonlinearity is generally not a serious violation when the standard deviation of the residuals is less than the standard deviation of the dependent.

Serious violations could be remedied by transformations, but best would be to opt for logistic regression which only requires linearity of the logit. The linearity of the logit assumption can be tested with the Box-Tidwell procedure and if any interaction terms are significant it indicates that the main effect has violated the assumption of linearity of the logit. Another possible, but debatable remedy is to introduce dummies as straight lines which could increase linearity.

______________________________________________

/zza47

Serious violations could be remedied by transformations, but best would be to opt for logistic regression which only requires linearity of the logit. The linearity of the logit assumption can be tested with the Box-Tidwell procedure and if any interaction terms are significant it indicates that the main effect has violated the assumption of linearity of the logit. Another possible, but debatable remedy is to introduce dummies as straight lines which could increase linearity.

______________________________________________

/zza47