Data Assumptions: Its about the residuals, and not the variables’ raw data

Normality, or normal distributions is a very familiar term but what does it really mean and what does it refer to…
In linear models such as ANOVA and Regression (or any regression-based statistical procedures), an important assumptions is “normality”. The question is whether it refers to the outcome (dependent variable “Y”), or the predictor (independent variable “X”). We should remember that the true answer is “none of the above”. 
In linear models where we look at the relationship between dependent and independent variables, our focus is on Y, given the value of X (or Y|X). Of importance here is to look at the unexplained variance, which we call the “error term” or “residuals” (ε). So, when we check for the assumption of normality, we only need to consider the distribution of the residuals (which could be either the standardised or the studentized residuals). An easy way to detect normality is to do a normal probability plot and Q-Q plots of these residuals. 
So, to meet the assumption of normality, only our residuals need to have a normal distribution. We don’t need to care about the univariate normality of either the dependent or the independent variables. Note that while a lack of normality of residuals are often caused by non-normality of the dependent variable, it could be that even though the dependent variable is normally distributed, that residuals fail the assumption of normality. In this case, non-normality of residuals are likely caused by a violation of the assumption of linearity, or maybe the presence of a few large univariate outliers. Check for both univariate outliers (e.g. z-scores) and multivariate outliers (e.g. Mahalanobis distance) and also look at influence measures (e.g. SDfBeta or the Covariance ratio).
Other distributional assumptions that go hand-in-hand with normality of the residuals are “independence of errors” and “constant variance” (homoscedasticity). Both of these can be graphically checked by plotting the residuals against “predicted values”. This plot should have a good random scatter – so no distinctive patterns (e.g. lines, funnels or curves). You can also do scatter plots of the “Regression Standardized Predicted Values” (on the X-Axis) against “Regression Studentized Deleted Residuals” (Y-Axis) or plot Standardized Residuals (Y-Axis) against the predictors which should have no patterns. 
With statistical tests on binary dependent variables (such as logistic regression and discriminant analysis), the dependent variable can’t be normally distributed. This is not a problem with logistic regression as it luckily has no distribution assumptions (it’s a distribution-free procedure). However, for discriminant analysis we need to look at the multivariate normality which means normality within the groups formed by the dependent variable.
Note that the remedies of univariate non-normality through procedures such as transformations (e.g. log, square root, or reciprocal) may be a good remedy for univariate non-normality, but may have no significant effect on the normality of residuals.