Outlier cases – univariate outliers
Discussing the causes, impact, identification and remedial action of outliers is a lengthy subject. I will keep it short by only focussing on a few ways to identify, in this post, univariate outliers. Also refer to the post entitled: Outlier cases – bivariate and multivariate outliers.
Be reminded that with bivariate and multivariate analysis, the focus should not be on univariate outliers, though it is advisable to check them but don’t take immediate remedial action.

First and foremost, do the obvious by looking at a few visuals such as histograms, stemandleaf plots, boxandwhisker plots, normal probability plots, QQ detrended plots, etc. These graphs often show extreme outliers.

Specifically for categorial variables, inspection of the frequency distribution with a boxandwhisker plot for each variable will show outliers.
 Specifically for continuous variables, create standardised zscores of each variable (in bivariate regression investigate the residuals, e.g. their standardised zscores, such as the “studentized residuals”). Note that using zscores assumes a normal distribution. A general heuristic is that if more than 1% of all the cases have zscores greater than +2.58 (or just +2.5), then we have an outlier problem. If any are more than +3.29 (or just +3), then we have serious outliers (and most likely candidates for remedial action).
Here’s a practical example: If our rule is to remove all zscores outside 2.5, then if the SD is 9 and the mean is 60, then: 9 X 2.5 = 22.5. Add this to the mean: 60 + 22.5 = 82.5. So remove all cases with a mean larger than 82.5 (do the same for the bottom end of the scale). You may do this at different stringency levels i.e. 1.96, 2.58, or 3.29 (or 1SD, 2SD, 3SD). I bet you knew this!
Broadly, we have a few strategies to deal with univariate outliers including the following:

Remove the outlier cases (list wise or pairwise),

Transform the data (e.g. select the appropriate logarithmic, square root, reciprocal, reverse score etc. transformation procedure),

Change the score – either an easy change or a more complex “changescorestrategy”,
 Just investigate to determine the scope of outliers and keep the findings in the back of your mind for later action or nonaction. This is very applicable if you do bivariate or multivariate procedures.
Be careful with any of the above strategies, except the last one. My recommendation is to always check univariate outliers but don’t do anything yet if you are planning to do bivariate or multivariate analysis. While a data point may be a serious univariate outlier, it may not be an outlier in a bivariate or multivariate analysis – and the reverse is also true.
_________________________________________
Related Posts:
Outlier cases – bivariate and multivariate outliers
Outlier cases – bivariate and multivariate outliers
Further Reading:
_________________________________________
/zza95