Assumptions

Data Assumption: Sphericity

November 11, 2017

Very brief description The assumption of sphericity refers to the equality of variances of the differences between treatment levels.  In Repeated Measures ANOVA it is a measure of the homogeneity of the variances of the differences between levels so it is quite similar to homogeneity of variance in between-groups in the univariate ANOVA. It is denoted by ε and sometimes referred to as “circularity”.   Who cares Sphericity applies to repeated measures ANOVA and MANOVA. While technically not an assumption of Factor Analysis, “Bartlett’s test of [READ MORE]

Data Assumption: Linearity

October 9, 2017

Very brief description: Linearity means that mean values of the outcome variable (dependent variable) for each increment of the predictors (independent variables) lie along a straight line (so we are modeling a straight relationship).    Who cares The assumption of linearity is required by all multivariate techniques based on correlation measures of association e.g. Regression, Logistics Regression, Factor Analysis, Structural Equation Modeling, Discriminant Analysis, General Linear Models, etc.    Why it is important Most, if not all of the tests of association / relationships that we [READ MORE]

Significance Testing – Three Concerns

June 19, 2017

Some words of caution about significance testing by Kevin Gray: “I’ve long had three major concerns about significance testing. First, it assumes probability samples, which are rare in most fields. For example, even when probability sampling (e.g., RDD) is used in consumer surveys, because of (usually) low response rates, we don’t have true probability samples. Secondly, it assumes no measurement error. Measurement error can work in mysterious ways but generally weakens relationships between variables. Lastly, like automated modeling, it passes the buck to the machine and [READ MORE]

Outlier cases – bivariate and multivariate outliers

August 14, 2016

In follow-up to the post about univariate outliers, there are a few ways we can identify the extent of bivariate and multivariate outliers:   First, do the univariate outlier checks and with those findings in mind (and with no immediate remedial action), follow some, or all of these bivariate or multivariate outlier identifications depending on the type of analysis you are planning.  _____________________________________________________ BIVARIATE OUTLIERS: For one-way ANOVA, we can use the GLM (univariate) procedure to save standardised or studentized residuals. Then do a normal [READ MORE]

Data Assumption: Multicollinearity

May 13, 2016

Very brief description Multicollinearity is a condition in which the independent variables are highly correlated (r=0.8 or greater) such that the effects of the independents on the outcome variable cannot be separated. In other words, one of the predictor variables can be nearly perfectly predicted by one of the other predictor variables.  Singularity is when the independent variables are (almost) perfectly correlated (r=1) so any one of the independent variables could be regarded as a combination of one or more of the other independent variables. In practice, you should not [READ MORE]

Data Assumptions: Univariate Normality

January 14, 2015

Very brief description As one of the most basic data assumptions, much has been written about univariate, bivariate and multivariate normality. An excellent reference is by Tom Burdenski (2000) entitled Evaluating Univariate, Bivariate, and Multivariate Normality Using Graphical and Statistical Procedures. A few noteworthy comments about normality: 1. Normality can have different meanings in different contexts, i.e. sampling distribution normality and model error distribution (e.g. in Regression and GLM). Be very careful which type of normality is applicable.   2. By definition, a [READ MORE]

Outlier cases – univariate outliers

July 26, 2014

Discussing the causes, impact, identification and remedial action of outliers is a lengthy subject. I will keep it short by only focussing on a few ways to identify, in this post, univariate outliers. Also refer to the post entitled: Outlier cases – bivariate and multivariate outliers.   Be reminded that with bivariate and multivariate analysis, the focus should not be on univariate outliers, though it is advisable to check them but don’t take immediate remedial action.   First and foremost, do the obvious by looking at a few visuals such as histograms, stem-and-leaf plots, [READ MORE]

Data Assumption: Homogeneity of variance-covariance matrices (Multivariate Tests)

October 15, 2013

Very brief description: “Homogeneity of variance-covariance matrices” is the multivariate version of the univariate assumption of Homogeneity of variance and the bivariate assumption of Homoscedasticity. Refer to the post “Homogeneity of variance” for a discussion of equality of variances. In short, homogeneity of variance-covariance matrices concerns the variance-covariance matrices of the multiple dependent measures (such as in MANOVA) for each group. For example, if you have five dependent variables, it tests for five correlations and ten covariances for [READ MORE]

Data Assumption: Homoscedasticity (Bivariate Tests)

September 3, 2013

  Very brief description:   Homoscedasticity is the bivariate version of the univariate assumption of Homogeneity of variance, and the multivariate assumption of Homogeneity of variance-covariance matrices.  Refer to the post “Homogeneity of variance” for a discussion of equality of variances. In short, homoscedasticity suggests that the metric dependent variable(s) have equal levels of variability across a range of either continuous or categorical independent variables.  More specifically, in bivariate analysis such as regression, homoscedasticity means that the variance [READ MORE]

Data Assumption: Homogeneity of variance (Univariate Tests)

August 2, 2013

Very brief description: When comparing groups, their dispersion (variances) on the dependent variable should be relatively equal at each level of the independent (factor or grouping) variable (and neither should their sample sizes vary greatly across the groups). In other words, the dependent variable should exhibit equal levels of variance across the range of groups. Homogeneity of variance is the univariate version of bivariate test of homoscedasticity, and the multivariate assumption of homogeneity of variance-covariance matrices.   Who cares Both t-test and ANOVA are sensitive to [READ MORE]

Data Assumption: Homogeneity of regression slopes (test of parallelism)

July 19, 2013

Very brief description: The dependent variable and any covariate(s) such as in ANCOVA and MANCOVA, should have the same slopes (b-coefficient) across all levels of the categorical grouping variable (factors). In other words, the covariate(s) must be linearly related to the dependent variable. On the other hand, covariate(s) and factors should not be significantly correlated.   Who cares ANCOVA MANCOVA Ordinal regression Probit response models   Why is it important The fact is: when groups differ significantly on the covariate (thus an interaction) then placing the covariate into the [READ MORE]

Data Assumptions: Its about the residuals, and not the variables’ raw data

June 3, 2013

Normality, or normal distributions is a very familiar term but what does it really mean and what does it refer to…   In linear models such as ANOVA and Regression (or any regression-based statistical procedures), an important assumptions is “normality”. The question is whether it refers to the outcome (dependent variable “Y”), or the predictor (independent variable “X”). We should remember that the true answer is “none of the above”.    In linear models where we look at the relationship between dependent and independent variables, our [READ MORE]
1 2