ANOVA-family

Building statistical models: Linear (OLS) regression

April 17, 2018

Everyday, researchers around the world collect sample data to build statistical models to be as representative of the real world as possible so these models can be used to predict changes and outcomes in the real world. Some models are very complex, while others are as basic as calculating a mean score by summating several observations and then dividing the score by the number of observations. This mean score is a hypothetical value so it is just a simple model to describe the data.   The extent to which a statistical model (e.g. the mean score) represents the randomly collected [READ MORE]

Variables and their many names

January 12, 2018

Many of the statistical procedures used by marketing researchers are based on “general linear models” (GLM). These can be categorised into univariate, multivariate, and repeated measures models.  The underlying statistical formula is Y = Xb + e where Y is generally referred to as the “dependent variable”, X as the “independent variable”, b is the “parameters” to be estimated, and e is the “error” or noise which is present in all models (also generally referred to as the statistical error, error terms, or residuals). Note that both [READ MORE]

Repeated Measures ANOVA versus Linear Mixed Models.

March 9, 2017

You want to measure performance of the same individual measured over a period of time (repeated observations) on an interval scale dependant variable, but, which procedure to use?  So we are looking for an equivalent of the paired samples t-test, but we want to allow for two or more levels of the categorical variable i.e. pre, during, post. The Repeated Measures ANOVA [SPSS: ANALYZE / GENERAL LINEAR MODEL / REPEATED MEASURES] is simpler to use but sadly its often not as accurate and flexible as using Linear Mixed Models (SPSS: ANALYZE / MIXED MODELS / LINEAR). Reminder that the Linear [READ MORE]

Outlier cases – bivariate and multivariate outliers

August 14, 2016

In follow-up to the post about univariate outliers, there are a few ways we can identify the extent of bivariate and multivariate outliers:   First, do the univariate outlier checks and with those findings in mind (and with no immediate remedial action), follow some, or all of these bivariate or multivariate outlier identifications depending on the type of analysis you are planning.  _____________________________________________________ BIVARIATE OUTLIERS: For one-way ANOVA, we can use the GLM (univariate) procedure to save standardised or studentized residuals. Then do a normal [READ MORE]

Correlation and covariance matrices

August 30, 2015

Many statistical procedures such as the ANOVA family, covariates and multivariate tests rely on either covariance and/or correlation matrices. Statistical assumptions such as Levene’s test for homogeneity of variance, the Box’s M test for homogeneity of variance-covariance matrices, and the assumption of sphericity specifically address the properties of the variance-covariance matrix (also referred to as the covariance matrix, or dispersion matrix). The covariance matrix as shown below indicates the variance of the scores on the diagonal, and the covariance on the [READ MORE]

Analysis of Covariance (ANCOVA)

May 13, 2015

BRIEF DESCRIPTION The Analysis of Covariance (ANCOVA) follows the same procedures as the ANOVA except for the addition of an exogenous variable (referred to as a covariate) as an independent variable. The ANCOVA procedure is quite straightforward: It uses regression to determine if the covariate can predict the dependent variable and then does a test of differences (ANOVA) of the residuals among the groups. If there remains a significant difference among the groups, it signifies a significant difference between the dependent variable and the predictors after the effect of the [READ MORE]

Why ANOVA and not multiple t-tests? Why MANOVA and not multiple ANOVA’s, etc.

September 9, 2013

ANOVA reigns over the t-test and the MANOVA reigns over the ANOVA. Why?   If we want to compare several predictors with a single outcome variable, we can either do a series of t-tests, or a single factorial ANOVA.   Not only is a factorial ANOVA less work, but conducting several t-tests for each predictor separately will result in a higher probability of making Type I errors. In fact, with every single t-test there is a chance of a Type I error. Conducting several t-tests compounds this probability. In contrast, a single factorial ANOVA controls for this error so that the probability [READ MORE]

Data Assumptions: Its about the residuals, and not the variables’ raw data

June 3, 2013

Normality, or normal distributions is a very familiar term but what does it really mean and what does it refer to…   In linear models such as ANOVA and Regression (or any regression-based statistical procedures), an important assumptions is “normality”. The question is whether it refers to the outcome (dependent variable “Y”), or the predictor (independent variable “X”). We should remember that the true answer is “none of the above”.    In linear models where we look at the relationship between dependent and independent variables, our [READ MORE]

Measuring effect size and statistical power analysis

October 3, 2012

Effect size measures are crucial to establish practical significance, in addition to statistical significance. Please read the post “Tests of Significant are dangerous and can be very misleading” to better appreciate the importance of practical significance. Normally we only consider differences and associations from a statistical significance point of view and report at what level e.g. p<.001 we reject the null hypothesis (H0) and accept that there is a difference or association (note that we can never “accept the alternative hypothesis (H1)” – see the [READ MORE]

One-way (Independent) ANOVA

July 13, 2012

BRIEF DESCRIPTION: The One-way ANOVA is an extension of the Two-independent sample t-test as it compares the observed mean on the dependent variable among more than two groups as defined by the independent variable.  For example, is the mean customer satisfaction score (on the dependent variable) significantly different among three customer groups: adult men, adult women, and children (on the independent variable).  In addition to expressing group differences on the dependent variable, we can also express the findings in terms of relationship or association, e.g. “Age [READ MORE]