Statistical Error

Significance Testing – Three Concerns

June 19, 2017

Some words of caution about significance testing by Kevin Gray: “I’ve long had three major concerns about significance testing. First, it assumes probability samples, which are rare in most fields. For example, even when probability sampling (e.g., RDD) is used in consumer surveys, because of (usually) low response rates, we don’t have true probability samples. Secondly, it assumes no measurement error. Measurement error can work in mysterious ways but generally weakens relationships between variables. Lastly, like automated modeling, it passes the buck to the machine and [READ MORE]

Type I and II errors – Hypothesis testing

March 10, 2015

In so many statistical procedures we execute, statistical significance of findings is the basis of statements, conclusions, and for making important decisions. While the importance of statistical significance (compared with practical significance) should never be overestimated, it is important to understand how statistical significance relates to hypothesis testing. A hypothesis statement is designed to either be disproven or failed to be disproven. (Note that a hypothesis can be disproven (or failed to be disproven), but can not proven to be true). Hypotheses relate to either [READ MORE]

Why ANOVA and not multiple t-tests? Why MANOVA and not multiple ANOVA’s, etc.

September 9, 2013

ANOVA reigns over the t-test and the MANOVA reigns over the ANOVA. Why?   If we want to compare several predictors with a single outcome variable, we can either do a series of t-tests, or a single factorial ANOVA.   Not only is a factorial ANOVA less work, but conducting several t-tests for each predictor separately will result in a higher probability of making Type I errors. In fact, with every single t-test there is a chance of a Type I error. Conducting several t-tests compounds this probability. In contrast, a single factorial ANOVA controls for this error so that the probability [READ MORE]

Data Assumptions: Its about the residuals, and not the variables’ raw data

June 3, 2013

Normality, or normal distributions is a very familiar term but what does it really mean and what does it refer to…   In linear models such as ANOVA and Regression (or any regression-based statistical procedures), an important assumptions is “normality”. The question is whether it refers to the outcome (dependent variable “Y”), or the predictor (independent variable “X”). We should remember that the true answer is “none of the above”.    In linear models where we look at the relationship between dependent and independent variables, our [READ MORE]

Tests of statistical significant can be dangerous and misleading

February 27, 2013

Years ago we used to programme our IBM PC’s to run t-tests overnight to determine if groups of respondents differ on a series of product attributes. We then highlighted all the attributes with significant differences at p‘<‘.05, p‘<‘.01 and p‘<‘.001 levels and proudly reported to the client which attributes are differentiating and which not. However, after all these years this practice (in many different forms) is still continued by some researchers (though now calculated in a split second), and in total disregard to the validity of a [READ MORE]

Building statistical models: Linear (OLS) regression

October 17, 2012

Everyday, researchers around the world collect sample data to build statistical models to be as representative of the real world as possible so these models can be used to predict changes and outcomes in the real world. Some models are very complex, while others are as basic as calculating a mean score by summating several observations and then dividing the score by the number of observations. This mean score is a hypothetical value so it is just a simple model to describe the data.   The extent to which a statistical model (e.g. the mean score) represents the randomly collected [READ MORE]

Means, sum of squares, squared differences, variance, standard deviation and standard error

August 18, 2012

I remember how confusing these terms were to me when I started learning statistics. Let me offer a brief non-technical explanation of each: When we take a random sample of observations from a population of particular interest (e.g. all our customers), we would like to do some modelling (e.g. mean or regression) so that our sample can describe and/or predict the total population of interest. The most basic model we can use is to calculate the mean score of any given variable or construct, and then conclude that it represents the population of interest.   However, before we can use the [READ MORE]