# Statistics Principles

### Variables and their many names

January 12, 2018

Many of the statistical procedures used by marketing researchers are based on “general linear models” (GLM). These can be categorised into univariate, multivariate, and repeated measures models.  The underlying statistical formula is Y = Xb + e where Y is generally referred to as the “dependent variable”, X as the “independent variable”, b is the “parameters” to be estimated, and e is the “error” or noise which is present in all models (also generally referred to as the statistical error, error terms, or residuals). Note that both [READ MORE]

### Getting the hang of z-scores

January 4, 2017

If we have a sample of data drawn randomly from a population with a normal distribution, we can assume that our sample distribution also has a normal distribution (provided a sample size of more than 30). If we have a mean of zero and a standard deviation (SD) of 1, then we can calculate the probability of getting a particular score based on the frequencies we have. To centre our data around a mean of zero, we need to subtract each individual score from the overall mean, then divide this by the standard deviation. This is the process of standardisation of raw data into z-scores. This [READ MORE]

### Research questions and hypotheses?

December 9, 2016

When doing proposals or client reports, we often refer to “research questions” and “research hypotheses” (sometimes used interchangeably). What is the difference?   Research Questions do NOT entail specific predictions (magnitude or direction of the outcome variable) and are therefore phrased in question format that could include questions about descriptives, difference or association (or relationship). These assist the researcher to choose the most appropriate statistics techniques. Lets look at each:   1. Research questions that relate to describing [READ MORE]

### Test statistics and significance

November 27, 2016

A test statistic such as the F-test, t-test, or the χ² test, all look at the proportion of variance explained (effect) by our model versus variance not explained (error) by our model. Our model can be as basic as a mean score which is calculated as the sum of the observed scores divided by the number of observations included. If this proportion is >1, then the variance explained (effect) is larger than the variance not explained (error). The higher this proportion the better our model.    Lets say it is 5 (rather than 1), so the proportion of explained variance (effect) is 5 times [READ MORE]

### Statistical Power Analysis

July 15, 2016

(Statistical) Power Analysis refers to the ability of a statistical test to detect an effect of a certain size, if the effect really exists. In other words, power is the probability of correctly rejecting the null hypothesis when it should be rejected. So while statistical significance deals with Type I (α) errors (false positives), power analysis deals with Type II (β) errors (false negatives), which means power is 1- β Cohen (1988) recommends that research studies be designed to achieve alpha levels of at least .05 and if we use Cohen’s rule of .2 for β, then 1- β= 0.8 (an 80% [READ MORE]

### Variables – three key types

February 10, 2016

Now here’s an easy one: What is a variable? It is simply something that varies – either its value or its characteristic. In fact, it must vary. If it does not vary then we can’t call it a VARiable, so we call it a “constant” such as the regression constant (the y-intercept).    In the equation of a straight line (linear relationship) Y = a + bX, where:    Y=dependent variable    X=independent variable    a=constant (the Y-axes intercept, or the value of Y when X=0)    b=coefficient (slope of the line, in other words the amount that Y increases [or [READ MORE]

### Revisiting the basics of data and measurement scales (Part 2)

July 28, 2015

The statistical procedures we choose depend on the type of data we collected with the different types of measurement scales we employed. We should be careful to understand the constructs we measure and the type of scales we employ as this will determine what statistical procedures are appropriate for analysis.   This post (in following-up to part 1) is partly based on what S.S. Stevens told us in 1946 (see “Further Reading” below) about data and scales. Please note that this classification is no science as there is a lingering debate about the classification [READ MORE]

### Revisiting the basics of data and measurement scales (Part 1)

July 4, 2015

Way-way back in 1946 Stanley S. Stevens (b:1906 – d:1973) at Harvard University published his classic paper entitled “On the Theory of Scales of Measurement” in which he describes four statistical operations applicable to measurements made with regards to objects, and he identified a particular scale associated with each. He calls them “determination of equality” (nominal scales), “determination of great or less” (ordinal scales), “determination of equality of intervals or differences” (interval scales), and “determination [READ MORE]

### Practical significance and effect size measures

June 26, 2015

If statistical significance is found (e.g. p<.001), the next logical step should be to calculate the practical significance i.e. the effect size (e.g. the standardised mean difference between two groups), which is a group of statistics that measure the magnitude differences, treatment effects, and strength of associations. Unlike statistical significance tests, effect size indices are not affected by large sample sizes (as in the case of statistical significance).    As effect size measures are standardised (units of measurement removed), they are easy to evaluate and easy to [READ MORE]

### Type I and II errors – Hypothesis testing

March 10, 2015

In so many statistical procedures we execute, statistical significance of findings is the basis of statements, conclusions, and for making important decisions. While the importance of statistical significance (compared with practical significance) should never be overestimated, it is important to understand how statistical significance relates to hypothesis testing. A hypothesis statement is designed to either be disproven or failed to be disproven. (Note that a hypothesis can be disproven (or failed to be disproven), but can not proven to be true). Hypotheses relate to either [READ MORE]

### Tests of statistical significant can be dangerous and misleading

February 27, 2013

Years ago we used to programme our IBM PC’s to run t-tests overnight to determine if groups of respondents differ on a series of product attributes. We then highlighted all the attributes with significant differences at p‘<‘.05, p‘<‘.01 and p‘<‘.001 levels and proudly reported to the client which attributes are differentiating and which not. However, after all these years this practice (in many different forms) is still continued by some researchers (though now calculated in a split second), and in total disregard to the validity of a [READ MORE]

### Measuring effect size and statistical power analysis

October 3, 2012

Effect size measures are crucial to establish practical significance, in addition to statistical significance. Please read the post “Tests of Significant are dangerous and can be very misleading” to better appreciate the importance of practical significance. Normally we only consider differences and associations from a statistical significance point of view and report at what level e.g. p<.001 we reject the null hypothesis (H0) and accept that there is a difference or association (note that we can never “accept the alternative hypothesis (H1)” – see the [READ MORE]
1 2