Type I and II errors – Hypothesis testing

In so many statistical procedures we execute, statistical significance of findings is the basis of statements, conclusions, and for making important decisions. While the importance of statistical significance (compared with practical significance) should never be overestimated, it is important to understand how statistical significance relates to hypothesis testing.

A hypothesis statement is designed to either be disproven or failed to be disproven. (Note that a hypothesis can be disproven (or failed to be disproven), but can not proven to be true).

Hypotheses relate to either differences (e.g. t-tests for differences in mean values) or relationships (e.g. correlations for differences between the slope of a line and zero) – although these are not exclusive as they are highly related to each other. We select specific test procedures depending on the number of variables, the characteristics of our data, and whether we are comparing two or more means, standard deviations, or variances. 

We essentially have two hypotheses: 
  • H0: The null hypothesis refers to no differences, no changes, and no relationships between the independent and dependent variables (alternatively expressed as “invalid”, “void” and “amounts to nothing”).
  • Ha: The alternative hypothesis refers to the effect of the independent variable on the dependent variable which results in differences, changes, and relationships.
The outcome of the test regarding the population parameter is to either “reject the null hypothesis” or “fail to reject the null hypothesis”.  Note that if you fail to reject the null hypothesis that does NOT mean you can accept the null hypothesis as you can never prove the null hypothesis is true (so you can never accept the null hypothesis). If you fail to reject the null hypothesis by a small margin, do not report it as “almost significant”, and likewise, if you reject the null hypothesis with a very small p-value (e.g. p<.0001), do not report it as “highly significant” as in hypothesis testing, the findings are either significant or not significant, and you either reject or fail to reject the null.
Reminder that large sample sizes are prone to detect small differences as statistical significance, and therefore it is crucial to consider the practical significance and effect size with, in particular, large samples.
The p-value is the probability that the observed differences occur only by chance. So if the probability of this occurrence is so rare (e.g. p<5%) then we conclude that real (or not fluke or by chance) differences or relationships exist.  Therefore, if you sample consecutively from the same population, there is only a 5% chance that the claimed effects are only by chance.  Note that it is always good practice to include the confidence levels when reporting the results of hypothesis tests.
This bring me to Type I and Type II errors:
  • Type I: If we reject what is actually a true null hypothesis, so we recognise differences or relationships that don’t exist. This is also known as a false positive, so wrongly reporting a condition that does not exist. 
  • Type II: If we fail to reject a false null hypothesis, so we don’t recognise differences or relationships that do exist. This is known as a false negative so we don’t report a condition that actually exits.  
A common mistake among researchers are to set significance levels, denoted by α (alpha) AFTER data is analysed, when in fact these levels should be PRE-set (even before data collection) based on the consequences of Type I and Type II errors. The more serious consequences of errors, the smaller the significance level should be established. Both statistical power (1- β) and effect size have an impact on these decisions.