Measuring effect size and statistical power analysis

Effect size measures are crucial to establish practical significance, in addition to statistical significance. Please read the post “Tests of Significant are dangerous and can be very misleading” to better appreciate the importance of practical significance.

Normally we only consider differences and associations from a statistical significance point of view and report at what level e.g. p<.001 we reject the null hypothesis (H0) and accept that there is a difference or association (note that we can never “accept the alternative hypothesis (H1)” – see the post on the reporting of hypothesis testing). This only confirms whether the results are attributable to chance or not. Even if our t-value (for differences) or our R-value (for associations) are highly significant (e.g. p<.0001), the significance could have been due to our large sample size.
 
Unless we consider the effect size, we won’t know if our findings have any practical significance – and is substantive enough to have real practical value (which is what most researchers and their clients really should care about). So, effect size indicates the magnitude (for differences) and strength (for associations) and this implies practical significance.
 
Here’s a very brief outline of how to determine effect size for differences and for associations. 

 
STEP 1: Do your usual DIFFERENCE test (e.g. t-test or ANOVA) or ASSOCIATION test (e.g. regression).

Lets just look at effect size for t-tests as the others are very similar.  From the t-test results you take the t-value and the degrees of freedom (df) and move on to Step 2


STEP 2: Calculate the Effect Size to determine the magnitude of the difference (or the strength of the association)

There are several effect size measures. Which one should we select? Here’s a brief guideline:

FOR DIFFERENCE (e.g. t-test or ANOVA) use the d-family which includes the commonly used Cohen’s d, although there are many others including risk difference, risk ratio, odds ratio, Glass’s delta, Hedges’ g, and the probability of superiority. The d-family of effect size measures focuses on magnitude of differences.
FOR ASSOCIATION (e.g. regression) use the r-family which includes the correlation coefficient r, R2  Spearman’s rho, Kendall’s tau, phi coefficient, Cramer’s V, Cohen’s f, eta squared (η2). The r-family of effect size measures focuses on strength of association.
 
The ƒ2 (Cohen’s ƒ2) effect size measure for an F-test in ANOVA and multiple regression is defined as (where R2 is the squared multiple correlation):
         
As example, if we want to look at DIFFERENCES (of our t-test) and we selected to use Cohen’s d test, go to Cohen’s d online calculator and use the second of the two groups of tests called: “Calculate d and r using t values and df (separate groups t-test)”. For Independent samples: input t-values, and df to get the Effect Size (Cohen’s d)
 
For paired samples (e.g. pre-post), use the test called: “Calculate d and r using means and standard deviations” and input the means and standard deviations of the two groups (treatment and control). 
 
Compare the resulting Cohen’s d value with a heuristics table such as the one below (Source: “Statistical Power Analysis for the Behavioral Sciences”, Cohen 2008) and decide whether you have a large enough effect size to be of practical significance.
 
Note that a logical follow-up would be a Statistical Power Analysis  so we may know if we are correctly accepting or rejecting the null hypothesis (e.g. probability of avoiding Type II errors). 

 

_________________________________________
Further Reading:
_________________________________________
/zza90