Which test: Compare MORE THAN TWO DEPENDENT groups (Paired, Matched, Same respondent groups)

July 20, 2022

When the research objective is to compare more than two dependent groups, which means they are paired, matched, and thus the same respondent groups in a pre- post-test, we have a choice among different statistical procedures, depending on the following variable characteristics:   Number of variables:  [Unless where otherwise indicated] One dependent variable and one independent categorical variable (more than two levels or groups)   Examples:  Are the means / frequencies (on the dependent variable) of the same respondents over more than two different time periods significantly [READ MORE]

Which test: Compare MORE THAN TWO INDEPENDENT groups (Unpaired, Unmatched, Different respondent groups)

June 12, 2022

When the research objective is to compare more than two independent groups, which means they are unpaired, unmatched, and thus different respondent groups, we have a choice among different statistical procedures, depending on the following variable characteristics:   Number of variables:  [Unless where otherwise indicated] One dependent variable and one independent categorical variable (more than two levels)   Examples:  Are the means / frequencies of more than two independent groups of respondents significantly different? When the dependent variable is BINOMIAL / BINARY / [READ MORE]

Analysis of Covariance (ANCOVA)

May 3, 2022

BRIEF DESCRIPTION The Analysis of Covariance (ANCOVA) follows the same procedures as the ANOVA except for the addition of an exogenous variable (referred to as a covariate) as an independent variable. The ANCOVA procedure is quite straightforward: It uses regression to determine if the covariate can predict the dependent variable and then does a test of differences (ANOVA) of the residuals among the groups. If there remains a significant difference among the groups, it signifies a significant difference between the dependent variable and the predictors after the effect of the [READ MORE]

Practical significance and effect size measures

April 20, 2022

If statistical significance is found (e.g. p<.001), the next logical step should be to calculate the practical significance i.e. the effect size (e.g. the standardised mean difference between two groups), which is a group of statistics that measure the magnitude differences, treatment effects, and strength of associations. Unlike statistical significance tests, effect size indices are not affected by large sample sizes (as in the case of statistical significance).    As effect size measures are standardised (units of measurement removed), they are easy to evaluate and easy to [READ MORE]

Correlation and causation

February 19, 2022

“…the difference between correlation and causation does not matter if you have enough data” is complete nonsense but I still hear this assertion, even when the phrase “While correlation doesn’t mean causation…” is tossed in. . The mix-up between correlation and causation came to my attention vividly many years ago in the context of what I would call political epidemiology. The distinction is also something statisticians learn in the classroom (or should), and is relevant with data of all volumes, velocities and varieties. . Big Data actually can be [READ MORE]

Outlier cases – univariate outliers

February 14, 2022

Discussing the causes, impact, identification and remedial action of outliers is a lengthy subject. I will keep it short by only focussing on a few ways to identify, in this post, univariate outliers. Also refer to the post entitled: Outlier cases – bivariate and multivariate outliers.   Be reminded that with bivariate and multivariate analysis, the focus should not be on univariate outliers, though it is advisable to check them but don’t take immediate remedial action.   First and foremost, do the obvious by looking at a few visuals such as histograms, stem-and-leaf plots, [READ MORE]

Data Assumption: Multicollinearity

January 13, 2022

BRIEF DESCRIPTION: Multicollinearity is a condition in which the independent variables are highly correlated (r=0.8 or greater) such that the effects of the independents on the outcome variable cannot be separated. In other words, one of the predictor variables can be nearly perfectly predicted by one of the other predictor variables.  Singularity is when the independent variables are (almost) perfectly correlated (r=1) so any one of the independent variables could be regarded as a combination of one or more of the other independent variables. In practice, you should not [READ MORE]

Two research fallacies

December 13, 2021

The Research Methods Knowledge Base website reminds us researchers (and readers of research findings) of the “Two Research Fallacies”.    “A fallacy is an error in reasoning, usually based on mistaken assumptions”.   The two most serious research fallacies discussed in this article are the “ecological fallacy” and the “exception fallacy”   “The ecological fallacy occurs when you make conclusions about individuals based only on analyses of group data”.  For example, if the average income of a group of people is $60,000, we [READ MORE]

Significance Testing – Three Concerns

November 8, 2021

Some words of caution about significance testing by Kevin Gray: “I’ve long had three major concerns about significance testing. First, it assumes probability samples, which are rare in most fields. For example, even when probability sampling (e.g., RDD) is used in consumer surveys, because of (usually) low response rates, we don’t have true probability samples. Secondly, it assumes no measurement error. Measurement error can work in mysterious ways but generally weakens relationships between variables. Lastly, like automated modeling, it passes the buck to the machine and [READ MORE]

Building statistical models: Linear (OLS) regression

October 17, 2021

Every day, researchers around the world collect sample data to build statistical models to be as representative of the real world as possible so these models can be used to predict changes and outcomes in the real world. Some models are very complex, while others are as basic as calculating a mean score by summating several observations and then dividing the score by the number of observations. This mean score is a hypothetical value so it is just a simple model to describe the data.  The extent to which a statistical model (e.g. the mean score) represents the randomly collected sample [READ MORE]

Which Test: Logistic Regression or Discriminant Function Analysis

September 17, 2021

Discriminant Function Analysis (DFA) and Logistic Regression (LR) are so similar, yet so different. Which one when, or either at any time? Let’s see….   DISCRIMINANT FUNCTION ANALYSIS (DFA): Is used to model the value (exclusive group membership) of a either a dichotomous or a nominal dependent variable (outcome) based on its relationship with one or more continuous scaled independent variables (predictors). A predictive model consisting of one or more discriminant functions (based on the linear combinations of the predictor variables that provide the best discrimination between the [READ MORE]

So many regression procedures. Confused?

August 11, 2021

Regression is the work-horse  of research analytics. It has been around for a long time and it probably will be around for a long time to come. Whether we always realise it or not, most of our analytical tools are in some way or another based on the concept of correlation and regression.   Let us look at a few regression-based procedures in the researchers’ toolbox:   1. Simple and multiple linear regression: Applicable if both the single dependent variable (outcome or response variable) and one or many independent variables (predictors) are measured on an interval scale. If we [READ MORE]
1 2 3 4 6