Key Statistical Techniques

Chi-square (χ²) Test of Independence

May 22, 2017

BRIEF DESCRIPTION Whereas the One-sample Chi-square (χ²) goodness-of-fit test compares our sample distribution (observed frequencies) of a single variable with a known pre-defined distribution (expected frequencies) such as the population distribution, normal distribution, or poisson distribution, to test for the significance of deviation, the Chi-square (χ²) Test of Independence compares two categorical variables in a cross-tabulation fashion to determine group differences or degree of association (or non-association i.e. independence).  Chi-square (χ²) is a [READ MORE]

Which Test: Factor Analysis (FA, EFA, PCA, CFA)

April 4, 2017

Confused about when to use FA, EFA, PCA, or CFA? Well, all of them are interdependence methods in which no single variable or group of variables are defined as being independent or dependent. The statistical procedure involves the analysis of all variables in the data set simultaneously so the goal of these interdependence procedures is to uncover structure by grouping of variables (as in factor analysis) rather than respondents (typically in cluster analysis) or objects (typically in perceptual mapping). So interdependence methods do not attempt to predict one or more variable by others as [READ MORE]

One-Sample Chi-square (χ²) goodness-of-fit test

January 9, 2016

BRIEF DESCRIPTION The Chi-square (χ²) goodness-of-fit test is a univariate measure for categorical scaled data, such as dichotomous, nominal, or ordinal data.  It tests whether the variable’s observed frequencies differ significantly from a set of expected frequencies. For example, is our observed sample’s age distribution of 20%, 40%, 40% significantly different from what we expect (e.g. the population age distribution) of 30%, 30%, 40%. Chi-square (χ²) is a non-parametric procedure.   SIMILAR STATISTICAL PROCEDURES: Binomial goodness-of-fit (for binary data) [READ MORE]

Which Test: Logistic Regression or Discriminant Function Analysis

October 8, 2015

Discriminant Function Analysis (DFA) and Logistic Regression (LR) are so similar, yet so different. Which one when, or either at any time? Lets see….   DISCRIMINANT FUNCTION ANALYSIS (DFA): Is used to model the value (exclusive group membership) of a either a dichotomous or a nominal dependent variable (outcome) based on its relationship with one or more continuous scaled independent variables (predictors). A predictive model consisting of one or more discriminant functions (based on the linear combinations of the predictor variables that provide the best discrimination between the groups) [READ MORE]

So many regression procedures. Confused?

September 11, 2015

Regression is the work-horse  of research analytics. It has been around for a long time and it probably will be around for a long time to come. Whether we always realise it or not, most of our analytical tools are in some way or another based on the concept of correlation and regression.   Lets look at a few regression based procedures in the researchers’ toolbox:   1. Simple and multiple linear regression: Applicable if both the single dependent variable (outcome or response variable) and one or many independent variables (predictors) are measured on an interval scale. If we [READ MORE]

Analysis of Covariance (ANCOVA)

May 13, 2015

BRIEF DESCRIPTION The Analysis of Covariance (ANCOVA) follows the same procedures as the ANOVA except for the addition of an exogenous variable (referred to as a covariate) as an independent variable. The ANCOVA procedure is quite straightforward: It uses regression to determine if the covariate can predict the dependent variable and then does a test of differences (ANOVA) of the residuals among the groups. If there remains a significant difference among the groups, it signifies a significant difference between the dependent variable and the predictors after the effect of the [READ MORE]

One-Sample Kolmogorov-Smirnov goodness-of-fit test

February 8, 2015

BRIEF DESCRIPTION The Kolmogorov-Smirnov (K-S) test is a goodness-of-fit measure for continuous scaled data. It tests whether the observations could reasonably have come from the specified distribution, such as the normal distribution (or poisson, uniform, or exponential distribution, etc.), so it most frequently is used to test for the assumption of univariate normality. The categorical data counterpart is the Chi-Square (χ²) goodness-of-fit test. The K-S test is a non-parametric procedure.     SIMILAR STATISTICAL PROCEDURES: Adjusted Kolmogorov-Smirnov Lilliefors test (null [READ MORE]

Which Test: Chi-Square, Logistic Regression, or Log-linear analysis

November 19, 2013

In a previous post I have discussed the differences between logistic regression and discriminant function analysis, but how about log-linear analysis? Which, and when, to choose between chi-square, logistic regression, and log-linear analysis?   Lets briefly review each of these statistical procedures: The chi-square test (χ²) is a descriptive statistic, just as correlation is descriptive of the association between two variables. Chi-square is not a modeling technique, so in the absence of a dependent (outcome) variable, there is no prediction of either a value (such as in ordinary [READ MORE]

Why ANOVA and not multiple t-tests? Why MANOVA and not multiple ANOVA’s, etc.

September 9, 2013

ANOVA reigns over the t-test and the MANOVA reigns over the ANOVA. Why?   If we want to compare several predictors with a single outcome variable, we can either do a series of t-tests, or a single factorial ANOVA.   Not only is a factorial ANOVA less work, but conducting several t-tests for each predictor separately will result in a higher probability of making Type I errors. In fact, with every single t-test there is a chance of a Type I error. Conducting several t-tests compounds this probability. In contrast, a single factorial ANOVA controls for this error so that the probability [READ MORE]

When the regression work-horse looks sick

June 19, 2013

Regression, in particular simple bivariate and multiple regression (and to a much lesser extent multivariate regression which is a “multivariate general linear model” procedure) is the work-horse of many researchers. For some, it is a horse exploited to the bone when other statistical (or even non-statistical) procedures would have done a better job!  Also, many statistical procedures are based on linear regression models (often without us realising it such as the fact that the ANOVA can be explained as a simple regression model).    At the core of many statistical analytics is [READ MORE]

One-way (Independent) ANOVA

July 13, 2012

BRIEF DESCRIPTION The One-way ANOVA is an extension of the Two-independent sample t-test as it compares the observed mean on the dependent variable among more than two groups as defined by the independent variable.  For example, is the mean customer satisfaction score (on the dependent variable) significantly different among three customer groups: adult men, adult women, and children (on the independent variable).  In addition to expressing group differences on the dependent variable, we can also express the findings in terms of relationship or association, e.g. “Age [READ MORE]

Two-independent sample t-test

July 11, 2012

BRIEF DESCRIPTION The Two-independent sample t-test is for continuous scaled data and it compares the observed mean on the dependent variable between two groups as defined by the independent variable.  For example, is the mean customer satisfaction score (on the dependent variable) significantly different between men and women (on the independent variable). The t-test is a parametric procedure.    SIMILAR STATISTICAL PROCEDURES Non-parametric counterparts of the Two-independent t-test include the (Wilcoxon) Mann-Whitney U-test (non-parametric), Wald-Wolfowitz Runs [READ MORE]
1 2