BRIEF DESCRIPTION: As one of the most basic data assumptions, much has been written about univariate, bivariate and multivariate normality. An excellent reference is by Tom Burdenski (2000) entitled Evaluating Univariate, Bivariate, and Multivariate Normality Using Graphical and Statistical Procedures. A few noteworthy comments about normality: 1. Normality can have different meanings in different contexts, i.e. sampling distribution normality and model error distribution (e.g. in Regression and GLM). Be very careful which type of normality is applicable. 2. By definition, a [READ MORE]

BRIEF DESCRIPTION: The Kolmogorov-Smirnov (K-S) test is a goodness-of-fit measure for continuous scaled data. It tests whether the observations could reasonably have come from the specified distribution, such as the normal distribution (or poisson, uniform, or exponential distribution, etc.), so it most frequently is used to test for the assumption of univariate normality. The categorical data counterpart is the Chi-Square (χ²) goodness-of-fit test. The K-S test is a non-parametric procedure. SIMILAR STATISTICAL PROCEDURES: Adjusted Kolmogorov-Smirnov Lilliefors test (null [READ MORE]

When the research objective is to compare a single group distribution to a hypothetical / known distribution (goodness-of-fit tests), we have a choice among different statistical procedures, depending on the following variable characteristics: Number of variables: One dependent variable Examples: Does our sample data distribution fit the binomial / normal / poisson curve? Is our interval-measured sample distribution significantly different from a normal distribution (goodness-of-fit for normality)? Is the 10%/20%/20%/30%/20% age proportions in our sample significantly [READ MORE]

BRIEF DESCRIPTION: The One-Sample t-test is for continuous scaled data and it compares an observed sample mean with a predetermined value. For example, is our customer satisfaction sample mean significantly different from a pre-set figure such as an industry benchmark or an action standard. It also helps us to answer a question such as “Are we 95% confident that the mean score is between 7.5 and 8.5”. The t-test is a parametric procedure. SIMILAR STATISTICAL PROCEDURES One-sample z-test Non-parametric counterparts of the one-sample t-test include the Wilcoxon [READ MORE]

The Research Methods Knowledge Base website reminds us researchers (and readers of research findings) of the “Two Research Fallacies”. “A fallacy is an error in reasoning, usually based on mistaken assumptions”. The two most serious research fallacies discussed in this article are the “ecological fallacy” and the “exception fallacy” “The ecological fallacy occurs when you make conclusions about individuals based only on analyses of group data”. For example, if the average income of a group of people is $60,000, we [READ MORE]

Discussing the causes, impact, identification and remedial action of outliers is a lengthy subject. I will keep it short by only focussing on a few ways to identify, in this post, univariate outliers. Also refer to the post entitled: Outlier cases – bivariate and multivariate outliers. Be reminded that with bivariate and multivariate analysis, the focus should not be on univariate outliers, though it is advisable to check them but don’t take immediate remedial action. First and foremost, do the obvious by looking at a few visuals such as histograms, stem-and-leaf plots, [READ MORE]

BRIEF DESCRIPTION: The Two-independent sample t-test is for continuous scaled data and it compares the observed mean on the dependent variable between two groups as defined by the independent variable. For example, is the mean customer satisfaction score (on the dependent variable) significantly different between men and women (on the independent variable). The t-test is a parametric procedure. SIMILAR STATISTICAL PROCEDURES Non-parametric counterparts of the Two-independent t-test include the (Wilcoxon) Mann-Whitney U-test (non-parametric), Wald-Wolfowitz Runs [READ MORE]

Everyday, researchers around the world collect sample data to build statistical models to be as representative of the real world as possible so these models can be used to predict changes and outcomes in the real world. Some models are very complex, while others are as basic as calculating a mean score by summating several observations and then dividing the score by the number of observations. This mean score is a hypothetical value so it is just a simple model to describe the data. The extent to which a statistical model (e.g. the mean score) represents the randomly collected [READ MORE]

I was asked to review a report and in the regression analysis several independent variables were shown to not have a significant relationship with the dependent variable. While I have no access to the raw data, to me it was obvious that there must be at least one significant interaction effect among the independent variables and hence I decided to start off 2014 by writing about interactive effects in regression! This can be a very long discussion but to be in-line with my approach here at IntroSpective Mode, is we keep things brief and concise, and leave it up to the reader to go elsewhere [READ MORE]

We’re all very familiar with the “Likert-scale” but do we know that a true Likert-scale consists not of a single item, but of several items which under the right conditions – i.e. subjected to an assessment of its reliability (e.g. intercorrelations between all pairs of items) and validity (e.g. convergent, discriminant, construct etc.) can be summed into a single score. The Likert-scale is a unidimensional scaling method (so it measures a one-dimensional construct), is bipolar, and in its purest form consists of only 5 scale points, though often we refer to a [READ MORE]

Many of the statistical procedures used by marketing researchers are based on “general linear models” (GLM). These can be categorised into univariate, multivariate, and repeated measures models. The underlying statistical formula is Y = Xb + e where Y is generally referred to as the “dependent variable”, X as the “independent variable”, b is the “parameters” to be estimated, and e is the “error” or noise which is present in all models (also generally referred to as the statistical error, error terms, or residuals). Note that both [READ MORE]

In brand image studies, like most research, it’s GIGO (Garbage In – Garbage Out). For example, very general adjectives such as Cheerful, Fun, and Unique will seldom differentiate brands meaningfully. Instead, the attributes should be relevant to consumers, specific to the category and reflect the actual positionings of the brands and, in most cases, include functional and other objective characteristics. How the image data are collected is also important. Pick-any association matrices are usually the least differentiating. Lastly, how the data are analyzed is also important. [READ MORE]