In brand image studies, like most research, it’s GIGO (Garbage In – Garbage Out). For example, very general adjectives such as Cheerful, Fun, and Unique will seldom differentiate brands meaningfully. Instead, the attributes should be relevant to consumers, specific to the category and reflect the actual positionings of the brands and, in most cases, include functional and other objective characteristics. How the image data are collected is also important. Pick-any association matrices are usually the least differentiating. Lastly, how the data are analyzed is also important. [READ MORE]

“…the difference between correlation and causation does not matter if you have enough data” is complete nonsense but I still hear this assertion, even when the phrase “While correlation doesn’t mean causation…” is tossed in. The mix-up between correlation and causation came to my attention vividly many years ago in the context of what I would call political epidemiology. The distinction is also something statisticians learn in the classroom (or should), and is relevant with data of all volumes, velocities and varieties. Big Data actually can be Big [READ MORE]

Here is a great article by my friend Andy Field about sphericity. If you are looking for a great intro to SPSS, check out this book by Andy. When it came out in 2013 I worked through it – front to end – and I learned a lot, and it refreshed my memory of many things I have forgotten. He writes in an easy to understand way! [READ MORE]

Very brief description The assumption of sphericity refers to the equality of variances of the differences between treatment levels. In Repeated Measures ANOVA it is a measure of the homogeneity of the variances of the differences between levels so it is quite similar to homogeneity of variance in between-groups in the univariate ANOVA. It is denoted by ε and sometimes referred to as “circularity”. Who cares Sphericity applies to repeated measures ANOVA and MANOVA. While technically not an assumption of Factor Analysis, “Bartlett’s test of [READ MORE]

Very brief description: Linearity means that mean values of the outcome variable (dependent variable) for each increment of the predictors (independent variables) lie along a straight line (so we are modeling a straight relationship). Who cares The assumption of linearity is required by all multivariate techniques based on correlation measures of association e.g. Regression, Logistics Regression, Factor Analysis, Structural Equation Modeling, Discriminant Analysis, General Linear Models, etc. Why it is important Most, if not all of the tests of association / relationships that we [READ MORE]

With so many statistical procedures available, how do we decide which tests are best to address our research objectives? (several posts deal with this topic). First and foremost, the decision as to which statistical procedures to apply to the data should be made BEFORE the design of the data collection instrument (e.g. the questionnaire), and not AFTER data has been collected. Plan ahead so that your analysis are entirely focused on addressing your research objectives and NOT to address your data. Too many researchers remain guilty of waiting to see the data so they can decide what to [READ MORE]

Some words of caution about significance testing by Kevin Gray: “I’ve long had three major concerns about significance testing. First, it assumes probability samples, which are rare in most fields. For example, even when probability sampling (e.g., RDD) is used in consumer surveys, because of (usually) low response rates, we don’t have true probability samples. Secondly, it assumes no measurement error. Measurement error can work in mysterious ways but generally weakens relationships between variables. Lastly, like automated modeling, it passes the buck to the machine and [READ MORE]

BRIEF DESCRIPTION Whereas the One-sample Chi-square (χ²) goodness-of-fit test compares our sample distribution (observed frequencies) of a single variable with a known pre-defined distribution (expected frequencies) such as the population distribution, normal distribution, or poisson distribution, to test for the significance of deviation, the Chi-square (χ²) Test of Independence compares two categorical variables in a cross-tabulation fashion to determine group differences or degree of association (or non-association i.e. independence). Chi-square (χ²) is a [READ MORE]

Confused about when to use FA, EFA, PCA, or CFA? Well, all of them are interdependence methods in which no single variable or group of variables are defined as being independent or dependent. The statistical procedure involves the analysis of all variables in the data set simultaneously so the goal of these interdependence procedures is to uncover structure by grouping of variables (as in factor analysis) rather than respondents (typically in cluster analysis) or objects (typically in perceptual mapping). So interdependence methods do not attempt to predict one or more variable by others as [READ MORE]

You want to measure performance of the same individual measured over a period of time (repeated observations) on an interval scale dependant variable, but, which procedure to use? So we are looking for an equivalent of the paired samples t-test, but we want to allow for two or more levels of the categorical variable i.e. pre, during, post. The Repeated Measures ANOVA [SPSS: ANALYZE / GENERAL LINEAR MODEL / REPEATED MEASURES] is simpler to use but sadly its often not as accurate and flexible as using Linear Mixed Models (SPSS: ANALYZE / MIXED MODELS / LINEAR). Reminder that the Linear [READ MORE]

Interesting article by Kevin Gray at Cannon Gray (http://cannongray.com) Model means different things to different people and different things at different times. As I briefly explain in A Model’s Many Faces, I often find it helpful to classify models as conceptual, operational or statistical. In this post we’ll have a closer look at the last of these, statistical models. First, it’s critical to understand that statistical models are simplified representations of reality and, to paraphrase the famous words of statistician George Box, they’re all wrong but some of them [READ MORE]

If we have a sample of data drawn randomly from a population with a normal distribution, we can assume that our sample distribution also has a normal distribution (provided a sample size of more than 30). If we have a mean of zero and a standard deviation (SD) of 1, then we can calculate the probability of getting a particular score based on the frequencies we have. To centre our data around a mean of zero, we need to subtract each individual score from the overall mean, then divide this by the standard deviation. This is the process of standardisation of raw data into z-scores. This [READ MORE]