Many of the statistical procedures used by marketing researchers are based on “general linear models” (GLM). These can be categorised into univariate, multivariate, and repeated measures models.
The underlying statistical formula is Y = Xb + e where Y is generally referred to as the “dependent variable”, X as the “independent variable”, b is the “parameters” to be estimated, and e is the “error” or noise which is present in all models (also generally referred to as the statistical error, error terms, or residuals). Note that both the left side and the right side of the equation could have more than one variable.
As example, a multiple linear regression (which is a generalization of linear regression) could be expressed by the following equation which indicates multiple predictors with their parameters (regression coefficients):
What is an independent variable (IV): This is the variable that you think has an effect on the variable of real interest (the DV) so you select different independent variables to see their effect, possibly manipulate and control them, and evaluate their effect on the DV. You also look at the scores on the DV at different levels of the IV, and if no differences across the levels, then the IV and DV are unrelated!
However, both the DV and the IV variables can have different (and often very confusing) names. Lets take a look at some of them:
- Dependent (Y) and Independent (X) variables are the most commonly used for most statistical procedures, though technically, it wrongly implies causality in that X affects (or causes) the outcome of Y. Reminder that causality can only be established through structural equation modelling (though restrictions apply) but more accurately through careful experimentation designs.
- “Outcome” / “Response” (Y) and “Predictor” (X) variables are most applicable in any analysis which includes linear models such as regression and rightly does not imply causality, but rather association, relationships, and predictions.
- “Regressand” (Y) and “Regressor” (X) variables are typically used in mathematical regression models.
- “Measured” (Y) and Controlled (X) variables are related to experimentation
- “Observed” / “Responding” (Y) and “Manipulated” (X) variables are more commonly used in experimentation.
- “Explained” (Y) and “Explanatory” (X) variables are often used in regression models.
- “Output” (Y) and “Input (X) variables are less commonly used in experimentation.
- “Experimental (Y) and “Exposure” (X) variables more commonly used in experimentation.
“Grouping” variables (or factors / factor variables) are categorical independent variables commonly referred to in t-tests, and in the ANOVA-family of tests. When we have categorical independent variables used in regression they are commonly referred to as “dummies“, “categorical predictors“, or “indicator variables“. Note than in general linear mixed models these are referred to as “fixed factors” and “random factors”. “Fixed” refers to categorical independent variables where all levels (groups) of the variable is included such as gender (M/F), directions (north, east, south, west) etc., while “random” factors if it is only possible to include a sample of the levels (groups) such as the key cities in England, or only a few selected customer groups.
Variables – three key types