Variables and their many names

Many of the statistical procedures used by marketing researchers are based on “general linear models” (GLM). These can be categorised into univariate, multivariate, and repeated measures models. 

The underlying statistical formula is Y = Xb + e where Y is generally referred to as the “dependent variable”, X as the “independent variable”, b is the “parameters” to be estimated, and e is the “error” or noise which is present in all models (also generally referred to as the statistical error, error terms, or residuals). Note that both the left side and the right side of the equation could have more than one variable. 

As example, a multiple linear regression (which is a generalization of linear regression) could be expressed by the following equation which indicates multiple predictors with their parameters (regression coefficients):


What is a dependent variable (DV): It is also known as the “variable of interest” so it specifically relates to our research objectives and is thus the focus in the analysis. An example is that it measures what our marketing effort is trying to achieve such as “customer satisfaction”, “attitude”, “awareness”, etc. It often is the variable that responds or depends on the independent variable (IV), and often is a measure of the IV e.g. satisfaction [DV] is a measure of the inner feelings of people in different age groups or with different brand ownership groups [IV]. 

What is an independent variable (IV): This is the variable that you think has an effect on the variable of real interest (the DV) so you select different independent variables to see their effect, possibly manipulate and control them, and evaluate their effect on the DV.  You also look at the scores on the DV at different levels of the IV, and if no differences across the levels, then the IV and DV are unrelated! 

However, both the DV and the IV variables can have different (and often very confusing) names. Lets take a look at some of them:
  1. Dependent (Y) and Independent (X) variables are the most commonly used for most statistical procedures, though technically, it wrongly implies causality in that X affects (or causes) the outcome of Y.  Reminder that causality can only be established through structural equation modelling (though restrictions apply) but more accurately through careful experimentation designs.
  2. “Outcome” / “Response” (Y) and “Predictor” (X) variables are most applicable in any analysis which includes linear models such as regression and rightly does not imply causality, but rather association, relationships, and predictions.
  3. “Regressand” (Y) and “Regressor” (X) variables are typically used in mathematical regression models.
  4. “Measured” (Y) and Controlled (X) variables are related to experimentation
  5. “Observed” / “Responding” (Y) and “Manipulated” (X) variables are more commonly used in experimentation.
  6. “Explained” (Y) and “Explanatory” (X) variables are often used in regression models.
  7. “Output” (Y) and “Input (X) variables are less commonly used in experimentation.
  8. “Experimental (Y) and “Exposure” (X) variables more commonly used in experimentation.
Moreover, additional terms for DV are: “test”, criteriontarget, and “the variable of interest”. Additional terms for IV are: antecedentspresumed causesinfluences, and covariates.

“Grouping” variables (or factors / factor variables) are categorical independent variables commonly referred to in t-tests, and in the ANOVA-family of tests. When we have categorical independent variables used in regression they are commonly referred to as “dummiescategorical predictors, or indicator variables. Note than in general linear mixed models these are referred to as “fixed factors” and “random factors”. “Fixed” refers to categorical independent variables where all levels (groups) of the variable is included such as gender (M/F), directions (north, east, south, west) etc., while “random” factors if it is only possible to include a sample of the levels (groups) such as the key cities in England, or only a few selected customer groups. 
In factorial ANCOVA and factorial MANCOVA the continuous extraneous variables are referred to a “covariates” while categorial extraneous variables are either “fixed factors” or “random factors” depending on their characteristics. An example of a mixed model would be a study of overall product evaluation (the dependent variable) by gender (fixed factor as we include all possibilities – M/F) by city (random factor as we only include a selection of cities in our country), and controlling for the level of product awareness (which is the extraneous variable measured on a continuous scale as the covariate).
“Extraneous” variables which may have an effect on Y, but is of no particular interest (or not the focus of the study) and which we would like to eliminate or control for, are referred to as covariates (if measured on a continuous scale) and “blocking factors” (measured on a categorical scale) such as in ANCOVA, MANCOVA and hierarchical multiple regression). They are also referred to as “controlled” or “control” variables. 
Another “variable” which relate to our equation is indicated with the symbol ε. This relates to the fact that the variability in Y that is not explained by X and is referred to as “error” (or “error terms”), “residuals”, “side effects”, “tolerance”, “noise”, “unexplained variance” or even “unexplained share”. 
Then we also have “manifest” variables (directly observed and measured), “latent” variables (not directly observed) and in models such as structural equation modeling (SEM) we differentiate between variables that are “exogenous” (a variable that affects a model without being affected by it) and “endogenous” (a variable whose values are determined by other variables in the model).
So many confusing terms, but once you get the hang of them, they’ll all make sense. Have a cup of tea!

Further Reading: