Variables – three key types

Now here’s an easy one: What is a variable?

It is simply something that varies – either its value or its characteristic. In fact, it must vary. If it does not vary then we can’t call it a VARiable, so we call it a “constant” such as the regression constant (the y-intercept). 

In the equation of a straight line (linear relationship) Y = a + bX, where:

   Y=dependent variable
   X=independent variable
   a=constant (the Y-axes intercept, or the value of Y when X=0)
   b=coefficient (slope of the line, in other words the amount that Y increases [or decreases] for 1 unit of increase [or decrease] in X).

The above formula can also be written as Y = mX + b, where b=constant (the Y-axes intercept) and m=gradient.
In all “dependence” statistical techniques (such as ANOVA, regression, etc) we refer to “independent variables” (IV) and “dependent variables” (DV), as well as a third one, called “extraneous variables” (EV). Note that in “interdependence” techniques (such as factor-, cluster-, and correspondence-analysis etc.) we don’t distinguish between these types of variables. Look at this post for the many names of variables!

Lets take a brief look at each of the IV, DV and EV:

1. The dependent variables (DV), are also referred to as outcome or criterion variable (or grouping variable in discriminant analysis).  This variable is of greatest interest which we want to predict, explain, understand, or assess in relation to the IV’s. It is dependent on the IV’s. In most analysis our task is to determine how, and to what extent, each IV relates to, or explain, the variation in the DV.
2. The independent variables (IV), are also referred to as factors, covariates, predictors, antecedents, or presumed causes or influences under investigation. There are two basic types of IV’s
  • The “active” IV: These are variables in experiments when specified treatments to respondents are manipulated by the researcher.
  • The “attribute” IV (or measured, manifest, or characteristics IV): These are variables that can not be manipulated but is a focus of the study, e.g. demographics, ad spending, product attributes, etc.)  Studies with only attribute variables and no active variables are non-experimental and therefore we can not draw definite conclusions about cause and effect relationships between the IV and the DV. While we will detect whether variables are related, we can not conclude their causes and influences. 
3. The “extraneous” variables (EV), are also referred to as the “nuisance” variables or the covariates. These variables are not of primary interest in our research but they are of concern as they cause noise in our assessment of the relationship between the IV and the DV and can have an influence on the relationship between the DV and IV which makes it difficult to isolate the real variables of interest (i.e. the IV). 
There are two types of EV: 
  • Participant variables, which are variables related to individual characteristics of respondents such as their intelligence, product awareness / ownership, etc.
  • Situational variables, which relate to the environment in which they are researched which may impact on how each person responds. Examples are speed of internet connection in online research, weather, and ambience noise.  If any of these EV’s can not be controlled for during the research phase, then we refer to them as confounding variables. It is crucial that we identify these, measure them, and include them in our analysis otherwise we won’t know if the DV is influenced by the IV, the confounding variable, or maybe an interaction between these. When confounding variables are included in analysis such as the ANCOVA, MANCOVA, or regression (hierarchical multiple regression), we refer to them as “covariates”.

The above IV, DV, and EV relate to “dependence” statistical techniques such as t-tests, ANOVA, regression, discriminant analysis, structural equation modelling, etc. 
With “interdependence” statistical techniques such as factor analysis, latent class analysis, cluster analysis, multi-dimensional scaling, correspondence analysis, etc., we are interested in determining patterns of relationships between all variables so we don’t differentiate between DV’s and IV’s. 
Go back to our linear relationship formula (Y = a + bX). Note that something is missing…..  Marketing and social research is based on “most likely values of the variables”, which is called “probabilistic models” (unlike “deterministic models” in physical sciences which indicates precise relationships). So, as we in marketing research work with basic underlying relationships in terms of the “most likely” or approximate values of human behaviour – rather than exact values, we have some unexplained error (e) which should be reflected in the formula (Y = a + bX + e).



Further Reading: