Revisiting the basics of data and measurement scales (Part 2)

The statistical procedures we choose depend on the type of data we collected with the different types of measurement scales we employed.

We should be careful to understand the constructs we measure and the type of scales we employ as this will determine what statistical procedures are appropriate for analysis.
This post (in following-up to part 1) is partly based on what S.S. Stevens told us in 1946 (see “Further Reading” below) about data and scales. Please note that this classification is no science as there is a lingering debate about the classification system. Nevertheless, here is the classification and definitions I find to be the most plausible and is based on Stevens and other intellectuals:

Categorical data refers to nominal data which also includes dichotomous / binary data, as well as ordinal data, while continuous data includes interval, and ratio data. Continuous data can either we truly continuous and thus measured on a continuum at any level of precision (such as temperature, time, etc), but also discrete which means data can only take certain values such as a 1, 2, 3, on a 5-point scale – with the assumption of equal intervals between scale-points. Deciding which statistical procedure to apply depends most importantly on whether our data is categorical (nominal/ordinal), or continuous (interval/ratio).

Qualitative data: This generally refers to “nominal” level data, or as Stevens described it as “determination of equality”. The scale merely indicates the presence or absence of a characteristics or property. The variable is discrete – black or white, yes or no, and not a little of both, so it is either one or the other. Measures on a nominal (aka “categorical scale”) are classified into categories according to the characteristics by which they differ rather than by ‘how much’. It is a procedure of counting and classifying, non-numerically as categories such as gender, user-group, or brands of a company.

Quantitative data (ordinal, interval and ratio): recorded in numerical form, for example, age, price, and items sold per day.  Each of ordinal, interval and ratio data have their own characteristics (as Stevens taught us) and are discussed in a different post.
Non-metric data: This generally includes nominal and ordinal data.

Metric data: This generally includes interval and ratio data, however some ordinal data can be treated as metric.
Non-Parametric data (e.g. non-normal / non-Gaussian distribution):
Non-parametric statistical procedures use measurements on nominal and ordinal scale and are less powerful than their parametric counterparts. They are “distribution free” methods that do not depend on the population fitting any parametrized distributions. Non-parametric methods are less powerful because they use less information. As example, parametric correlation (Pearson’s product moment) uses information about the mean and deviation from the mean while a non-parametric correlation (e.g. Spearman or Kendall) only use the rank position of pairs of scores.

Parametric data (e.g. normal / Gaussian distribution):
Parametric statistical procedures use measurements on interval (or near interval ordinal) or ratio scale where the population is assumed to fit any parametrized distribution (most typically the normal distribution).  In a parametric test a sample statistic is obtained to estimate the population parameter.  If data is not parametric (or if the objective of the analysis is only to describe rather than to generalize the findings), then non-parametric statistical methods should be employed which are generally less powerful to detect real differences or variability in data.

This classification is particularly important as many statistical procedures assume parametric data (measured on interval or ratio scales, and fitting the normal curve). When these assumptions are violated (i.e. measured on nominal or ordinal scale, or even if measured on interval/ratio scaled data but not fitting the normal curve), the non-parametric statistical procedures should be followed.
Discrete data: When the variable can only take a finite number of values which are distinct. As example, when responses to a five-point rating scale can only take on the values 1, 2, 3, 4, and 5, and not 3.22, so there is a space (blanks) on the scale between each of the possible values, i.e. no values between 3 and 4.

Continous data: When the variable can take on an infinite number of intermediate values between any two identified values, including fractional ones. Examples include daily calorie intake, weight on a bathroom scale, etc. While age is continuous, a few age categories is discrete, while many age categories can be treated as continuos because the construct (age) falls along a continuum. What is important here is the characteristics of the variable, i.e. the construct (e.g. age can be measured in milli seconds) and not the scale itself (e.g. scales on a annual yearly interval). Another example is likeability which is a continuos variable but is often measured on a Likert scale which is discrete, yet we treat it as continuous data when we decide on which statistical procedure to apply because the construct (likeability) is continuos!
When determining which statistical procedure to use, we need to first look at our variable (the constructs we want to measure) and then find the best measurement scale. Therefore, references to data assumptions for each statistical procedure differentiate between the different types of data as outlined above.


Stevens, S. S. “On the Theory of Scales of Measurement,” Science, 1946, 103, 677-680, accessed March 20, 2012