Means, sum of squares, squared differences, variance, standard deviation and standard error

I remember how confusing these terms were to me when I started learning statistics. Let me offer a brief non-technical explanation of each:

When we take a random sample of observations from a population of particular interest (e.g. all our customers), we would like to do some modelling (e.g. mean or regression) so that our sample can describe and/or predict the total population of interest. The most basic model we can use is to calculate the mean score of any given variable or construct, and then conclude that it represents the population of interest.
However, before we can use the sample mean to represent the population mean, we need to determine how good our model (e.g. our sample mean) is to represent (predict) the true population mean – a process we call “assessing the fit of our model“. First we need to look at the difference between the observed data in our sample and the fitted model (the mean). Deviance from our model (x-x̄) tells us that if we get a negative value (observed data is smaller than the mean), our model is overestimating. 

So to get a grand view of the accuracy of our model (the total amount of over- or under-estimating), we need to get the total amount of deviance between all our observed data points and the mean, and as some will be negatives (overestimating) and others will be positives (under-estimating) we need to square them otherwise they all cancel out each other and we will end up with a zero, which wrongfully will suggest that we have a perfectly fitted model, when in fact we don’t. So once we squared all the differences between our observed data and our model (so they are all positive and no more cancelling out between negatives and positives), and add them together, we have the total of squared differences, referred to as the sum of squares (SST).
The problem with calculating an SST is that it is an unbounded measure so the more data points we have, the higher our SST figure will be. Very hard to interpret and to work with! The solution lies in calculating the average sample error of our model by dividing the SST by the number of observations in our sample. However, as we are generally more interested in the population error than the sample error, we need to divide SST by the degrees of freedom (df=N-1) rather than the sample size (N). The result is called the variance, which is the average error between our model (the mean) and the actual observations. The problem is that our variance figure is still difficult to interpret due to our squared differences used to calculate the SST. The remedy is to take the square root () of the variance which we now call the standard deviation (SD)
The formula summarises this action:


From the above formula is shown that:
  1. To calculate the fit of our model, we take the differences between the mean and the actual sample observations, square them, summate them, then divide by the degrees of freedom (df) and thus get the variance.
  2. While the variance is hard to interpret, we take the root square of the variance to get the standard deviation (SD). Now we can easily say that a SD of zero means we have a perfect fit between our model and the observed sample data. The higher the SD, the further away the observed data points are from our mean, and the less accurate the mean is to predict individual data observation (thus the mean is not an accurate representation of the data). We know that in normally distributed populations, 68% of observations will lie within 1 SD away from the mean, while 95% will be 2 SD from the mean, and almost all observations (99%) will lie within 3 SD. 
Another way to judge how good our model (e.g. sample mean) is to be representative of the population, is to take several independent samples of the same population and compare their means. It is unlikely that their means will be exactly the same. If we take the mean of these means, and calculate their standard deviation (SD), we get the standard error of the mean (SE). It is also referred to as the “SD of the sampling distribution”. 
What is the difference between SD and SE?
They are almost the same, but not exactly (same-same-but-different). 
  • SD is the dispersion from a single sample mean (it determines how close individual observations in our sample is to the sample mean), and is not directly affected by sample size.
  • SE is the dispersion from the overall mean of different sample means – thus the SD of sample means (it determines how close our sample mean is to the population mean), and is inversely affected by sample size, so if you sample the entire population of interest (a census), the standard error will be zero!
Now if you analyse a single sample in your favourite statistics program you will find the “SE of the mean” in the descriptives. How is this possible when we only took a single sample. Well, thanks to smart statisticians and their knowledge of the central limit theorem, we now know that in larger samples (about 30 or above) the sampling distribution has a normal distribution with the mean equal to the population mean and an error of the mean (SE) which can be calculated by taking into account the SD of the sample and the sample size.
I hope this gives some easy to understand explanation of measures of a basic model fit.