Correlation: Friend or Foe?

Updated: Jul 16

Correlation: Friend or Foe?

Welcome to #BeyondTheMean! Check out this post to see what this blog is all about.

Correlation is a statistical measure that helps to show the strength of the relationship between two variables. You will generally see it reported with a lower cased “r” and on a scale of negative one to positive one. It is a very common statistic in educational research – but is it a reliable measure for decision makers to use during their continuous improvement planning processes? In this week’s post, I want to unpack the concept of correlation and explore its role in educational decision making.

Easy to Calculate – Easy to Interpret

One of the beautiful things about correlation is that it is extremely fast to calculate and super easy to interpret. In Google Sheets or Excel, you can quickly calculate the correlation between two columns of data by typing “=correl” into an empty cell then typing your two column arrays separated by a column (for example: =correl(A:A, B:B)). If you have a bunch of columns, you can drop them into a correlation matrix generator, like my free correlation generator found in The Repository, and instantly generate that shows the relationship between all of your variables.

Having run the quick calculation, interpreting the results of a correlation is simple. Correlations exist on a range of negative one to positive one – with zero indicating that there is no correlation between the two variables. A correlation of positive one means that there is a very strong relationship between two variables while the opposite is true for a correlation of negative one. So, what does all this mean? Let’s look at a hypothetical.

Let’s say that the variable “proficiency in reading” has a correlation of +1.0 with the variable “minutes of recess”. This would indicate that students who had more minutes of recess also had higher reading outcomes. If the correlation between these two variables was -1.0, however, it would indicate that students who had more recess had lower reading outcomes. So – you can use this method to decide about how many minutes of recess your kiddos get each day, right? Not so fast…

Correlation does not equal Causation

You have undoubtedly heard the phrase “correlation does not equal causation” at some point during your formal education. This is one of the most common proverbs in the world of statistical education. I’ve even seen it on t-shirts and ball caps!

You see, just because two things are statistically related doesn’t necessarily mean that influence one another in real life. Let’s reflect back to my earlier hypothetical about recess and reading. Just because the two things may be positively correlated doesn’t necessarily mean that more minutes of recess actually caused students to have higher reading outcomes. It could simply be a fluke.

One of my favorite website so to peruse when I am feeling bored and a little nerdy is Tyler Virgen’s Spurious Correlations. On his site, Tyler lists tons of ridiculously strong correlations between obviously unrelated variables. My favorite spurious correlation is the relationship between people who drowned after falling out of a fishing boat and marriage rates in Kentucky (my home state). With a correlation of r=0.95 you may think that Kentucky husbands are chucking their wives off their fishing boats on a regular basis – but of course, that’s a ridiculous assumption. Therein lies the danger of over-reliance on correlation.

Using Correlation to Guide Decision Making

So educational decision makers should just ignore correlations, right?


Calculating the correlation between two variables is still a valuable activity for education decision makers for a few reasons. First – it is easy and fast – which makes it a great first step in a data analysis procedure. If there is no correlation between two variables, it is unlikely that there will be other statistical outcomes of significance between those two variables. You can save a little time by weeding through variables using a correlation as a first step.

Next, correlation is an excellent measure of effect size. It is far easier to calculate than more formal measures of effect size, like Cohen’s D which involves calculating the mean of each variable and dividing it by the pooled standard deviation. If you have two classes who received instruction in a different way and one class has a higher correlation to the outcome than the other, then you may be able to determine which instructional strategy had a greater effect.

Finally, correlations can help you spot instructional inequities. In a purely equitable world, there should be no correlation between demographic group and outcome – because one shouldn’t impact the other. We don’t live in a purely equitable world so calculating the correlations between various demographic groups is a useful activity for education leaders seeking pockets of inequity. If you cannot see the inequity, then you cannot fix it. Dropping your data into a correlation matrix generator as part of your regular continuous improvement planning and evaluation processes can help you quickly spot inequities and monitor changes as you seek to close the achievement gap.