#BeyondTheMean

Effect Size: Education’s Most Misunderstood Statistic

Updated: Jul 16



Welcome to #BeyondTheMean! Check out this post to see what this blog is all about.


The world of education theory is dominated today by a single statistical output – effect size. This statistic has become the standard way to describe the impact of educational interventions thanks in large part to John Hattie and his seminal work “Visible Learning: A Synthesis of Over 800 Meta-Analyses Relating to Achievement.” Today, effect size can be found prominently displayed on product webpages, in conference programs, and widely discussed in PLC meetings across the United States.


But, do educators really know what effect size is and are they using it correctly? In my experience speaking and training on evidence-based decision-making across the country, the answer is no. Effect size has become a barrier to effective decision-making in large part due to its ubiquity in marketing campaigns. The good news is that it really isn’t that hard to understand and apply effect size appropriately to decision-making. In this post, I will show you how.


What is effect size?

Effect size is a measure of the magnitude of a difference between two distributions of scores. In education settings, it is usually used to show the difference between pre-test and post-test scores for an intervention or supplemental program. Another common use is to demonstrate the difference in performance between two groups of students in an attempt to standardize the measurement of the achievement gap.


How do I calculate effect size?

Calculating effect size is relatively simple. There are two principle methods (along with many less common or circumstance specific methods). The first is to use the correlation coefficient. Correlation is easily calculated in most spreadsheet software using the =PEARSON function. I have created a video to show you how to use this function; it can be found here in The Repository.


More common in education research publications is the Cohen’s d method. It should be noted that there are multiple variations of effect size, so you might encounter other forms. Cohen’s d is simply stated as the difference between two means divided by the pooled standard deviation. This means that you are taking the average performance from one sample and subtracting it from another sample, then dividing the difference by the pooled standard deviation.



There is no straight-forward way to calculate Cohen’s d in spreadsheet software, so you must use a series of tests as follows.


First, you must find the mean(=AVERAGE(A:A), standard deviation (=STDEV.P(A:A), and count (=COUNT(A:A) for each of your two distributions. You must then calculate the pooled standard deviation, which uses this formula: =SQRT(((n1-1)*SD1^2+(n2-1)*SD2^2)/(n1+n2-2)). This is a lot, so I have recorded a video walking you through each of the steps here. Alternatively, you can also upload your data to my pre-test/post-test analysis tool and it will instantly calculate the effect size for you.


How do I interpret the results of the effect size test?

This is where things can get a bit tricky. Since effect size is the standardized difference between two averages, there can be many variables at play that may influence your interpretation. Study design decisions, sample size, and other external factors can all influence the results of any statistical output. You should be warry of anyone who tries to sell you on a straight forward method of interpretation that can be applied in all circumstances. Having said that, here are two places to start.


When correlations are used to measure effect sizes, their interpretation is a little more commonly defined. Correlation coefficients exist on a scale of negative 1 to positive 1, with zero indicating no relationship between two distributions. When interpreting effect size using correlation, the closer the coefficient is to one, the stronger the relationship.


When Cohen’s d is used to present the effect size, the standard interpretation is one devised by Jacob Cohen himself. By his estimation, an effect size of d=0.2 is a small effect size, d=0.5 is a medium effect size, and d=0.8 is a large effect size. This standard interpretation provides us with a starting point, but by Cohen’s own admission, this interpretation guidance should only be applied when “ no better basis for estimating the effect size index is available” (Cohen, 1988).


In addition to these two common interpretations, there are those that have sought to interpret effect sizes to as months of instruction, changes in percentile rank, fluctuations in the achievement gap, and measures of teacher effectiveness. The thing to remember about all these interpretation methods is that they are designed to work only under a narrow set of circumstances built into a study’s design. It would be incorrect, for example, to interpret an effect size of 0.5 to equal a year of instruction for all studies without consideration for the study design and analytic methods.


So, what is a lay-person to do? My advice is to rely on the researcher’s interpretation. Any peer-reviewed study will include a detailed analysis and interpretation of study results. If a researcher has designed their study to use effect size to measure a given outcome – such as the previous year of instruction example – they will say so clearly in their discussion. Rather than trying to interpret the results of this statistic on their own, school and system leaders would be well served by reading the whole research article and considering the interpretative guidance provided by the author.



How should I use effect size in decision-making?

With all the nuances required to understand and interpret effect size, how is a decision-maker supposed to use it? Matthew Kraft provides five rules for consideration in his 2020 paper entitled “Interpreting Effect Sizes of Education Interventions”. They are:


1. Results from correlational studies presented as effect seizes are not causal effects.

2. The magnitude of effect sizes depends on what, when, and how outcomes were measured.

3. Subjective decisions about research design and analysis influence effect sizes.

4. Costs matter for evaluating the policy relevance of effect sizes.

5. Scalability matters for evaluating the policy relevance of effect sizes.


When factoring effect size into your decision-making process, these five rules provide decision-makers with some good considerations. First, Kraft reminds us that correlational s