Welcome to #BeyondTheMean! Check out this post to see what this blog is all about.

The world of education theory is dominated today by a single statistical output – effect size. This statistic has become the standard way to describe the impact of educational interventions thanks in large part to John Hattie and his seminal work “Visible Learning: A Synthesis of Over 800 Meta-Analyses Relating to Achievement.” Today, effect size can be found prominently displayed on product webpages, in conference programs, and widely discussed in PLC meetings across the United States.

But, do educators really know what effect size is and are they using it correctly? In my experience speaking and training on evidence-based decision-making across the country, the answer is no. Effect size has become a barrier to effective decision-making in large part due to its ubiquity in marketing campaigns. The good news is that it really isn’t that hard to understand and apply effect size appropriately to decision-making. In this post, I will show you how.

What is effect size?

Effect size is a measure of the magnitude of a difference between two distributions of scores. In education settings, it is usually used to show the difference between pre-test and post-test scores for an intervention or supplemental program. Another common use is to demonstrate the difference in performance between two groups of students in an attempt to standardize the measurement of the achievement gap.

How do I calculate effect size?

Calculating effect size is relatively simple. There are two principle methods (along with many less common or circumstance specific methods). The first is to use the correlation coefficient. Correlation is easily calculated in most spreadsheet software using the =PEARSON function. I have created a video to show you how to use this function; it can be found here in The Repository.

More common in education research publications is the Cohen’s d method. It should be noted that there are multiple variations of effect size, so you might encounter other forms. Cohen’s d is simply stated as the difference between two means divided by the pooled standard deviation. This means that you are taking the average performance from one sample and subtracting it from another sample, then dividing the difference by the pooled standard deviation.

There is no straight-forward way to calculate Cohen’s d in spreadsheet software, so you must use a series of tests as follows.

First, you must find the mean(=AVERAGE(A:A), standard deviation (=STDEV.P(A:A), and count (=COUNT(A:A) for each of your two distributions. You must then calculate the pooled standard deviation, which uses this formula: =SQRT(((n1-1)*SD1^2+(n2-1)*SD2^2)/(n1+n2-2)). This is a lot, so I have recorded a video walking you through each of the steps here. Alternatively, you can also upload your data to my pre-test/post-test analysis tool and it will instantly calculate the effect size for you.

How do I interpret the results of the effect size test?

This is where things can get a bit tricky. Since effect size is the standardized difference between two averages, there can be many variables at play that may influence your interpretation. Study design decisions, sample size, and other external factors can all influence the results of any statistical output. You should be warry of anyone who tries to sell you on a straight forward method of interpretation that can be applied in all circumstances. Having said that, here are two places to start.

When correlations are used to measure effect sizes, their interpretation is a little more commonly defined. Correlation coefficients exist on a scale of negative 1 to positive 1, with zero indicating no relationship between two distributions. When interpreting effect size using correlation, the closer the coefficient is to one, the stronger the relationship.

When Cohen’s d is used to present the effect size, the standard interpretation is one devised by Jacob Cohen himself. By his estimation, an effect size of d=0.2 is a small effect size, d=0.5 is a medium effect size, and d=0.8 is a large effect size. This standard interpretation provides us with a starting point, but by Cohen’s own admission, this interpretation guidance should only be applied when “ no better basis for estimating the effect size index is available” (Cohen, 1988).

In addition to these two common interpretations, there are those that have sought to interpret effect sizes to as months of instruction, changes in percentile rank, fluctuations in the achievement gap, and measures of teacher effectiveness. The thing to remember about all these interpretation methods is that they are designed to work only under a narrow set of circumstances built into a study’s design. It would be incorrect, for example, to interpret an effect size of 0.5 to equal a year of instruction for all studies without consideration for the study design and analytic methods.

So, what is a lay-person to do? My advice is to rely on the researcher’s interpretation. Any peer-reviewed study will include a detailed analysis and interpretation of study results. If a researcher has designed their study to use effect size to measure a given outcome – such as the previous year of instruction example – they will say so clearly in their discussion. Rather than trying to interpret the results of this statistic on their own, school and system leaders would be well served by reading the whole research article and considering the interpretative guidance provided by the author.

How should I use effect size in decision-making?

With all the nuances required to understand and interpret effect size, how is a decision-maker supposed to use it? Matthew Kraft provides five rules for consideration in his 2020 paper entitled “Interpreting Effect Sizes of Education Interventions”. They are:

1. Results from correlational studies presented as effect seizes are not causal effects.

2. The magnitude of effect sizes depends on what, when, and how outcomes were measured.

3. Subjective decisions about research design and analysis influence effect sizes.

4. Costs matter for evaluating the policy relevance of effect sizes.

5. Scalability matters for evaluating the policy relevance of effect sizes.

When factoring effect size into your decision-making process, these five rules provide decision-makers with some good considerations. First, Kraft reminds us that correlational studies cannot be used to derive causation. Education leaders must be careful not to draw this conclusion as it will almost certainly steer them in the wrong direction. Next, Kraft advises us to consider the makeup of the study itself and how the data was collected and analyzed. This harks back to my previous advice to trust the author’s interpretation. Finally, Kraft encourages us to consider both cost and scalability. I think this is vitally important for school level decision makers who often have limited resources. If faced with a decision between two interventions, one with an effect size of d=0.23 and another with an effect size of d=0.89, the less informed leader may automatically select the latter intervention; relying solely on effect size to inform their decision. A more informed leader would take the time to understand the cost and staffing implications of deploying both interventions and select the earlier intervention because it could be more realistically deployed with fidelity in their school.

As I work with educators across the United States on evidence-informed decision-making, this example is an all too common one. Many education leaders have set arbitrary interpretive benchmarks in their minds and refuse to consider interventions that exist outside of their established effect size wishes. This is not the way. Effective and informed leaders know to consider the entirety of a study – and ideally multiple studies – when making decisions about teaching and learning. They consider their population, their unique setting, their available financial and staff resources, and the prior skill and experience of their team when selecting interventions.

A blog like this only has so much instructive power, so I would encourage you to keep reading and learning about effect size and how to use it. In addition to the books and articles already linked in this post, here are some other good places to look as you seek to expand your knowledge of statistical interpretation.

I hope you have found this article informative and helpful on your journey toward becoming an evidence-informed educator. Remember, next time you see an effect size being reported in a study, consider the whole scope of the study before determining how to interpret its outcomes; and trust the author’s interpretation. Stay away from those who would provide a single interpretative rule and ask hard questions of those trying to use effect size to sell you a product or service. Good luck on your journey friends and let me know how I can help.