# Significance vs. Magnitude: How to interpret statistics for decision making.

Matthew B. Courtney, Ed.D. | May 2024 | 5 Minute Read

When using research to examine continuous improvement efforts, many educators get stuck when it comes to interpreting the results of statistical tests. Don’t feel bad – most of us only had one statistics class in college and it was probably focused solely on interpreting the results of standardized tests. While the lessons you learned in college are valuable, it can sometimes be challenging to apply those test and measurements rules to other contexts. In this article, I want to explore the ideas of statistical significance and magnitude and give you some tips for using these measures to inform your continuous improvement decision making.

## Statistical Significance

##

Statistical significance is the first measure often reported by researchers deploying a technique called null-hypothesis testing, or NHT. Broadly speaking, in an NHT, the researcher has two or more groups of study participants who receive different experiences. At the end, the researcher deploys statistical tests to determine if the groups of participants had statistically different outcomes. Common NHTs include the t-test and the ANOVA. When deployed on populations assigned through random sampling, these tests can help researchers understand the impact of specific activities.

Statistical significance is reported using the p-value. The p-value is the probability that an outcome occurred by random chance. The p-value is an oddly controversial value, and possibly one that many scientists don’t truly understand. If you’re experiencing statistical imposter syndrome, you should take comfort in the fact that the nuances of p-value continue to baffle those who use it every day.

When it comes to interpreting the p-value, there is one commonly accepted rule. The lower the p-value, the stronger the statistical significance. It is commonly accepted that the results of a statistical tests are significant if the p-value is below 0.05 and strongly significant if it is below 0.01 or 0.001. You will always see the p-value reported along with statistical outcomes in educational research papers. Memorize the three thresholds and you will be well on your way to understanding the results of statistical tests.

## Statistical Magnitude

In addition to statistical significance, many researchers will also report on the magnitude of the difference between their groups. Magnitude is sometimes called “practical significance” because it quantifies the difference between two groups in real numbers. The most common measure of magnitude is effect size, usually using a formula called Cohen’s d.

Effect size is calculated by dividing the difference between the means by the pooled standard deviation. If that formula sounds complicated, never fear. You can upload your data into the tools found on this website to instantly calculate the effect size, as well as other statistical tests, for your distributions of scores.

To put effect size into layman’s terms, the formula provides a standardized way to say that the average of group one is this far away from the average of group two. Effect size can also be used to examine the difference between pre-test and post-test scores. In that context, it provides a standardized way to say the group experienced this much change between test administrations.

Unlike statistical significance, the higher the effect size the stronger the effect is determined to be. It is commonly accepted that an effect size is small when d=0.20, medium when d=0.50, and large when d=0.80. Of course, you will rarely have an effect size that is exactly d=0.80, so these numbers provide a rough guide to help you think about effect size results on a spectrum.

## What does all this really mean?

When using research to make decisions about new interventions, programs, or strategies for your school, properly interpreting the results of statistical testing is an important element. Generally speaking, education leaders will want to focus their attention on research in which the results were statistically significant. In fact, the definition of an evidence-based practice under the Every Student Succeeds Act (ESSA) insists that a practice is not evidence-based unless it is supported by statistically significant outcomes. If you are using research for ESSA compliance, you must check the p-value and reject studies with a p-value greater than 0.05.

In my opinion, this rule is a little limiting. There are many things that can impact the statistical significance of a study that have nothing to do with the impact of the intervention. For example, studies with small samples or samples that lack variability and diversity may result in skewed p-values. While these studies may not be ESSA compliant, they may still offer value to the education leader.

Whether you call them measures of magnitude, practical significance, or simply effect size, these values can help education leaders understand how much change they can expect to see in their schools if they deploy a new strategy. The higher the effect size of a study, the greater the impact we can expect to see in our students – assuming you implement with fidelity.

Having said all that, these measures of statistical and practical significance are only one factor in the decision making process. Education leaders must consider the cost of a new strategy, implementation factors such as time and training, monitoring procedures, how a strategy meshes with the other efforts already underway, and whether or not a new strategy aligns with the school’s culture and climate. An education leader who chooses an intervention based solely on the effect size of a study is making just as poor a decision as the leader who doesn’t consider the effect size at all.