The Distribution Analysis Tool
The Distribution Analysis Tool (DAT) is designed to allow teachers to easily and quickly analyze their student data. With just a few clicks, the DAT will automatically read a column of data from a spreadsheet, calculate seven descriptive statistics, and create two graphs. When used regularly, this tool will help save time and promote the use of data to improve instruction. This article will provide detailed technical directions for using the tool.
About the Tool
The DAT is a Shiny Web Application developed by Matthew Courtney in 2020. It uses the R statistical programming language to read data off of a spreadsheet and create a summary of any column. The DAT is hosted on the shinyapps.io server.
Preparing your Data
Attention should be paid to the proper preparation of your data before uploading it to the DAT. The DAT will return accurate calculations and visualizations for whatever data you upload, but it cannot account for mistakes in your original data worksheet. If you are pulling your data from a standardized gradebook or testing system, it is likely ready to go with very little preparation.
When preparing your data, you should ensure that you follow the principles of tidy data. This means that each column contains a variable (like a test score) and each row contains an observation (like a student). You should also ensure that the columns you wish to examine contain numerical values. For example, a test score of eighty-nine percent should be recorded in the column as 89 or 0.89 and not 89%.
You should also ensure that each score in your column is formatted the same. If we take the previous test score example, you want to make sure that each test score is either a whole number, like 89, or a decimal point, like 0.89. The DAT cannot tell the difference between these variables and this will cause you to have incorrect outcomes.
Finally, you should make sure that each column has a header that is easily recognizable. You will need this to be able to accurately select the correct column in the DAT.
The DAT will examine whichever column you tell it to, so you do not have to remove columns with text or other variables to use the DAT. Just ensure that the column you want to examine is properly formatted. Having said that, you should NEVER upload personally identifiable information for yourself or your students to the internet. While the information uploaded into the DAT is not permanently stored, personally identifiable data is always vulnerable to cyber-attack. Do not upload personally identifiable data for either yourself or your students to the DAT.
When your data is clean and ready, save the file as a .CSV file. CSV stands for comma separated values. This is a common file format for transferring large amounts of data quickly and efficiently. The DAT will only read a .CSV file.
Using the DAT
Using the DAT to analyze your data is simple. First, upload your .CSV file by selecting the “Browse” button in the grey box. This will open a window that will allow you to find the file. Select the file and click “Open”.
The DAT will automatically upload your spreadsheet. This process is normally pretty quick, but the time it takes to upload will vary greatly depending on the size of your file. An upload progress bar will light up under the browse box.
When the DAT has completed its upload, it will automatically display the summary information of the first column of your spreadsheet in the white space. You can change which column is being reviewed by selecting it from the drop-down menu in the grey box. The DAT will automatically update the statistics and graphs for whichever variable you select.
All of the information presented by the DAT is static – meaning that you cannot change or customize it. You can, however, copy and paste the information into a document or slideshow presentation to easily share the result with your colleagues. You can also save the graphs by right clicking on the graph and selecting “Save Image As” from the menu.
Interpreting the Results
The DAT will return seven summary statistics and two visualizations to help you interpret the meaning of your data set. While the DAT will quickly and accurately summarize your student data, it will not tell you what that data means. It is up to you to apply local context and your own background information about your students to derive meaning from the data. The DAT will present the following outputs:
Mean – The mean is the average of your distribution. It is a measure of central tendency that allows you to summarize a distribution.
Median – The median is the middle number in a distribution. When you compare it to the mean, the median can help you see if your data is skewed.
Mode – The mode is the number that shows up most often within a distribution.
Standard Deviation – The standard deviation is a measure that tells you how spread out your data is. The smaller the standard deviation, the closer together your students scored.
Minimum – The minimum is the lowest number in a distribution.
Maximum – The maximum is the highest number in a distribution.
Range – The range is the difference between the highest number and the lowest number.
Histogram – A histogram is a visualization that helps you see how your student data is clustered. Histograms break your students down into groups, called bins. The height of the bar tells you how many students scored within a given bin.
Boxplot – A boxplot is a visualization that helps you see how your scores fell within the distribution. The bold line in the middle is the median. The top half of the box shows the quartile of scores above the median while the bottom half shows the scores below. The whiskers show you the highest and lowest scores. Any outlier scores are shown with little dots above or below the whiskers.