top of page

Technical Directions

The Distribution Analysis Tool - Technical Directions

​

This tool is a simple spreadsheet analyzer that allows you to upload any type of spreadsheet file and analyze a selected column. It calculates summary statistics such as mean, median, mode, minimum, maximum, range, and standard deviation, and generates a boxplot and histogram for the selected column. It is an educational tool and not suitable for analyzing sensitive or confidential data. When used regularly, this tool will help save time and promote the use of data to improve instruction. This article will provide detailed technical directions for using the tool.

 

About the Tool

The Distribution Analysis Tool is a Shiny Web Application developed by Matthew Courtney in 2023. It uses the R statistical programming language to read data off of a spreadsheet and create a summary of any column. The DAT is hosted on the shinyapps.io server.

 

Preparing your Data

​

  1. Make sure your data is in a spreadsheet file format, such as .xls, .xlsx, or .csv.

  2. Ensure that your spreadsheet file does not contain personally identifiable information or any other sensitive or confidential data.

  3. Ensure that the column you want to analyze contains numerical data. If the column contains non-numerical data, such as text or dates, you will need to clean and prepare the data by converting it to numerical data.

  4. Remove any empty cells or rows from your data to avoid errors during analysis.

  5. Check for and remove any duplicates in your data to avoid skewing your analysis.

  6. If your data contains missing values, decide how to handle them before analysis. One option is to remove any rows or cells with missing values. Another option is to fill in missing values with a reasonable estimate, such as the mean or median of the column.

  7. If you have a large dataset, consider reducing the size of your data to improve the speed of analysis. You can do this by selecting a subset of the columns or rows in your dataset that are most relevant for your analysis.

  8. Ensure that your data is accurate and reliable by validating the data before analysis. Check for errors and inconsistencies in your data, and correct any issues that you find.

  9. Finally, when uploading your data into the tool, make sure to select the correct file and column to analyze. Double-check the data to ensure that you have selected the right file and column, and that the data is clean and prepared for analysis.

​

 

Using the DAT

  1. Click the "Choose a spreadsheet file" button and select the file you want to analyze. Please note that the tool can handle any type of spreadsheet file, including .xls, .xlsx, and .csv files.

  2. Once the file is uploaded, a dropdown menu will appear. Select the column you want to analyze from the dropdown menu. Please ensure that the column you select contains numerical data.

  3. The tool will automatically generate the following summary statistics for the selected column.

    1. Mean: This is the average value of the selected column. It is calculated by adding up all the values in the column and dividing by the number of values.

    2. Median: This is the middle value in the selected column. If the number of values in the column is even, the median is the average of the two middle values.

    3. Mode: This is the most frequently occurring value in the selected column. If there is more than one value with the highest frequency, the mode will display as "NA".

    4. Min: This is the smallest value in the selected column.

    5. Max: This is the largest value in the selected column.

    6. Range: This is the difference between the maximum and minimum values in the selected column.

    7. SD: This is the standard deviation of the selected column. It measures the amount of variation or dispersion in the data.

  4. The tool will also generate a boxplot for the selected column. A boxplot is a graphical representation of the summary statistics described above. The box represents the middle 50% of the data, the horizontal line within the box represents the median, the whiskers represent the range of the data, and any points outside the whiskers are considered outliers.

  5. The tool will also generate a histogram for the selected column. A histogram is a graphical representation of the frequency distribution of the data. The x-axis represents the range of values in the column, and the y-axis represents the frequency of each value.

  6. To interpret the summary statistics, you can use them to describe the distribution of the data. For example, if the mean and median are close in value, the data is likely normally distributed. If the mode is different from the mean and median, the data is likely skewed. The range and standard deviation can also help you understand the spread or variability of the data.

  7. To interpret the boxplot, you can use it to identify any outliers or patterns in the data. If there are outliers, they may represent errors in the data or unusual observations. If the box is small and the whiskers are long, the data is likely spread out and may have a large range.

  8. To interpret the histogram, you can use it to identify the shape of the distribution. If the histogram is bell-shaped, the data is likely normally distributed. If the histogram is skewed to the left or right, the data is likely skewed. The height of the bars in the histogram represents the frequency of each value.

  9. Please note that this tool is for educational purposes only and should not be used for sensitive or confidential data, including personally identifiable information. If you are uncertain whether your data is appropriate for analysis with this tool, please consult with an expert in the field.

​

References

bottom of page