Monday, November 5

Grouped Frequency Distributions

Problem: You've collected data, but there are too many unique values to make a frequency table practical.

Answer: Consider a Grouped Frequency Distribution. A grouped frequency distribution splits your data into groups, called classes. A range of values defines each class, so group frequency tables are appropriate for both discrete and continuous data.


The most difficult aspect of using a grouped frequency distribution is often that of deciding upon the size of each class. Since your original data is "lost", it's important to partition (split) your data into classes that make finding patterns within the data easy. Typically, ten classes is considered a "good" number of groups.


Random.org is a pretty cool website where you can learn all about different facets of "randomness." But, how random are their results? To test the ability of Random.org to generate a truly random list of 50 decimals, we'll analyze the results with a grouped frequency distribution. Using a standard frequency table is probably not a good idea, since the random decimals will range between 0-1 (which would potentially require 101 rows).

Here are some "random" results:


0.41 0.95 0.35 0.17 0.93 0.32 0.86 0.92 0.20 0.98
0.51 0.37 0.23 0.80 0.07 0.51 0.14 0.04 0.03 0.09
0.11 0.50 0.85 0.30 0.16 0.97 0.06 0.17 0.38 0.69
0.58 0.59 0.82 0.95 0.22 0.76 0.16 0.23 0.74 0.40
0.23 0.91 0.48 0.98 0.93 0.84 0.27 0.32 0.64 0.52

To make a grouped frequency distribution, we'll need to first divide our data into classes. How many classes should we use? It depends on the experiment... For this experiment, our expected outcome is that there is a similar occurrence of decimals across the entire range of 0-1. So, splitting our data into two class is probably not enough. Let's use ten classes, each with a class width of 0.1 . Class width is the "range" of each group in your grouped frequency table.

Typically, class width is calculated with the following formula:

class width = ( largest data value - smallest data value ) / number of classes

For us, this would be:

class width = (0.98 - 0.03)/10 = 0.95/10 = 0.095



But, we'll use 0.1 as a class width, mostly because it's much easier to work with:

So, are the results really "random"? The observed outcome from this experiment seems to say "NO." Why? Well, since there are 10 classes, and 50 pieces of data, a random distribution of data would place about 5 numbers in each class. However, some intervals have only 2 numbers, while others have as many as 9.

But, is testing only 50 numbers a fair test? Why not?