Research Design and Statistical Consulting
George M. Diekhoff, Ph.D.

Avoid Collecting Data Using Categories

I work a lot with data that my clients have collected on their own, that is, without the benefit of guidance during the research design phase of the project. That’s great, and I’m happy to assist those who already have their data, but it does mean that sometimes the data don’t have all the qualities we might like. 

Here’s an example. We often collect data on variety of demographic variables, if only so that we can later describe the characteristics of our samples. Age is one of the demographic variables that we commonly collect, and there are good ways and bad ways of collecting this information. Too often, folks pursue one of these “bad” approaches. 

Take this for example: 

What is your age? (check one)
18 and younger _____
19-24 _____
25-30 _____
31-36 _____
37-42 _____
etc. 

Suppose you get the following frequency counts in each category: 

What is your age? (check one)
18 and younger f = 15
19-24 f = 11
25-30 f = 23
31-36 f = 19
37-42 f = 46
etc. 

Now it’s time to do some simple descriptive statistics. What’s your sample’s mean age? What’s the standard deviation? Do you see the problem? Because you’ve collected age data in categories, you don’t really know what anyone’s exact age is. And because you don’t know the ages of your participants, you can’t calculate the mean, standard deviation, or any other statistic that involves the Age variable.

s