Research Design and Statistical Consulting
George M. Diekhoff, Ph.D.

Another Way of Spotting Low-Quality Survey Data: Identifying Flat-Liners and Random Responders

A variety of methods are available to help identify survey respondents who, for whatever reasons, have not provided thoughtful responses to the survey questions. This entry describes one such approach based on calculating a measure of the variability of each respondent’s ratings.
Suppose that a series of rating scales all measure the same construct and that they all measure this in the same direction (i.e., items that need to be reverse-scored have been reversed). Any instrument that displays a Cronbach’s alpha of .80 or higher would illustrate this. In that case, respondents who possess a certain amount of the attribute should show some variability in their ratings across those items, but not too much and not too little. Too little variability would suggest that the respondent isn’t thinking carefully enough about subtle differences between the items. Too much variability would suggest that the respondent is responding almost at random.
Look at the following ratings on a 1-7 scale for five items intended to measure procrastination. Low scores indicate low levels of procrastination and high scores indicate high levels. Rows 1-3 come from respondents who were careful in giving ratings about their tendency to procrastinate; rows 4-6 are from respondents who are giving honest responses, but aren’t thinking very carefully about the subtle differences from question to question; and rows 7-9 are from respondents who are responding randomly or in some pattern, perhaps just wanting to finish the survey quickly.
5 6 5 5 7
1 2 2 1 3
3 4 3 3 5
6 6 6 5 6
1 1 1 1 2
4 4 4 4 4
1 7 6 4 7
1 3 5 7 1
1 7 1 7 1
If the variables (items) are named Item1 thru Item 5, the following syntax will calculate the standard deviation of each participant’s ratings:
COMPUTE StdDev = SD(Item1, Item2, Item3, Item4, Item5).
COMPUTE StdDev = SD (Item1 TO Item5).
The following standard deviations would result for the nine respondents shown: .89, .84, .89 .45, .45, .00, 2.55, 2.61, 3.29
You can see that standard deviations that are either too low (rows 4, 5, 6) or too high (rows 7, 8, 9) are indicative of low quality data. In any given data set, what constitutes “too low” or “too high” needs to be determined empirically by examining a frequency distribution of the standard deviation values and seeing how various standard deviations match up with different response patterns.