Have you ever needed a measure of variability for a nominal scale variable? I did recently. I needed a way of measuring the diversity of corporate boards of directors. (I’ll give examples later.) Statistics texts focus on measuring variability of continuous variables (e.g., range, inter-quartile range, variance, standard deviation), but are remarkably silent on the matter of measuring variability in categorical variables. Kader, G. D., & Perry, M. (2007). Variability for categorical variables. Journal of Statistics Education, 15(2), 1-16, provide one measure, however, with the unlikely name “coefficient of unalikability,” abbreviated with the symbol, u2. The logic behind the unalikeability statistic takes them 16 pages to explain, but we can cut to the computational formula and see how it works with some examples.
Here are some data to illustrate the unalikeability statistic. The variable is Religious Preference, with four categories: Christian, Jewish, Muslim, Hindu. Shown next are frequency distributions for two samples, one with no variability on the variable, and the second with maximum possible variability.
Sample 1: shows very little variability or diversity in religious preference
|N = 8|
Sample 2: shows maximum possible variability or diversity as cases are evenly distributed across categories of the variable
|N = 8|
Next, here’s the formula for the unalikeability statistic:
u2 = 1 – Sum of the squared proportions
In words: (1) square the proportions associated with each category of the variable; (2) add these squared proportions; (3) subtract that sum from 1.
For the first distribution: u2 = 1 – (1^2 + 0^2 + 0^2 + 0^2) = 0
For the second distribution: u2 = 1 – (.25^2 + .25^2 + .25^2 + .25^2) = .75
You can see that where there is no variability or diversity, u2 takes on a value of 0. With a more variable or diverse distribution, the value of u2 increases appropriately to reflect that greater variability.
The only thing that’s annoying about the unalikeability statistic is that the maximum value of u2 is different depending on the number of categories. For a two-category variable (like sex, for instance), the highest possible value is u2 = 0.50. For a three-category variable, maximum u2 = 0.67. For a four-category variable, maximum u2 = .75. What this means is that one can’t use u2 to compare levels of variability in categorical variables that contain different numbers of categories. However, u2 does at least allow us to measure differences from one group to the next on the same categorical variable.I suppose that this limitation of the unalikeability statistic really isn’t that limiting. After all, we can’t compare the variances or standard deviations of two different variables either because those measures of variability are influenced not only by actual data variability but also by score magnitude.