Research Design and Statistical Consulting

George M. Diekhoff, Ph.D.

In Tabachnick and Fidell’s multiple editions of Using Multivariate Statistics, they offer formulas for performing square root, log10, and reciprocal data transformations, the point of which is to normalize the data distribution. This is a relatively straightforward process when working with correlational statistics, but it gets a little more confusing when one is trying to normalize the data for a factorial ANOVA. Remember that the assumption in factorial ANOVA is that the distributions should be normal within each cell of the factorial design. In the formulas that the authors provide, they mention constants “k” = “the largest score + 1” and “c” = “a constant added to each score so that the smallest score is 1.” In factorial ANOVA, the question arises, is “k” the largest score + 1 in each cell? That is does one use a different value of k in transforming the data in each cell? Or is k the largest score across all the cells of the design + 1. Dr. Tabachnick was kind enough to clarify this for me. The constant k is found by determining the largest value of the dependent variable in any of the cells of the design, then adding 1 to that value. Similarly, the value c is created so that the smallest score across all the cells is 1.

One other comment about attempting to normalize data in the cells of a factorial ANOVA: Good Luck! The same transform must be applied in all cells and it’s almost certainly going to make matters worse in some cells while improving the distributions in other cells. In other words, an idea that sounds good in theory isn’t actually very useful most of the time. One might be better off to use nonparametric statistics when they’re available, or to make sure that sample sizes in all cells are reasonably large (e.g., >30) so that the Central Limit Theorem ensures that the sampling distribution of the means is normally distributed even if the samples are not. If there are no nonparametric alternatives and your sample size is small, you can always just use a more stringent (p < .001) level of significance in evaluating the significance tests and caution your readers that the normality assumption was violated.