Limitations and alternative tests

Copyright © Richard B. Darlington. All rights reserved.

The Wilcoxon test loses power because, even if there are no ties in any of the senses mentioned earlier, there are still many rank-sums that tie with each other. For instance, an S of 6 might be 1+2+3, or might be 2+4, or might be 1+5, or might represent a single rank of 6. Recall that p is the probability of finding a value of S smaller than or equal to the one observed. Thus the fact that four different patterns in this example all yield S-values of 6, means that all these patterns are counted in the p computed when any one of the patterns is observed. That raises the values of p and thus lowers the power of the test. An alternative test, based on normal scores, avoids this disadvantage.

When either the Wilcoxon test, or the alternative normal-scores test, is applied to gain scores, surprisingly it does not allow the conclusion that the pretest and posttest scores differ. Consider for instance the following set of hypothetical data:

Pretest scores 1 2 3 4 5 6 7 8 9 10 11 12
Posttest scores 2 3 4 5 6 7 8 9 10 11 12 1
Gain scores 1 1 1 1 1 1 1 1 1 1 1 -11
Notice that the distribution of the pretest scores exactly equals that of the posttest scores. The gain scores have a mean of exactly 0, but are highly skewed. When the Wilcoxon test is applied to the gain scores it yields p = .017 (one-tailed), even though the pretest and posttest distributions are identical. The straddle test [hypertext link here] avoids this problem, allowing the conclusion that pretest and posttest distributions really do differ.