Copyright © Richard B. Darlington. All rights reserved.

Measures for Ordered Categories

Kendall tau, Spearman rho, Goodman-Kruskal gamma, and Yule Q

A scale consisting of ordered categories can be thought of as a set of ranks with a great many ties. Therefore we begin this section with a brief review of rank correlation. The two best-known measures of rank correlation are the Spearman rho and the Kendall tau. When there are no ties, rho is simply the Pearson correlation applied to the ranks. Spearman showed that for untied ranks, the Pearson formula reduces to

rho = 1 - 6 SUM(d2)/(N3 - N),

where di is the difference between the ranks of case i on the two variables. An obvious way to extend this approach to tied ranks is to simply use the Pearson formula to correlate the ranks, regardless of the number of ties. However, this approach yields an index with no good interpretation. The Pearson correlation itself draws its meaning from the fact that it is proportional to the linear regression slope, and in a set of ordered categories the assumption of linearity is extremely implausible.

The other well-known measure of rank correlation is the Kendall tau. Like rho, in its original form tau assumes an absence of tied ranks. Tau uses the concepts of concordance and discordance--concepts fundamental to all the recommended measures in this section. Two cases are said to be concordant on two ordinal variables X and Y if the case higher on X is also higher on Y, and are discordant if the case higher on X is lower on Y. The two cases are neither concordant nor discordant if they are tied on X or Y or both. Let the number of concordant and discordant pairs be denoted Con and Dis respectively. The formulas in this section are intended to be conceptual only; if you plan to compute any of these measures by hand, see the later section on computational formulas.

The Kendall tau, which assumes no ties, is

tau = (Con - Dis)/total number of pairs

As mentioned above, ordinal scales have many ties, so for our purposes it is very important how one modifies this formula to handle ties. The Kendall tau-b and Stuart tau-c correct the original tau value for ties in different ways, but neither is as widely used as gamma, defined as

gamma = (Con - Dis)/(Con + Dis)

Thus gamma corrects for ties simply by ignoring all pairs tied on either X or Y. Gamma is +1 if there are no discordant pairs, -1 if there are no concordant pairs, and 0 if there are equal numbers of concordant and discordant pairs. When gamma is applied to a 2 x 2 table with cell frequencies labeled

gamma reduces to the Yule Q, whose formula is

Q = (AD - BC)/(AD + BC)

Q is the only measure in this document which is doubly margin-free; gamma is not doubly margin-free when r or c exceeds 2.

Gamma is the best-known single measure for correlating two ordinal scales. However, it has some limitations that are overcome by two alternative statistics: Somers D and Symmetrically Adjusted Gamma (SAG).

Prediction, Causal Analysis, Somers D, and RBC

When two cases are tied on a criterion variable in a prediction problem, then it is reasonable not to count that pair when assessing the accuracy of a predictor variable, since no predictor could distinguish between them. But if two cases are tied on the predictor but not on the criterion, then that pair should be counted as a failure of the predictor variable to distinguish between cases. Thus a measure of predictive accuracy should be lowered by many of the ties on the predictor but not by ties on the criterion, while gamma treats both kinds of ties identically.

The importance of this point is illustrated by the table

Let these 6 frequencies be denoted by
in the order shown. Suppose we want to predict the column variable from the row variable. Let the upper left-hand corner of this table be the corner high on both variables. Then Con = AD + AF + CF = 30*20 + 30*30 + 20*30 = 2100 and Dis = BC + BE + DE = 10*20 + 10*10 + 20*10 = 500, so gamma = (2100 - 500)/(2100 + 500) = .615. Now suppose we collapse the first two rows into one, creating the modified table
Now Con = 50*30 = 1500 while Dis = 10*30 = 300 so gamma = (1500 - 300)/(1500 + 300) = .667. Thus gamma has increased from .615 to .667, even though the predictor scale with two levels is clearly discriminating less accurately than the one with three levels. Furthermore, in the original 3 x 2 table the difference between rows 1 and 2 is just as large as the difference between rows 2 and 3; it's not as if gamma had increased because we had combined two rows that were nearly identical to each other. The point is that many of the ties on a predictor variable should count against that variable when assessing its accuracy, and gamma does not do that.

A similar argument applies to causal analysis. When independent and dependent variables are continuous, the standard measure of effect size is the regression slope, which equals the estimated difference in Y for two cases that differ one unit on X. More generally, a measure of effect size should measure the differences between cases on a dependent variable given that the cases differ on the independent variable. Thus pairs tied on the independent variable should not be counted in measuring effect size. But if two cases are tied on the dependent variable but not on the independent variable, that should lower the measured effect size. Thus when measuring effect size, as when measuring predictive accuracy, ties on the column variable should be treated differently from ties on the row variable.

The Somers D statistic is a modification of gamma designed to handle the problems just described. The conceptual formula for D (not recommended for actual computation) is

D = (Con - Dis)/number of pairs in different columns

To compute D, think of all possible pairs including each case paired with itself. If we count pair 1-2 as different from pair 2-1, then the total number of pairs is N2. Similarly, the number of pairs with both members in column 1 is ct12, where ct1 is column total 1. By extension, the total number of pairs with both members in the same column is SUM(ct2), where the summation is across columns. Therefore the number of pairs in different columns is N2 - SUM(ct2). This value corrects for the fact that we counted each case paired with itself as a pair, because such pairs are first counted in N2 but then subtracted out in SUM(ct2), so they are ultimately not counted.

But this measure does count each pair twice by counting 1-2 and 2-1 as separate pairs. Ordinarily Con and Dis are defined without this double counting. Therefore to put the numerator and denominator of D in similar units we should double the numerator. Thus a computing formula for D is

D = 2(Con - Dis)/(N2 - SUM(ct2))

When we apply D to the 3 x 2 table of this section, as with gamma we find Con = 2100 and Dis = 500. The column totals are both 60 and N = 120, so we have N2 - SUM(ct2) = 14400 - 2*3600 = 7200. Therefore D = 2(2100 - 500)/7200 = .444. For the collapsed 2 x 2 table we have Con = 1500 and Dis = 300. The denominator of D remains at 7200, so D = 2(1500 - 300)/7200 = .333. Unlike gamma, D correctly shows that a loss of predictive power resulted from collapsing rows 1 and 2 together.

The Somers D similarly corrects the limitation of gamma in measuring the size of a causal relationship. But it is important to place variables in the proper roles. In prediction problems, the formulas of this section assume the predictor variable is the row variable and the criterion variable is the column variable. But for causal problems, the same formulas assume the row variable is the dependent variable and the column variable is the independent variable. The earlier section on lambda-max explains why these rules are reasonable.

For tables with 2 columns, Somers D is equivalent to the rank-biserial correlation, here denoted RBC. RBC can be used to study the difference between two groups on a ranked variable. Its formula is simple:

RBC = (2/N) (mean rank in group 2 - mean rank in group 1)

Ties are handled in the obvious way--assigning the mean of the tied ranks to any ties. RBC ranges from -1 to 1, reaching these extremes whenever all cases in one group exceed all those in the other. RBC may differ substantially from the more familiar point-biserial correlation, which equals +1 or -1 only if all scores within each group are equal.

In 2 x 2 tables, Somers D is equivalent to lambda-max in that lambda-max = |Somers D|. This document reports many equivalencies between measures, but this is the most remarkable because Somers D is derived in terms of pairs of cases while lambda-max is derived in terms of single cases. Thus if either measure is used in a 2 x 2 table, it can be explained and justified in either or both ways.

A Symmetric Measure: Symmetrically Adjusted Gamma (SAG)

The aforementioned limitations of gamma also apply when rows and columns are treated identically. To see this, consider the 3 x 3 table


As before, let the upper left corner be the corner high on both variables. Then a little calculation reveals Con = 5900, Dis = 2300, gamma = .439. If we combine rows 1 and 2, and also combine columns 1 and 2, the table reduces to

and we find Con = 3000, Dis = 900, gamma = .538. Contrary to our usual intuitive meaning of association, combining rows and columns has substantially increased gamma. Once again, it cannot be claimed that the rows and columns combined were especially similar to each other; the difference between rows 1 and 2 is just as large as that between rows 2 and 3, and the same applies to columns.

For another limitation of gamma, consider the two 5 x 5 tables below. Again let the upper left corner be the point highest on both row and column variables. In each of the two tables A and B, there are 5 cells with 8 cases each and 20 empty cells. Gamma is 1 in both tables, because neither table has any discordant pairs. But most people would say intuitively that association is higher in table A than in B, because so many pairs in B are tied on either row or column.

Table A

Table B

Both these limitations of gamma are easily fixed; use a statistic whose numerator is like gamma's, but include in its denominator all pairs not actually falling in the same cell. By reasoning similar to that used with Somers D, the number of such pairs is N2 - SUM(t2), where t denotes "cell total" and the summation is across all rc cells. We'll denote the statistic with this denominator as SAG for "symmetrically adjusted gamma". As for Somers D we must multiply the numerator by 2 to correct for double counting in the denominator. Thus we define

SAG = 2(Con - Dis)/(N2 - SUM(t2))

where Con and Dis are computed without double counting of pairs.

When SAG is applied to the 3 x 3 table of this section, we have N = 190, SUM(t2) = 4400. As before we have Con = 5900 and Dis = 2300, so SAG = 2(5900 - 2300)/(1902 - 4400) = .227. In the collapsed 2 x 2 table we have Con = 3000, Dis = 900, N = 190, SUM(t2) = 12700, SAG = .179. As expected, SAG shows that combining rows and columns lowers the association.

When SAG is applied to the two 5 x 5 tables just given, the denominator for either table is 402 - 5 x 82 = 1280. For table A it turns out that Con = 640, so SAG = 2 x 640 / 1280 = 1. But for table B, Con = 162 = 256, so SAG = 2 x 256 / 1280 = .4. Thus SAG corresponds to intuition by yielding a substantially lower value for table B than for A.

One could of course define a statistic like SAG whose denominator consists of all pairs, not merely all pairs in different cells. However, such a statistic could never attain a value of 1 so long as any cell contained more than one case--a property that defies our intuitive meaning of "perfect association" for problems of this sort.

As mentioned earlier, gamma, Somers D, and SAG all have ratio-scale utility interpretations in which you win $1 for every concordant pair, lose $1 for every discordant pair, and win $0 for every other pair. Then all these measures equal the mean number of dollars won across a specified set of pairs. For gamma the set is all concordant or discordant pairs, for Somers D the set is all pairs in different columns, and for SAG the set is all pairs in different cells. Thus the three measures always have the same sign, and |gamma| >= |Somers D| >= |SAG|.