Copyright © Richard B. Darlington. All rights reserved.

Asymmetric Margin-Bound Measures Applicable to any Table

These measures are applicable to any table since they do not require ordered categories and do not require that the row and columns have the same categories. The measures on this page seem useful primarily for measuring the accuracy with which the column variable can be predicted from the row variable in the sample at hand.

Lambda

Lambda was mentioned in the literature at least as early as 1941, but was given its current name by Goodman and Kruskal in 1954. Except for the last paragraph, this section merely summarizes the discussion of lambda in the special introductory section, so readers who studied that can skip to the final paragraph of this section.

To understand lambda, imagine rational judges trying to guess the column membership of cases drawn from a sample, when each judge knows the cell frequencies for the sample. Each case is drawn exactly once and then replaced, but the order in which cases are drawn is random. An "informed" judge is told the row membership of each case before guessing, while a "blind" judge is not. Both judges try to maximize the number of correct guesses. Then lambda is the difference between the numbers of correct guesses of the two judges, expressed as a proportion of the difference between the blind judge and a hypothetical perfect judge.

The rational blind judge will always guess the column with the largest total frequency, since that gives him or her the highest chance of being right. For similar reasons, the rational informed judge will always guess the column with the largest cell frequency in the row in which he or she knows each case to fall. For instance, consider the frequency table

IIIIIIIV
A36*111521
B142851*32
C1845*3212
Total68849865

The blind judge would note that column 3 has the largest total frequency, so that judge can maximize the number of correct guesses by guessing column 3 for all cases, thus making 98 correct guesses. The highest frequency in each row is starred; the informed judge would make those guesses, and would thus make 36+51+45 or 132 correct guesses. A hypothetical perfect judge would be correct for all 315 cases. Therefore lambda = (132 - 98)/(315 - 98) = .157.

Lambda has a frequency interpretation. In our terminology, lambda defines the target set of cells to include the cell with the highest frequency within each row. Once the target set is defined, lambda fits the basic formula (observed - null)/(max - null) required for a frequency interpretation. Lambda has difference proportionality, but lacks unique zero.

MR

MR (for "multiple ranks") equals lambda when there are only two columns, but is designed to overcome a limitation of lambda that can arise when there are 3 or more columns, as in the table

IIIIII
A50491
B50248
Total1005149

This table shows substantial association in one sense, but lambda is zero because the same column (column I) has the highest frequency in each row. Thus the judge who knows row membership will make the very same guesses as the judge who does not, and lambda will be zero.

If you think of judges as getting points for the accuracy of their guesses, then lambda assumes that each judge receives one point for every correct guess. In MR we imagine that the judge doesn't merely guess one column, but rather ranks the columns in the order of their probability, with the least likely column ranked 1. Then the judge receives a number of points equal to the rank the judge assigned to the column the case was actually in. Otherwise MR is defined like lambda; MR equals the difference in points received by informed and blind judges, expressed as a proportion of the difference between perfect and blind judges. Thus MR has a weighted frequency interpretation.

For the 2 x 3 table shown above, the blind judge works with just the column totals, basing guesses on the fact that columns 1, 2, 3 have successively declining totals. The blind judge thus assigns a rank of 3 to column 1, 2 to column 2, and 1 to column 3. The column totals are respectively 100, 51, 49, so the number of points this judge receives is 3*100 + 2*51 + 1*49 = 451. The informed judge receives 3*50 + 2*49 + 1*1 or 249 points for guesses in row 1 plus 3*50 + 2*48 + 1*2 or 248 points for guesses in row 2, or 497 points total. The hypothetical perfect judge receives 3 points for each of the 200 cases, scoring 600. Thus MR = (497 - 451)/(600 - 451) = .3087. Recall that lambda for this table is 0. However, MR always equals lambda when there are only two columns.

Both lambda and MR lack the unique-zero property. The table

991
5149
lacks independence, but both lambda and MR are zero because the same column (column 1) has the highest freqency in each row, so a judge would make exactly the same guesses with or without knowing row membership. However, in tables with three or more columns MR generally comes closer to having unique zero than lambda, since MR is 0 only if all column frequencies have exactly the same ranks in all rows, while lambda is zero whenever the same column has the highest cell frequency in all rows.

Lambda is affected far more by cell frequencies well above their expected values than by those below, while MR is more symmetric in this respect. Consider for instance three 6 x 6 tables. Except for the cells in the upper-left-to-lower-right diagonal, suppose each table has a frequency of 100 in each cell. In that diagonal, all entries in table A are 0, all in table B are 100, and all in C are 200. Table B exhibits exact independence, and in that table both lambda and MR are 0. In table C, both measures are .143. But in table A, MR is higher still at .200, while lambda is only .04. Yet proportionally speaking, the A-B difference between 0 and 100 is far larger than the B-C difference between 100 and 200. This property of lambda seems odd and undesirable for many purposes.

MP

MP stands for "multiple probabilties." As with lambda and MR, imagine judges predicting column membership. For MP the judge does not merely rank the likelihoods of the various columns, but assigns an actual probability to each column. The judge then receives points equal to the probability assigned to the column the case was in. Then the measure of association is computed the same way as before, from the point totals of blind, informed, and perfect judges.

We illustrate with the last 2 x 2 table above. The blind judge notes that 150 of the 200 cases are in column 1 and thus assigns probabilities of .75 and .25 to the two columns. The points won by this strategy are 150*.75 + 50*.25 or 125. The informed judge assigns probabilities of .99 and .01 to the two columns for row 1, winning 99*.99 + 1*.01 or 98.02 points for that row. For row 2 this judge wins 51*.51 + 49*.49 or 50.02 points, making 148.04 points altogether. The hypothetical perfect judge wins one point for each of the 200 cases. Thus MP = (148.04 - 125)/(200 - 125) = .3072. Both lambda and MR are zero for this table.

MP can be criticized on the ground that the imaginary judges we have described in this section are not following strategies rationally designed to maximize their expected winnings. For instance, if a judge thinks the probability of choice A is .75, but learns that he or she will win points proportional to the probability he or she assigns to the correct choice, then it can be shown that the judge's optimum strategy is actually to state a probability of 1 for choice A. That objection does not apply to the Uncertainty statistic described next, and we have found that MP and Uncertainty often have nearly the same numerical value.

MP has unique zero and square proportionality.

Uncertainty

Uncertainty is confusingly named, because like the other measures in this section, it increases as you improve your ability to predict column membership from row membership, thus increasing your certainty, not your uncertainty. Like MP, Uncertainty has unique zero but lacks difference proportionality. To understand Uncertainty, imagine a judge who guesses the exact probability of each column for each case, as with MP. But in this instance the judge is given some points to begin with and then loses points equal to ln(1/p), where p is the probability the judge assigned to the column that turned out to be correct. Thus the judge loses 0 points only by assigning a probability of 1 to the correct column, since ln(1/1) = 0. The Uncertainty statistic turns out the same regardless of the number of points the judge was given at first, so we can most easily imagine that number to be zero and think in terms of losses rather than gains for the judge.

In the previous 2 x 2 table

A991
B5149
Total15050
the blind judge notes that 75% of all cases are in column 1, so for each case he or she assigns probabilities of .75 and .25 to columns 1 and 2 respectively. Since ln(1/.75) = .2877 and ln(1/.25) = 1.386, the blind judge loses 150*.2877 + 50*1.386 or 112.467 points. In row 1 the informed judge loses 99*1.005 + 1*4.605 or 5.600 points, because ln(1/.99) = 1.005 and ln(1/.01) = 4.605. In row 2 that judge loses 51*.673 + 49*.713 or 69.285 points, making 74.895 points lost overall. The hypothetical perfect judge loses 0 points, so Uncertainty = (112.467 - 74.895)/(112.467 - 0) = .3341.

It is quite common for Uncertainty to nearly equal MP, as it does in this example (MP was .3072). The Uncertainty statistic is discussed extensively by Theil (1972). It has special meaning in communication theory. It also avoids the aforementioned technical objection to MP, since it can be shown that judges faced with the payoff structure described here should rationally state their true subjective probabilities without fudging them. However, as a general measure of association with a simple clear meaning, it seems inferior to lambda, MR, and MP.

Go to next section.