Copyright © Richard B. Darlington. All rights reserved.

Asymmetric Margin-Free Measures Applicable to any Table

Lambda-max

Lambda is margin-bound. That is, lambda changes as the relative sizes of criterion groups change. Since this document ignores sampling error, you can assess any such changes in lambda simply by recomputing it after multiplying all entries in any column by a constant. It can be shown that when making these adjustments, lambda is always maximized by making the column totals equal. This is fairly obvious intuitively, because lambda is defined by comparing an informed judge who knows each case's row membership to a blind judge who always chooses the column with the largest column total. Thus making the column totals equal hampers the blind judge as much as possible by denying him or her any advantage he would have over choosing a column randomly.

Therefore you can easily compute a statistic we'll call lambda-max, which is the maximum value lambda can attain across various relative sizes of the criterion groups. Since it doesn't matter what size we make the groups, so long as we make them equal, it's simplest to in effect make each group size 1, by dividing each entry in the table by its column total and working with the within-column proportions. Since each group becomes size 1, it follows that c, the number of columns, becomes the total sample size.

Thus lambda-max is computed as follows:

  1. Express each cell frequency as a proportion of its own column total.
  2. Select the highest of these proportions within each row.
  3. Let S denote the sum of these maximum proportions.
  4. Lambda-max = (S-1)/(c-1), where c is the number of columns.
For instance, consider the table

IIIIIIIVRow max
A3611152136
B1428513251
C1845321245
Total68849865132

The within-column proportions are shown below.

IIIIIIIVRow max
A.529.131.153.323.529
B.206.333.520.492.520
C.265.536.327.185.536
Total1.001.001.001.001.586

Then lambda-max = (1.586 - 1)/(4 - 1) = .195 while lambda = (132 - 98)/(315 - 98) = .157.

We can arrive at the formula for lambda-max by a different line of reasoning. Instead of imagining criterion groups of different sizes than those observed, as we just did, imagine that a judge wins different numbers of points for correct guesses in different columns. Then for any point scheme we can define a measure of association as follows, where PWB stands for "points won by":

Association = (PWB informed judge - PWB blind judge)/(PWB perfect judge - PWB blind judge)

We might then ask what point-assignment scheme yields the highest possible measure of association. The answer is that this measure is maximized by using fractional points, and assigning 1/cti points for each correct guess in column i, where cti is the column total for column i. Again it is reasonably clear intuitively why this scoring scheme maximizes the measure of association; it denies the blind judge any advantage over choosing a column randomly, by using a scoring scheme that gives him or her a total of 1 point (across all N guesses) regardless of which column he or she chooses. We won't go through the logic in detail, but it turns out that the maximum measure of association yielded by this approach exactly equals lambda-max.

We can subsume these two conceptions of lambda-max under one broader conception. Consider measures capable of being represented by the last equation, with points awarded only for fully correct guesses. Lambda-max is the maximum value such a measure could have for any combination of point schemes and criterion-group sizes.

Some Uses of Lambda-max for Prediction

There are at least three circumstances where lambda-max would be preferable to any of the margin-bound measures as a measure of predictive power.

First, you may want a measure of predictive accuracy that awards correct "long shots" more than less impressive correct guesses. Earlier we imagined a party guest guessing the professions of the other guests. As mentioned earlier, you would probably be more impressed at the guesser's accuracy if he or she were correct on a rare profession like actuary than on a more common profession like lawyer. Lambda-max matches that intuitive property.

Second, sometimes you don't know the relative sizes of criterion groups, but you do know the proportion of each criterion group falling in each predictor category, and you want to use that information to make some sort of statement about the value of a predictor. For instance, if a sign in a lie-detection system were observed in 80% of liars and in only 20% of truth-tellers, the 60% difference between these rates is a useful measure of predictive accuracy that is unaffected by the fact that some populations may contain a higher proportion of liars than others. Lambda-max has the same property. In fact, we will see shortly that in 2 x 2 tables lambda-max reduces to a simple difference between within-column proportions, so it is the same statistic as in this example.

Third, sometimes it is fairly easy to modify a predictor (the row variable) to make it more inclined toward some rows and less toward others, to try to increase its predictive power. For instance, suppose a Yes-No question is used to predict success or failure with Yes predicting success. If 90% of people answer Yes but only 50% succeed, you might try to increase the item's predictive power by rewriting the question to make the No option more attractive. Clearly you can't tell from present data exactly how well the modified predictor would work. But as an approximation it's reasonable to guess that adjustments of this sort (modifying the row totals to yield a maximum association for the existing criterion groups), if done well, would have effects similar to those we imagined in defining lambda-max (modifying column totals to yield a maximum association for the existing predictor). Thus lambda-max might be thought of as an approximate measure of the maximum accuracy that could be attained by adjusting the predictor to increase some row totals and decrease others. Of course you never know for sure how well the adjusted predictor will work until you try it.

The last paragraph raises the question: why not reverse the roles of rows and columns in computing lambda-max, or equivalently why not compute lambda-max with its present formula after transposing the matrix to make each criterion group one row and each predictor category one column? That is of course possible, but the substantive meaning of this statistic would be less clear than the meaning of lambda-max. Simply multiplying each row of the (untransposed) matrix by a constant does not really give a good indication of the effects of rewriting a questionnaire item, while multiplying each column by a constant does tell precisely the effect of changing the size of a criterion group. Thus we suggest using a measure of association that does have at least one precise interpretation (given at the end of the previous section) and approximates another (the effect of rewriting an item) rather than a measure with no precise substantive interpretation at all.

Lambda-max as a Measure of Causation

Suppose the independent variable in a causal problem is categorical rather than interval or ordinal, as when you have several treatment groups. In most causal analyses you would like a measure of causation to be independent of the sizes of the treatment groups, since those sizes are often arbitrary. For instance, any function of within-group success rates is independent of group sizes. But lambda-max is independent of column margins, so it has this desirable property if the column variable is the independent variable.

We mentioned earlier that unique zero seems to be an essential property of measures of causation, since when sampling error is ignored any deviation from exact independence indicates causation. Lambda-max has unique zero.

Many people think of the independent and dependent variables in a causal analysis as comparable to the "predictor" and "criterion" variables respectively of a prediction problem. Therefore we should emphasize that we suggest computing lambda-max with the independent variable being the column variable in a causal analysis, even though we also suggest letting the predictor variable be the row variable in a prediction problem. This is because lambda-max is independent of column totals but not row totals. In a causal analysis we typically want the measure of association to be independent of the size of treatment groups, which are often arbitrary. But in a prediction problem it makes more sense to have a measure of predictive accuracy be independent of the sizes of criterion groups, as illustrated in the recent example on lie detection. One way to make sense of this is to note that in prediction problems the causation often runs in the opposite direction from the prediction. For instance, in the lie-detection example we may reasonably assume that a lie (the criterion variable) causes the subject to emit the symptom we observe.

Some Other Properties of Lambda-max

Unlike the four previous measures, lambda-max has both unique zero and a weighted frequency interpretation. Lambda and MR have frequency and weighted frequency interpretations respectively but lack unique zero, while MP and Uncertainty have unique zero but lack frequency or weighted frequency interpretations. Lambda-max was named lambda* by Goodman and Kruskal, who gave it little attention and seem not to have recognized either its unique-zero property (see G & K 1979, p. 12) or its property as the maximum of lambda.

Lambda-max is singly margin-free; it is independent of the column margins. All previous measures change with both the column and row marginal totals.

Unlike the previous measures, lambda-max cannot be defined if any columns are empty because within-column proportions cannot be computed for those columns. We thus assume empty columns are dropped from the table before lambda-max is computed.

Since lambda-max = (S-1)/(c-1) and S cannot exceed the number of rows r, it follows that lambda-max <= (r-1)/(c-1). Thus the upper limit of lambda-max falls below 1 if r < c. That makes sense; if r < c then you can never use row information to discriminate perfectly among columns, so long as each column contains at least one case.

Interpreting Lambda-max in Two-Column Tables

For 2 x 2 tables lambda-max reduces to simply the absolute difference between within-column proportions. Thus in the table

<

TABLE BORDER=1> A3658 B6442 Total100100

lambda-max = .58 - .36 = .22.

For tables with 2 columns but more than 2 rows, lambda-max reduces to the Gini D statistic (Goodman & Kruskal, 1979, 55-57), which also has a simple interpretation. Divide the rows into two clusters: those with a higher proportion in column 1 than the sample as a whole, and those with a lower proportion. Collapse the rows within each cluster into a single row, so the table reduces to 2 x 2 form. Then lambda-max or Gini D computed from the original table equals the difference between the within-column proportions in this reduced table.

For instance, consider the table

A1426
B3820
C2735
D1811
Total9792

Collapsing rows A and C into a single row named AC, and collapsing rows B and D into a second row BD, gives the table

AC4161
BD5631
Total9792

Thus lambda-max = Gini D = 61/92 - 41/97 = .240. This same value of .240 would be found by following the computing directions for lambda-max given earlier.

If the two entries in a row have exactly the same ratio as the column totals, you will get the same value of lambda-max regardless of whether you include those cases in the first row or the second row of the reduced table, or split them between the rows.

RE (Relative Effect)

RE (relative effect) is defined only for 2 x 2 causal problems in which one treatment is the "control" treatment and the other is the "experimental" treatment. Then RE answers the question, "How much does the proportion computed within the experimental group differ from that in the control group, as a proportion of its maximum possible difference in the same direction?" For instance, if the success rates in control and treatment groups were respectively 20% and 40%, then RE = .25 because the 20% increase in success rate was one-quarter of its maximum possible value of 80% given the success rate of the control group. However, if the control and treatment success rates were respectively 20% and 10%, then RE = .5 because the reduction in success rate was half of its maximum possible value of .2. Of course if an effect is in the opposite direction from predictions, as this latter case might represent, the size of the effect is usually of little interest.

Go to next section