Copyright © Richard B. Darlington. All rights reserved.
Therefore you can easily compute a statistic we'll call lambda-max, which is the maximum value lambda can attain across various relative sizes of the criterion groups. Since it doesn't matter what size we make the groups, so long as we make them equal, it's simplest to in effect make each group size 1, by dividing each entry in the table by its column total and working with the within-column proportions. Since each group becomes size 1, it follows that c, the number of columns, becomes the total sample size.
Thus lambda-max is computed as follows:
| I | II | III | IV | Row max | |
| A | 36 | 11 | 15 | 21 | 36 |
| B | 14 | 28 | 51 | 32 | 51 |
| C | 18 | 45 | 32 | 12 | 45 |
| Total | 68 | 84 | 98 | 65 | 132 |
The within-column proportions are shown below.
| I | II | III | IV | Row max | |
| A | .529 | .131 | .153 | .323 | .529 |
| B | .206 | .333 | .520 | .492 | .520 |
| C | .265 | .536 | .327 | .185 | .536 |
| Total | 1.00 | 1.00 | 1.00 | 1.00 | 1.586 |
Then lambda-max = (1.586 - 1)/(4 - 1) = .195 while lambda = (132 - 98)/(315 - 98) = .157.
We can arrive at the formula for lambda-max by a different line of reasoning. Instead of imagining criterion groups of different sizes than those observed, as we just did, imagine that a judge wins different numbers of points for correct guesses in different columns. Then for any point scheme we can define a measure of association as follows, where PWB stands for "points won by":
Association = (PWB informed judge - PWB blind judge)/(PWB perfect judge - PWB blind judge)
We might then ask what point-assignment scheme yields the highest possible measure of association. The answer is that this measure is maximized by using fractional points, and assigning 1/cti points for each correct guess in column i, where cti is the column total for column i. Again it is reasonably clear intuitively why this scoring scheme maximizes the measure of association; it denies the blind judge any advantage over choosing a column randomly, by using a scoring scheme that gives him or her a total of 1 point (across all N guesses) regardless of which column he or she chooses. We won't go through the logic in detail, but it turns out that the maximum measure of association yielded by this approach exactly equals lambda-max.
We can subsume these two conceptions of lambda-max under one broader conception. Consider measures capable of being represented by the last equation, with points awarded only for fully correct guesses. Lambda-max is the maximum value such a measure could have for any combination of point schemes and criterion-group sizes.
First, you may want a measure of predictive accuracy that awards correct "long shots" more than less impressive correct guesses. Earlier we imagined a party guest guessing the professions of the other guests. As mentioned earlier, you would probably be more impressed at the guesser's accuracy if he or she were correct on a rare profession like actuary than on a more common profession like lawyer. Lambda-max matches that intuitive property.
Second, sometimes you don't know the relative sizes of criterion groups, but you do know the proportion of each criterion group falling in each predictor category, and you want to use that information to make some sort of statement about the value of a predictor. For instance, if a sign in a lie-detection system were observed in 80% of liars and in only 20% of truth-tellers, the 60% difference between these rates is a useful measure of predictive accuracy that is unaffected by the fact that some populations may contain a higher proportion of liars than others. Lambda-max has the same property. In fact, we will see shortly that in 2 x 2 tables lambda-max reduces to a simple difference between within-column proportions, so it is the same statistic as in this example.
Third, sometimes it is fairly easy to modify a predictor (the row variable) to make it more inclined toward some rows and less toward others, to try to increase its predictive power. For instance, suppose a Yes-No question is used to predict success or failure with Yes predicting success. If 90% of people answer Yes but only 50% succeed, you might try to increase the item's predictive power by rewriting the question to make the No option more attractive. Clearly you can't tell from present data exactly how well the modified predictor would work. But as an approximation it's reasonable to guess that adjustments of this sort (modifying the row totals to yield a maximum association for the existing criterion groups), if done well, would have effects similar to those we imagined in defining lambda-max (modifying column totals to yield a maximum association for the existing predictor). Thus lambda-max might be thought of as an approximate measure of the maximum accuracy that could be attained by adjusting the predictor to increase some row totals and decrease others. Of course you never know for sure how well the adjusted predictor will work until you try it.
The last paragraph raises the question: why not reverse the roles of rows and columns in computing lambda-max, or equivalently why not compute lambda-max with its present formula after transposing the matrix to make each criterion group one row and each predictor category one column? That is of course possible, but the substantive meaning of this statistic would be less clear than the meaning of lambda-max. Simply multiplying each row of the (untransposed) matrix by a constant does not really give a good indication of the effects of rewriting a questionnaire item, while multiplying each column by a constant does tell precisely the effect of changing the size of a criterion group. Thus we suggest using a measure of association that does have at least one precise interpretation (given at the end of the previous section) and approximates another (the effect of rewriting an item) rather than a measure with no precise substantive interpretation at all.
We mentioned earlier that unique zero seems to be an essential property of measures of causation, since when sampling error is ignored any deviation from exact independence indicates causation. Lambda-max has unique zero.
Many people think of the independent and dependent variables in a causal analysis as comparable to the "predictor" and "criterion" variables respectively of a prediction problem. Therefore we should emphasize that we suggest computing lambda-max with the independent variable being the column variable in a causal analysis, even though we also suggest letting the predictor variable be the row variable in a prediction problem. This is because lambda-max is independent of column totals but not row totals. In a causal analysis we typically want the measure of association to be independent of the size of treatment groups, which are often arbitrary. But in a prediction problem it makes more sense to have a measure of predictive accuracy be independent of the sizes of criterion groups, as illustrated in the recent example on lie detection. One way to make sense of this is to note that in prediction problems the causation often runs in the opposite direction from the prediction. For instance, in the lie-detection example we may reasonably assume that a lie (the criterion variable) causes the subject to emit the symptom we observe.
Lambda-max is singly margin-free; it is independent of the column margins. All previous measures change with both the column and row marginal totals.
Unlike the previous measures, lambda-max cannot be defined if any columns are empty because within-column proportions cannot be computed for those columns. We thus assume empty columns are dropped from the table before lambda-max is computed.
Since lambda-max = (S-1)/(c-1) and S cannot exceed the number of rows r, it follows that lambda-max <= (r-1)/(c-1). Thus the upper limit of lambda-max falls below 1 if r < c. That makes sense; if r < c then you can never use row information to discriminate perfectly among columns, so long as each column contains at least one case.
<
lambda-max = .58 - .36 = .22.
For tables with 2 columns but more than 2 rows, lambda-max reduces to the Gini D statistic (Goodman & Kruskal, 1979, 55-57), which also has a simple interpretation. Divide the rows into two clusters: those with a higher proportion in column 1 than the sample as a whole, and those with a lower proportion. Collapse the rows within each cluster into a single row, so the table reduces to 2 x 2 form. Then lambda-max or Gini D computed from the original table equals the difference between the within-column proportions in this reduced table.
For instance, consider the table
| A | 14 | 26 |
| B | 38 | 20 |
| C | 27 | 35 |
| D | 18 | 11 |
| Total | 97 | 92 |
Collapsing rows A and C into a single row named AC, and collapsing rows B and D into a second row BD, gives the table
| AC | 41 | 61 |
| BD | 56 | 31 |
| Total | 97 | 92 |
Thus lambda-max = Gini D = 61/92 - 41/97 = .240. This same value of .240 would be found by following the computing directions for lambda-max given earlier.
If the two entries in a row have exactly the same ratio as the column totals, you will get the same value of lambda-max regardless of whether you include those cases in the first row or the second row of the reduced table, or split them between the rows.