Copyright © Richard B. Darlington. All rights reserved.
Introduction to Measures of Association, with Emphasis on Lambda
Consider an election with 4 candidates. Suppose you want to know to what
extent a voter's candidate preference can be predicted from the voter's
occupational category: professional, skilled, or unskilled. Lambda measures
this quality. Let the occupational categories be A, B, C, and let the candidates
be I, II, III, IV. Suppose a sample of 315 people breaks down as shown in
the following table:
Preferences for 4 Candidates
| Occ. | I | II | III | IV |
| A | 36 | 11 | 15 | 21 |
| B | 14 | 28 | 51 | 32 |
| C | 18 | 45 | 32 | 12 |
| Total | 68 | 84 | 98 | 65 |
To compute lambda, first determine how many times you could guess
candidate preference correctly without knowing the voter's occupational category. In
the sample of 315 voters the most popular candidate is III. You would have
98 correct guesses by guessing candidate III for every voter. That's the best
you can do without knowing voters' occupational categories.
Then determine the number of correct guesses you could achieve by using
occupational category. The table shows that among voters in category A, the
most popular candidate is I. Therefore if you knew a voter was in category A,
you best guess would be I, and you would make 36 correct guesses.
Similarly, your optimum strategy is to select the column with the largest
number in each row. In the table below, the largest frequency in each row is
starred. The starred values add up to 132, so you could make 132 correct
guesses altogether if you knew each voter's occupational category. That's a
gain of (132 - 98) or 34 over the best you can do without knowing occupation.
Favored Candidates in Each Row
| Occ. | I | II | III | IV |
| A | 36* | 11 | 15 | 21 |
| B | 14 | 28 | 51* | 32 |
| C | 18 | 45* | 32 | 12 |
| Total | 68 | 84 | 98 | 65 |
But a hypothetical perfect judge could make 315 correct guesses (the total
sample size), a gain of (315 - 98) or 217 over the same guesses. Since 34/217
= .157, we can say that knowing occupational category produces 15.7% of
the increase in correct guesses that could be achieved by guessing all cases
correctly. That value is lambda; lambda equals the increase in number of correct
guesses of column membership produced by knowing row membership,
expressed as a proportion of the largest possible increase. In more mechanical terms,
lambda = (sum of row maxima - largest column total)/(N - largest column
total)
Lambda is a good measure of association for a very specific problem, but there are
many instances where it would be inappropriate:
- If you want to give more weight to correct "long shot" guesses than other
guesses.
- If you want to allow the guesser to distribute a guess across columns, saying
for instance, "The chance is .4 that this voter will choose candidate I, .2 for
candidate II", and so on.
- If you want a measure of the degree to which the row variable
affects the column variable. To see the inappropriateness of
lambda for this purpose, see the later discussion of the "unique zero"
property.
- If you want a measure of association that treats the row and column variables
symmetrically, so a table would yield the same measure of association if the
table were "flipped", with each row becoming a column and vice versa.
- If the row categories are necessarily the same as the column categories, as
when a table relates husband's religion to wife's religion. Then the number of
"correct predictions" couldn't be the highest entry in each row, as it is with
lambda. Rather it should be the number in the diagonal cell, as when husband
and wife have the same religion.
- If the categories of both rows and columns are naturally ordered from low to
high. In the voting example one might argue that occupation is so ordered,
but candidates are certainly not.
Thus a great many conditions can suggest the need for alternatives to lambda.
In the outline in the main section of this paper,
these various criteria are applied in reverse order. For instance, measures of
association for ordered categories appear first in the outline, and lamba is one
of the last measures shown.
Go to main section