Copyright © Richard B. Darlington. All rights reserved.

Introduction to Measures of Association, with Emphasis on Lambda

Consider an election with 4 candidates. Suppose you want to know to what extent a voter's candidate preference can be predicted from the voter's occupational category: professional, skilled, or unskilled. Lambda measures this quality. Let the occupational categories be A, B, C, and let the candidates be I, II, III, IV. Suppose a sample of 315 people breaks down as shown in the following table:

Preferences for 4 Candidates
Occ.IIIIIIIV
A36111521
B14285132
C18453212
Total68849865

To compute lambda, first determine how many times you could guess candidate preference correctly without knowing the voter's occupational category. In the sample of 315 voters the most popular candidate is III. You would have 98 correct guesses by guessing candidate III for every voter. That's the best you can do without knowing voters' occupational categories.

Then determine the number of correct guesses you could achieve by using occupational category. The table shows that among voters in category A, the most popular candidate is I. Therefore if you knew a voter was in category A, you best guess would be I, and you would make 36 correct guesses. Similarly, your optimum strategy is to select the column with the largest number in each row. In the table below, the largest frequency in each row is starred. The starred values add up to 132, so you could make 132 correct guesses altogether if you knew each voter's occupational category. That's a gain of (132 - 98) or 34 over the best you can do without knowing occupation.

Favored Candidates in Each Row
Occ.IIIIIIIV
A36*111521
B142851*32
C1845*3212
Total68849865

But a hypothetical perfect judge could make 315 correct guesses (the total sample size), a gain of (315 - 98) or 217 over the same guesses. Since 34/217 = .157, we can say that knowing occupational category produces 15.7% of the increase in correct guesses that could be achieved by guessing all cases correctly. That value is lambda; lambda equals the increase in number of correct guesses of column membership produced by knowing row membership, expressed as a proportion of the largest possible increase. In more mechanical terms,

lambda = (sum of row maxima - largest column total)/(N - largest column total)

Lambda is a good measure of association for a very specific problem, but there are many instances where it would be inappropriate:

Thus a great many conditions can suggest the need for alternatives to lambda. In the outline in the main section of this paper, these various criteria are applied in reverse order. For instance, measures of association for ordered categories appear first in the outline, and lamba is one of the last measures shown.

Go to main section