This analysis of The Bell Curve by Richard J. Herrnstein and Charles Murray (Free Press, 1994) is based on a talk given by Darlington at Cornell on April 24, 1995.
One can think of The Bell Curve (TBC) as an attempt to enroll Richard Herrnstein's knowledge of psychology and psychometrics in the service of the conservative social agenda that Charles Murray has long favored. This agenda includes cutting back strongly on welfare, eliminating affirmative action and related programs, redirecting education funds from the disadvantaged to the gifted, and placing more emphasis on skills and abilities in immigration policy. Three psychological claims are used to buttress this agenda:
Consider for instance the December 5 1994 issue of National Review, which features commentary on TBC. The review by historian Eugene D. Genovese contains phrases like "incoherent treatment of race", "socially dangerous irresponsibility", and "chilling naivete, if not disingenuousness". Sociologist Brigitte Berger writes, "...a narrow and deeply flawed book...The worst thing for conservatives would be to become identified with the Murray-Herrnstein position." Economist Glenn C. Loury writes, "...in every instance there are political arguments for these policy prescriptions that are more compelling and more likely to succeed in the public arena than the generalizations about human capacities that Herrnstein and Murray claim to have established...Herrnstein and Murray are in a moral and political cul-de-sac. I see no reason for serious conservatives to join them there."
Similar views were expressed by economist and lawyer James J. Heckman of the University of Chicago, writing in the conservative Reason magazine of March 1995, who criticized TBC roundly. Heckman's review is the best single review I've seen. All these conservatives were underwhelmed by TBC's reasoning and evidence, even though they had long supported the book's policy conclusions.
With the understanding, then, that my comments do not concern TBC's policy conclusions and are unrelated to a general liberal-conservative split, I'll offer the following personal analysis of TBC. The rest of this piece is organized around the three psychological claims stated above.
Herrnstein and Murray studied the relation of each of these 1990 "success" variables to two variables measured in 1979: a measure of IQ (actually a composite measure from the Armed Forces Qualification Test battery or AFQT), and a measure of socioeconomic status (SES)--also a composite measure. They correlated the IQ measure with the "success" variable while statistically holding constant the SES measure, and simultaneously correlated the SES measure with the "success" variable while statistically holding constant the IQ measure. TBC reports that by these analyses, most of these measures of success were far more highly related to IQ than to SES.
Although TBC claims that these results are entirely consistent with other work in the literature, they are in fact directly contadictory to work by Cornell's Stephen Ceci. In his 1990 book On Intelligence...More or Less, Ceci reports an analysis by himself and Charles Henderson in which adult income was predicted from measures of IQ and SES taken in the teenage years. In other words, the study was remarkably like the studies reported in TBC, although a completely different sample (also of several thousand subjects) was used. Interestingly, Ceci and Henderson got results exactly opposite those reported by TBC. Statistically holding SES constant removed essentially all relationship between teenage IQ and later income, while holding IQ constant had little effect on the measured relation between teenage SES and later income. Interestingly, TBC never cites this study published in 1990, though it cites other work published as recently as 1994 and repeatedly assures the reader it is providing a fair and comprehensive review of the scientific literature.
How then can two such similar studies yield such opposing results? To answer that, we have to look at the details. Although TBC's discussion of this work goes on for 8 chapters, it's important to remember that all these analyses use the same statistical methodology, the same sample of subjects, and the same measures of IQ and SES. Thus any defect in their approach affects dozens of analyses spanning all these chapters. The aforementioned review by Heckman was extremely critical of this series of analyses, and most of my criticisms of it are taken from that source.
First, unlike the College Board tests, the AFQT makes no attempt to distinguish between "aptitude" and "achievement" items; that would be irrelevant to identifying people who can serve as a radar technican or computer repairer in the armed forces. Thus it is doubtful that the AFQT composite that TBC calls an "IQ" measure is really that. Rather it seems to measure general knowledge which reflects socioeconomic background as much as native intelligence.
Second, there are numerous defects in TBC's measure of socioeconomic status--defects that would tend to lower its observed relation to later success. First, it is based entirely on self-report from teenage children, and often these children don't even know, for instance, how many years of schooling their parents completed. A great many of these children didn't know their parents' income, so Herrnstein and Murray simply left that variable out of their SES composite--even though sociologists usually think of income as a major part of what they mean by SES. Second, the questions the children were asked concerned the status of their families "right now"--not the average status across the 18 or so years of the youth's upbringing. Thus if a father had for years held a middle-management position but had been laid off and was temporarily working as a house painter, the house-painter occupation was the one entering the TBC measure of SES. Third, Herrnstein and Murray arbitrarily rounded down the highest measures of SES and rounded up the lowest measures, thus introducing more error into the measurement of this construct.
After reviewing all these sources of error, Heckman concluded, "The authors have no good way to separate genetic from social influences on social behavior. Their environmental data are too crude and the AFQT score they use is obtained too late in life to make a genetic-environmental distinction meaningful." Recall this is a conservative writer fully sympathetic with TBC's policy conclusions. And if the criticisms seem like the kind that could be leveled against virtually any social research, recall that the Ceci-Henderson study did reach exactly the opposite conclusions. I regard the conclusions of these 8 chapters as unproven.
In 1980 the American Journal of Psychology asked me to review Jensen's Bias in mental testing, which made what seemed at the time to be a big splash in the media, though TBC has now set a new standard. When I read the book, I was unconvinced by very large sections of it. However, Jensen's description of one study left me very impressed. This was an unpublished 1951 doctoral dissertation by Frank McGurk. McGurk matched 213 black high-school students very closely to 213 white students. Each black student was matched with a white student in the same curriculum in the same school and with a nearly-equal score on an SES scale. Thus McGurk's matching for "environment" was virtually unparalleled in the research literature on black-white differences. McGurk then administered a broad set of 74 IQ-test items to all 426 students (that's 2 x 213), and examined the black-white difference on each item individually.
According to Jensen, McGurk's results fit well with Jensen's own theory that
Since Jensen was citing an unpublished doctoral dissertation, the temptation was strong to accept his summary of the findings. But when I showed these pages to Boyce, she used inter-library loan to obtain a copy of McGurk's unpublished doctoral dissertation. She found that the vocabulary item just mentioned was one item in a 10-item vocabulary scale while the milk-cream item was one item on a scale of 9 arithmetic word problems. The average black-white difference on the vocabulary scale was 4.4% while the average difference on the arithmetic word problems was nearly identical at 5.3%. In fact, the milk-cream item was the one item from its scale showing the largest black-white difference while the ABYSMAL item was the one item from its scale showing the smallest difference favoring whites--contrary to Jensen's assertion that "Many similar examples can be found in McGurk's report."
The milk-cream item did not appear to require more complex reasoning than other items in that scale. For instance, in a broad sense it seems virtually identical to the following item:
Let me mention another point about black-white differences in IQ scores that Boyce and I discovered at that time. With my permission, Ceci included this point on pages 148-9 of his 1990 book, so the point was available to Murray and Herrnstein if they had chosen to look at that book. The question is: what are the average IQ scores of Caucasians outside the developed world--defined for the decades under review as North America, Northwest Europe, Australia, and New Zealand? A comprehensive literature review by Richard Lynn (Ceci pp. 148-9) found just 12 published references to mean IQ's of samples from this population--studying subjects from Portugal, Spain, Italy, southeastern Europe, Iraq, Iran, and India. The mean, median, and midrange of the 12 within-sample means are all 85. Further, when you discard the 6 studies which either failed to mention the sample size or reported sample sizes of 25 or less, the 6 remaining means range only from 83 to 87. Thus the mean IQ scores of Caucasians outside the industrialized world seem to be essentially equal to that of American blacks. Again, culture and schooling seem central.