Few mental tests have been more popular among psychometricians than Comprehension. They are part of the Binet scale, the Army Alpha, and of course the Wechsler series. One thing I like about them, is they are fairly g loaded (0.68 in one quality study) without being as loaded on culture or education as the Information and Vocabulary subtests. Like these subtests, it measures the ability to acquire verbal knowledge, but unlike them it can be easily translated into different languages for cross-cultural comparisons without losing much relevance.

The test is good at spotting high IQ people who lack common sense. I once knew a British woman who was utterly brilliant in her verbal and mathematical skills, yet somehow couldn’t hold a job. Despite her being a hyper-educated adult I had her take the Wechsler children’s intelligence scale which she found super easy (the only adult I ever knew to get the hardest vocabulary question on that test) except for the spatial subtests and Comprehension (where she revealed a tendency towards boorish behavior).

The test seems especially good at picking out people with bad judgement. I have noticed that regardless of overall IQ or g, those who make low scores tend to be the people who are wrong about everything. Whether it’s falling for absurd conspiracy theories, thinking the Earth is flat or simply denying HBD, people with bad judgement tend to make low scores, even when they are otherwise brilliant. For this reason I call it a test of wisdom.

And yet for all the rich clinical data the test provides, it is also among the most criticized. One famous Comprehension item asked “What should you do if you’re sent to the store to buy some bread and the store owner tells you there is no bread left?”

This is an absolutely beautiful item, but critics scoffed that the correct answer (go to another store) was unfair to kids in the ghetto and rural areas because there is often not another store within miles. Of course my own sister failed by answering “I would buy something else” and we grew up in the suburbs so no excuse. But my sharp as a tac mother and no-nonsense father both passed with ease.

The great David Wechsler was very protective of his Comprehension items, especially the ones he grew emotionally attached to. When told by his team that he had to drop the question “Why are women and children saved first during ship wreck?” from the Wechsler Intelligence Scale for Children (WISC) he allegedly screamed “Chivalry may be dying. Chivalry may be dead. BUT IT WILL NOT DIE ON THE WISC!!!!!!!!!!!!!!!!!!!

After he died in 1981, psychologists had a field day purging such classics from his scale and replacing them with new items. In fairness, some of the items Wechsler had grown emotionally attached to may not have met modern statistical criteria for a valid reliable question. Comprehension was eventually relegated to an alternate subtest.

The Comprehension subtest on the PAIS differs from the type on the Wechsler and early Binet in that it’s multiple choice (except for the bonus questions which I have yet to norm). In this way it resembles the WWI Comprehension subtest. The advantage of this is it can be scored by computer and is much less prone to human error than the highly subjective scoring on the Wechsler. The disadvantage is it doesn’t require as much creativity and executive functioning when you only need to select the answer rather than thinking of it yourself, but it still requires the insight and judgement to know why some answers are better than others.

At least nine readers who took the PAIS Comprehension subtest also reported their SATs/verbal SATs/ACTs. The IQ equivalents of their self-reported college boards had a mean of 121 and an SD of 15 (U.S. norms). Their mean score on the 12 item Comprehension test was 7.6 (SD = 1.17).

Assuming a linear relationship between both tests and similar g loadings, we can make some crude IQ equivalences:

Comprehension score (out of 12):

12 = IQ 177 (U.S. norms)

11 = IQ 164

10 = IQ 152

9 = IQ 139

8 = IQ 126

7 = IQ 113

6 = IQ 101

5 = IQ 88

4 = IQ 75

3 = IQ 62

2 = IQ 50

1 = IQ 37

0 = IQ 24

These numbers should be taken with a huge grain of salt, especially at the extremes. For one thing we don’t know if the relationship with IQ is linear since unlike my crossword test, the questions were arbitrarily chosen and do not form a natural scale. Second, the standard deviation of my test respondents is suspiciously low. Although this is great for expanding the test’s ceiling and floor, it is a red flag because it suggests the items themselves intercorrelate poorly which suggests low reliability. At some point I will have to calculate a split-half reliability coefficient to test this hypothesis.