In the mid 20th century, the college board became curious about what the distribution of SAT scores would be if ALL American 17-year-olds took the SAT, not just the college bound elite. One reason for this curiosity was the average score of people who actually took the SAT started falling in the 1960s, especially the verbal scores, so people wanted to know whether this was simply because less elite kids were applying to college, or if teens in general were being dumbed down (see chart from Herrnstein & Murray, 1994; page 425):

This is discussed in the book The Bell Curve:

What they found was that the decline was just an artifact of the SAT population becoming more inclusive. When you look at nationally representative samples, not only was there no decline, but there was actually a very small Flynn effect.

I was especially interested in seeing these national norms because the SAT has long been considered a good proxy for IQ, but unlike IQ tests which are normed to have a mean and standard deviation of 100 and 15 in the general U.S. population, the verbal and math subscales of the pre-1995 SAT were both normed to have a mean and SD of 500 and 100 respectively and with respect to the 1940s SAT taking population (not the general U.S. population). The chart above tells us how the average U.S. 17-year-old (IQ 100) would have scored on the SAT from 1950s to the 1980s, but to fill in the rest of the IQ distribution, we need to know the standard deviations.

Thanks to Charles Murray, I was able to find the SDs for the 1980s and I had already found the SDs for the 1970s.

What about the 1960s? I recently discovered this data from the 1960 norming study:

Unfortunately the data is stratified by sex (if they tried that today they’d need categories for non-binary, gender fluid, two spirit). Well to determine the mean of the entire cohort, we take the weighted average (52% female) so verbal mean = 0.52(376) + 0.48(372) = 374. Math mean = 0.52(385) + 0.48(438) = 410.

These figures perfectly match the 1960 figures from The Bell Curve book which I showed at the top of this article so I must have done something right!

Now combining the SDs of men and women is much more difficult (even chat GPT can’t do it!). You can’t just average them because the size of the combined SD is not just a function of the two SDs, but how far apart the two means are. Of course if we assume the male, female, and sex-combined distributions are all perfectly Gaussian it’s kind of easy to estimate, but estimating is very different from actually calculating (and when men and women are too far apart, it’s probably impossible for all three distributions to be Gaussian.

To determine the sex-combined SD we must first determine the Sum of Squares:

Sum of Squares = female n*(female sd^2 + female mean^2) + male n*(male sd^2 + male mean^2)

And then:

sex combined standard deviation is SQRT(Sum of Squares / sex combined N – sex combined mean^2)

And so for 1960, the sex-combined mean and SD for verbal and math respectively were 374; 116 and 410; 114 respectively.

To determine the sex-combined composite mean we just add the verbal + math mean = 784, and assuming the 0.67 correlation between verbal and math and the below formula (Herrnstein & Murray, 1994, page 779), the composite SD was 210.