Hi, PP! It’s me again I wonder what would be your estimation of Isaac Newton’s IQ. I know quite a bit about his life. He’s considered the best physicist and scientist ever, one of the best mathematicians ever and one of the most influential people, too. So, what do you think?
Newton’s IQ is fascinating because not only is he considered the best physicist of all time, but according to Michael Hart’s book The 100, the second most influential person of all time; though one led to the other. These aren’t independent achievements,
So what was his IQ?
According to this source, 60.5 billion people have lived from 1 AD to 2011. Let’s say 16% were white. Assuming Newton was the best physicist to ever live, he would have at the very least been at the one in 9.68 billion level among whites, which is +6.33 standard deviations (SD) on a normalized curve.
However great achievement requires more than just ability. It also helps to have 10,000 hours of practice, among other things. Ability seems to explain 66% to 70% of the variance in various cognitive performance, suggesting ability correlates 0.82 with performance.
So if Newton were +6.33 SD in physics performance, we’d expect him to be 0.82(+6.33) = +5.19 SD in physics ability.
How much does physics ability correlate with IQ? The math section of the WIAT correlates 0.84 with WAIS-IV full-scale IQ, so if Newton were +5.19 SD in physics ability, I’d expect him to be 0.84(+5.19 SD) = +4.4 SD in IQ. In other words, I’d expect him to have scored IQ 166 (white norms) on a random test normed in his day.
To appreciate how high that is, young white American men have an average height of 5’10.1″ (SD = 2.94″) so an IQ of 166 is the height equivalent of being 6’11”. Both are +4.4 SD.
So just as we might expect the greatest basketball player of all time to be 6’11”, we’d expect the greatest physicist of all time to be IQ 166.
To those who think even IQ 166 is not high enough for a mind as great as Newton, I point to examples of other great minds who scored much lower on IQ tests such as Ted Kaczynski in adulthood or Garry Kasparov. For those who say the tests weren’t valid measures of their intelligence, I say IQ is an imperfect science. An IQ score is simply one’s performance on highly g loaded psychometric tasks not a direct measure of neurological functioning, so occasionally it will give highly flawed results. IQ 166 is simply my best guess of how Newton would have scored on a randomly selected high ceiling IQ test considered valid in his time and place, not necessarily a prediction of his actual intelligence.
It’s common knowledge in psychometrics that U.S. whites average about one standard deviation (15 IQ points) higher than U.S. blacks and have done so since the first mass tests were administered in WWI.
But could the gap extend much further in space and time? Tens of thousands of years further.
At first it sounds absurd: there were no IQ tests 15,000 years ago, and there weren’t any white people. The earliest Europeans had dark skin, and they were largely replaced by Middle Easterners spreading agriculture.
Nonetheless, there were people living in Europe 15,000 years ago and to the degree they resemble today’s Europeans (phenotypically and genetically) they’re a proxy for archaic whites.
Similarly, the oldest lineage in Africa are Bushmen, and to the degree they resemble modern Africans, they’re a proxy for archaic blacks.
The archaic whites left the following rock art over 15,000 years ago.
The archaic blacks left the following rock art, perhaps much more recently.
When I asked readers to rate the two archaic white paintings using the quality scale of the Dale-Harris Draw-A-Man test, the median votes were 3 and 11, giving the archaic whites a mean score of 7.
For archaic blacks, the median votes were 3 and 8, giving archaic blacks a mean score of 5.5.
That’s a difference of 1.5 points. Since the standard deviation for incipient adults (age 15) on the Goodenough-Harris quality scale is 1.7, archaic whites over 15 thousand years ago were already nearly one standard deviation (15 IQ points) higher than archaic blacks living later.
Of course with such a tiny sample size, this conclusion is EXTREMELY tentative and requires far more research.
IQ stands for intelligence quotient because originally IQ was calculated as the ratio of mental age to chronological age, so if you were a six-year-old who cognitively functioned like the average six-year-old, you had an IQ of 100, because you were functioning at 100% of your chronological age. By contrast if you were a six-year-old who was functioning like a four-year-old, your IQ was 66, because your development was only 66% as fast as it should be, and you were sent to what were then called EMR classes.
This was a beautifully elegant concept but there were a few problems. The first is all of us are 0.75 years older than we think we are since we grew in the womb for 9 months. The age ratio method would have made more sense if they had added 0.75 to both the chronological and mental ages and I suspect the distribution would have been more normal.
The other problem is cognitive growth is not linear function of age throughout the entire maturation process.
“Some guy” writes:
Does it really matter if it’s not linear though? If someone scores as the average 10 year-old then it indicates they have the drawing IQ of a 10-year old, which seems more useful than a subjective number.
What’s more useful information about a man’s height? That he’s as tall as the average 10-year-old, or that he’s 1.3 feet shorter than the average man. Both are useful, but the advantage of creating a scale that is independent of age is that it has a much higher ceiling. On the old Stanford-Binet, scores stopped increasing after age 15, so how do you assign a mental age to someone who is smarter than the average 15-year-old?
The old Stanford-Binet got around this problem by arbitrarily extending the mental age scale beyond 15, so Marilyn vos Savant was able to claim an IQ of 228, because she scored a mental age of 22.8 at age 10, even though there was no such thing as a mental age of 22.8 on a test where mental growth peaks at 15.
This makes about as much sense as telling a 19-year-old seven-footer they have a height age of 92, and therefore a Height Quotient of 484, after all the average male height only increases by 0.2 inches from 19 to 20, so if height didn’t plateau, at that rate it would take the average man until his 90s to reach seven feet.
“Some guy” continues:
Presumably they still used this system to see if people scored averagely for their age, but had to first to figure out what the average for each age was anyway.
A related question: Is the mental age concept still applicable to modern IQ tests even though they’re not based on it? Let’s say 10-year old scores 130 on the WAIS. 2 SD above the mean on a 16 SD mental age test would be 132. Can that child be assumed to have the same IQ as the average 13.2 year old?
Put it this way. If a ten-year-old scored like an average 13-year-old on every subtest of the WISC-R, he’d get a full-scale IQ of 126, which is similar to the 130 you’d expect from the age ratio formula. On the other hand if a 10-year-old scored like a 15-year-old on the WISC-R, he’d get a full-scale IQ of 134, which is much less than the 150 you’d expect from age ratios.
And yet if a six-year-old scores like a nine-year-old on the WISC-R he gets a full-scale IQ of 143. So the same ratio IQ equates to different deviation IQs depending on what age it’s obtained (or what test it’s obtained on) which makes it a problematic index.
It probably agrees most with deviation IQ when both the chronological and mental age are no lower than 4 and no higher than 12, since that’s probably the most linear developmental period.
It is most ironic that the Draw a Man test was invented by a woman and that girls outscore boys, but in the 1920s, women were devalued. The great Florence Goodenough realized that as children got older, their drawings became more sophisticated and thus could be used as a proxy for mental age. Goodenough’s test was not a good measure of IQ, but at times it was good enough (get it?).
When the test was revised in 1962 by Dale Harris, not only did he add a “Draw a Woman” subtest, but he added a quality scale so that rather than spending half an hour going through a long checklist of dozens of different criteria, psychologists could just compare a drawing they were scoring to a progression of drawings ranked from level 1 (crude stick figure) to level 12 (a detailed sketch) and judge which level it most resembled. This may sound subjective, but different judges gave very similar scores (though today machine learning could probably improve objectivity).
What I love about the quality scale is that when they were making it, they instructed the judges to divide all the drawings they reviewed into 12 separate piles such that difference in quality between each pile was equal. This makes the raw scores a true interval scale, unlike most tests which are only ordinal scales.
Please study the progression of drawings from 1 to 12, and notice how as you move up the scale, you get a gradual and consistent improvement in accuracy, detail and proportion (with no sudden jumps in quality). Based on the linear progression, try to imagine a drawing that would merit a level 13 or 14 etc if the scale extended that high:
Now please compare the below drawings which I’ll be discussing in future articles to the quality scale and vote on where they should rank. Please vote before wondering who drew them or reading the comments since that could bias your judgement. Please be as objective as possible. Consider the level of maturity of each drawing (using the above quality scale as a guide), not whether you like or dislike it.
Although all drawings should be of men, in some cases artists took certain liberties (i.e. head of a bird etc). In such cases use your best judgement to decide what score the drawing merits.
I could have scored these myself but it seems more objective and scientific to rely on the wisdom of crowds:
Back in 2016, commenter Recuring cited the following quote:
We of the Study of Mathematically Precocious Youth (SMPY) at Johns Hopkins have discovered, chiefly by testing able 12-year-olds, that when the examinee’s SAT-M score vastly exceeds his or her SAT-V score the youth is almost certain to score high on a difficult test of nonverbal reasoning ability such as the Advanced Form of the Raven Progressive Matrices, often higher than a high-M high-V examinee does. To test this out, on 6 May 1985 I administered to Terry the RPM-Advanced, an untimed test. He completed its 36 8-option items in about 45 minutes. Whereas the average British university student scores 21, Terry scored 32. He did not miss any of the last, most difficult, 4 items. Also, when told which 4 items he had not answered correctly, he was quickly able to find the correct response to each. Few of SMPY’s ablest protégés, members of its “700-800 on SAT-M Before Age 13″ group, could do as well.
I found the norms for this test (hat-tip to commenter Rahul for telling me they’re online) so I was finally able to complete part 3 of this series (three years after part 2).
For UK 10-year-olds, the 5th percentile (IQ 75) is a raw score of 1, while the 95th percentile (IQ 125) is a raw score of 15. If we assume raw scores are roughly normally distributed, we can crudely estimate that a 14 point gap in raw score equates to a 50 point IQ gap, and thus Terry’s score of 32, which is 24 points above the median raw score of 8, would thus be 86 points above the median IQ of 100, or IQ 186 (UK norms).
Some might argue that we should deduct a few points for the Flynn effect since the UK norms were six years old, however my sense is that the Flynn effect has been wildly exaggerated. For example, on the WAIS-III Matrix Reasoning subtest, average raw scores are identical for all ages from 18 to 34 and on the Advanced Progressive Matrices U.S. white norms (since it was normed in lily-white Iowa), there’s no change in raw scores from age 20 to 30:
On page 206 of Bias in Mental Testing, Arthur Jensen writes:
Not sure why Jensen considers all these correlations positive, unless zero is a positive number (I consider it neutral).
And I’m not sure why some commenters think weight lifting requires coordination when the correlation between strength (hand grip, chinning) and coordination (Pursuit rotor tacking, Mirror star tracing) is zero.
But maybe these are not the best measures of strength or coordination (mirror star tracing sounds more like a cognitive test than a physical one), but when I lift weights, I don’t feel like I’m using coordination. To me coordination is best measured by very fast paced tasks that require moving multiple body parts with exquisite timing.
Physical coordination probably correlates more with IQ than does any other physical ability. Daniel Seligman writes:
Contrary to certain stereotypes about athletes and intellectuals, physical coordination is positively correlated with IQ. Technical studies by the U.S. department of Labor report a 0.35 correlation between coordination and cognitive ability.
0.35 is very similar to the correlation between IQ and brain size; so there are at least two physical traits (brain size and coordination) that correlate moderately with IQ.
Some might argue that physical coordination is a part of intelligence since it’s largely a brain function. I define intelligence as the ability to use whatever physical traits one has as a tool to exploit whatever environment one’s in. I see coordination as one of those physical traits used as a tool by intelligence rather than part of intelligence itself, but it’s a meta-tool in that it controls the body which in turn controls the external environment.
The problem with including physical coordination in our definition of intelligence is that intelligence is only important because it’s what separates man from beast, and physical coordination fails to do that. Even if it were possible to put a man’s brain in a cheetah’s body, he would not be able to exploit the environment because his brain’s not evolved to control the cheetah’s body. But if a man’s brain could control what the cheetah did with its motor control, only then would the cheetah display the goal directed adaptive behavior we know as intelligence.
It’s like the Master Blaster character in Mad Max: Beyond Thunderdome. If Master’s brain was literally put in Blaster’s body, he might not have the coordination to win so many fights, but by telling Blaster how to use his coordination, he has given him his mind.
Feelings control intelligence
Intelligence is often defined as the mental ability to problem solve, but something is only a problem if it’s bothering us (i.e. cause us to feel pain or discomfort). Hence, feelings define the problems we use our intelligence to solve.
Intelligence controls physical coordination
Once our intelligence decides what behavior will solve a problem most efficiently, our physical coordination must direct our muscle movements accordingly. One could argue coordination itself is a mental ability and thus part of intelligence however by definition, abilities are only mental if they don’t cluster with sensory or motor functions, and physical coordination clusters with the latter. Even though coordination is part of the brain, it’s not fully part of the mind. It’s more neurological than mental per se.
In honor of Super Bowl Sunday, I thought I’d ask whether physical abilities have a general factor (g), the same way cognitive abilities do? For those who don’t know, Charles Spearman famously discovered that all cognitive abilities are positively correlated, suggesting they all are influenced by some general ability, and thus the most efficient way to measure someone’s overall cognition was to test the most g loaded abilities (since these best predict all other abilities).
The existence of a physical g factor would be useful (though not necessary) for assigning people an AQ (athletic quotient), the same way IQ tests assign folks an IQ (intelligence quotient).
Cognitive and physical abilities are both forms of voluntary goal directed behaviors that can be objectively graded on a scale of proficiency, but because cognitive abilities are more prestigious (at least after high school), we increasingly see athletes trying to claim their abilities are mental instead of physical. And so we have commenters claiming that weight lifting requires not just physical strength, but coordination, and terms like kinesthetic intelligence have emerged in the literature.
We all have our biases, which is why I love factor analysis, which allows us to objectively decide what category or sub-category different abilities fall into. Factor analysis is a technique where you look at the intercorrelation between a large number of traits, and notice that some intercorrelate better than others, allowing you to infer sources of variance that are common to some traits but not others, allowing you to group traits into different clusters.
When you control for these common sources of variance, you find that some traits still intercorrelate better than others, allowing you you to infer higher sources of common variance. This allows for a hiearchy of categories and sub-categories that is wholly objective, requiring no judgement or decision on anyone’s part, and yet still agrees with common sense.
As the late Arthur Jensen noted, when you factor analyze hundreds of clothing measurements from every body part dimension imaginable, you find almost all of the variance can be explained by just three factors: general body size, body length, and body width analogous to how every location on Earth can be explained by just three data points (latitude, longitude, and altitude).
When we give people an extremely diverse series of tasks, we find that most tasks that are commonly thought of as physical (running, jumping, lifting) are more positively correlated with each other than with tasks commonly thought of as mental (repeating numbers, defining words, solving jig-saw puzzles). To be sure, there are certain hybrid abilities that load equally on both domains.
When the 11 subtests on the WAIS-R were factor analyzed, it turned out that even though David Wechsler thought he was measuring 11 different parts of intelligence, most of the variance in scores could be explained by just four factors: general intelligence, verbal knowledge, spatial ability, and short-term memory.
Similarly, when ten different physical tests were factor-analyzed, it was found that most of the variance could be explained by just five factors: general athleticism, strength, running ability, coordination, and balance. I find it interesting that measures of physical strength (hand-grip, chinning) have negative loadings on both coordination and balance, suggesting an evolutionary trade-off between muscle and control as we marched up the evolutionary tree.
It’s interesting to note that just as vocabulary was the single best measure of cognitive g in the cognitive battery, 100-yard dash was the single best measure of athletic g in the athletic battery. So just as people who score at the one in a billion level (relative to Western norms) on g loaded cognitive tests are said to have IQs of 190+, people who would score at the one in a billion level on the 100-yard dash can claim an AQ of 190+. That doesn’t mean their athletic g is truly at the one in a billion level (even the 100-yard dash is an imperfect measure of athletic g) but it’s about as close to a perfect measure of athletic g as the best IQ tests are to perfect measures of cognitive g.
Of course the best measure of athleticism would be to give people the full battery of physical tests and calculate the composite score, but if you wanted a short-form, you would just take the 100-yard dash, just like when psychologists wanted a short-form for IQ, they would just test Vocabulary, though more recent research suggests that g loading of Vocabulary has declined or was never as high as once thought.
Which brings us to another key point: The factor structure of any correlation matrix is sensitive to what tests you include, so skeptics might want to see a similar factor structure replicated in several diverse random selections of tests before accepting the factors and their loadings on various tests.
Charles Murray was recently interviewed by Bill Kristol (who made my annual list of the 100 most influential living people of all time). Murray himself might one day make the list for the enormous impact his 1984 book Losing Ground had on public policy.
I could listen to Murray talk for hours. The man exudes gravitas.
Kristol worships Murray, once calling him America’s leading living social scientist. Despite being a lightning rod on college campuses, Murray is whined and dined by the elite.
[note from PP, feb 4, 2019: an earlier version of this article contained a spelling mistake that has since been corrected]
According to wikipedia, 100% of the fastest twenty 100 meter runners of all time are blacks, and 65% of the World’s best ping-pong players are East Asian. This fits J.P. Ruston’s theory that blacks and East Asians are at opposite extremes of an evolutionary trade-off, with Caucasoids in the middle (though closer to East Asians), consistent with the time period each race branched off the human evolutionary tree (Blacks first, Caucasoids second, Mongoloids last).
In the Minnesota Transracial Adoption study, white babies, black babies, and mixed babies (biological father black; biological mother white) were adopted into white upper middle-class homes when they were 19 months, 32 months, and nine months respectively. The purpose of the study was to determine how much of the 15 point black-white IQ gap in the United States is genetic.
In 1975, the children and adoptive parents were IQ tested on at least an abbreviated versions of the Stanford Binet/WISC/WAIS (depending on age), and then retested in 1986 on the WISC-R/WAIS-R depending on age. Here are the results:
Because the norms on all the tests were out-dated at the time of testing (especially in 1975), John Loehlin attempted to correct all scores for the Flynn effect.
But many people ignore the IQs themselves, and instead just focus on the IQ differences. They see that at age 17, adopted whites scored 7.1 points higher than adopted mixeds in the unadjusted data, and 16.2 points higher than the adopted blacks, and conclude that the 15 point black-white IQ gap in the United States is roughly 100% genetic.
One problem with this is that black babies were adopted later than the non-black babies. Another problem is they were born to black mothers, while the non-black babies were all born to white mothers, so the prenatal and perinatal environments may have been quite unequal.
Thus I have always been more intrigued by the 7.1 IQ gap between the adopted whites and adopted mixeds. Since the adopted mixeds presumably had only half as much black ancestry as the typical U.S. black, it’s interesting that there’s roughly half the infamous 15 point black-white IQ gap, despite being gestated in white wombs and raised in white homes. Does this point to the importance of genetics?
Physicist Drew Thomas argues that the comparison between the adopted whites and adopted blacks is misleading because in the tables posted above, at both ages we only see data for the adopted kids who remained in the study for the follow-up testing in 1986. He argues that several low IQ adopted white kids dropped out of the study, and had they remained, the IQ gap between the adopted whites and adopted mixeds would have perhaps been only 3.5 points at age 17.
However this argument is starting to feel a little post-hoc. When you do a study, your data is what it is. You can’t adjust it for what it would have been had people you wished remained in the study. Almost any study can be debunked if we imagine how it would have turned out in a parallel universe where different people took part.
That’s not to deny that adjusting for attrition can be important in some cases, but in this study, Thomas argues attrition only increased the IQs of adopted whites and not the adopted non-whites. An effect that only affected one demographic sounds to me like random error, not a systematic bias that needs to be adjusted for. And if the error was random, one could just as easily argue the IQs of adopted whites were too low before the attrition rather than too high after the attrition.
Indeed if the adopted white sample is so easily skewed by a few kids dropping out of the study, then maybe that sample is too small to begin with, and instead we should compare the much larger sample of adopted mixeds not to the adopted whites, but to the general U.S. white population.
At an average age of 17, the adopted mixeds took the WISC-R and WAIS-R depending on age, and averaged 98.5 (93.5 after adjustments for the Flynn effect, since WISC-R and WAIS-R norms were 14 and 8 years old respectively at the time of testing).
However some top-secret research I’ve been slowly doing over the past decade suggests the Flynn effect has been wildly exaggerated, so while I don’t think their average IQ was as high as 98.5, I also doubt it was as low as the Flynn corrections say. Let’s split the difference and say 96 (U.S. norms).
By contrast, the whites in the WISC-R and WAIS-R standardization samples averaged 102.2 (standard deviation (SD) = 14.08) and 101.4 (SD = 14.65) respectively. Let’s split the difference and say 101.8 (SD = 14.4).
Thus converting to the more traditional scale where the U.S. white mean and SD are set at 100 and 15 respectively, the adopted mixed mean of 96 becomes ((96 – 101.8)/14.4)(15) + 100 = 94.
In other words, despite being gestated in white wombs and raised in upper-middle class white homes, having just one U.S. black biological parent appears to have reduced IQ by 6 points, suggesting that having two U.S black biological parents would reduce IQ by 12 points, suggesting that 80% of the 15 point black-white IQ gap in the U.S. is genetic. 80% squared is 0.64 which is similar to the 0.69 heritability of the WAIS full-scale IQ found in Thomas Bouchard’s study of identical twins reared apart, consistent with Jensen’s default hypothesis which claimed that IQ gaps between U.S. races are caused by the same nature-nurture mix that occurs within them.
To paraphrase President Obama, there is no black America or white America; from a nature-nurture perspective, there’s just America.
While this analysis seems to have controlled for the prenatal and family environment, it’ does not control for peer groups. Maybe as mixed kids raised in white homes, they were unmotivated on IQ tests because of the racist stereotype that being smart = acting white. On the other hand, they did better on scholastic tests than they did on formal IQ tests, suggesting motivation was not a problem.
If the genetic part of the U.S. black-white IQ gap is indeed 12 points and black Americans are only about 74% black on average it implies that 100% West African ancestry would reduce IQ by 16 points below the U.S. white mean (at least if we assume U.S. black ancestry is representative of West African ancestry).
And at least if we assume the Phenotype = Genotype + Environment model
Some readers invoke a reaction norm model where genotype A is higher IQ than genotype B in environment A, but lower than genotype B in environment B. Assuming such norm crossing occurs with IQ, my sense is that it would be limited to individual cases and cancel out in group level comparisons like the black-white IQ gap.
Some might argue that it’s inappropriate to compare adopted mixeds to the general U.S. white population because adopted mixeds might not be genetically representative of their parent populations. In The g Factor, Jensen states that the parents of the mixeds averaged 12.5 years of schooling (page 473) while just the mothers averaged 12.4 (page 478). From here we can deduce that the fathers averaged 12.6.
In 1975 America, white women and non-white men age 25+ had a median of 12.3 and 11.3 years of schooling respectively (see table 4 of this document). Comparable figures in 1986 were 12.6 and 12.5. So using education as a proxy, there’s no reason to think the mixed kids were selected to have lower IQs than the mean of their parent races. If anything, their biological fathers averaged more education than age 25+ non-white men throughout the full duration of the study and their biological mothers averaged about the same education as age 25+ white women.
Of course it would help to know the exact ages of the parents, rather than just lumping them in with everyone over 25. I can’t find the age of the biological parents of the mixeds specifically, but the bio moms and dads of all the kids who took part in at least part of the study (see table 3 of this paper) averaged 21.6 and 26.3 at the time the kids were born, and thus were about 29 and 33 in 1975 and about 39 and 43 in 1986, thus they were likely near the median age of the 25+ cohort by the end of the study.
Although this study shows the black-white IQ gap is highly genetic, several similar studies beg to differ. Tizard (1974) compared black, white and mixed-race kids raised in English residential nurseries and found that the only significant IQ difference favored the non-white kids. A problem with this study is that the children were extremely young (below age 5) and ethnic differences in maturation rates favor black kids. A bigger problem with this study is that the parents of the black kids appeared to be immigrants (African or West Indian) and immigrants are often hyper-selected for IQ (see Indian Americans).
A second study by Eyferth (1961) found that the biological illegitimate children of white German women had a mean IQ of 97.2 if the biological father was was a white soldier and 96.5 if the biological father was a black soldier (a trivial difference). Both the white and mixed kids were raised by their biological white mothers. One problem with this study is that the biological fathers of both races would have been screened to have similar IQ’s because at the time, only the highest scoring 97% of whites and highest scoring 70% of blacks passed the Army General Classification Test and were allowed to be U.S. soldiers. In addition, 20% to 25% of the “black fathers” were not African-American or even black Africans, but rather French North Africans (non-white caucasoids or “dark whites” as they are sometimes called). In addition, there was no follow-up to measure the adult IQ of the children.
A third study by Moore (1986) included a section where he looked at sub-samples of children adopted by white parents. He found that nine adopted kids with two black biological parents averaged 2 IQ points higher than 14 adopted kids with only one biological black parent but the sample size was quite small, I don’t know anything about the bio-parents and again, no followup when the kids were older.