Estimating a reader’s IQ Part 2: Historiometrics

In my last article I used the demographic approach to estimate a reader’s IQ. This is used when you lack psychometric data, but you know other details statistically related to test scores. This method is most often used when dementia is suspected but the patient has no psychometric history, thus the only way to check if cognition has declined is to estimate the expected level of past IQ. Such estimates are made from demographic variables like education, occupation and race, but I like to add physical variables like head size, height, body mass index etc, hence I renamed it bio-demographics.

In part 2 we take a historiometric approach. Like the biodemographic approach it’s also used when we don’t have test scores per se, but we do have a cognitive history amenable to quantitative analysis. For example if I know you learned to read at age 5, while the average kid can’t learn until 6, I might estimate your IQ is 120, since you were cognitively functioning at 120% your age level.

Another example might be when Homo Erectus was said to have the mind of a modern European 7-year-old because the tools they made could not be learned by children younger. This might also be called cognitive archeology, a term James Flynn used.

The reader wrote:

I was precocious only with respect to verbal ability. My kindergarten teacher evidently told my mother that I had the most extensive vocabulary of any child she had taught in a 20 year period. Her typical class probably consisted of 20 students, so I take this to mean that I had the most advanced vocabulary out of ~400. Half of the students at this school were white, and the other half black.

Of course being at the one in 400+ level at this particular school is not necessarily equivalent to being at the one in 400+ level for Americans as a whole, especially since the racial distribution of this school is not typical.

Since half the school is white, we can guess he was at the one on 200+ level among the white students, however because of systemic racism, upper class whites send their kids to white schools, leaving the children of the lower class to attend largely black schools.

On page 63 of Charles Murray’s Coming Apart he notes that whites with only a high school diploma average IQ 99 (U.S. norms) and those with even less average IQ 87. Splitting the difference, IQ 93 was perhaps the mean of white parents of students at this school.

But given the 0.5 IQ correlation between parents and their children, we’d expect the children of these white parents to progress 50% to the national white mean (IQ 103), thus the white students likely averaged IQ 98.

Assuming a standard deviation of about 15 we’d expect the one in 200+ level (+2.5 SD) to be IQ 136+.

So our historiometric estimate of the reader’s IQ (136+) is somewhat higher than the bio-demographic one (127).

Estimating a reader’s IQ Part 1: Bio-demographics

Back in February, a reader asked me to estimate his IQ. He wrote:

Dear PP,

I’m a long-time reader of your blog. I particularly enjoy your estimates of the I.Q.s of historical and contemporary personalities.

Would you perhaps be interested in estimating my I.Q.?…

…I am underweight, but my head is decidedly small. I am currently unemployed and have never had a job that could be called ‘good’ by any reasonable standard…

…Best wishes,

A small-skulled, unemployed…

When I asked him to elaborate, he had a lot more to say but in part 1 we focus only on the bio-demographic data. He writes:

My height is 5’9″, and my weight is 121.6 lbs.

I have no formal education to speak of beyond high school. I believe that I graduated around the middle of my class.

I am 30 years old, white …

…I’ve long suspected that I have Asperger’s (and scored 1.9 SD above the mean on Paul Cooijmans’ ‘Aspergoid’ scale), but I have not sought a formal diagnosis…

… My current favorite writers are Schopenhauer and Cioran (I don’t know whether this is relevant or helpful)…

…Politically, my sympathies are broadly anti-democratic. Wilhelmine Germany is close to my ideal society…

…Strictly speaking, I am an atheist, but I find much of value in Theravada Buddhism. I believe that I will survive the death of this body…

…My head circumference is 21.75 inches…

I begin with the fact that he’s a long-time reader of my blog. Anonymous polls suggest my readers average genius level IQ of about 129 (U.S. norms). Not because of the quality of my writing, but because the topic of IQ interests a lot of super high IQ people.

The next question that needs to be answered is whether this reader is brighter or duller than my average reader. The fact that he is unemployed doesn’t tell me much because he’s still young, may have psychological issues, and we’re in the middle of a pandemic. But with only a high school education, he’s certainly much less educated than my average reader and education is perhaps the single strongest demographic proxy for IQ (though not as strong as it was in the 1950s).

On the other hand his favorite writers are Schopenhauer and Cioran, implying more intellectual interests than most of my readers.

But with a head circumference of only 21.75″, his brain is way smaller than my average readers’.

But part of the reason it’s small is because he is so light and low weight/height ratio is a sign of genius.

Head size adjusted for body size

Using 2012 data from the U.S. army, the reader’s head circumference is about 1.36 standard deviations below average for an man (bottom 10%).

But the reader’s weight is 2.12 standard deviations below average for a man, and since the distribution of weight is positively skewed, his normalized deviation from the mean is 2.66 SD below the mean (bottom 0.4%).

Given that head size correlates about 0.41 with weight in U.S. men, his weight predicts his cranium would be be 2.66(0.41) = 1.09 SD below average.

Thus his cranium is only 0.27 SD smaller than his body mass predicts, so adjusted for body size, his head circumference is roughly -0.27 SD.

Statistically expected IQ

However given the average reader’s cranium is predicted to be +0.44 SD, even after adjusting for body mass, he’s about 0.67 SD smaller headed than the average reader.

And given that head circumference correlates about 0.23 with IQ, I’d expect him to be about 0.23(-0.67 SD)= 0.15 SD less intelligent than the average reader (who is a genius).

In other words, about 2 IQ points below the genius mean of 129, or roughly 127.

Of course, this is a very crude estimate with an enormous margin of error, and as the series continues, we’ll see how close it is.

Genetically superior: Was the virus made in a lab?

For over a year, the prevailing narrative among the educated class is that the coronavirus jumped from bats to humans in a Chinese wet market. However an alternative theory is that the virus was created in Chinese lab. At first this theory was only believed by right-wing morons but increasingly it has been embraced by intellectuals like Bill Maher and Saagar Enjeti.

One reason to reject the bat theory is it seems unlikely that a virus that evolved in bats would immediately be so well adapted to humans. And why would a virus that evolved in bats spread primarily indoors when bats are always outside? These two facts cause people to suspect the virus is not natural but was designed to spread in humans. And the fact that it attacks organs and causes blood clots makes some suspect it was created as a potential biological weapon that somehow escaped from the lab. Unlike the racist scum pushing ant-Asian conspiracy theories, I don’t believe it was intentionally released.

If the virus does indeed turn out to be a biological weapon, then East Asians have created a powerful new weapon that has conquered the World. Although the virus has unintentionally caused enormous pain and suffering to billions of people, the fact that they could engineer something that effective is evidence of superior East Asian intelligence.

But because humans evolved to be tribal, despicable acts of violence have been unleashed against Asian-Americans like this brutal stomping of an older Oriental woman in New York.

When I learned that the assailant was black, my heart sank. How can one minority be so racist against another? You would think that they would know better.

On the other hand, blacks were the first and most tropical humans, while Mongoloids were the last and most cold-adapted humans. If ethnic genetic interests are real, being at opposite ends of the racial spectrum may cause hostility.

However when another anti-Asian racist (this time a white one) decided to beat up an elderly Oriental woman, she beat the living shit out of him. Even though he had every advantage, being twice her size, speed and strength and youth, she used her large brain size to adapt; to take whatever situation she was in, and turn it around to her advantage.

That’s really what intelligence is.

Genetically superior.

New research inspires fresh look at evolutionary progress

For years I have argued that (1) the more branching on the evolutionary tree from which you are descended, the more evolved you are (on average), and (2) the more evolved you are, the more superior you are (on average). To many educated people, this sounds ignorant because regardless of when your lineage branched off the evolutionary tree, all extant life has been evolving for the same amount of time. However years ago, I noticed that taxa descended from less branching just seem more primitive, so there must be a logical reason why branching matters. For example, plants seem more primitive than animals and plants have done less branching than animals.

Recently commenter Race Realist informed me of more modern taxonomy that might not fit my theory as neatly. Luckily, even the new taxonomy supports my theory. For example, after only one split on the evolutionary tree, bacteria/a-proteobacteria branch off. After two splits, discoba branch off. After three splits sa[r]p and amorphea splits off.

Source: An Alternative Root for the Eukaryote Tree of Life
Author links open overlay panelDingHe12OmarFiz-Palacios12Cheng-JieFu1JohannaFehling13Chun-ChiehTsai1Sandra L.Baldauf1

So simple branching predicts the clade that contains humans (amorphea) is on top of the evolutionary hierarchy and bacteria are at the bottom. Now among amorphea, once again the clade that contains humans (Holozoa) is tied for the most splits while Amoebozoa has only one.

Sadly, among Holozoa, the clade that contains humans does not come out on top, but I suspects that’s because they didn’t have enough room in the chart to create a comprehensive tree at that level of specificity.

The interview that rocked the World

Oprah’s recent interview with Meghan Markle and Prince Harry drew record ratings and made Worldwide front page news for weeks. Cunning and smart, the big brained billionaire knows how to spot an opportunity, and then brilliantly exploit it.

The very British Carole Malone writes:

OPRAH WINFREY is one of the smartest women on the planet. It’s why she’s a billionaire. It’s why when Harry and Meghan invited her to their wedding she went like a shot.

Not because she was overcome with excitement at being a royal wedding guest (although she must have been curious).

And not because of Meghan, who was a two bit actress in a Netflix soap opera at the time. But because, even back then, Oprah, always alert to the main chance, had her beady eye on the Big Interview. She wasn’t Meghan and Harry’s friend back in 2018, having met them just once before the wedding, and I suspect she isn’t now. But everything she’s done to help them, e.g. get their first home in LA, introduce them to all the right people, will have been done with a view to the big prize. And now she’s got it.

Source: What Meghan and Harry are doing is despicable, Oprah interview betrays Queen; Express; March 4, 2021

Love that line “went like a shot” especially when spoken by a British woman. I just picture Oprah’s private jet flying like a speeding bullet to the UK wedding while using her nearly 2000 cc cranium to calculate how to lure the couple into the interview of the century: The way to get it, is to pretend I don’t want it.

Pretty in pink with a custom made hat to fit her super-size cranium

During the interview, the couple made the explosive claim that an unnamed member of the royal family was concerned that they might produce a dark skinned baby. This caused British gossip Pierce Morgan to go absolutely ballistic. Even though Meghan and Harry clarified that the alleged royal racist was NOT the Queen or Prince Phillip, Morgan took this as a personal attack on her majesty.

However Meghan’s co-ethnic struck back, driving Morgan off the set of his own show. It’s unclear whether he quit or was fired, but after thousands of viewers and Meghan herself complained about his behavior, he never returned.

Meanwhile on the other side of the pond, talk show host Sharon Osbourne defended her friend Morgan against accusations of racism. This led to allegations that Osbourne herself was racist, and after being allegedly ambushed on her own show, she too was removed from TV indefinitely.

Queen Elizabeth released a statement saying she will address Meghan and Harry’s accusations in private. The palace has also opened up an investigation into whether Markel bullied palace staff. Many are asking whether the monarchy can survive this.

Meanwhile the real Queen sits somewhere in her $100 million Santa Barbara mansion, watching all the chaos she unleashed.

When phenotype does & does not trump genotype in taxonomy

In a previous article I declared the kalash to be white, even though they diverged from whites 12,000 years ago and are more genetically unrelated to whites than whites are to non-white Caucasoids. Commenter “Some Guy” wrote:

I assume you wouldn’t group together two different species as one just because they had a similar phenotype Pumpkin, isn’t it a bit strange to do the equivalent with races/subspecies?

Some scientists (not all and perhaps not most) do group different species together into the same taxon based on phenotype. A good example are reptiles. Note in the below evolutionary tree, crocodiles and snakes are both reptiles, but birds are not, even though crocodiles are much more closely related to related to birds than to snakes. Thus the grouping is based on phenotype, not lineage.


However no scientist would ever classify bats as a type of bird even if their phenotypes were 100% identical (which they’re not).

Do any of my readers grasp the subtle difference between the two trees that makes it okay to lump distantly related but phenotypically similar species together in some cases but not in others? I’ve mentioned it before but people often ignore me. 🙂

UPDATE 2021-03-18

So now that commenter Lerenzo (and perhaps Austin Slater) grasped the difference, I can now make it explicit.

It’s NEVER okay to lump genetically distant species together (no matter how similar the phenotype) if they form a polyphyletic group but some scientists feel it’s okay if they form a paraphyletic group. Of course everyone agrees it’s okay if they form a monophyletic group.

The reason polyphyletic groups are not okay no matter how phenotypically similar they might be, is probably that their similarity has independent origins. By contrast phenotypic similarity in both monophyletic and paraphyletic groups is inherited from a common ancestor.

Unfortunately many scientists today even reject paraphyletic groups and treat all monophyletic groups as taxa regardless of phenotypic diversity. This has led to absurdities like humans being called apes, birds being called dinosaurs, and Andaman Islanders being denied their blackness.

Whites are at least 12,000 years old

Don’t clash with the Kalash

The Kalash are a fascinating population because they look just like whites, yet are indigenous to South Asia. Genetically, they are more distant from Whites than Whites are from the blackest skinned South Asian which is why I’ve long argued that genetics is a poor way to define race.

Human populations show subtle allele-frequency differences that lead to geographical structure, and available methods thus allow individuals to be clustered according to genetic information into groups that correspond to geographical regions. In an early worldwide survey of this kind, division into five clusters unsurprisingly identified (1) Africans, (2) a widespread group including Europeans, Middle Easterners, and South Asians, (3) East Asians, (4) Oceanians, and (5) Native Americans. However, division into six groups led to a more surprising finding: the sixth group consisted of a single population, the Kalash

The Kalash Genetic Isolate: Ancient Divergence, Drift, and Selection
Qasim Ayub,1,7,∗ Massimo Mezzavilla,1,2,7 Luca Pagani,1,3 Marc Haber,1 Aisha Mohyuddin,4 Shagufta Khaliq,5 Syed Qasim Mehdi,6 and Chris Tyler-Smith1

So isn’t that interesting that when you divide the human genepool into five clusters, you get traditional races: 1) Negroids, 2) Caucasoids, 3) Mongoloids, 4) Australoids, and 5) Americoids. But when you divide us into 6 clusters, the Kalash emerge as their own distinct macro-race. This shows that the split between Kalash and conventional Caucasoids is about as deep and as ancient as the paleolithic splits between major races like Native Americans and East Asians and predates civilization, agriculture and even the Holocene.

Since the split from other South Asian populations, the Kalash have maintained a low long-term effective population size (2,319–2,603) and experienced no detectable gene flow from their geographic neighbors in Pakistan or from other extant Eurasian populations. The mean time of divergence between the Kalash and other populations currently residing in this region was estimated to be 11,800 (95% confidence interval = 10,600−12,600) years ago, and thus they represent present-day descendants of some of the earliest migrants into the Indian sub-continent from West Asia.

The Kalash Genetic Isolate: Ancient Divergence, Drift, and Selection
Qasim Ayub,1,7,∗ Massimo Mezzavilla,1,2,7 Luca Pagani,1,3 Marc Haber,1 Aisha Mohyuddin,4 Shagufta Khaliq,5 Syed Qasim Mehdi,6 and Chris Tyler-Smith1

If the Kalash diverged from the ancestors of whites 12,000 years ago, yet look just like whites, then either a white looking phenotype evolved twice independently, or much more likely, the white race is at least 12,000 years old.

Whites should be very proud to be 12,000 years old! Young enough to imply evolutionary progress (if you believe in such) but old enough to have been selected by nature (before agriculture and civilization).

It would be nice if some of the white nationalist types who are so concerned about preserving the white race, would put some of that energy into preserving the Kalash who are far more at risk of extinction and represent the last representatives we have of original whiteness.

With a mean IQ of 100, whites are one of the highest IQ groups on the planet (behind only Ashkenazi Jews and East Asians). But were their genetic IQs 100 from inception, or did they only become 100 after agriculture and civilization. We know for example that Native Americans and Arctic people score lower than their East Asian cousins, suggesting that the neolithic transition might have boosted IQ.

Thus I would predict that the Kalash (even if raised from birth in middle class Western society) would score at least 0.5 SD below conventional Whites.

It’s also interesting that the Kalash are some of the earliest migrants into South Asia. Is it possible that South Asians evolved from whites?

Human Benchmark tests Part 4: Answering reader questions

A reader stated provided a screenshot of his performance on

The reader states:, that website where you test your reaction speed, has a wide selection of other psychometric tests, I’d guess a composite score of all the tests would probably have a decently high g-loading. I just want some background info on these tests, if there is any.

As discussed in previous articles in this series, some of the tests (sequence memory, number memory) have their roots in conventional psychometric tests. Tests of reaction time date back to the 19th century work of Francis Galton who believed that basic neurological speed predicted intelligence. Unfortunately Galton’s research was derailed by a lack of reliability (he only used a one trial measure of reaction time) range restriction (his samples tended to be elite) and improper measures of intelligence with which to relate reaction time (he compared it with school grades since IQ tests had not yet been invented). As a result, he detected virtually no relationship between reaction time and intellect.

Nearly a century later Arthur Jensen would revisit Galton’s work, correcting for these problems. He found that when you aggregated many different kinds of reaction time (simple, complex, etc) measured both by speed and consistency (faster and less variable RTs imply higher intelligence) over many different trials, and compared with measures of IQ (not grades) and corrected for range-restriction, the results correlated a potent 0.7 with intelligence.

Unfortunately, the human benchmark test only uses simple reaction time (which is much less g loaded than complex RT), only one type of simple reaction time (an aggregate of several types is more g loaded) and only measures speed (variability is much more g loaded) and does not provide a composite score weighted to maximize g loading. As a result, on the whole the human benchmark tests seem inferior to the game THINKFAST which a bunch of us played circa 2000. So accurate was THINKFAST that the Prometheus society considered using it as an entrance requirement, with one internal study finding that one’s physiological limit on THINKFAST correlated a potent 0.7 with SAT scores in one small sample of academically homogenous people. Having people practice until hitting their physiological limit was a great way to neutralize practice effects because everyone must practice until their progress plateaus.

Sadly, this innovative research petered out when people worried that Thinkfast might give different results depending on the computer. People fantasized about Thinkfast being on a standardized handheld device so scores could be comparable, but in those days, few people imagined we’d one day all have iphones and ipads.

The reader continues:

I’ve also attached a screenshot of all my average scores, though I’ll note that some scores are inflated since I’ve done all the tests many times and I often don’t bother finishing the test if I do bad. The strange thing about these scores is that by more conventional measures both my verbal IQ and working memory are pretty average, yet I’m able to score above the 99.9 percentile on 2 of these tests. I think this points to the fact that memory is an ability that is much broader than most IQ models would suggest. Like the verbal memory test in particular, I seem to be using a very different part of my brain compared to more typical tests like digit span. I’d also wager that most of the variation in working memory can be explained by chunking/processing abilities rather than raw storage capacity.
Also, what does the strength of the practice effect really say about a test? None of these tests really have a pattern or trick to them, yet for some of them my score has improved a lot from the first time I did them.

This is an extremely important question. In complex cognitive tasks like chess or conventional IQ tests, practice improves performance because we learn strategies, but on elementary cognitive tasks like Human Benchmark and Thinkfast, fewer strategies are possible so one wonders if there’s an increase in raw brain power.

The analogy I make is heigt vs muscle. If I repeatedly had my height measured, I might score a bit higher with practice. Not because I was genuinely getting taller, but because I was learning subtle tricks like how to stand straighter. By contrast if I had my strength measured everyday, I’d show more increase, but this increase would not simply be because I acquired tricks to do better (how I position the barbells in my hands) but because a genuine increase in strength.

So is intelligence more analogous to height or physical strength (the latter being far more malleable)? Is the practice induced increase in Human Benchmark tests an acquired strategy (even a subconscious one) or a real improvement, and how do we even operationalize the difference?

If practicing elementary cognitive tasks really did improve intelligence we’d expect brain-training games to improve IQ, but apparently they do not. Jordan Peterson explains that the problem is that cognitive practice in one domain does not translate to other ones.

On the other hand, why should anyone expect brain training to transcend domains? When a weight lifter does bicep curls, he doesn’t expect it to make his legs any stronger, so why should someone practicing visual memory expect to see an increase in verbal memory, let alone overall IQ?

But how can we know if we’ve even improved a specific part of intelligence rather than just become more test savvy? We know that weight lifting has improved our strength, and not just our technique, because we can see our muscles getting bigger, so perhaps cognitive training games might make certain brain parts bigger.

The groundbreaking London Taxi Cab study, published in 2000, used MRI technology to compare the brains of experienced taxi cab drivers and bus drivers who drive the city streets of London every day. In contrast to bus drivers, whose driving routes are well-established and unchanging, London taxi drivers undergo extensive training to learn how to navigate to thousands of places within the city. This makes them an ideal group to use to study the effects of spatial experience on brain structure.

The study focused on the hippocampus, which plays a role in facilitating spatial memory in the form of navigation. The MRI revealed that the posterior hippocampi of the taxi drivers were much larger than that of the bus drivers (who served as the control subjects). Even more exciting was that the size of the hippocampus directly correlated with the length of time that someone was a taxi driver–the longer someone drove a taxi, the larger their hippocampus.

The London Taxi Cab Study provides a compelling example of the brain’s neuroplasticity, or ability to reorganize and transform itself as it is exposed to learning and new experiences. Having to constantly learn new routes in the city forced the taxi cab drivers’ brains to create new neural pathways “in response to the need to store an increasingly detailed spatial representation.” These pathways permanently changed the structure and size of the brain, an amazing example of the living brain at work.


Assuming the brains of the taxi drivers actually changed (as opposed to the sample changing because less spatially gifted drivers left the job) it might be possible to increase specific parts of intelligence, but since there are so many different parts, it’s perhaps impossible to ever increase overall intelligence (or overall brain size) by more than a trivial degree. We can improve our overall muscle mass because our muscles are outside or skeleton; by contrast our brains our inside our cranium so its growth is constrained. It could be that improving the size of one part of the brain requires a corresponding decrease in other parts, to avoid the overall brain from getting too big for its skull.

My research assistant 150 IQ Ganzir also weighed in on the reader’s questions, writing:

The first aspect of this score profile I noticed is the absence of any huge dips, the 10 on Number Memory notwithstanding, since a tiny change in raw score on that test can dramatically alter your percentile ranking. Given that all of this subject’s scores on the more IQ-like tests are well above average compared even to other HumanBenchmark users, who themselves are undoubtedly self-selected for superior proficiency on these types of tasks, we wouldn’t expect their reaction time to be particularly fast, but it is. Our subject appears to be a jack-of-all-trades, if you will, at these tasks. Simple reaction time has only a weak correlation of about -0.2 to -0.4 with IQ, according to Arthur Jensen on page 229 of The g factor. Note that the correlation is negative because a faster reaction speed implies a lower reaction time.

The commenter mentions: “I’ve also attached a screenshot of all my average scores, though I’ll note that some scores are inflated since I’ve done all the tests many times and I often don’t bother finishing the test if I do bad.” If true, this would indeed cause a statistical upward bias, but I have no idea how to even begin calculating the size of that. However, if the tests are reliable in the statistical sense, meaning they give similar scores with each administration, then the average score increase couldn’t be too large. But, then again, if the commenter was reaching nearly the same score every time, why would they restart on a bad run? High intra-test score variability might indicate executive functioning problems.

The commenter notes that their verbal IQ and working memory are “pretty average” on other tests, but their score on verbal memory here is so high relative to other HumanBenchmark users that the system just gives it 100th percentile without discriminating further. (I know that it can’t literally be 100th percentile, as I and several other people I know have achieved higher scores.) A possible contributing factor is that HumanBenchmark users may tend to have less than long attention spans, inhibiting performance on this test, on which reaching one’s potential may take quite a while, especially for higher scorers.

Our correspondent also writes: “Like the verbal memory test in particular, I seem to be using a very different part of my brain compared to more typical tests like digit span. I’d also wager that most of the variation in working memory can be explained by chunking/processing abilities rather than raw storage capacity.” Of course, I don’t think it’s possible to determine by introspection which part(s) of the brain you’re using on a given task, but I think I understand the subjective experience described here. As for chunking/processing abilities versus raw storage capacity, I’m not sure what’s implied here. The human brain could be described as a massively parallel computer, and it naturally processes things in chunks. If “chunking” refers to purposely learnt mnemonics, such as the mnemonic major system, then Goodhart’s Law applies here because learnt skills lose their g-loading.

The commenter thus wonders about the continued meaning of their scores: “Also, what does the strength of the practice effect really say about a test? None of these tests really have a pattern or trick to them, yet for some of them my score has improved a lot from the first time I did them.” Unfortunately, without studies of these tests specifically, we can’t know the extent to which Goodhart’s Law applies. Even analyses of seemingly similar tests from mainstream psychometrics wouldn’t be insufficient, since the HumanBenchmark versions are subtly but crucially different. All I can say is that only someone of uncommonly high cognitive capacity could produce this score profile regardless of how much time they spent practicing, and that, with no indication of how rare your scores are compared to the general population, greater precision is currently almost meaningless.

Scores on the “Chimp Test,” or at least the version on HumanBenchmark, are also almost meaningless because unlimited time is allowed to review the digits’ locations before answering, making it less a test of visual working memory and more a test of how long the testee is willing to stare at boxes. Also, most people will probably on average score higher on the HumanBenchmark “Number Memory” test than on the clinical version of the Digit Span test, since the former presents the digits simultaneously and allows a few seconds to mentally review them, whereas, in the latter, each digit is read only once with no opportunity for review.

Finally, the subject’s strong performances on Typing and Aim Trainer make me suspect a background in competitive computer gaming.

Human Benchmark tests part 3: Number memory

Obviously I can’t devote an article to every Human Benchmark test so I’m limiting myself to the best ones. One of the best is number memory.

Digit Span is measured by the largest number of digits a person can repeat without error on two consecutive trials after the digits have been presented at the rate of one digit per second, either aurally or visually. Recalling the digits in the order of presentation is termed forward digit span (FDS); recalling the digits in the reverse order of presentation is termed backward digit span (BDS). Digit Span is part of the Stanford Binet and of the Wechsler scales. Digit Span increases with age, from early childhood to maturity. In adults the average FDS is about 7; average BDS is about 5. I have found that Berkley students, whose average IQ is about 120; have an average FDS of between 8 and 9 digits.

The g Factor by Arthur Jensen, page 262

It should be noted that the Human Benchmark version of digit span does NOT include the Backwards version and shows all the digits at once for several seconds, not each one at a rate of one per second, and it only has one trial per level so there’s no room for error. For this reason I suggest taking your best score on your first two attempts.

So important is this test that it is one of the 10 subtests handpicked by David Wechsler himself for his original Wechsler scale, published in the 1930s.

Perhaps no test has been so widely used in scales of intelligence as that of Memory Span for Digits. It forms part of the original Binet Scale and all the revisions of it. It has been used for a long time by psychiatrists as a test of retentiveness and by psychologists in all sorts of psychological studies. Its popularity is based on the fact that it is easy to administer, easy to score, and specific as to the type of ability it measures. Nevertheless, as a test of general intelligence it is among the poorest. Memory span, whether for digits forward or backward, generally correlates poorly with other tests of intelligence. The ability involved contains little of g, and, as Spearman has shown, is more or less independent of this general factor.

The Measurement and Appraisal of ADULT INTELLIGENCE 5th edition, David Wechsler, 1958, page 70 to 71

On page 221 of The g Factor, Jensen notes that FDS and BDS have g loadings of about 0.30 and 0.60 respectively.

Wechsler goes on to explain that despite being a poor measure of intelligence overall, he included it in part because in his eyes, it’s a great measure of low intelligence: “Except in cases of special defects or organic disease, adults who cannot retain 5 digits forward and 3 backward will be found, in 9 cases out of 10, to be feeble-minded or mentally disturbed.”

The other reason he included it is he viewed it as an excellent measure of dementia.

I’m not convinced the test is better at low levels than at high levels. For example, Charles Krauthhammer towered with a spectacular of BDS of 12, and his genius is validated by the enormous influence he had over U.S. foreign policy.

In the below poll your level corresponds to the highest number of digits you correctly remembered on at least one of your first two attempts:

Human Benchmark tests Part 2: Converting Sequence memory to IQ

19% of my readers self-reported Human Benchmark sequence memory highest scores of level 21+ (after 10 attempts).

3% of my readers self-reported highest scores of 6 or less.

Evidence continues to accumulate showing that on a scale where Americans average IQ 100 (SD = 15), my global readership towers with an average IQ of 129 (SD = 19). Thus assuming a normal curve, the top 19% and bottom 3% should have IQs of 147+ and sub-98 respectively.

Assuming the sequence memory test is sufficiently g loaded, this implies level 21 = IQ 147 and level 6 = IQ 97.

Thus I would predict that a random sample of American youngish adults would average 6.84 (SD = 4.5).

Put simply:

IQ = 77 + (highest level obtained in first 10 tries)(3.33)

However one oddity about the self-reported data is that all of the people scoring 21+ score 24+. Nobody reported a score of 21 to 23. This suggest inaccuracy of self-reported data but it may also suggest that above level 21, the test starts measuring certain cognitive strategies and stops measuring g.