Human Benchmark tests Part 4: Answering reader questions

15 Monday Mar 2021

Posted by pumpkinperson in Uncategorized

A reader stated provided a screenshot of his performance on humanbenchmark.com.

The reader states:

… humanbenchmark.com, that website where you test your reaction speed, has a wide selection of other psychometric tests, I’d guess a composite score of all the tests would probably have a decently high g-loading. I just want some background info on these tests, if there is any.

As discussed in previous articles in this series, some of the tests (sequence memory, number memory) have their roots in conventional psychometric tests. Tests of reaction time date back to the 19th century work of Francis Galton who believed that basic neurological speed predicted intelligence. Unfortunately Galton’s research was derailed by a lack of reliability (he only used a one trial measure of reaction time) range restriction (his samples tended to be elite) and improper measures of intelligence with which to relate reaction time (he compared it with school grades since IQ tests had not yet been invented). As a result, he detected virtually no relationship between reaction time and intellect.

Nearly a century later Arthur Jensen would revisit Galton’s work, correcting for these problems. He found that when you aggregated many different kinds of reaction time (simple, complex, etc) measured both by speed and consistency (faster and less variable RTs imply higher intelligence) over many different trials, and compared with measures of IQ (not grades) and corrected for range-restriction, the results correlated a potent 0.7 with intelligence.

Unfortunately, the human benchmark test only uses simple reaction time (which is much less g loaded than complex RT), only one type of simple reaction time (an aggregate of several types is more g loaded) and only measures speed (variability is much more g loaded) and does not provide a composite score weighted to maximize g loading. As a result, on the whole the human benchmark tests seem inferior to the game THINKFAST which a bunch of us played circa 2000. So accurate was THINKFAST that the Prometheus society considered using it as an entrance requirement, with one internal study finding that one’s physiological limit on THINKFAST correlated a potent 0.7 with SAT scores in one small sample of academically homogenous people. Having people practice until hitting their physiological limit was a great way to neutralize practice effects because everyone must practice until their progress plateaus.

Sadly, this innovative research petered out when people worried that Thinkfast might give different results depending on the computer. People fantasized about Thinkfast being on a standardized handheld device so scores could be comparable, but in those days, few people imagined we’d one day all have iphones and ipads.

The reader continues:

I’ve also attached a screenshot of all my average scores, though I’ll note that some scores are inflated since I’ve done all the tests many times and I often don’t bother finishing the test if I do bad. The strange thing about these scores is that by more conventional measures both my verbal IQ and working memory are pretty average, yet I’m able to score above the 99.9 percentile on 2 of these tests. I think this points to the fact that memory is an ability that is much broader than most IQ models would suggest. Like the verbal memory test in particular, I seem to be using a very different part of my brain compared to more typical tests like digit span. I’d also wager that most of the variation in working memory can be explained by chunking/processing abilities rather than raw storage capacity.
Also, what does the strength of the practice effect really say about a test? None of these tests really have a pattern or trick to them, yet for some of them my score has improved a lot from the first time I did them.

This is an extremely important question. In complex cognitive tasks like chess or conventional IQ tests, practice improves performance because we learn strategies, but on elementary cognitive tasks like Human Benchmark and Thinkfast, fewer strategies are possible so one wonders if there’s an increase in raw brain power.

The analogy I make is heigt vs muscle. If I repeatedly had my height measured, I might score a bit higher with practice. Not because I was genuinely getting taller, but because I was learning subtle tricks like how to stand straighter. By contrast if I had my strength measured everyday, I’d show more increase, but this increase would not simply be because I acquired tricks to do better (how I position the barbells in my hands) but because a genuine increase in strength.

So is intelligence more analogous to height or physical strength (the latter being far more malleable)? Is the practice induced increase in Human Benchmark tests an acquired strategy (even a subconscious one) or a real improvement, and how do we even operationalize the difference?

If practicing elementary cognitive tasks really did improve intelligence we’d expect brain-training games to improve IQ, but apparently they do not. Jordan Peterson explains that the problem is that cognitive practice in one domain does not translate to other ones.

On the other hand, why should anyone expect brain training to transcend domains? When a weight lifter does bicep curls, he doesn’t expect it to make his legs any stronger, so why should someone practicing visual memory expect to see an increase in verbal memory, let alone overall IQ?

But how can we know if we’ve even improved a specific part of intelligence rather than just become more test savvy? We know that weight lifting has improved our strength, and not just our technique, because we can see our muscles getting bigger, so perhaps cognitive training games might make certain brain parts bigger.

The groundbreaking London Taxi Cab study, published in 2000, used MRI technology to compare the brains of experienced taxi cab drivers and bus drivers who drive the city streets of London every day. In contrast to bus drivers, whose driving routes are well-established and unchanging, London taxi drivers undergo extensive training to learn how to navigate to thousands of places within the city. This makes them an ideal group to use to study the effects of spatial experience on brain structure.
The study focused on the hippocampus, which plays a role in facilitating spatial memory in the form of navigation. The MRI revealed that the posterior hippocampi of the taxi drivers were much larger than that of the bus drivers (who served as the control subjects). Even more exciting was that the size of the hippocampus directly correlated with the length of time that someone was a taxi driver–the longer someone drove a taxi, the larger their hippocampus.
The London Taxi Cab Study provides a compelling example of the brain’s neuroplasticity, or ability to reorganize and transform itself as it is exposed to learning and new experiences. Having to constantly learn new routes in the city forced the taxi cab drivers’ brains to create new neural pathways “in response to the need to store an increasingly detailed spatial representation.” These pathways permanently changed the structure and size of the brain, an amazing example of the living brain at work.
Source

Assuming the brains of the taxi drivers actually changed (as opposed to the sample changing because less spatially gifted drivers left the job) it might be possible to increase specific parts of intelligence, but since there are so many different parts, it’s perhaps impossible to ever increase overall intelligence (or overall brain size) by more than a trivial degree. We can improve our overall muscle mass because our muscles are outside or skeleton; by contrast our brains our inside our cranium so its growth is constrained. It could be that improving the size of one part of the brain requires a corresponding decrease in other parts, to avoid the overall brain from getting too big for its skull.

My research assistant 150 IQ Ganzir also weighed in on the reader’s questions, writing:

The first aspect of this score profile I noticed is the absence of any huge dips, the 10 on Number Memory notwithstanding, since a tiny change in raw score on that test can dramatically alter your percentile ranking. Given that all of this subject’s scores on the more IQ-like tests are well above average compared even to other HumanBenchmark users, who themselves are undoubtedly self-selected for superior proficiency on these types of tasks, we wouldn’t expect their reaction time to be particularly fast, but it is. Our subject appears to be a jack-of-all-trades, if you will, at these tasks. Simple reaction time has only a weak correlation of about -0.2 to -0.4 with IQ, according to Arthur Jensen on page 229 of The g factor. Note that the correlation is negative because a faster reaction speed implies a lower reaction time.

The commenter mentions: “I’ve also attached a screenshot of all my average scores, though I’ll note that some scores are inflated since I’ve done all the tests many times and I often don’t bother finishing the test if I do bad.” If true, this would indeed cause a statistical upward bias, but I have no idea how to even begin calculating the size of that. However, if the tests are reliable in the statistical sense, meaning they give similar scores with each administration, then the average score increase couldn’t be too large. But, then again, if the commenter was reaching nearly the same score every time, why would they restart on a bad run? High intra-test score variability might indicate executive functioning problems.

The commenter notes that their verbal IQ and working memory are “pretty average” on other tests, but their score on verbal memory here is so high relative to other HumanBenchmark users that the system just gives it 100th percentile without discriminating further. (I know that it can’t literally be 100th percentile, as I and several other people I know have achieved higher scores.) A possible contributing factor is that HumanBenchmark users may tend to have less than long attention spans, inhibiting performance on this test, on which reaching one’s potential may take quite a while, especially for higher scorers.

Our correspondent also writes: “Like the verbal memory test in particular, I seem to be using a very different part of my brain compared to more typical tests like digit span. I’d also wager that most of the variation in working memory can be explained by chunking/processing abilities rather than raw storage capacity.” Of course, I don’t think it’s possible to determine by introspection which part(s) of the brain you’re using on a given task, but I think I understand the subjective experience described here. As for chunking/processing abilities versus raw storage capacity, I’m not sure what’s implied here. The human brain could be described as a massively parallel computer, and it naturally processes things in chunks. If “chunking” refers to purposely learnt mnemonics, such as the mnemonic major system, then Goodhart’s Law applies here because learnt skills lose their g-loading.

The commenter thus wonders about the continued meaning of their scores: “Also, what does the strength of the practice effect really say about a test? None of these tests really have a pattern or trick to them, yet for some of them my score has improved a lot from the first time I did them.” Unfortunately, without studies of these tests specifically, we can’t know the extent to which Goodhart’s Law applies. Even analyses of seemingly similar tests from mainstream psychometrics wouldn’t be insufficient, since the HumanBenchmark versions are subtly but crucially different. All I can say is that only someone of uncommonly high cognitive capacity could produce this score profile regardless of how much time they spent practicing, and that, with no indication of how rare your scores are compared to the general population, greater precision is currently almost meaningless.

Scores on the “Chimp Test,” or at least the version on HumanBenchmark, are also almost meaningless because unlimited time is allowed to review the digits’ locations before answering, making it less a test of visual working memory and more a test of how long the testee is willing to stare at boxes. Also, most people will probably on average score higher on the HumanBenchmark “Number Memory” test than on the clinical version of the Digit Span test, since the former presents the digits simultaneously and allows a few seconds to mentally review them, whereas, in the latter, each digit is read only once with no opportunity for review.

Finally, the subject’s strong performances on Typing and Aim Trainer make me suspect a background in competitive computer gaming.

24 thoughts on “Human Benchmark tests Part 4: Answering reader questions”

in reality college entrance exams are IQ tests for the above average. so a really low score doesn't mean you're retarded, but a really high score does mean... said:

March 15, 2021 at 8:52 pm

peepee posts air jordan peterson when he agrees with her but not when he (former harvard professor of psychology) says college entrance exams are IQ tests.

Reply
- Teffec P. said:
  
  March 16, 2021 at 9:36 pm
  
  They’re meant to estimate intelligence, so they can be classified as IQ tests. They’re just more narrow in their scope than comprehensive, 1-on-1 tests and are therefore worse tools. They make sense as a cost-effective way for mass screening, though, and their composition has politically correct ideals in mind. The writing portion of the SAT was added to bolster women’s scores, and it, along with verbal analogies, was probably removed to help low-TIE minorities.
  
  Where Peterson was off was claiming that this means the “IQs” of these students are accurately represented by their anomalously high scores on such tests. He’s a right-wing example of a verbally-inclined intellectual whose fluency and diction sometimes belie higher comprehension. Some of these folks who grew up playing with thesauruses instead of toys get carried away making elegant and convincing yet specious and logically bankrupt/mathematically naive arguments. They’re usually left-wing, though.
  
  Reply
  - Vegan DHA said:
    
    March 18, 2021 at 10:18 am
    
    Based on this and a previous comment of yours here, I suspect I really like the way you think.
    And regarding this specific comment, I also agree 100%. I’m tired (though I’d like to just not care) of people thinking Peterson is a genius. He is well-educated (in the formal sense) and that’s perhaps the best I can say about his mind. Perhaps if he worked on himself (it would take time), his judgment would become more accurate.
    (I don’t get why Bruno thinks Peterson and Ben Shapiro are super smart even considering he’s right-wing)
  - Teffec P. said:
    
    March 20, 2021 at 10:33 pm
    
    I mean I think Peterson is very intelligent and fabulously articulate, and he has acknowledged having relatively deficient mathematical abilities. I believe he’s well-intentioned and good for society, but he’s fallible and a little pretentious like most academics.
    
    Ambivalent about Shapiro but he’s obviously super smart. Skipped multiple grades, accomplished violinist, attended Harvard Law, incisive speech
  - Vegan DHA said:
    
    March 21, 2021 at 8:19 am
    
    In a way, I don’t disagree with you, both of them have some impressive cognitive skills, though you (and perhaps even myself) would probably win in a debate with either of them.
    I do think that Peterson is unlikely to be useful to society. I think his mental health issues are too severe for him to outsmart them when forming opinions and re Shapiro: Either that too (mental health) or just his conservatism is so deep that, well, same as with Peterson when forming opinions. Remember the Aquaman incident?
    Like, people make fun of Wittgenstein and Heidegger, but as faulty as their thinking might have been (I don’t have an opinion), they were 1000 times the thinkers that P and S are.
  - Teffec P. said:
    
    March 25, 2021 at 3:34 am
    
    I’m not familiar with the Aquaman incident? I don’t follow him particularly closely, but I appreciate Peterson as a credentialed academic who can articulately back up the essence of right-wing ideology: that some intergroup inequity is inevitable and pretending otherwise results in a net loss. He still annoys me sometimes, and it’s true that many shrinks, ironically, are nuts. He’s just way more sane and less annoying than practically all of his major detractors IMO. I believe that a lot of people would attest to his writings having a great positive impact on their lives also – definitely more than most of his pink-haired, sanctimonious social science colleagues can probably claim to have helped or inspired.
    
    Shillpiro is a huckster. I don’t really think of him as an intellectual, but he’s certainly got a good brain.
  - Teffec P. said:
    
    March 25, 2021 at 3:37 am
    
    *plausibly claim
  - Vegan DHA said:
    
    March 26, 2021 at 7:44 am
    
    Thanks for the thoughtful response! The Aquaman “incident” is this: https://www.youtube.com/watch?v=DHZFwZ-a8kI
name redacted by pp, march 15, 2021 said:

March 15, 2021 at 9:08 pm

it’s called “clearasil” peepee.

the master slave dialectic:

1. the autist, like woz, is a type of slave. a slave useful for mental labor. black africans cut cain. autists code. inter alia.

2. the promotion of IQ as a measure of all men neglects that men are divided into master and slave.

3. for non autists this may seem like wishful thinking…but then they’ve never experienced the woz type…the natural born slave…the nerd…etc.

4. the distinction between physical labor and mental labor is not congruent with slave and master.

Reply
The Philosopher said:

March 15, 2021 at 10:51 pm

TLDR

Reply
RaceRealist said:

March 16, 2021 at 12:16 am

What is “raw brain power”?

“If I repeatedly had my height measured, I might score a bit higher with practice. Not because I was genuinely getting taller, but because I was learning subtle tricks like how to stand straighter.”

If you go to a doctor then the error on height measurement will be smaller. With good posture, one’s true height will be observed. Saying there is a “true test score” on these nonsense psychometric tests is ridiculous.

Note how PP cites the charlatan Peterson and references a Vox article on brain training and doesn’t deal with other research showing the opposite.

The Malleability of IQ

“We know that weight lifting has improved our strength, and not just our technique”

Bad technique means that no real strength gains will made. Notice how IQ-ists always try to relate “IQ” to actual physical measures like muscle size/strength and height.

Why is ganzir’s “IQ” relevant at all? Is it causally efficacious or is it an outcome? What’s the argument that it’s a cause and not an outcome?

Reply
- Ganzir said:
  
  March 16, 2021 at 3:54 am
  
  Why is ganzir’s “IQ” relevant at all?
  
  Did you know that Muslims are supposed to praise Allah during every one of the 5 daily prayers? It’s kind of like that. Peepee loves me and doesn’t want me to leave him, so he makes sure to call me 150 IQ Ganzir every time he writes my name.
  
  (I’m just joking pepe, pls don’t leave me)
  
  Reply
  - Marty gras said:
    
    April 2, 2021 at 2:00 am
    
    I liked u better when I throught u were the blonde girl in the pic
  - Ganzir said:
    
    April 2, 2021 at 10:30 am
    
    I liked you better 5 seconds ago when I didn’t know you existed.
  - pumpkinperson said:
    
    April 2, 2021 at 10:43 am
    
    LOL
LOADED said:

March 16, 2021 at 2:38 am

So you let Pill insult your posts and you dont let me comment about pizza? Wth PP I thought we wuz frandz.

Reply
JC said:

March 16, 2021 at 12:34 pm

“On the other hand, why should anyone expect brain training to transcend domains? When a weight lifter does bicep curls, he doesn’t expect it to make his legs any stronger, so why should someone practicing visual memory expect to see an increase in verbal memory, let alone overall IQ?”

Although somewhat less analogous to intellectual domain gains transference, the subject of the linked article (which, in literal terms alone is quite remarkable) sparks the imagination – this one, at least – with thoughts of translatability to cognitive enhancement.

https://www.sciencedaily.com/releases/2020/10/201022112555.htm#:~:text=Summary%3A,%2D%2D%20without%20even%20moving%20it

Reply
- RaceRealist said:
  
  March 16, 2021 at 1:11 pm
  
  Makes sense to me. This is why weight training when one, for instance, has an injured arm, is imperative to stave off muscle loss and make the eventual rehab easier when it comes to it.
  
  Reply
? said:

March 16, 2021 at 4:22 pm

Wow, I really just got a few thousand words of academic writing and a whole cognitive profiling for $15… That’s less than my average uber eats order.

Some thoughts:
– I’ve always wanted to try the THINKFAST thing since I saw it on another article of yours, too bad it looks like it’s no longer available.
– I think there’s an obvious evolutionary reason for why IQ is probably more like height than muscle mass. Since, unlike IQ, there’s big evolutionary advantage with being able to vary muscle mass, so you can conserve energy when strength is not needed. Not really a use in being dumber when you don’t use your brain. Although, being able to prioritize certain cognitive skills with our limited brain capacity is very advantageous, and that’s exactly what we seem to be are best at.
– About my question on the practice effect, I was really asking what types of tests tend to see more improvement from practice; are they more/less g-loaded, more verbal/spatial, more/less culturally loaded, etc. An answer to that question could help us in distinguishing test savvy and real improvement in aptitude.
– On Ganzir’s response to my comment on chunking, of course processing and storage are inseparably linked, but there’s still two distinct abilities at play. Chunking basically reduces the amount of information you’re actually remembering, like compressing a file. But large differences exist between how well one can compress different types of information, and therefore the “g-factor” of memory (how well you can remember any information) would be the capacity for completely distinct bits of information. The fact that most memory tests only involve one type of information, in one long string, introduces a lot of variability in how good someone is at dealing with a certain type of information. (Although strangely number memory seem to have little to do with math IQ in my case, since I’m terrible at digit span while math IQ is probably my strongest area)
– Ganzir’s second paragraph got me wondering about how psychometrics treats retest reliability. A lot of tests are thrown out for their low reliability but even if that does make it a bad instrument of measure, it doesn’t mean the task isn’t a good display of cognitive ability. Some tasks may just be variable by their nature, and the abilities those tests uniquely require are underrepresented in composite tests that aim for reliability.
– Also, yes, I do play competitive PC games, though not as much as I use to.

Reply
- Ganzir said:
  
  March 25, 2021 at 10:09 pm
  
  – I’ve always wanted to try the THINKFAST thing since I saw it on another article of yours, too bad it looks like it’s no longer available.
  
  I’ve searched and asked around quite a bit in hopes of obtaining a copy, but no luck. Pepe, did you ever find a copy?
  
  Ganzir’s second paragraph got me wondering about how psychometrics treats retest reliability. A lot of tests are thrown out for their low reliability but even if that does make it a bad instrument of measure, it doesn’t mean the task isn’t a good display of cognitive ability. Some tasks may just be variable by their nature, and the abilities those tests uniquely require are underrepresented in composite tests that aim for reliability.
  
  That’s why Francis Galton didn’t discover a correlation between reaction time and other indicators of intelligence. Reaction time has high intra-testee variability, so you need to take many measurements per person to get a good indicator of their true reaction time, which Galton didn’t do.
  
  Reply
antjuanfinch said:

March 17, 2021 at 12:46 am

I’d imagine that retention of gained ability would be highly correlated with initial working memory differences. So those who are above average in working memory might be more likely to make significant humanbenchmark gains *and* retain those gains.

And I think your analogy between muscles and the brain might have broken down because the comparison to make might not have been with size, but density. Through exercise, one develops denser tissues, and through cognitive training, one might develop denser neural networks. But of course, the amount of development here would probably be highly dependent on how trainable one’s brain was, to begin with.

Reply
- Marty gras said:
  
  April 2, 2021 at 2:03 am
  
  Human benchmark isn’t iq. IQ is a fairy tale written by the man who wrote the bible and the Quran.
  
  Reply
Muricaman said:

April 2, 2021 at 1:36 am

The verbal part is so ridiculously easy to score high on probably because a lot of foreigners take the test

Reply
HammerJho said:

September 23, 2021 at 9:52 pm

I know it’s been a while since you’ve made this post but the cognitivefun.net version of the test could be interesting to give to your readers. It uses serial presentation so it is closer to the clinical version and also offers a backwards span test.

Also, you’ve talked about how much more g-loaded choice reaction time is than simple reaction time and there is a toolbox that has a version of the test. Here are the links for them.

http://cognitivefun.net/

https://www.psytoolkit.org/lessons/simple_choice_rts.html

Reply