• About

Pumpkin Person

~ The psychology of horror

Pumpkin Person

Tag Archives: WAIS

Why the high reliability of IQ tests is misleading

27 Monday Jul 2020

Posted by pumpkinperson in Uncategorized

≈ 214 Comments

Tags

height, reliability, WAIS, WAIS-IV

Anyone who’s taken multiple intelligence tests knows dramatically the scores can vary. For example, we have commenters on this blog who claim as much as 2 standard deviation gaps (2 SD) between their SAT scores and their Wechsler scores. Imagine if two different stadiometers gave a 2 SD difference in height (that’s over 5 inches!) . Why is a level of imprecision that would never be tolerated in the hard sciences handwaved away in psychometrics, and what do we do to fix it?

At first glance IQ tests seem incredibly reliable as evidenced by the 0.98 reliability (standard error (SE) of 2 points) reported for the WAIS-IV. But how was this number arrived at? For most subtests, reliability was measured by randomly dividing the subtest in half (odd vs even items), taking the correlation between both halves, and then correcting the correlation for the full length of the subtest. Once they have the reliabilities for all the individual subtests, they then combine them into a composite reliability for the entire scale.

But if the subtest level reliability is calculated by randomly dividing the subtests items into odd or even numbered items, why not calculate the full-scale IQ reliability by dividing the subtests into odd or even numbered subtests? The WAIS-IV might be an extremely reliable measure of how smart you are on the abilities measured by the WAIS-IV, but are the abilities measured by the WAIS-IV a representative sample of all cognitive abilities?

Unlike the WAIS-IV, the original WAIS was arguably a pretty representative sample of human cognition. Although there was some selection bias for subtests that correlated well with other subtests, for the most part Wechsler just wanted a very diverse group of subtests that were easy to administer, fun to take, and provided clinical insights into how people think.

A psychotic mental defective obtained the following scores on the original WAIS (keep in mind that subtest scores have a mean of 10 and an SD of 3, unlike the verbal, performance and full-scale IQ’s that have a mean of 100 and an SD of 15)

Information4
Comprehension5
Arithmetic3
Similarities3
Digit Span9
Vocabulary4
Digit Symbol0
Picture Completion8
Block Design4
Picture Arrangement4
Object Assembly7
Verbal IQ71
Performance IQ66
Full Scale IQ67

So using my favorite standard deviation calculator, we find this person has a mean subtest score of 4.64 with an SD of 2.54. Now because there are 11 subtests, we divide this SD by the square root of 11, which gives a standard error (SE) of 0.77. What that means is that assuming the 11 WAIS subtests are equivalent to a random sample of all cognitive abilities, then this person’s true average level of functioning has about a 2/3rd chance of falling anywhere from a scaled score of 3.87 and a scaled score of 5.41 (+/- 1 SE). For his age (17) on the original WAIS, this equates to a true IQ range of 61 to 72, implying an SE of 5.5! (more than twice as high as the SE claimed by the WAIS-IV based on a misleading definition of reliability)

We can arguably say with 95% certainty that if the WAIS included every cognitive ability possessed by the human brain, his full-scale IQ would be anywhere from 56 to 78 (+/- 1.96 SE). But that’s a bit like saying his height is anywhere from 5’2″ to 5’6″. Unless this person has an abnormal amount of subtest scatter, it may take an IQ test with over 60 subtests for IQ to have a meaningful reliability as high as height’s.

Share this:

  • Click to share on X (Opens in new window) X
  • Click to share on Facebook (Opens in new window) Facebook
Like Loading...

A cross-cultural study of university students

02 Wednesday Oct 2019

Posted by pumpkinperson in Uncategorized

≈ 20 Comments

Tags

Block Design, cross-cultural IQ, Digit Span, IQ, race, South Africa, university, WAIS

A 2015 paper by Kate Cockcroft et al., compares the scores of 349 British middle class university undergrads to 107 lower class black South African undergrads on the WAIS-III (UK edition).

The results were as follows:

The UK students averaged a full-scale IQ of 106.95 (UK norms) while the South Africans averaged IQ 93.27. However because this study was published 18 years after the UK WAIS-III was published, we should adjust for the Flynn effect.

The single best source on recent Wechsler Flynn effects is Weiss et al., 2015  which found that full-scale IQ has been increasing by 0.31 points per year, at least in U.S. children. If we assume it’s the same for UK adults, then the UK students have an adjusted IQ of 101 and the South African students have an adjusted IQ of 88.

What’s more interesting to me is how they did on the culture reduced tests since that’s the more fair comparison.

 
Test: Digit Span scaled score Flynn adjusted Digit Span  scaled score Flynn adjusted Digit Span IQ equivalent Block Design scaled score Flynn adjusted Block Design scaled score Flynn adjusted Block Design IQ equivalent Compoite IQ based on adjusted scores on both tests
UK undergrads 9.5 9.32 97 10.66 9.76 99 98
Black South African undergrads 9.35 9.17 96 8.67 7.77 89 91

So on a composite score of the most culture reduced spatial & non-spatial test (Block Design & Digit Span), Black South African undergrads average IQ 91. This is 11 points higher than the average Black South African seems to score on the same culture reduced tests.

As of 2013, only 16% of South Africa’s black young adults were attending higher education (compared to about 55% of whites, 47% of Indians and 14% of Coloureds). Thus, simply attending university puts one in the top 16% of this demographic, with the median South African university student being in the top 8%. If there were a perfect correlation between IQ and education, the median South African black university student would have an IQ 21 points higher than the average black South African. In reality his IQ is only 11 points higher, suggesting a correlation of 0.52 (at least on the most culture reduced tests).

This is similar to the 0.57 correlation between IQ and education observed in the United States.

Share this:

  • Click to share on X (Opens in new window) X
  • Click to share on Facebook (Opens in new window) Facebook
Like Loading...

North American IQ: circa 1937 to circa 2014

17 Saturday Aug 2019

Posted by pumpkinperson in Uncategorized

≈ 21 Comments

Tags

Alan Kaufman, brain size, Flynn effect, IQ, neuroplasticity, WAIS, Wechsler Bellevue

The Flynn effect, popularized by James Flynn, refers to the fact that IQ tests supposedly get easier with time. Although by definition the average IQ of American or British (white) people is always 100, the older the IQ test, the easier it is to score 100. Thus to keep the average at 100, tests like the Wechsler must be renormed every 10 years or so, otherwise the average IQ would increase by about 3 points per decade.

Although scholars continue to debate whether the Flynn effect reflects a genuine increase in intelligence (perhaps caused by prenatal nutrition or mental stimulation) or just greater test sophistication caused by modernity, there’s been remarkably little skepticism about the existence of the Flynn effect itself.

Malcolm Gladwell writes:

If an American born in the nineteen-thirties has an I.Q. of 100, the Flynn effect says that his children will have I.Q.s of 108, and his grandchildren I.Q.s of close to 120—more than a standard deviation higher. If we work in the opposite direction, the typical teen-ager of today, with an I.Q. of 100, would have had grandparents with average I.Q.s of 82—seemingly below the threshold necessary to graduate from high school. And, if we go back even farther, the Flynn effect puts the average I.Q.s of the schoolchildren of 1900 at around 70, which is to suggest, bizarrely, that a century ago the United States was populated largely by people who today would be considered mentally retarded.

While few people believe our grandparents were genuinely mentally retarded, it’s taken for granted that they would have scored in the mentally retarded range by today’s standards.

But is this true? I began having doubts over a decade ago when I examined the items on the first Wechsler intelligence scale ever made: the ancient WBI (Wechsler Bellevue intelligence scale). Meticulously normed on New Yorkers in the 1930s, this test remains far and away the most comprehensive look we have at early 20th century white North American intelligence, and while some of the subtests looked easy by today’s standards, others, especially vocabulary, looked harder.

The Kaufman effect

What also struck me was how little instruction, probing or coaching people got when taking the ancient WBI, compared to its modern descendant the WAIS-IV. This matters a lot because the way the Flynn effect is calculated on the Wechsler is by giving a new sample of people both the newest Wechsler and its immediate predecessor, in random order to cancel out practice effects, and then seeing which version they score higher on. If they average 3 points lower on the WAIS-IV normed in 2006 than on the WAIS-III normed in 1995, it’s assumed IQ increased by 3 points in 11 years.

The problem with this method (as Alan Kaufman may have discovered before me) is that the subset of the sample that took the newer version first has a huge advantage on the older version compared to the norming sample of the older test (over and above the practice effect which is controlled for), because the norming sample of the older test was never given coaching and probing.

Statistical artifact

A Promethean once said maybe the Flynn effect is just a statistical artifact of some kind. He never told me what he meant, but it got me thinking:

One problem with how the Flynn Effect is calculated on the Wechsler is that it’s assumed that gains over time can be added. For example it’s assumed that you can add the supposed 7.8 IQ gain from WAIS normings 1953.5 -1978 to the 4.2 IQ gain from normings 1978 – 1995 to the 3.7 IQ gain from normings 1995-2006, for a grand total of 15.7 IQ points from normings 1953.5 – 2006.

This would make sense if he were talking about an absolute scale like height, but is problematic when talking about a sliding scale like IQ. For example, suppose the raw number of questions correctly answered in 1953.5 was 20 with an SD of 2. By 1953.5 standards, 20 = IQ 100 and every 2 points = 15 IQ points above or below 100. Now suppose in 1978, people averaged 22 with an SD of 1. That’s a gain of 15 IQ points by 1953.5 standards. Now suppose in 1995 people average 23 with an SD of 2. That’s a gain of 15 IQ points by 1978 standards. Adding the two gains together implies a 30 point gain from 1953.5 to 1995, but by both 1953 and 1993 standards, the difference is only 23 points.

Changing content

Another problem with studying the Flynn effect is the content of tests like the Wechsler is constantly changing. This is especially problematic when studying long-term trends in general knowledge and vocabulary. If words that are obscure in the 1950s become popular in the 1970s, then people in the 1970s will score high on the 1950s vocabulary test. Meanwhile the 1970s vocabulary test may contain words that don’t become popular until the 1990s, Thus adding the vocabulary gains from the 1950s to the 1970s to the gains from the 1970s to the 1990s, might give the false impression that people in the 1990s will do especially well on a 1950s vocabulary test, when in reality, many words from the 1950s may have peaked in the 1970s and are even more obscure in the 1990s than they were in the 1950s.

An ambitious study

Given the Kaufman effect, the statistical artifact, and changing content, I realized the only way to truly understand the Flynn effect is to take the oldest quality IQ test I could find and replicate its original norming on a modern sample.

In 2008 I made it my mission to replicate Wechsler’s 1935-1938 norming of the very first Wechsler scale. Ideally I should have flown to New York where Wechsler had normed his original scale, but if Wechsler could use white New Yorkers as representative of all of white America (WWI IQ tests showed white New Yorkers matched the national white average), I could use white Ontarians as representative of all of white North America (indeed white Americans and white Canadians have virtually the same IQs). The target age group was 20-34 because this was the reference age group Wechsler had used to norm his subtests.

It took over a decade but I was gradually able to arrange for 15 randomly selected white young adults to take the one hour test. They were non-staff recruited from about half a dozen fast food locations in lower to upper middle class urban and suburban Ontario. The final sample was not perfectly representative of white North America (they were a bit less educated and much less female) and testing conditions were not optimum (environments were sometimes noisy, at least one person had a few beers before testing; another was literally falling asleep during the test) and 15 people is way to small a sample to draw statistically significant conclusions about 11 different subtests. One man with a conspicuously low score was removed from the sample because he had suffered a stroke.

Nonetheless, the below table shows how whites tested in 2008 to 2019 compared to Wechsler’s 1935-1938 sample, with the last column showing the expected scores of the 21st century sample, extrapolating gains James Flynn calculated from 1953.5 to 2006 (see page 240 of his book Are We Getting SMARTER?) to the current study: circa 1937 to circa 2013.5.

Note: the 11 subtests were scaled to have a mean of 10 and an SD of 3 in the original young adult norming sample, while the verbal, performance and full-scale IQs were scaled to have a mean of 100 and an SD of 15. Note also that vocabulary is alternate test, not used to calculate either verbal or full-scale IQ on the WBI. One third of my sample did not take Digit Symbol so for these, Performance and full-scale IQs were calculated via prorating.

 
Test: Nationally representative sample of young white adults (NY, 1935 to 1938) Randomish sample of young white adults (2008 to 2019, ON, Canada) Expected WBI scores in 2008-2019 based on Flynn’s calculated rate of increase
Information (general knowledge test) 10 (SD 3) 8.07 ( SD 2.6) 12.3
Similarites (verbal abstract reasoning) 10 (SD 3) 12.93 (SD 2.94) 15.54
Arithmetic (mental math) 10 (SD 3)

7.2 (SD 3.78)

(this subtest contained a unit conversion item that seemed biased against Canadians)

11.02
Vocabulary 10 (SD 3) 8.73 (SD 2.6) 14.95
Comprehension (Common sense & social judgement) 10 (SD 3) 9.33 (SD 3.2) 13.93
Digit Span (attention & rote memory) 10 (SD 3) 9.47 (SD 2.23) 11.46 
Picture Completion (visual alertness) 10 (SD 3) 10.47 (SD 3.16) 14.52
Picture Arrangement (social interpretation) 10 (SD 3) 9.8 (SD 2.54) 13.35
Block Design (spatial organization) 10 (SD 3) 12.53 (SD 3.07) 12.91
Object Assembly (spatial integration) 10 (SD 3) 11.47 (SD 1.77) 14.06
Digit Symbol (Rapid eye-hand coordination) 10 (SD 3)

10.8 (SD 2.82)

(note: only 10 of the 15 subjects took this subtest)

14.66

Verbal IQ

100 (SD 15) 99.8 (SD 14.46)  
Performance IQ 100 (SD 15) 106.47 (SD 12.11)  
Full-scale IQ 100 (SD 15) 103.4 (SD 13.63) 122

Conclusion

The Flynn effect is dramatically smaller than we’ve been led to believe, at least on tests of specific information that may become obscure over generations. By contrast certain verbal skills (categorizing) and spatial analysis have indeed increased by amounts comparable with Flynn’s research. It’s unclear if these are nutritional gains caused by increasing brain size, neuroplastic gains caused by cultural stimulation, or mere teaching to the test caused by schooling, computers and brain games.

Share this:

  • Click to share on X (Opens in new window) X
  • Click to share on Facebook (Opens in new window) Facebook
Like Loading...

contact pumpkinperson at easiestquestion@hotmail.ca

Recent Comments

RaceRealist's avatarRaceRealist on Which better predicts populati…
austin slater's avataraustin slater on Which better predicts populati…
RaceRealist's avatarRaceRealist on Which better predicts populati…
God's Word's avatarGod's Word on Which better predicts populati…
pumpkinperson's avatarpumpkinperson on Which better predicts populati…
RaceRealist's avatarRaceRealist on Which better predicts populati…
RaceRealist's avatarRaceRealist on Which better predicts populati…
RaceRealist's avatarRaceRealist on Which better predicts populati…
RaceRealist's avatarRaceRealist on Which better predicts populati…
RaceRealist's avatarRaceRealist on Which better predicts populati…
RaceRealist's avatarRaceRealist on Which better predicts populati…
Anime's avatarAnime on Which better predicts populati…
Anime's avatarAnime on Which better predicts populati…
Anime's avatarAnime on Which better predicts populati…
Anime's avatarAnime on Which better predicts populati…

Archives

  • December 2025
  • November 2025
  • October 2025
  • September 2025
  • August 2025
  • July 2025
  • June 2025
  • May 2025
  • April 2025
  • March 2025
  • February 2025
  • January 2025
  • December 2024
  • November 2024
  • October 2024
  • September 2024
  • August 2024
  • July 2024
  • June 2024
  • May 2024
  • April 2024
  • March 2024
  • February 2024
  • January 2024
  • December 2023
  • November 2023
  • October 2023
  • September 2023
  • August 2023
  • July 2023
  • June 2023
  • May 2023
  • April 2023
  • March 2023
  • February 2023
  • January 2023
  • December 2022
  • November 2022
  • October 2022
  • September 2022
  • August 2022
  • July 2022
  • June 2022
  • May 2022
  • April 2022
  • March 2022
  • February 2022
  • January 2022
  • December 2021
  • November 2021
  • October 2021
  • September 2021
  • August 2021
  • July 2021
  • June 2021
  • May 2021
  • April 2021
  • March 2021
  • February 2021
  • January 2021
  • December 2020
  • November 2020
  • October 2020
  • September 2020
  • August 2020
  • July 2020
  • May 2020
  • April 2020
  • March 2020
  • February 2020
  • January 2020
  • December 2019
  • November 2019
  • October 2019
  • September 2019
  • August 2019
  • July 2019
  • June 2019
  • May 2019
  • April 2019
  • March 2019
  • February 2019
  • January 2019
  • December 2018
  • November 2018
  • October 2018
  • September 2018
  • August 2018
  • July 2018
  • June 2018
  • May 2018
  • April 2018
  • March 2018
  • February 2018
  • January 2018
  • December 2017
  • November 2017
  • October 2017
  • September 2017
  • August 2017
  • July 2017
  • June 2017
  • May 2017
  • April 2017
  • March 2017
  • February 2017
  • January 2017
  • December 2016
  • November 2016
  • October 2016
  • September 2016
  • August 2016
  • June 2016
  • February 2016
  • January 2016
  • November 2015
  • May 2015
  • January 2015
  • December 2014
  • November 2014
  • October 2014

Categories

  • ethnicity
  • heritability
  • income
  • Oprah
  • Uncategorized

Meta

  • Create account
  • Log in
  • Entries feed
  • Comments feed
  • WordPress.com

Recent Comments

RaceRealist's avatarRaceRealist on Which better predicts populati…
austin slater's avataraustin slater on Which better predicts populati…
RaceRealist's avatarRaceRealist on Which better predicts populati…
God's Word's avatarGod's Word on Which better predicts populati…
pumpkinperson's avatarpumpkinperson on Which better predicts populati…
RaceRealist's avatarRaceRealist on Which better predicts populati…
RaceRealist's avatarRaceRealist on Which better predicts populati…
RaceRealist's avatarRaceRealist on Which better predicts populati…
RaceRealist's avatarRaceRealist on Which better predicts populati…
RaceRealist's avatarRaceRealist on Which better predicts populati…
RaceRealist's avatarRaceRealist on Which better predicts populati…
Anime's avatarAnime on Which better predicts populati…
Anime's avatarAnime on Which better predicts populati…
Anime's avatarAnime on Which better predicts populati…
Anime's avatarAnime on Which better predicts populati…

Archives

  • December 2025
  • November 2025
  • October 2025
  • September 2025
  • August 2025
  • July 2025
  • June 2025
  • May 2025
  • April 2025
  • March 2025
  • February 2025
  • January 2025
  • December 2024
  • November 2024
  • October 2024
  • September 2024
  • August 2024
  • July 2024
  • June 2024
  • May 2024
  • April 2024
  • March 2024
  • February 2024
  • January 2024
  • December 2023
  • November 2023
  • October 2023
  • September 2023
  • August 2023
  • July 2023
  • June 2023
  • May 2023
  • April 2023
  • March 2023
  • February 2023
  • January 2023
  • December 2022
  • November 2022
  • October 2022
  • September 2022
  • August 2022
  • July 2022
  • June 2022
  • May 2022
  • April 2022
  • March 2022
  • February 2022
  • January 2022
  • December 2021
  • November 2021
  • October 2021
  • September 2021
  • August 2021
  • July 2021
  • June 2021
  • May 2021
  • April 2021
  • March 2021
  • February 2021
  • January 2021
  • December 2020
  • November 2020
  • October 2020
  • September 2020
  • August 2020
  • July 2020
  • May 2020
  • April 2020
  • March 2020
  • February 2020
  • January 2020
  • December 2019
  • November 2019
  • October 2019
  • September 2019
  • August 2019
  • July 2019
  • June 2019
  • May 2019
  • April 2019
  • March 2019
  • February 2019
  • January 2019
  • December 2018
  • November 2018
  • October 2018
  • September 2018
  • August 2018
  • July 2018
  • June 2018
  • May 2018
  • April 2018
  • March 2018
  • February 2018
  • January 2018
  • December 2017
  • November 2017
  • October 2017
  • September 2017
  • August 2017
  • July 2017
  • June 2017
  • May 2017
  • April 2017
  • March 2017
  • February 2017
  • January 2017
  • December 2016
  • November 2016
  • October 2016
  • September 2016
  • August 2016
  • June 2016
  • February 2016
  • January 2016
  • November 2015
  • May 2015
  • January 2015
  • December 2014
  • November 2014
  • October 2014

Categories

  • ethnicity
  • heritability
  • income
  • Oprah
  • Uncategorized

Meta

  • Create account
  • Log in
  • Entries feed
  • Comments feed
  • WordPress.com

Blog at WordPress.com.

Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here: Cookie Policy
  • Subscribe Subscribed
    • Pumpkin Person
    • Join 686 other subscribers
    • Already have a WordPress.com account? Log in now.
    • Pumpkin Person
    • Subscribe Subscribed
    • Sign up
    • Log in
    • Report this content
    • View site in Reader
    • Manage subscriptions
    • Collapse this bar
 

Loading Comments...
 

    %d