The following article is an updated revision of an article I published in August 2019 about how 21st century Northern Americans score on an IQ test normed before the second World War. The reason for the update is that in December 2019, the sample size of my study increased by 13% (from n = 15 to n = 17). I had originally hoped to collect more data before publishing an update but with the uncertainty surrounding the coronavirus crisis, it’s unclear when that will be.
The Flynn effect, popularized by James Flynn, refers to the fact that IQ tests supposedly get easier with time. Although by definition the average IQ of American or British (white) people is always 100, the older the IQ test, the easier it is to score 100. Thus to keep the average at 100, tests like the Wechsler must be renormed every 10 years or so, otherwise the average IQ would increase by about 3 points per decade.
Although scholars continue to debate whether the Flynn effect reflects a genuine increase in intelligence (perhaps caused by prenatal nutrition or mental stimulation) or just greater test sophistication caused by modernity, there’s been remarkably little skepticism about the existence of the Flynn effect itself.
Malcolm Gladwell writes:
If an American born in the nineteen-thirties has an I.Q. of 100, the Flynn effect says that his children will have I.Q.s of 108, and his grandchildren I.Q.s of close to 120—more than a standard deviation higher. If we work in the opposite direction, the typical teen-ager of today, with an I.Q. of 100, would have had grandparents with average I.Q.s of 82—seemingly below the threshold necessary to graduate from high school. And, if we go back even farther, the Flynn effect puts the average I.Q.s of the schoolchildren of 1900 at around 70, which is to suggest, bizarrely, that a century ago the United States was populated largely by people who today would be considered mentally retarded.
While few people believe our grandparents were genuinely mentally retarded, it’s taken for granted that they would have scored in the mentally retarded range by today’s standards.
But is this true? I began having doubts over a decade ago when I examined the items on the first Wechsler intelligence scale ever made: the ancient WBI (Wechsler Bellevue intelligence scale). Meticulously normed on New Yorkers in the 1930s, this test remains far and away the most comprehensive look we have at early 20th century white Northern American intelligence, and while some of the subtests looked easy by today’s standards, others, especially vocabulary, looked harder.
The Kaufman effect
What also struck me was how little instruction, probing or coaching people got when taking the ancient WBI, compared to its modern descendant the WAIS-IV. This matters a lot because the way the Flynn effect is calculated on the Wechsler is by giving a new sample of people both the newest Wechsler and its immediate predecessor, in random order to cancel out practice effects, and then seeing which version they score higher on. If they average 3 points lower on the WAIS-IV normed in 2006 than on the WAIS-III normed in 1995, it’s assumed IQ increased by 3 points in 11 years.
The problem with this method (as Alan Kaufman may have discovered before me) is that the subset of the sample that took the newer version first has a huge advantage on the older version compared to the norming sample of the older test (over and above the practice effect which is controlled for), because the norming sample of the older test was never given coaching and probing.
Statistical artifact
A Promethean once said maybe the Flynn effect is just a statistical artifact of some kind. He never told me what he meant, but it got me thinking:
One problem with how the Flynn Effect is calculated on the Wechsler is that it’s assumed that gains over time can be added. For example it’s assumed that you can add the supposed 7.8 IQ gain from WAIS normings 1953.5 -1978 to the 4.2 IQ gain from normings 1978 – 1995 to the 3.7 IQ gain from normings 1995-2006, for a grand total of 15.7 IQ points from normings 1953.5 – 2006.
This would make sense if he were talking about an absolute scale like height, but is problematic when talking about a sliding scale like IQ. For example, suppose the raw number of questions correctly answered in 1953.5 was 20 with an SD of 2. By 1953.5 standards, 20 = IQ 100 and every 2 points = 15 IQ points above or below 100. Now suppose in 1978, people averaged 22 with an SD of 1. That’s a gain of 15 IQ points by 1953.5 standards. Now suppose in 1995 people average 23 with an SD of 2. That’s a gain of 15 IQ points by 1978 standards. Adding the two gains together implies a 30 point gain from 1953.5 to 1995, but by both 1953 and 1993 standards, the difference is only 23 points.
Changing content
Another problem with studying the Flynn effect is the content of tests like the Wechsler is constantly changing. This is especially problematic when studying long-term trends in general knowledge and vocabulary. If words that are obscure in the 1950s become popular in the 1970s, then people in the 1970s will score high on the 1950s vocabulary test. Meanwhile the 1970s vocabulary test may contain words that don’t become popular until the 1990s, Thus adding the vocabulary gains from the 1950s to the 1970s to the gains from the 1970s to the 1990s, might give the false impression that people in the 1990s will do especially well on a 1950s vocabulary test, when in reality, many words from the 1950s may have peaked in the 1970s and are even more obscure in the 1990s than they were in the 1950s.
An ambitious study
Given the Kaufman effect, the statistical artifact, and changing content, I realized the only way to truly understand the Flynn effect is to take the oldest quality IQ test I could find and replicate its original norming on a modern sample.
In 2008 I made it my mission to replicate Wechsler’s 1935-1938 norming of the very first Wechsler scale. Ideally I should have flown to New York where Wechsler had normed his original scale, but if Wechsler could use white New Yorkers as representative of all of white America (WWI IQ tests showed white New Yorkers matched the national white average), I could use white Ontarians as representative of all of white Northern America (indeed white Americans and white Canadians have virtually the same IQs). The target age group was 20-34 because this was the reference age group Wechsler had used to norm his subtests.
It took over a decade but I was gradually able to arrange for 17 randomly selected white young adults to take the one hour test. They were non-staff recruited from about half a dozen fast food/ coffeehouse locations in lower to upper middle class urban and suburban Ontario. The final sample ranged in education from 9.5 years (early high school dropout) to 18 years (Masters Degree in Engineering from one of Canada’s top universities). The mean self-reported education level was 12.9 years (SD = 2.12) suggesting that despite the lack of female participants, the sample was fairly representative (the average Canadian over 25 has about 13 years of schooling); in cases where those below the age of 25 were in the process of finishing a degree, they were credited as having it.
Testing conditions were not optimum (environments were sometimes noisy, at least one person had a few beers before testing; another was literally falling asleep during the test) and 17 people is way to small a sample to draw statistically significant conclusions about 11 different subtests. One man with a conspicuously low score was removed from the sample after he stated that he had years ago suffered a stroke.
Nonetheless, the below table shows how whites tested in 2008 to 2019 compared to Wechsler’s 1935-1938 sample, with the last column showing the expected scores of the 21st century sample, extrapolating gains James Flynn calculated from 1953.5 to 2006 (see page 240 of his book Are We Getting SMARTER?) to the current study: circa 1937 to circa 2013.5.
Note: the 11 subtests were scaled to have a mean of 10 and an SD of 3 in the original young adult norming sample, while the verbal, performance and full-scale IQs were scaled to have a mean of 100 and an SD of 15. Note also that vocabulary is alternate test, not used to calculate either verbal or full-scale IQ on the WBI. One third of my sample did not take Digit Symbol so for these, Performance and full-scale IQs were calculated via prorating.
Test: | Nationally representative sample of young white adults (NY, 1935 to 1938) | Randomish sample of young white adults (2008 to 2019, ON, Canada) | Expected WBI scores in 2008-2019 based on Flynn’s calculated rate of increase |
Information (general knowledge test) | 10 (SD 3) | 8.41 ( SD 2.55) | 12.3 |
Similarites (verbal abstract reasoning) | 10 (SD 3) | 13.35 (SD 2.91) | 15.54 |
Arithmetic (mental math) | 10 (SD 3) | 9.18 (SD 4.34)(this subtest contained a unit conversion item that seemed biased against Canadians so for those who advanced far enough to fail this item, scores were prorated; had they not been the mean would have been 7.53 (SD 3.54)) | 11.02 |
Vocabulary | 10 (SD 3) | 9 (SD 2.5) | 14.95 |
Comprehension (Common sense & social judgement) | 10 (SD 3) | 9.47(SD 2.93) | 13.93 |
Digit Span (attention & rote memory) | 10 (SD 3) | 9.71 (SD 2.63) | 11.46 |
Picture Completion (visual alertness) | 10 (SD 3) | 10.71 (SD 3.1) | 14.52 |
Picture Arrangement (social interpretation) | 10 (SD 3) | 10.24 (SD 2.73) | 13.35 |
Block Design (spatial organization) | 10 (SD 3) | 13.12 (SD 3.31) | 12.91 |
Object Assembly (spatial integration) | 10 (SD 3) | 11.82 (SD 1.89) | 14.06 |
Digit Symbol (Rapid eye-hand coordination) | 10 (SD 3) | 11.12 (SD 2.82)(note: only 12 of the 17 subjects took this subtest) | 14.66 |
Verbal IQ | 100 (SD 15) | 103.8 (SD 14.73) | |
Performance IQ | 100 (SD 15) | 109.3 (SD 12.11) | |
Full-scale IQ | 100 (SD 15) | 106.9 (SD 13.63) | 122 |
Conclusion
The Flynn effect is dramatically smaller than we’ve been led to believe at least on tests of specific information that may become obscure over generations. By contrast Similarities (abstract reasoning) and Block Design (spatial analysis) have indeed increased by amounts comparable with Flynn’s research. These two abilities may conspire to explain why some of the largest Flynn effects have been claimed on the Raven Progress Matrices, an abstract reasoning test using a spatial medium.
It’s unclear if these are nutritional gains caused by increasing brain size, neuroplastic gains caused by cultural stimulation, or mere teaching to the test caused by schooling, computers and brain games.
Lynn (1990) argued the Flynn effect was caused by nutrition, citing a twin study proving nutrition gains are more pronounced on Performance IQ (consistent with the Flynn effect). Research on identical twins (where one twin gets better prenatal nutrition than the other) has shown that by age 13, the well nourished twin exceeds his less nourished counterpart by about 0.5 SD on both head circumference and Performance IQ, but not at all on verbal IQ. Thus it’s interesting that 21st century young Northern American men today exceed their WWII counterparts by about 0.5 SD on both head circumference (22.61″ vs 22.3″) and Performance IQ (109 vs about 100).
One possibility is that Performance IQ gains are entirely caused by improvements in the biological environment (prenatal health and nutrition), while verbal IQ gains are entirely caused by cultural advances (i.e. education); though somewhat negated by knowledge obsolescence.
Surprised digit span failed to drop as much as it would be expected but dunno if that is only for backwards digit span. I’ve heard backwards digit span might have the highest g-loading, talking to fellow intellectuals and what not haha.
Anyways, this video on YouTube will help explain what’s happening: https://m.youtube.com/watch?v=o5pUu2fwtXo
It’s a question of how much the genetic drop has been and how much of an increase we are seeing in the environmentally dependent subtests. If we can access that, we can find out how IQ is changing over time.
the proper way to grok putin is…
1. he’s KGB. he’s a chekist. he’s pure evil, but his evil is NOT “original”. he’s evil in the tradition of yezhov and beria and dzerzhinsky and stalin and etc.
2. he’s a russian nationalist.
I dont care about Putin. To me Putin is a pawn in a game I cant really decipher. Thats like my centralized concept in life, to decipher the decipherable and disregard the indecipherable.
If you look at it way, you get the most success outta life.
Fascinating you did this study btw, Pumpkin. I admire your determination to find truth amidst the unclear waters of IQ testing. However, I’m sure there is a lot of personal gain involved.
When the Philosopher mentioned that Trump has ADD I instantly got it,!!!
I knew what symptoms my family has.
The neurologist I saw or the test I took was Dr. Kaufman (Noah K. Kaufman)(tested 2017)
Animekitty – weak schizo hallucination, weak bipolar (low energy) slight mania.
Aunt – sever bipolar (mania, crying)
Mom – Developmentally delay (possible mild downs?)
Brother – Borderline, high sensory perception (misdiagnosed ADD)
Sister – ADHD – or high mania bipolar, high energy.
The past and present in the Western world can be summed up as this: Aristotle was a charlatan writing in mumble jumble fancy speak. Today, we have fake news but it is less sophisticated, and it’s easier to understand.
Im glad theres someone else out there that agrees aristotle wasnt impressive.
Puppy finally writes a good post.
Youre right the conclusions of the flynn effect are stupid i,e, our grandparents were mentally retarded. I don’t think you can add norming variations between editions of the test.
The best way to tell is to do what you did and give the same type of sample the old test and see if scores really are that much higher.
That’s what they already do, dumbass.
One thing i’ve noticed in my own country is that school exams were harder back in the past than today.
They’ve had to bring down the standards to cater for non white minorities or to ‘show’ the education system is making productivity gains.
I don’t know if you can say the same for IQ tests. I’m not an expert. But i’ve heard the sat was watered down as well and that has a high equivalence to an IQ test.
In contrast to many like Jimmy Dore and AOC, I don’t think the bailouts were as brazen as what people like Robert were asking for, which was communism. Giving Boeing bridge loans is the halfway point between what I wanted – let them go bust and what robert wanted which was to marry boeing and have children with the people on the board.
you just made that up you lying liberace impersonator.
BA should be punished as you say.
the point is letting a handful of companies go bankrupt as punishment for their sins is fine. letting gobs of buisnesses go bankrupt at the same time is a fucking man made disaster.
I think the value of education is a lot more trivial than most economists agree upon. I think basic education is where you make the most gains and afterwards there are severe diminishing returns to learning things like calculus or the structure of eukarytoic cell. People won’t remember the second type of information by the time they are working unless they are specialists.
Most education beyond age 15 or so is probably forgotten.
So this might be another way of saying I don’t really believe the practice effect on IQ tests is that large either.
obivously. economists are autistic and professors will always think of human capital as being equavalent to credentials.
education per se is NOT a thing. you’d have to be a retarded faggot to think that.
that is education qua medieval french literature is useless.
education qua apprenticeships in the trades or engineering is NOT.
because retarded most politicians think natural science education is always useful and more science is always better. but almost all of both is completely useless and wasteful.
if education = efficiently delivered useful, practical then it would be good.
but formal education is 90% a waste of money and time the way it is done in the US, at least. the way it is, NOT the way it has to be.
If the airlines start paying out bonuses and buybacks after the bailouts instead of keeping workers on the payroll you will see why robert was such a dummy as he doesn’t understand how people think.
i told you you were gay. https://www.bloomberg.com/news/articles/2020-03-25/royal-bank-of-canada-sued-by-reit-over-cmbs-margin-calls
this is what dylan ratigan calls “extraction”.
yes BA has had a payout ratio of 185% over the last 5 years. but mnuchin will save it anyway. the airlines want “free” money rather than loans in the form of a govt equity stake.
the point is if BA’s creditors said, “it’s all ours now” is they would make more than they were owed because BA is still worth an enormous amount as a going concern, and this is never the way bankruptcy is supposed to work. creditors should always be taking a haircut, hence “credit risk”. it would be stealing and it would be bad for everyone except the banks and bond holders. it would be bad for all the employees and the firms dependent on BA and their employees.
the same thing happened with AIG. the losses in its CDS portfolio were in the end trivial iirc. yet its shareholders were wiped out. lehmann i don’t know, but look how it expensive it was. https://www.bloomberg.com/opinion/articles/2019-01-19/lehman-brothers-bankruptcy-keeps-getting-more-expensive
The coronavirus highlights how most jobs and lifestyles are only used for prestige. Doctors and other essential professions demonstrated that our society is centralized around not interfering with their livelihoods but now we see them dying and other professions in health care dying and we realize what their actual purpose is, especially in a ruthless world where sacrifices by others have to be made. This is where we see what
our economy depends on and gives us feedback into how many things are done only to virtue signal.
We will learn from this and help restore order to our grand-scale Mouse Utopia.
Are you moderating RR?
No
it’s funny how italy is so civilized that there are ONLY three excuses to leave your domicile.
1. food…going to a grocery store.
2. drugs…going to a pharmacy.
3. dogs…taling your dog for a walk.
dogs are ESSENTIAL personnel/workers.
https://en.wikipedia.org/wiki/List_of_biosafety_level_4_organisms
are all of these UN-contagious?
is there a CFR vs contagiousness negative correlation?
what is the “just so story” which explains this nagative correlation?
lassa virus
bolivian hemorrhagic virus
congo-crimean virus
omsk virus
small pox
H5N1
SARS
MERS
ebola
marburg
rabies
…
HIV is still UNIQUE in terms of…
1. longest average incubation period by far…
2. 99% kill rate…only 1% are long term non progressors iirc.
the point is…
spending time and money on medieval french lit is evil when there are homeless people and people living in trailer parks and people with no health insurance…
my theory is…
the humanities and history and social sciences ARE/CAN BE important…
but their current means of funding is evil.
but at the same time…
state funded humanities is/are shit…
so scratch that…
it could be that psychologists scored as high on IQ tests as economists…
and it could be that economists weren’t autistic…
but then they’d both have to be HISTORIANS.
fauci agreed.
If one assumes that the number of asymptomatic or minimally symptomatic cases is several times as high as the number of reported cases, the case fatality rate may be considerably less than 1%. This suggests that the overall clinical consequences of Covid-19 may ultimately be more akin to those of a severe seasonal influenza (which has a case fatality rate of approximately 0.1%) or a pandemic influenza (similar to those in 1957 and 1968) rather than a disease similar to SARS or MERS, which have had case fatality rates of 9 to 10% and 36%, respectively..
BUT this was submitted a month ago i expect and if it were true then…
the shortage of ICU beds and the shortage of ventilators would also have to be fake news…
OR…
doctors have autism and a lot of these patients are just hysterical but convincing.
Pumpkin Person,
Thanks for the ambitious study. And thanks for the thoughtful conclusion. However, there seems to be more confounding of conclusions from the studies than is obvious. Yet, I am seeking the truth as you are.
The question is whether the gains in IQ in the studies are really intelligence gains!
Only recently have deceptive misnomers been removed from the Wechsler tests in WAIS-IV. Verbal IQ and Performance IQ contained components that pointed to fairly similar mental substrates. For example, the Similarities subtest involved abstract reasoning so also did Matrix Reasoning. Working Memory Index in Verbal IQ and Processing Speed Index in Performance IQ point to neurophysiological factor quality. So, Verbal IQ – Performance IQ dichotomy may be, perhaps, nonsensical or misleading.
On the former Performance IQ, Clinicians were able to observe how a participant reacted to the “longer interval of sustained effort, concentration, and attention”; that the performance tasks required.(https://en.m.wikipedia.org/wiki/Wechsler_Adult_Intelligence_Scale) Hence, there likely was a bias in favor of the brains with better neurophysiology that could easily withstand stress and strain. However, those brains might not have necessarily had better intuitions – eductive, deductive, and inductive capabilities – which are a hallmark of higher intelligence.
On WAIS-IV, there are currently four index scores representing major components of intelligence:
Verbal Comprehension Index (VCI)
Perceptual Reasoning Index (PRI)
Working Memory Index (WMI)
Processing Speed Index (PSI)
Two broad scores, which can be used to summarize general intellectual ability, can also be derived:
Full Scale IQ (FSIQ), based on the total combined performance of the VCI, PRI, WMI, and PSI
General Ability Index (GAI), based only on the six subtests that the VCI and PRI
The GAI is clinically useful because it can be used as a measure of cognitive abilities that are less vulnerable to impairments of processing speed and working memory. And prenatal nutritional deficit, perhaps, may be implicated in some cases.
Some studies have found the gains of the Flynn effect to be particularly concentrated at the lower end of the distribution. Teasdale and Owen (1989), for example, found the effect primarily reduced the number of low-end scores, resulting in an increased number of moderately high scores, with no increase in very high scores. (https://en.m.wikipedia.org/wiki/Flynn_effect) So, mere teaching to the test caused by schooling, computers and brain games may actually be responsible for the gains observed in your study. And what is the working mechanism? These help the average to see low-hanging fruits of logic on the tests and they pluck them! But their is a ceiling on the more novel questions, hence, there is no inflation at the higher ends of the distribution as has been observed. This may also be why score gain was selective and not across board – perhaps the other subtests are hard to teach, you have them or you do not.
Please what do you think?
Only recently have deceptive misnomers been removed from the Wechsler tests in WAIS-IV. Verbal IQ and Performance IQ contained components that pointed to fairly similar mental substrates. For example, the Similarities subtest involved abstract reasoning so also did Matrix Reasoning. Working Memory Index in Verbal IQ and Processing Speed Index in Performance IQ point to neurophysiological factor quality. So, Verbal IQ – Performance IQ dichotomy may be, perhaps, nonsensical or misleading.
The most comprehensive factor analysis of the latest Wechsler scales identify at least 5 factors: verbal comprehension, working memory, spatial, fluid abstraction, and processing speed. The original Wechsler (used in my study) had somewhat different subtests (no Matrix reasoning, Figure Weights or Cancellation) so only 3 factors emerged when it was factor analyzed: Verbal comprehension, Working Memory, and Spatial ability. Verbal IQ mostly measured the first two while Performance IQ mostly measured the third.
On the former Performance IQ, Clinicians were able to observe how a participant reacted to the “longer interval of sustained effort, concentration, and attention”; that the performance tasks required.(https://en.m.wikipedia.org/wiki/Wechsler_Adult_Intelligence_Scale) Hence, there likely was a bias in favor of the brains with better neurophysiology that could easily withstand stress and strain. However, those brains might not have necessarily had better intuitions – eductive, deductive, and inductive capabilities – which are a hallmark of higher intelligence.
The Performance IQ items all had short time limits (no more than 3 minutes) so not much need for sustained effort. The more Flynn affected subtests (Blocks and Object Assembly) measure the ability to see spatial relations.
Some studies have found the gains of the Flynn effect to be particularly concentrated at the lower end of the distribution. Teasdale and Owen (1989), for example, found the effect primarily reduced the number of low-end scores, resulting in an increased number of moderately high scores, with no increase in very high scores.
Flynn himself did not find that and neither did I. On the most Flynn affected subtests I’m seeing big gains on the right side of the curve.
So, mere teaching to the test caused by schooling, computers and brain games may actually be responsible for the gains observed in your study.
Unlikely. Unless a brain game or school class is almost identical in content to the wechsler subtest, it’s unlikely to show a transfer effect. That’s precisely what makes low IQ such a major problem. It can’t be corrected through psychological intervention except in cases where you teach to the test, which defeats the purpose of the test which is to measure performance on untaught tasks.
And what is the working mechanism? These help the average to see low-hanging fruits of logic on the tests and they pluck them! But their is a ceiling on the more novel questions, hence, there is no inflation at the higher ends of the distribution as has been observed. This may also be why score gain was selective and not across board – perhaps the other subtests are hard to teach, you have them or you do not.
Please what do you think?
I think it’s more logical to think the Performance IQ reflects a genuine rise in spatial IQ and was NOT affected by culture because (a) it was designed to be culture reduced, (b) we see parallel gains in head circumference, (c) study of prenatal nutrition show this affects Performance IQ and head circumference to the same degree. By contrast, I think the verbal gains just reflect education and are not a genuine rise in verbal intelligence.