One of the biggest mysteries in psychology is the Flynn Effect; the fact that over the 20th century, people have been performing better and better on IQ tests.  Of course, the average IQ in Western countries by definition is always about 100, however because people keep scoring higher every decade, the tests routinely have to be made more difficult and the norms must be regularly updated to keep the mean IQ from rising far above 100.

Now if this only happened on culturally loaded tests like General Knowledge and Vocabulary, we could simply conclude that the tests are just culturally biased against past generations who had less access to schooling and media.  But some of the biggest gains have been found on tests like the Raven which were explicitly designed to be culture fair.

A sample item from the Raven Matrices. One must complete the pattern above using one of the missing pieces below. For decades this test was considered the gold standard of culture reduced testing.

How big is the Raven adult Flynn effect? About 30 IQ points per century in the Anglo-sphere.

Although some studies show incredibley large Raven gains, such as 7 points per decade, these have only been documented for relatively short intervals (30 years) and tend to be in countries with massive changes in nutrition (Holland).  Looking at the Anglosphere, which has been the most important part of the World for the last few centuries, the gains appear to have been 3 points per decade over the 20th century.

In one study (see figure 2) the top 10% of British people born in 1877 (by definition those with IQ’s above 120 for their era) performed the same on the Raven as the bottom 5% of British people born in 1967 (by definition those with IQ’s below 75 for their era).  In other words, performance on the Raven had increased by the equivalent of 45 points in less than a century!  Of course it wasn’t a level playing field because those born in 1877 took the test when they were a somewhat elderly 65 while those born in 1967 took the test when they were young sharp 25 year olds, however Flynn cites longitudinal studies showing that Raven type reasoning declines by no more than 10 points by age 65.  That still leaves us with 35 points to explain.

Another source of inaccuracy was that although the test was not timed for either group, those born in 1877 took the test supervised while those born in 1967 got to take the test home.  This could potentially make a large difference; not necessarily because the unsupervised group would cheat, but because they would probably take more breaks since they were in the comfort of their homes.  They would probably return to challenging items after they had time to relax and see those items from a fresh perspective, while those who took the test supervised in some strange room were probably more likely to rush through the tasks so they could go home. I would estimate that being allowed to take a test home improves test performance by about 5 IQ points on average, though this is just a guess.

But that still leaves a huge difference of 30 IQ points between 25-year-olds born in 1877 and 25-year-olds born in 1967.  That gap is the Flynn effect.

Of course 25-year-olds born in 1967 are 49-years-old today.  Have more recent cohorts of 25-year-olds continued to improve on Matrix reasoning problems?  To answer this question I turn to the WAIS-IV, normed in 2006.  On the Matrix reasoning norms from the WAIS-IV, there’s virtually no difference in performance between those born in 1967 and those born in 1981, despite the fact that the former were older when tested.  This suggests the Raven Flynn effect might have slowed for post-1967 birth cohorts.

All together, it looks like Raven scores in the Anglo-sphere have increased by 30 IQ points in over a century (birth cohort 1877 to 1981).

How much of these gains can be explained by nutrition? About 10 IQ points.

The preponderance of evidence suggests that adult brain size has recently increased by about 1.3 SD per century in Europe and North America.  A study of identical twins, where one is born malnourished, and the other born healthy, suggests that the malnourished twin’s verbal IQ is unscathed, but her Performance IQ and brain size are both equally stunted.  Thus, if Victorians were 1.3 SD stunted in brain size, we might expect them to be 1.3 SD stunted in Performance IQ, but not at all stunted in verbal IQ.  I hypothesise this is is probably because the brain evolved to prioritize verbal IQ when nutrients are limited, since humans can survive if they have the verbal IQ to access the group’s cultural knowledge; they don’t need the spatial IQ to reinvent the wheel.

Traditionally Performance IQ tests measured visual and spatial talent, and the ability to manipulate objects (jig-saw puzzles, blocks).  By contrast the Raven, despite using the visual medium, emphasizes analogical thinking, not visual-motor abilities and spatial synthesis.  Indeed Richard Lynn notes that although malnutrition stunts Raven ability more than it stunts verbal and memory abilities, Raven ability is preserved compared with more traditional Performance IQ tests like Block Design.

So if the Victorians were 1.3 SD stunted in brain size, and if nutritional damage to brain size perfectly matches nutritional damage to Performance IQ, while verbal IQ is completely preserved, with Raven IQ in between both extremes, we might guess that malnutrition stunted Victorian Raven scores by 0.65 SD (half of 1.3 SD and the equivalent of 10 IQ points).

How do we explain the remaining 20 IQ points?

To explain the remaining 20 IQ points of the Flynn effect, in the past I have invoked schooling and socio-economic changes.  For example, it’s well known that attending high school, which most Victorians did not do, adds 10 points to your IQ score.  And even Jensen admitted that being raised in a higher socio-economic environment (which Victorians also lacked) adds 10 points to IQ (the well-known adoption effect).  Of course these cultural effects were thought to vanish by adulthood as genes become more important, but as the Dickens-Flynn model explained, that’s only true within generations, when genes and environment correlate, because genetically advantaged people create cultural advantages, causing cultural effects to become a mere extension of genetic effects.

Within generations, the boats that are naturally tallest, sit on the highest waves, so their extrinsic advantage (wave height) simply multiplies their intrinsic advantage (natural height) causing the latter to seem omnipotent.  However between generations, cultural advances are like a rising tide that lifts all boats, as even genetic dullards today enjoy far more schooling and socio-ecomic advantages than many geniuses of past centuries.

But why do cultural advantages improve IQ scores, which are supposed to measure innate ability?

Merely saying that schooling and socio-economic advances improve IQ scores, doesn’t get us anywhere.  The question is why.  In his book, Does your family make you smarter?, scholar James Flynn seems to hint at two explanations:  (1) The brain is like a muscle, and modern culture allows us to exercise it.  (2) Modern culture causes us to apply logic to the hypothetical.  Although Flynn (if I understood him correctly) seems to lump these two explanations together, I think they are better understood as separate hypotheses.

How much of the gains can be explained by exercising our brain like it were a muscle? Zero points.


In an article in The Irish Times, Flynn writes:

The human brain is like a muscle. Our physical muscles develop in terms of the demands made, compare a weightlifter’s muscles with those of a swimmer. By 1940, most Americans were driving cars and this made new demands on their mapping-skills. These would be reflected in a larger hippocampus, the part of the brain that is the seat of map reading (for example, London taxi divers have very enlarged hippocampuses). Today we are getting automatic guidance systems and these skills will decline. This has nothing to do with better or worse genes but reflects whatever cognitive skills our society asks us to do.

This is an attractive argument, but there are three major problems with it.

1) The effects of cognitive training seem to have very little transfer.  So practice navigating a car might, if you’re lucky, make you better at navigating by foot, but it’s unlikely to do much if anything for your other spatial abilities, like solving a jig-saw puzzle on WAIS IQ test. See Why knowledge & education can NOT make you smarter.

2) If the brain really were like a muscle, and the Flynn effect is largely caused by people getting more mental exercise, then we’d expect between generation brain size gains to increase from infancy to adulthood, just as if people today were lifting more weights than Victorians were, we’d expect the inter-generation muscle size gains to increase from infancy (when people don’t lift weight) to adulthood (when years of weight-lifting accumulate).

Instead it’s just the opposite.  Scholar Richard Lynn reports huge gains in head circumference in British one-year-olds and British seven-year-olds (1.5 cm and 2 cm in 50 years and given that the SD for head circumference among whites at both ages is 1.5 cm (see table 1 of this paper), that implies a head circumference increase of 2 SD and 2.67 SD per century respectively).  However by adulthood, there’s no evidence that head size or brain size has increased by more than 1.3 SD per century in the Western World, and even that might be an overestimate.

3) Lastly, if a large chunk of the Flynn effect really was analogous to newer generations building more muscle, than the gains made from developing cognitive skills would have real World consequences, just as building real muscle has real consequences in terms of strength performance.  If staying in school longer was really causing us to exercise our brain’s “abstract reasoning muscles” as measured by tests like the Raven, then shouldn’t we expect even more scientific breakthroughs in abstract fields like math, science, and philosophy?  Indeed James Flynn was perhaps the first to state that if the Flynn effect reflected mostly real gains in intelligence, we’d expect “a cultural renaissance too great to be overlooked”.  So why now does Flynn compare these mysterious IQ gains to the very real strength gains weight lifters experience?  Perhaps scholar Stephen Pinker convinced him we are experiencing such a renaissance?

However according to scholar Charles Murray, human accomplishment has actually declined from 1850 to 1950, and declined even more post-1950.  And in the book The Genius Famine, scholars Edward Dutton and Bruce Charlton also argue that “Genius” level achievements are declining. But to the extent that they have not declined (as Pinker argues), this can more than be explained by 1) a 10 IQ point increase in real intelligence caused by nutrition, 2) an increase in mass education, 3) greater population meaning more talent to draw from, and 4) building on the accomplishments of our ancestors.  There is simply not enough modern accomplishment leftover to explain, once you factor in these four other factors.

How much can be explained by hypothetical thinking? Perhaps 10 points.

In his TED talk, Flynn cites scholar Luria:

Luria looked at people just before they entered the scientific age, and he found that these people were resistant to classifying the concrete world. They wanted to break it up into little bits that they could use. He found that they were resistant to deducing the hypothetical, to speculating about what might be, and he found finally that they didn’t deal well with abstractions or using logic on those abstractions.

Now let me give you a sample of some of his interviews. He talked to the head man of a person in rural Russia. They’d only had, as people had in 1900, about four years of schooling. And he asked that particular person, what do crows and fish have in common? And the fellow said, “Absolutely nothing. You know, I can eat a fish. I can’t eat a crow. A crow can peck at a fish. A fish can’t do anything to a crow.” And Luria said, “But aren’t they both animals?” And he said, “Of course not. One’s a fish. The other is a bird.” And he was interested, effectively, in what he could do with those concrete objects.

And then Luria went to another person, and he said to them, “There are no camels in Germany. Hamburg is a city in Germany. Are there camels in Hamburg?” And the fellow said, “Well, if it’s large enough, there ought to be camels there.” And Luria said, “But what do my words imply?” And he said, “Well, maybe it’s a small village, and there’s no room for camels.” In other words, he was unwilling to treat this as anything but a concrete problem, and he was used to camels being in villages, and he was quite unable to use the hypothetical, to ask himself what if there were no camels in Germany.

It seems to be that pre-modern people simply didn’t understand the basic rules of taking tests.  You must assume that whatever information the tester gives you is true, and you must be willing to take it seriously.

They lived in a World where life depended on solving actual problems, so they couldn’t relate to tests that required them to solve imaginary problems, just to prove they had problem solving ability.  But those of us who have been socialized by decades of schooling and educated parents, are quite used to imaginary problems and are quite willing to take them seriously.

But I would call this mere test sophistication.  I would not say that training people to solve hypothetical problems has increased real intelligence, because real intelligence, by definition, is the ability to solve real problems.  Problems that are not real, are technically not even problems.

Of course to measure one’s ability to solve all types of problems, test makers must create hypothetical problems, but if a test-taker can’t interpret hypothetical situations as actual problems, then he is not necessarily lacking in intelligence, but rather is untestable via hypothetical questions. Such a person could only be tested if we made those hypothetical problems real, like we do when we test animals.  We don’t ask a monkey how he would use the bamboo sticks to get the banana, we deny him the banana until he figures out how to get it.  We make the hypothetical real, since it’s the only way he’ll take the test.

Writing in The Irish Times, Flynn states:

Scholars mired in the dogma that “real” intelligence cannot increase over time dismiss them as mere skill gains acquired by better education. This is self-defeating. The genetic limitations of our brains were supposed to tell us who was capable of profiting from education. Nothing was more evident to the elite of 1900 that that the masses could never be trained to assume the demanding cognitive roles the elite monopolised at that time. Well, the entire modern world has proved them wrong.

Intellectual progress has brought moral progress. Among school-demanded skills is applying logic to generalised statements and taking the hypothetical seriously. People of the Victorian era saws moral maxims as concrete things, no more subject to logic than any other concrete thing. Unlike us – people of the late 20th and early 21st century educated within an analytic scientific tradition – they would not see hypotheticals as universal criteria to be generalised.

Flynn seems unwilling to make a distinction between a mental skill and intelligence.  I think a narrow mental ability is just a skill or a talent.  A broader one is intelligence, or at least a major part of intelligence.  Clearly the ability to cope with the hypothetical transfers to several different kinds of cognitive tests, so it may at first glance appear to have broad transfer, and thus great adaptive value.

But then we must remember that tests, by definition, are hypothetical problems, so of course an ability to adapt to hypotheticals will enhance hypothetical problem solving, but that tells us nothing about its value to real problem solving.

Perhaps it has made us more intelligent in the cocktail party sense of having deep philosophical or moral views, but in terms of solving actual novel problems, I doubt it’s done much.  Why the skepticism?  Because we already know nutrition raised real intelligence by 10 points since the Victorian era, and our real world accomplishments are not impressive enough to add any more points to our real intelligence, given all the other advantages of modernity (large population of talent to learn from the past).

But I do agree with Flynn, that hypothetical problem solving is a major cause of the Flynn effect, perhaps equivalent to nutrition (10 points).  But I would consider it a learned skill, or trick of the test taking trade, rather than a raw ability that was improved through mental exercise.

Another quote from James Flynn’s TED talk:

My father was born in 1885, and he was mildly racially biased. As an Irishman, he hated the English so much he didn’t have much emotion for anyone else. (Laughter) But he did have a sense that black people were inferior. And when we said to our parents and grandparents, “How would you feel if tomorrow morning you woke up black?” they said that is the dumbest thing you’ve ever said. Who have you ever known who woke up in the morning — (Laughter) — that turned black?

In other words, they were fixed in the concrete mores and attitudes they had inherited. They would not take the hypothetical seriously, and without the hypothetical, it’s very difficult to get moral argument off the ground. You have to say, imagine you were in Iran, and imagine that your relatives all suffered from collateral damage even though they had done no wrong. How would you feel about that? And if someone of the older generation says, well, our government takes care of us, and it’s up to their government to take care of them, they’re just not willing to take the hypothetical seriously. Or take an Islamic father whose daughter has been raped, and he feels he’s honor-bound to kill her. Well, he’s treating his mores as if they were sticks and stones and rocks that he had inherited, and they’re unmovable in any way by logic. They’re just inherited mores. Today we would say something like, well, imagine you were knocked unconscious and sodomized. Would you deserve to be killed? And he would say, well that’s not in the Koran. That’s not one of the principles I’ve got. Well you, today, universalize your principles. You state them as abstractions and you use logic on them. If you have a principle such as, people shouldn’t suffer unless they’re guilty of something, then to exclude black people you’ve got to make exceptions, don’t you? You have to say, well, blackness of skin, you couldn’t suffer just for that. It must be that blacks are somehow tainted. And then we can bring empirical evidence to bear, can’t we, and say, well how can you consider all blacks tainted when St. Augustine was black and Thomas Sowell is black. And you can get moral argument off the ground, then, because you’re not treating moral principles as concrete entities. You’re treating them as universals, to be rendered consistent by logic.

Flynn correctly cites the racism of past generations as evidence of poor reasoning, and yet, as I noted above, he also claimed past generations struggled with generalizing and categorizing (“What do crows and fish have in common?”) and hypothetical syllogisms (“There are no camels in Germany. Hamburg is a city in Germany. Are there camels in Hamburg?”), but what is racism if not the tendency to generalize, categorize and use syllogisms.

To be a racist, you must be good at recognizing who is black, which requires an ability to see common facial, colour and hair traits.  It also requires the ability to think syllogistically: “Our new neighbor is black.  I don’t like blacks. Therefore, I don’t like our new neighbor”

So clearly, Victorians had the ability to think in these ways, but they could not, or would not, apply that thinking to the hypothetical problems posed on tests or in abstract discussions.

This makes perfect sense.  Intelligence evolved to enable us to adapt, to take whatever situation we’re in, and turn it around to our advantage.  Thus we are genetically predisposed to use our intelligence to solve practical problems; problems that are actually problems, not the make-believe problems of the Raven.

It is a testament to the decadence of modernity that we have few real problems to solve, so we’re motivated to solve imaginary problems, unlike our ancestors who “would not take the hypothetical seriously” in Flynn’s words.

Note, that even Flynn himself says “would not”, not “could not”.  This raises the question, is hypothetical thinking is even a skill, as opposed to merely a motivation? I’ll tentatively assume the former, and consider motivation effects separately.

How much of the gains can be explained by motivation? About 10 points.

On tests like the Raven Progressive Matrices, where focus, persistence and concentration is required, it always seemed like common sense to me that motivation was a major factor.

Particularly in samples where education and socio-economic status is low (as was the case with Victorians), tests like the SPM (Standard Progressive Matrices) and CPM (Colored Progressive Matrices) can be very annoying indeed.  Scholar J.P. Rushton et al, reported on giving these tests to the Roma:

Most Roma found the tasks very difficult; some complained of getting a headache.They typically asked to stop the test before 30 min. After completing and analyzing 231 sets of scores on the SPM, it was decided to switch to the CPM. The remaining 92 subjects were administered the CPM. Although test-takers seemed to enjoy this version more, they continued to report the task was difficult and gave them a headache.

Some tests require too much focus, causing folks to get a headache or frustrated

Motivation is a very likely explanation for the Flynn effect because Victorians were used to the outdoors, chopping wood and riding horses in the fresh air.  Sitting in an office at a desk concentrating on Raven puzzles for an hour must have been most painful indeed.  By contrast, modern people have typically spent 13 years in school and work in white collar jobs. We’re used to sitting still and concentrating and are intrinsically motivated to prove we’re smart on standardized tests.  We are also more likely to take tests seriously and find them interesting.

The effects of motivation on IQ scores are acute. A “meta-analysis of random-assignment experiments testing the effects of material incentives on intelligence-test performance on a collective 2,008 participants. Incentives increased IQ scores by an average of 0.64 SD, with larger effects for individuals with lower baseline IQ scores.”  0.64 SD equates to 10 IQ points.  Further, large incentives produced IQ gains of 1.63 SD (24 IQ points!).

Of course, it’s not as though any extrinsic reward has made modern people more motivated on IQ tests than Victorians were, but growing up with more schooling and socio-economic advantage likely produced a culture where people are more intrinsically motivated to do well on mental tests.  This could easily explain 10 points of the Flynn effect, particularly on tests like the Raven that require focused effort.



In the Anglo-sphere, Raven IQ has increased by the equivalent of 30 IQ points since the 19th century.  I believe there are three major causes of this increase.  (1)Prenatal and perinatal nutrition (including disease reduction) which has also substantially increased brain size. (2) the ability and/or willingness to take hypothetical problems seriously, and (3) the motivation to sit still, focus, persist, and concentrate on boring tests.  Each of these factors likely explains about a third of the Raven Flynn effect, though in my view, only the first third (nutrition) should be considered an increase in real intelligence (the mental ability to solve any problem).

While James Flynn correctly asserts that the brain is like a muscle and can get bigger in response to cognitive exercise, most cognitive exercise has extremely narrow effects, and the fact that 20th century brain size gains were largest in early childhood, suggest they are immutable early-life nutritional gains, not the result of decades of mental stimulation.

The real lesson of the Flynn effect is that the Raven Progressive Matrices is NOT a culture reduced test.  If culture reduced testing is to continue in the future, we’ll need tests that don’t require hypothetical abstractions, and are also fun and engaging enough to not require persistent motivation.  I recommend tests like Digit Span (which shows virtually zero Flynn effect) and Block Design (whose Flynn effect in adults can be 100% explained by the effects of prenatal nutrition on Performance IQ).  A properly weighted composite of both tests could have a g loading of 0.8+.  Identifying a culture reduced measure of verbal ability remains an interesting challenge.