I recently got an email from a college age man who was concerned that his VERBAL KNOWLEDGE INDEX (as I call it) was in the genius range (on both the children and adult version of the Wechsler) despite the rest of his cognitive profile being mediocre or low. When he asked his psychiatrist to interpret the results there was no helpful reply, so a friend of his suggested he contact me.

I’m not a psychiatrist so my opinions are for entertainment purposes only.

The first thing I did was correct all his subtest scores for norm inflation because the WISC-IV norms were 11 years old when he was tested and the WAIS-IV norms were a decade old. The sources I used were pg 240 of James Flynn’s Are We Getting SMARTER? and this table found here:

Such corrections are approximate because one can’t always assume that the rate of norm inflation can be extrapolated beyond the dates from which we have data and some subtests are so new, their rate of inflation had to be estimated using similar tests. In some cases there was norm deflation (see Coding in the table above, aka Digit Symbol).

The next thing I did was substitute the four index scores used on the WISC-IV and WAIS-IV with the five index scores used on the WISC-V. Again this gives only approximate results because the WISC-V index scores were built exclusively on WISC-V data and you’re not allowed to just substitute different versions, and in some cases I had to substitute subtests or adjust for not having the right number of subtests.

Nonetheless, the five factor model is so superior to the four-factor model, that for entertainment purposes only, I did it anyway.

The other liberty I took was calculating his overall IQ, by weighting all five indexes equally (the WISC-V gives equal weight to all core subtests, but more core subtests fall under some indexes than others).

The first thing we notice is remarkable stability as we move from the children’s scale (2013) to the adult scale (2016). In 2013 his overall IQ was slightly below the U.S. mean of 100; in 2016 he scored slightly above, and even this modest increase might be partly explained by practice effect.

In 2013 his profile was verbal > abstract > working memory > processing > spatial. In 2016, verbal > abstract > processing > spatial > working memory. In other words there is a near perfect 0.95 correlation between his cognitive profile in 2013 and 2016, despite the fact that different versions of the Wechsler (with different questions) were used on each date.

Vertical reliability vs horizontal reliability

Reliability (not to be confused with stability) is typically measured by dividing all the items on a test in half in some random way (e.g. odd vs even numbered items). If the total score on all the odd number items correlates well with one’s score on all the even numbered items, this suggests your score was reliable, because it internally self-replicates. The reliability of the Wechsler scales are so high at the full-scale level that they are said to have a standard error of only 2 points, meaning in 2/3rd of all cases, one’s score is within 2 points of one’s “true” score and in 95% of cases, one’s score is within 4 points of one’s true score.

But what is true score? True overall score is the overall score one would get on the Wechsler if we could make every subtest infinitely long, yet factor out fatigue, practice effects, and ageing.

However I propose an alterative definition of true overall score: the overall score one would get on the Wechsler if we could increase the number of subtests to infinity, yet factor out fatigue, practice effects, and ageing. But since many subtests redundantly measure the same functions, what we really want to do is increase the number of index scores to if not infinity, then the maximum number that exist within the human mind. Let’s call this horizontal true score, to distinguish it from the typical definition of true score, which we can call vertical true score.

To measure horizontal true score, imagine we were doing a poll of the average IQ in a given school. If we tested five students, and they had an average IQ of 80 with an SD of 10, then the standard error of our poll would be 10 divided by the square root of our sample size.

Now instead of trying to find the average IQ of different people in a school, we’re instead trying to find the average index score of different talents within the same mind. Once we find the standard error of the average index score, we could convert it to standard error of overall IQ (because index scores are imperfectly correlated, one’s composite score on multiples indexes tends to be more extreme than one’s average index score). Multiplying the standard error by 1.96 and then adding and subtracting it from the overall IQ gives the 95% confidence band.

Based on the amount of scatter, I calculate that all we can say with 95% certainty is that the subject’s true horizontal overall score is anywhere from IQ 73 to IQ 122 (in 2013) and anywhere from IQ 78 to IQ 133 (in 2016).

So the subject is either very smart, or very not-smart but we can’t be more precise than that without running several more tests.