Obviously I can’t devote an article to every Human Benchmark test so I’m limiting myself to the best ones. One of the best is number memory.
Digit Span is measured by the largest number of digits a person can repeat without error on two consecutive trials after the digits have been presented at the rate of one digit per second, either aurally or visually. Recalling the digits in the order of presentation is termed forward digit span (FDS); recalling the digits in the reverse order of presentation is termed backward digit span (BDS). Digit Span is part of the Stanford Binet and of the Wechsler scales. Digit Span increases with age, from early childhood to maturity. In adults the average FDS is about 7; average BDS is about 5. I have found that Berkley students, whose average IQ is about 120; have an average FDS of between 8 and 9 digits.
The g Factor by Arthur Jensen, page 262
It should be noted that the Human Benchmark version of digit span does NOT include the Backwards version and shows all the digits at once for several seconds, not each one at a rate of one per second, and it only has one trial per level so there’s no room for error. For this reason I suggest taking your best score on your first two attempts.
So important is this test that it is one of the 10 subtests handpicked by David Wechsler himself for his original Wechsler scale, published in the 1930s.
Perhaps no test has been so widely used in scales of intelligence as that of Memory Span for Digits. It forms part of the original Binet Scale and all the revisions of it. It has been used for a long time by psychiatrists as a test of retentiveness and by psychologists in all sorts of psychological studies. Its popularity is based on the fact that it is easy to administer, easy to score, and specific as to the type of ability it measures. Nevertheless, as a test of general intelligence it is among the poorest. Memory span, whether for digits forward or backward, generally correlates poorly with other tests of intelligence. The ability involved contains little of g, and, as Spearman has shown, is more or less independent of this general factor.
The Measurement and Appraisal of ADULT INTELLIGENCE 5th edition, David Wechsler, 1958, page 70 to 71
On page 221 of The g Factor, Jensen notes that FDS and BDS have g loadings of about 0.30 and 0.60 respectively.
Wechsler goes on to explain that despite being a poor measure of intelligence overall, he included it in part because in his eyes, it’s a great measure of low intelligence: “Except in cases of special defects or organic disease, adults who cannot retain 5 digits forward and 3 backward will be found, in 9 cases out of 10, to be feeble-minded or mentally disturbed.”
The other reason he included it is he viewed it as an excellent measure of dementia.
I’m not convinced the test is better at low levels than at high levels. For example, Charles Krauthhammer towered with a spectacular of BDS of 12, and his genius is validated by the enormous influence he had over U.S. foreign policy.
In the below poll your level corresponds to the highest number of digits you correctly remembered on at least one of your first two attempts:
I wonder how much of the low correlation of digit span with g is caused by its inherently low precision of measurement. If most people score between, say, 5 to 9, then you only have 5 possible results, so not much room to discriminate.
Between inclusive, that is
Good question. One of the reasons Wechsler combined FDS & BDS into a single scale was “the limited range of each series when taken separately. On digits forward, a score range of only 4 points (repeating 5,6,7, or 8 digits) includes about 90 per cent of the adult population, and about the same percentage is included by the ability to repeat 4 to 6 digits backward. Such a range is obviously too small for a point scale.”
The analogy I’d make would be measuring height by rounding everyone to the nearest foot. Virtually everyone would be either 5 or 6 feet tall and the correlation between height and weight would drop to almost zero.
That height analogy expresses my thoughts almost exactly, although I think there are enough men on either side of the 6’0″ line that the correlation would still be a fair bit above zero with a sufficient sample size.
If we rounded everyone to the nearest foot, than everyone from 4.5 to 5.49 feet would be 5 feet and everyone from 5.5 to 6.49 feet would be 6 feet. In other words, 95% of America’s adult population would be either 5 or 6 feet.
o ye
There’s a neat tool to answer this very question: http://emilkirkegaard.dk/understanding_statistics/?app=discretization
If the BDS correlation is .6 and there’s 5 bins, the correlation with 15 bins would be 0.65.
Another tip is the the “tail effects”-tool which is very handy if you’re interested in what fraction of a population is above or below a certain IQ or some other trait: http://emilkirkegaard.dk/understanding_statistics/?app=tail_effects
I’m absolutely fascinated, and more surprised than I probably should be, that someone already came up with a way to adjust for this. I expected the rise to be much higher than 0.05!
What justifies the claim that psychological “measurement” is possible in lieu of a response to the Berka/Nash/Garrison measurement critique?
If you have some strange definition of phycological measurement that prohibits the taking of phycological measurements it is very clearly not a good definition.
More to the point, you can’t argue against a concept because it’s invalid within whatever obscure formal system you happen to agree with, i.e. using some strange idiosyncratic definition of “measurement” and then complaining that other people use the word “measurement” incorrectly is ridiculous. Same thing for “natural selection” or whatever else.
Measurements need a specified measured object, object of measurement and measuring unit. Thus, they need to be physical. Psychometricians, in my opinion, assume the truth of physicalism. Physicalism needs to be true for “mental measurement” to be possible at all. Furthermore, psychometricians need to address the measurement critiques of the aforementioned authors.
“More to the point”, psychometric tests are first constructed and then correlational studies are used in order to “find out” what is “measured” with them. They’re going about this the wrong way.
“A psychological test score is no more than an indication of how well someone has performed at a number of questions that have been chosen for largely practical reasons. Nothing is genuinely being measured.” (Howe, 1997: 6).
https://notpoliticallycorrect.me/2019/12/09/definitions-of-intelligence-and-its-measurement/
https://notpoliticallycorrect.me/2020/04/16/superiority-psychometrics-and-measurement/
Nevermind the fact that the mental isn’t physical nor reducible to it therefore “mental measurement” is a logical impossibility.
https://notpoliticallycorrect.me/2020/08/16/conceptual-arguments-against-heredetarianism/
In my opinion, physicalism needs to be true for the tasks that psychometricians set out to do to be possible. Unfortunately for psychometricians, replies to Berka/Nash are not around—with the exception of Brand et al which is a nonsense reply.
I’m currently writing a more detailed critique of “mental measurement” using “psychometric” tests, drawing on Nash, Berka, Michell, and Trendler.
So I ask again—what justifies the claim that psychological “measurement” is possible?
What’s a statement of what the scale—whichever you would choose—measures? What justifies your use of the scale? How were the items devised then tested? What’s the description of the sample used for testing? What group would this so-called measure be appropriate for? What are the SDs, means, ranges and different subscales? What are the reliability and validity stats? What are examples of some of the items and how do they relate to the hypothetical construct that the scale supposedly measured? Are scales, questionnaires, self-reports, observational reports, likert scales measures? If so, what specified measured object, object of measurement and measurement unit are they referring to and which specific test do you have in mind? If so, what’s the argument that they are? What’s the argument that we can construct scales first and then use correlational studies after?
Rr’s game: be extremely autistic about language, invent dumb problem because autism, dig up some papers nobody will or should read to “resolve” the “problem.”
This became obvious when he posted about racial categories.
…
“Crap on the deck and scream blackjack.”
>Measurements need a specified measured object, object of measurement and measuring unit. Thus, they need to be physical.
I guess you can’t measure wages in USD because dollars aren’t a physical dimension.
But with IQ, this is just a problem if you want to do something like extrapolate the effect of an intervention from one score/rank to another. A “boost” of 5 pts might not translate from initial score 95 to score 115.
“Oh, well, ‘knowledge of calculus’ isn’t an object. It can’t be measured!”
Right next to the overabundance of people, information glut is the world’s biggest problem. It might be WORSE, because it makes dealing with the former so much harder.
Some people should never be taught to read or write.
What’s wrong with my views on racial categories? The measurement problem needs to be addressed or psychometricians can’t—logically—claim that some”thing” is being measured by their tests.
Language matters.
You know, you can just say “I don’t know the answer”—it’s OK to say you don’t know something.
Language is an approximate reference to the underlying referent. Commonly used words, like “measurement”, aren’t authoritatively defined by whichever particular paper you happen to agree with.
Are measures of temperament, as we might be inclined to use with livestock, somehow invalid? Presumably they fail to meet your criteria. Despite this it seems entirely possible to select for temperament, indicating that whatever strictures you seek to impose on the concept of measurement are of no particular practical concern.
I’ve never found it necessary to formally define measurement, but were I going to, I might say “a structured description of some characteristic*”. My definition is as authoritative as whatever you might wish to use and moreover has the benefit of being useful. If I felt so inclined, I could define measurement to be exactly whatever it needs to be to make my use of the word measurement correct; again, it would be no less authoritative then the definition you propose.
*The term characteristic is here used in a purely practical sense and not as a reflection of some deeper platonic reality.
To give a practical example, suppose we want to measure the tendency to do well on sudokus: we could give some group of people a bunch of sudokus and take their average times. Hopefully this predicts future performance, and if it does, we have a measurement, imperfect as it might be, of tendency to do well on sudokus. Presumably this measurement of sudoku prowess fails to satisfy your rigorous definition of measurement; my definition however is satisfied.
You’ve invented, or rather some academic has invented, some fantastical definition of the word measurement and now you argue that others misuse it. If you wish to have your own personal definition of measurement then so be it, but your definition of measurement isn’t somehow authoritative and isn’t particularly useful. The answer therefore to the question of how psychometric measurements satisfy your criteria for what constitutes a measurement is that your criteria aren’t real and don’t need to be satisfied.
Are you under the impression that there are “objective definitions”? The fact of the matter is: If one claims that its possible to measure psychological traits, then they must tango with the Berka/Nash measurement objection. If they cannot, then they cannot say that psychological traits are being measured. Furthermore, the questions I posed above also need to be addressed. The criteria are most definitely real—if you claim that X is measured by Y then you need to address the criticisms and questions. End of story.
I just resolved the Berka/Nash measurement objection: it isn’t real, and this is clear since I provided counterexample. Maybe you want to argue that actually these aren’t measurements but instead something else that just so happens to look and act exactly like a measurement but fails to meet some arbitrary criteria that you think a measurement should necessarily meet, but it’s pretty clear at that point that you have some personal definition of measurement that corresponds only vaguely with common usage.
To summarise:
Your claim is that something necessarily needs to be a physical object to be measured. I disagree. As a demonstration, I point to the fact that you can measure, for instance, a behavioural tendency. If you insist that a measurement necessitates a physical object, then obviously a measurement which is not of a physical object is not a measurement. This is an entirely arbitrary requirement and I reject it.
If for whatever reason measurement was defined to concern itself with physical objects exclusively, it would only mean that we would need to invent vocabulary to describe that which was analogous to measurements but concerned itself with those things which were not physical objects.
What’s the specified measured object, the object of measurement and the measurement unit for this “behavioural tendency”? Why are you not addressing the IQ objection—which is the heart of the matter?
(Post this one.)
Can one first construct scales and then deduce what they measure after? Are scales measures? All of the questions I have posed above relate directly to this discussion we are having. There is a necessary conceptual distinction between the object of measurement and the measured object. When it comes to extra-physical measurement, psychometricians don’t conceptualize the object of measurement that satisfies the minimal requirements for measurement. When it comes to measurement units, we have to know what we’re measuring and even if we can measure it at all. Can the concept of measurement be possible without a measument unit? What’s the measurement and what measurement doesn’t need units?
The objection most certainly is real, explicated by Berka in Measurement: Its Concepts, Theories and Problems and further refined by Nash in Intelligence and Realism.
For the two example I gave you (livestock and sudoku), how aren’t they measurement? In particular, how do they fail to satisfy the definition of a measurement I provided?
Moreover, what are the consequences of phycological measurement not satisfying your requirements for a measure?
Some details on temperament scoring in cattle (relevant to one of the examples provided):
One method of scoring cattle is by observing their behaviour within a squeeze chute.
An example of a scoring system is the following 4-point system:
1. Calm no movement
2. Restless shifting
3. Squirming continuous shaking of the squeeze chute
4. Rearing, twisting, continuous violent struggle
(from https://www.grandin.com/behaviour/principles/assessment.temperament.html)
Differences in temperament as measured with this method are not restricted to this particular test, i.e. different scores seem to be a real measure of differences in how the animals behave. Moreover, cattle sourced from herds derived from heavily selected animals tend to be less fearful.
As for why I’m ignoring IQ specifically: If your argument is a general argument against the possibly phycological measurement, it should be possible to extent it to any arbitrary example.
Measurement is just gauging quantity or magnitude. Obviously cognitive faculties exist and to varying degrees – and they must reducibly have basis in physical reality. The exact physical causes of intelligence are abstruse, so until we deconstruct it fully, we have to indirectly and imperfectly measure these processes with certain types of tests.
Perhaps one day, our understanding of the human mind and quantum mechanics will be so sophisticated that we can perfectly project an individual’s life choices from the moment they’re born. That is unless the human mind was bestowed with a metaphysical ability to incept.
Psychophysical reduction is a logical impossibility. See (iii).
https://notpoliticallycorrect.me/2020/08/16/conceptual-arguments-against-heredetarianism/
Of course it’s a general argument about *psychological traits*. If they are, indeed, measures, what’s the specified measured object, the object of measurement and the measurement unit for what you’re discussing? That humans can breed cattle with different temperaments does not justify the inference that there are “measures” of things without a specified measured object, object of measurement and measurement unit.
Furthermore, my other comment is wholly relevant to this discussion on measurement and how we can construct scales while attempting to deduce what they measure after, along with my slew of questions.
“Moreover, what are the consequences of phycological measurement not satisfying your requirements for a measure?”
It’s just a way to reproduce and justify (attempting to naturalize, in the case of IQ) class structure and social hierarchy.
The point isn’t that humans can breed cows for temperament, the point is that is possible to systematically characterise (i.e., measure), the temperament of cows. You claim that this isn’t a measurement because it fails to satisfy your criteria, I claim it is a measurement because it satisfies mine.
A specified measured object doesn’t exist because it isn’t necessary. You claim that it is, but I claim that it isn’t. I claim that you can measure a tendency, among other things. Do you deny that it’s possible to measure the variation (mean squared distance is only an example of such a measurement) or skew of a set of numbers?
As for the implications: I don’t mean social implications of IQ not being a measurement. How does a measurement that fails to meet your criteria differ, in terms actual usage and characteristics, from a measurement that meets them?
Humans observe and pick out cattle that fit the criteria they are attempting to select for in the offspring of the cattle. Because, say, scales can produce numbers that supposedly correspond to psychological traits does this mean the scale is a measure? There are a whole slew of questions for you to answer above.
If it faults to meet the criteria it is not a measurement.
What do you think of measurement by fiat?
I am not attempting to prove conformity to your criteria because I disagree that your criteria are necessary. A measure is a structured description of a characteristic. This is satisfied. If a scale provides a structured description of a trait, then it measures that trait. The scale used to evaluate the cattle provides a measurement of the cattle’s behaviour.
Your claim isn’t that phycological measurements aren’t possible, it’s that phycological measurements aren’t possible in the particular context of the definition of measurement you’re using. Your definition isn’t widely used, nor is it useful.
If I define a measurement as “a structured description of some characteristic”, then clearly a phycological measurement is a measurement. As it so happens, this is approximately what the word measurement is actually used to denote.
Would you disagree that a phycological measurement, like the one provided for livestock, is “a structured description of some characteristic”? The characteristic in this particular case being the tendency of the livestock to act in a way which we would describe as fearful.
You can always disagree by stating that you define the measurements to exclude those things, you might need to coin a some new words to cover the lacuna created, but it doesn’t matter, we would still be talking about exactly the same thing.
Your entire objection is essentially a semantic obfuscation. You could call phycological measurements phycological scalements (for example) if you so desired but we would still be talking about exactly the same thing.
Words are just approximate references for referents. You can use whatever word you desire: nothing changes. Except that you’ve introduced unnecessary complexity and needlessly confused the situation.
If all that is meant by a fiat measurement is a measurement based on the presumption of a relation, then naturally a given measurement may be good or it may not, depending on the extent to which the relation actually exists. At a more fundamental level the concept isn’t very useful. The characteristics we’re interested in aren’t some essential things; they’re just useful ways describe reality. Are we actually measuring the fearfulness of the cattle? Yes, because we’re measuring the tendency of the cattle to behave in a way we would describe, for our purpose, as fearful.
I was interested by the post you linked about Charles Krauthammer and his influence on the Iraq War. People who lived through 9/11 and the Iraq War may find it disturbing that I’m old enough to vote despite having no memory of either.
I got 11 my first attempt, but I smoked bud right before.
As far as BDS being a better measure of intelligence, I personally stand out more there than in FDS. I think my ceiling for each is 11. Don’t see how it’s much harder for someone to recite backwards if they truly have it memorized. I’m good at spelling backwards also. I’m diagnosed ADD, so I’m relatively weak at super rapid memorization, manipulation, and planning like in n-back. I worked at a fast food place as a teenager, and I wasn’t great at reading lists of menu items to bag and then running around to locate said items and pack them optimally.
I reached level 14, but technically that means I remember 13 digits correctly, not 14.
I’m skeptical of the guy who put level 70+, but with the proper mnemonic techniques, it could be done.
Then you should have voted 13. The 70+ is a troll vote. I could maybe see that after a few dozen attempts but not in the first two.
Well that ambiguity is humanbenchmark’s fault, not mine.
I did it😒 should be ridiculous 9, not 10. But isnt it like 40% of the participants did the same?
Just read something on Wikipedia that made me think of this blog.
“When [Kant’s] body was transferred to a new burial spot, his skull was measured during the exhumation and found to be larger than the average German male’s with a “high and broad” forehead. His forehead has been an object of interest ever since it became well-known through his portraits: ‘In Döbler’s portrait and in Kiefer’s faithful if expressionistic reproduction of it — as well as in many of the other late eighteenth- and early nineteenth-century portraits of Kant — the forehead is remarkably large and decidedly retreating. ‘”
I scored 14 digits (so level 15). I noticed that i naturally keep half the digits in my visuospacial sketchpad and the other half in my phonological loop. So I’ll keep repeating the second half of the span in auditory memory while filling out the first half from my visual memory, and then I’ll fill out the second half from auditory memory.
What do you think about this PP? Is this true digit span? Further more, do you think this would indicate a higher or lower IQ than someone who uses pure verbal memory, or pure visual memory?
Level 14, quite an outlier in contrast to my scores on many other tests of executive function, but the test is also specifically verbally loaded according to factor analysis. I wonder what kind of a shared genetic cause my performance on this task and my close relative’s autism spectrum disorder have.
What Weschler said in regards to cognitive decline is the exact reason I try to keep a semi-regular note of my digit span. Rather concerning for me is that I distinctly remember my FDS being ~2 digits longer when I was 17-18 years old, versus my current average of ~8 digits (which I scored on my two attempts today for your survery PP) at the ripe old age of 24… I’ve put this down to being APOE-4 homozygous and having a rather sub-optimal lifestyle for preservation of cognition within the last 3 years of my life.
I also recently sat the WAIS-IV and found the FDS given in that CONSIDERABLY harder than doing it on humanbenchmark (scoring a maximum of 5 digits). A long, monotonous reading of a sequence of digits did very little for my concentration. Thankfully I did quite a bit better on both the BDS and digit sequence portions, so I can rest easy knowing at least some of my G is preserved!
Interesting post as usual PP 🙂
Post subscores
My subscores weren’t included in my report (highly irritating…), but I will add them when the practice I sat the WAIS at gets back to me. I’m highly interested myself because of how skewed my cognitive profile is: VCI 136, PRI 100, WM 100 and PS 108. I had huge problems on block design which makes me think I may have a substantial defect in spatial ability.
On the auditory test, I got 5 digits. On this one, I got 7 because I could see the numbers all at once.
My working memory is 95. I talked to my therapist today and told her I have the inability to work things out in my head step by step. I told her that it is hard for me to find mental work at my level that I find meaningful. I think I have ADD. All my thinking is associative / random. It is somewhat conditional but less so. I would like to solve meaningful problems but finding them is hard to do.
recall julian kaye’s paramour was the wife of a senator.
gene tunney was the son of irish potato famine nobodies who became heavyweight champion of the world and married a rich girl from greenwich…
their son became a senator from california! https://en.wikipedia.org/wiki/John_V._Tunney
that’s the power-o’-the-penis!
What was Thinkfast? How can I access it?
Good question
Ok, it means i put 11 instead of my number 10. Ive destroyed a precious integrity and celestial harmony of your data, PP.🤦♂️Let reverse be my lot!! Btw, what all that might mean in terms of a value/papulation rarity, etc.. Did i make the same shit on the sequences?Its been a while, let me see