Way back in April 2016, I wrote:
Many non-scientists have a great interest in heritability, but lack the science education and/or cognitive ability to understand modern techniques like Genome-wide Complex Trait Analysis (GCTA), so this post is a quick attempt to explain it. Full Disclosure: I have virtually no formal science training beyond high school but this is just an oversimplified explanation.
GCTA gives a measure of the squared correlation between additive genotype and phenotype. The reason it’s so confusing is that you can’t directly correlate a phenotype with a genotype if you haven’t found the genes that code for that phenotype, and thus you can’t determine if someone is genetically high on a given trait.
So for example, you can’t determine if someone’s genetic IQ matches their actual IQ, if you don’t know if they have the genes for IQ. Since a correlation, by definition, is how close the rank order of two variables (i.e. genetic IQ and actual IQ) agree, it can’t be directly calculated if one of said variables (i.e. genetic IQ) can’t be ranked. It would be like trying to calculate the correlation between height and weight, but all the weights were reported in a language you didn’t speak.
To sidestep this problem, GCTA was invented by a scientist of East Asian heritage. In GCTA, instead of ranking everyone in your sample from highest to lowest on each trait, you simply randomly assign people to pairs, and for each pair, calculate the genetic distance and the phenotype distance. So for example, if the people who differ by 100 single nucleotide polymorphisms (SNPs), on average, differ by one standard deviation in IQ, and if people who differ by one standard deviation in IQ differ, on average, by 39 SNPs, then perhaps it can be inferred that (in this sample) the correlation between genetic IQ and actual IQ is whatever number when squared and multiplied by 100, equals 39.
That number is 0.62
This is because in a bivariate normal distribution, the slope of the standardized regression line equals the correlation between two variables, so if a genetic difference of 100 SNPs regresses to a one standard deviation difference in IQ, then one standard deviation must be only 62% as extreme as 100 SNPs and if a one standard deviation difference in IQ regresses to a 39 SNP difference, then 39 must be only 62% as extreme as one standard deviation.
Once we have the correlation of say 0.62 between additive genotype and phenotype , we square it to get the amount of variation explained which in this example would be 0.38 (the real number is probably much higher, and even higher still for broad-sense heritability).
Of course what very few people realize is that heritability is technically NOT the percentage of the phenotypic variation explained by genes, it’s the percentage explained by genes when environment is held constant or allowed to vary randomly.
Recently commenter Trumpocalypse (aka Mugabe), also commented on GCTA:
if it could be shown that the joint distribution, f, of some measure of genetic distance between individuals, d, and phenotypic difference, p, has a linear conditional mean (that is, f(d|p) is a straight line), then it would be possible to avoid all the problems of twin studies and the shared womb of twins…
the conditional mean could simply be extrapolated to d = 0. and from the mean difference, p, at d = 0, the heritability could be calculated easily.
furthermore the perfect-ness of the measure of genetic distance would be irrelevant. the joint probability density function would “include” this error. so with a large enough sample the extrapolation would have very little error.
this is essentially what GCTA is.
but whether the joint distribution is bivariate normal or some other distribution with linear conditional mean…i don’t know.
Here is some genetic distance data on nine major human populations:
According to Richard Lynn’s controversial meta-analysis of these nine genetic clusters, Africans average IQ 67, Non-European Caucasoids average 84, European Caucasoids average 99, Northeast Asians average 105, Arctic Northeast Asians average 91, Amerindians average 86, Southeast Asians average 87, Pacific Islanders average 85, and New Guineans and Australians average 62.
Although this is group data, not individual data, it would be interesting to compare group genetic distance to group IQ difference. If for example, I knew both the average IQ difference between two random humans from anywhere in the World, and if I also knew the Fst distance between two random individuals from anywhere in the World, I think I could probably use my crude understanding of GCTA, to estimate the IQ phenotype-genotype individual level correlation of the entire human species from Sforza’s and Lynn’s group level data. Squaring this correlation might be a good proxy for IQ’s Worldwide heritability.
On the other hand, the fact that Sforza intentionally tried to use non-selected genes to calculate genetic distance (since these mutate at a regular rate creating a reliable molecular clock for splitting off dates) might make the exercise pointless.