the genetic relationship between five human populations, expressed as a family tree. From left to right: San, Yoruba, French, Papuan, East Asian

I would like to thank everyone for the incredible reaction to my first dendrogram and a special thanks to the great James Thompson who tweeted it out to his thousands and thousands of twitter followers.

Although the vast majority of people enjoyed the dendrogram, there were a few people who were mocking it. Luckily I was tipped off by a quick thinking blogger. Not to namedrop but it was HBD Chick).

One of the critics asked why I had not calculated the cophentic correlation, because if I had, I would have known that human “races” don’t fit a tree like structure.

Alan R. Templeton writes:

The cophenetic correlation measures how well the observed genetic distances fit the predicted genetic distances from an evolutionary tree model and provides a heuristic goodness of fit to treeness…  The cophenetic correlations for various data sets that have been used to portray human population trees vary from 0.45 to 0.79 (Templeton, 1998a). A tree-like structure of genetic differentiation requires a cophentic correlation greater than 0.9, and any value less than 0.8 is regarded as a poor fit (Rohlf, 1993)

Source: Biological races in Humans

So what’s the cophenetic correlation of my dendrogram?

0.99333828

Why did my dendrogram achieve such a cophenetic correlation when others failed to do so? There are several possible reasons.

  1. incompetence? perhaps I didn’t follow the correct procedures?
  2. luck. With only a 5 modern human populations, a high correlation may have occurred by chance.
  3. pure samples: perhaps the genomes in the data-set were less hybridized than previous data-sets which contained mixed groups like Southeast Asians
  4. genome thoroughness: perhaps my data being newer, sampled more of the genome than previous research and thus gave more accurate results.