, , , , ,

Principal Component Analysis (PCA) is a mathematical technique by which many data points get reduced to a smaller number of more manageable data points.

Cavali-Sforza lumped humans into nine major populations. The following shows his phylogenetic tree of these nine populations followed by a matrix showing the genetic distance between them:

Because I wanted to see if these nine populations could be objectively reduced to a smaller number, I made all the distances negative and then entered the genetic distance matrix into a minitab spread sheet.

The reason I made the distances negative is because PC analysis is usually done on correlation matrices where the higher the value, the more similar. In a genetic distance matrix, it’s the opposite, hence the negative signs I added.

The principal component analysis gave the following result.

C1=Africans, C2= Non-European Caucasoid, C3=European Caucasoid, C4= Northeast Asian, C5=Arctic Northeast Asian, C6= Amerindians, C7= Southeast Asian, C8= Pacific Islander, C9= New Guinean & Australian

To determine how many principal components to retain, mathematicians use what’s called the eigenvalue > 1 rule, which in this case means only three components.

The first component explains 54% of genetic variation and since Northeast Asians have the highest loading on this component (0.432), it can be thought of as a measure of Northeast Asianness. Africans are the only group to load negatively on Northeast Asianness (-0.376).

The second component explains 26% of the variation and since Europeans have the highest loading on this (0.526), it can be considered a measure of whiteness.

The third component explains 12% of the variation and since Native Americans have the highest score on this (0.527) it can perhaps be considered a measure of “New Worldliness”.

Now when I plot each of the nine populations in three-dimensional space (x axis = Northeast Asianness, y axis = whiteness, z axis = New Worldliness) with their loadings multiplied by 10 to make differences visible, we find all of the nine populations fit into three major clusters.

made using Graphing Calculator 3D

These three clusters are extremely similar to the three major races of physical anthropology: Mongoloids on the back wall, Negroids on the side wall, and Caucasoids on the floor.

No disrespect to Caucasoids (I’m 100% pure Caucasoid myself). The graph can be reoriented so any group is on the floor.

One anomaly is that New Guineans & Australian aboriginals cluster with Mongoloids, even though they are morphologically closest to Negroid. Of course such anomolies are not uncommon in taxonomy. Birds for example genetically cluster with reptiles, even though they’re not reptiles. Humans cluster with apes, even though we’re not apes.

Such anomalies occur because most of our DNA is junk, so it groups us based on how recently we share common ancestors, not by how much of that common ancestor we shared.