The largest family tree ever created by mankind
The past two decades have seen extraordinary advances in human genomics research, generating genomic data for hundreds of thousands of individuals, including thousands of prehistoric humans.
This raises the exciting possibility of tracing the origins of humanity’s genetic diversity to create a complete map of how individuals around the world are related to each other.
So far, the main challenges facing this vision are that we are figuring out how to combine genome sequences from a variety of databases and develop algorithms to process data of this size.
Dr Yan Wong, an evolutionary geneticist at the Big Data Institute, and one of the lead authors explains: ‘We have built a giant family tree, a genealogy for all of humanity, modeling as accurately as we can historically has produced all the genetic variation we find in humans today. This pedigree allows us to see how each person’s genetic sequence is related to each other, along with all the points of the genome.”
Since individual genomic regions are inherited from only one parent, either mother or father, the ancestor of each point on the genome can be thought of as a tree.
Set of trees, known as “sequence trees” or “ancestral recombination graphs”, that link genetic regions back in time to the ancestors where genetic variation first appeared.
The study integrated modern and ancient human genome data from eight different databases and included a total of 3,609 individual genome sequences from 215 populations.
The ancient genomes include samples found around the world ranging in age from 1,000 to more than 100,000 years. Algorithms predicted where the common ancestor must be present in the evolutionary tree to explain patterns of genetic variation. The resulting network contains nearly 27 million ancestors.
After adding location data on these sample genomes, the authors used the network to estimate where the predicted common ancestors lived. The results successfully recapture key events in human evolutionary history, including the exodus out of Africa.
While the genealogy map is already an incredibly rich resource, the team plans to make it even more comprehensive by continuing to incorporate genetic data as it becomes available.
Since tree sequences store data in a highly efficient way, the dataset can easily contain millions of additional genomes..
This research is laying the groundwork for the next generation of DNA sequencing. As the quality of genome sequences from modern and ancient DNA samples improves, trees will become more precise and we will eventually be able to produce a single, unified map explaining the source. the origin of all the human genetic variation we see today.
While humans are the focus of this study, the method is valid for most living organisms; from orangutans to bacteria. It could be particularly beneficial in medical genetics, in separating real connections between genetic regions and disease from pseudo-connections arising from our shared ancestral history.
Source: Medindia