Create a free profile to get unlimited access to exclusive videos, sweepstakes, and more!
Gathering genetic data has become easier and more cost effective over recent years. As a result, thousands of genomes have been generated both from modern humans and ancient ancestors. One of the major challenges for putting that information into a cohesive story is the variances in the way genome information is stored.
Yan Wong from the Big Data Institute at the University of Oxford, and colleagues, have crafted a system for integrating disparate genetic data sets into a cohesive collection to generate a story of humanity’s migration across the Earth. This new tool was announced and described in the journal Science.
While thousands of ancient genomes and hundreds of thousands of modern ones have been collected in recent decades, the team used only about 3,600 to create an overarching family tree of humanity. Those sequences were collected from a number of publicly available data sets.
“There are three public DNA data sets that have good coverage and sample people from all over the world,” Wong told SYFY WIRE. “The oldest one is called the Thousand Genomes Project; there’s the Simons Genome Diversity Project; and there’s the Human Genome Diversity Project. Our method was developed to sort of smooth all of those together.”
In addition to the 3,601 modern genomes included in the project, the team also included DNA from eight ancient humans. Those samples included three Neanderthals from different points in time, one Denisovan, and four from a Siberian group called the Afanasievo who lived approximately 4.6 thousand years ago. Gathering additional ancient DNA is a challenge because as genetic material ages it decreases in quality. However, the team’s algorithm was able to work around the gaps in the data sets because of a characteristic called imputation.
“With the ancient DNA, it’s fragmentary enough that you might need to be able to deal with missing data. Our method copes with missing data very well. If you’ve got the ancestry, you can work out where it came from and make a good guess of what the genetic diversity looks like, even if you don’t have the genetic information,” Wong said.
That’s essentially how the system works. It takes in all of the information it has and makes comparisons. From there it can draw connections about how different populations might have moved around the planet over time. Importantly, the team didn’t feed any preexisting notions into the system to guide its predictions. That means scientists can allow the data to speak for itself, but it also means that it sometimes makes unexpected predictions which might not represent reality, especially where gaps in the data exist.
The tool predicted, for instance, that a population of humans somehow ended up in the middle of the Atlantic Ocean, during a time when they shouldn’t have had any way to cross large bodies of water.
“We haven’t told it that it’s more difficult to move on water than it is on land,” Wong said. “We also don’t have any DNA from northern North America. The nearest we have are from Mexico. So, the algorithm is seeing there’s a load of people in Siberia and then they moved to Mexico. How did they get there?”
As more genetic information becomes available, the ability for the tool to accurately predict the movements of our species will improve. The above problem, for example, could be corrected by adding genetic data from indigenous North American populations, filling in that gap.
Scientists are planning to make the tool freely available so that other labs, or even individuals, could input their own information. If you already have your own genetic data, from a commercial company like Ancestry or 23andMe, you could feasibly enter your information and see how your ancestors moved throughout the world over time. But the applications of this tool aren’t limited only to humans.
“It applies to any set of DNA sequences you care to come up with. That ranges from pathogens and parasites — you might want to look at mosquitos and malaria — or viruses and fungal infections in crop plants. Or you might want to look at organisms of conservation interest,” Wong said.
Before long, we might be able to see not just where our great-great grandparents came from, but where our ancestors lived many thousands of years ago, tracing all the way back to the origin of our species. We might find that we have even more in common than we realize.