BILLIONhe The first human genome was mapped in 2001 as part of the Human Genome Project, but researchers know that it is neither complete nor completely accurate. Now, scientists have produced The most completely sequenced human genome to date, fill in the gaps and correct the errors in the previous version.
The sequence is the most complete reference genome for any mammal to date. Findings from six new papers describing the genome, published in Sciencewill lead to a deeper understanding of human evolution and potentially reveal new targets for tackling a wide range of diseases.
The human genome is more accurate
“The Human Genome Project is based on DNA obtained through blood collection; it was the technology at the time,” said Adam Phillippy, head of genome informatics at the National Human Genome Research Institute (NHGRI) and senior author of one of the new article said. “The techniques at the time introduced flaws and vulnerabilities that had existed throughout the years. It’s great to fill those gaps and correct those mistakes.”
Michael Schatz, professor of computer science and biology at Johns Hopkins University and another senior author of the same paper.
This work is the result of the Telomere to Telomere consortium, supported by NHGRI and involving genetic and computational biology experts from dozens of institutes around the world. The team focused on filling 8% of the human genome remains a genetic black hole from the first draft sequence. Since then, geneticists have been trying to fill in those missing pieces little by little. The latest research team has determined the value of whole chromosomes in terms of new sequences, representing another 200 million base pairs (the letters that make up the genome) and 1,956 new genes.
“Since the Human Genome Project [in 2001]“We have claimed victory a few times over the past two decades,” said Evan Eichler, professor of genomic sciences at the University of Washington and another senior author of one of the papers. Eichler, who was also involved in the mapping of that original sequence, says the emphasis on what was sequenced was different this time around. “Although the original goal of the Human Genome Project was to sequence and direct every base pair, that could not be achieved because the technology was not yet advanced enough. So we have completed the parts that can be done.”
The promise of new discoveries
The newly sequenced regions include previously inaccessible parts such as the centromere, tightly wound central parts of chromosomes that keep the long double strands of DNA organized when these strands are withdrawn. out, little by little, to self-replicate and split into two cells like a single cell. share. These regions are important for normal human development and also play a role in brain development and neurodegenerative diseases. “It is one of the great mysteries of biology that all eukaryotes — all plants, animals, people, trees, flowers, and higher organisms — have centromeres. It’s really a fundamental part of how DNA copies and how chromosomes are organized and how cells divide. But that’s a big paradox, because even though its function has been around for billions of years, it’s almost impossible to study because we don’t have a centromere sequence to look at,” said Schatz. “Now we finally did.”
Scientists were also able to sequence long stretches of DNA containing repetitive sequences, which genetic experts initially thought were similar to copying errors and dismissed the so-called “junk DNA”. However, these repetitive sequences may play a role in some human diseases. “Just because a sequence is repetitive doesn’t mean it’s messy,” says Eichler. He shows that important genes are attached to these repetitive regions – genes that contribute to the machinery for making proteins, genes that regulate how cells divide and split their DNA evenly into two daughter cells of the body. them and human-specific genes that distinguish humans from our closest evolutionary relatives, the primates. For example, in one of the papers, the researchers found that primates have different copy numbers of these repeat regions than humans, and that they occur in different parts of the genome. .
“These are some of the most important functions needed to live and to make us human,” says Eichler. “Obviously, if you remove these genes, you won’t live. It’s not trivial for me.”
Deanna Church, vice president of Inscripta, a genome engineering company, said what it means, if any, to decode these repeats, and how to sequence the regions that haven’t been sequenced. previously defined as the centromere. The company wrote a commentary accompanying the scientific articles. Having the full sequence of the human genome differs from its decoding; She estimates that scientists have only decoded about half of what the human genome does.
There is still room for improvement. The new sequence essentially comes from half human – that is, half the genetic content normally found in a person’s DNA. Each person has two sets of chromosomes, one from the mother and one from the father. Each of those DNA strand contains slightly different versions of the gene, essentially giving us two sets of genes. Assembling those two genomes was no trivial task, and those challenges stymied the original Human Genome Project and resulted in the lack of parts of it. Sequencing technology at the time couldn’t easily separate the mother’s and father’s DNA copies, so if scientists tried to match certain parts they would think they were working with the mother’s chromosomes. mother, for example, they may encounter areas where they do not match. because they’re actually working with the paternal chromosome. “It’s similar to having two puzzles in the same box,” says Phillippy. “You have to sort out what the difference is and reconstruct both.”
For this new sequence, the scientists took advantage of a fertilization error in which the resulting embryo contained only the father’s chromosomes. The resulting growth was discarded and in the early 2000s persisted in the laboratory as a cell line that survived despite the abnormal chromosomal content. That makes it easier for teams to assemble the genome because they’re essentially working with a single genetic puzzle to solve.
Ultimately, however, researchers will need a more complete human genome with complete sequences of both maternal and paternal chromosomes. It will be released soon. Phillippy and others are working with triplets of DNA samples from volunteers and their parents so that scientists can separate the mother’s DNA from the father’s sequence and essentially assemble the two separate genomes. separate. The research teams expect to complete the so-called diploid human genome sequencing by the end of the year.
Winston Timp, associate professor of biomedical engineering at Johns Hopkins and co-author of one of the papers, said: “New genome assembly is beneficial because it provides a more accurate map. to understand what the data we have from earlier means. That includes finding new variants that can distinguish healthy people from those affected by disease, as well as variants that could put people at higher risk for certain diseases. .
“We discovered millions of previously unknown genetic variants in samples from thousands of individuals with high genetics,” said Rajiv McCoy, assistant professor of biology at Johns Hopkins and another co-author. genome has been sequenced. “We will have to wait until future work to learn more about their link to disease, but a big focus of the work now will be on trying to discover new genetic variants. which could not be determined before”.
Even with a more complete version of the human genome, scientists probably wouldn’t call for replacing the old version, despite its flaws and flaws. That’s because decades of research in human genetics have made the older version more annotated than the new one — similar to the difference between a copy of your favorite book, and a copy of your favorite book. Your handwritten notes and margin markers and fresh copies from the bookstore. “A genome is only as good as its caption,” says Eichler. “All clinical and research labs have built decades’ worth of data based on old, flawed genomes. To do all that work again for any individual lab would be appalling.” He predicts that more labs will gradually transition to working with the new genomes by comparing smaller datasets first during test runs to see how rich the information they’re generating from the newer genomes is. and more comprehensive. Like the original human genome, the new genome is also posted on a public database for any scientists to use. “For now, both genomes will be left intact so there will be no substitutions,” he said.
In the coming years, researchers will also begin creating more complete genomes, using both maternal and paternal DNA, to help scientists identify the best targets for new therapies and improve understanding of human development and evolution. The more genomes they have, the more likely important patterns will emerge that could lead to new understandings of human diseases and new treatments for them. Ultimately, the goal is for each person to be able to have their complete genome sequenced as part of their medical record, which will allow physicians to compare those sequences with reference sequences and identify which variants may contribute to specific diseases.
“This is introducing the world to an extra chromosome,” said Karen Miga, assistant professor of biomolecular engineering at the University of California, Santa Cruz and senior author of one of the papers. which we have never seen before. “We have new landscapes, new chains, opportunities and the promise of new discoveries.”
The excitement can be felt in the genomics and medical communities. “Hallelujah, we’ve finally completed a human genome, but the best is yet to come,” Eichler told a news conference. “No one should see this as the end, but rather the beginning of a transformation not only in genomics research but also in clinical medicine.”
Other must-read stories from TIME