1
|
Makova KD, Pickett BD, Harris RS, Hartley GA, Cechova M, Pal K, Nurk S, Yoo D, Li Q, Hebbar P, McGrath BC, Antonacci F, Aubel M, Biddanda A, Borchers M, Bornberg-Bauer E, Bouffard GG, Brooks SY, Carbone L, Carrel L, Carroll A, Chang PC, Chin CS, Cook DE, Craig SJC, de Gennaro L, Diekhans M, Dutra A, Garcia GH, Grady PGS, Green RE, Haddad D, Hallast P, Harvey WT, Hickey G, Hillis DA, Hoyt SJ, Jeong H, Kamali K, Pond SLK, LaPolice TM, Lee C, Lewis AP, Loh YHE, Masterson P, McGarvey KM, McCoy RC, Medvedev P, Miga KH, Munson KM, Pak E, Paten B, Pinto BJ, Potapova T, Rhie A, Rocha JL, Ryabov F, Ryder OA, Sacco S, Shafin K, Shepelev VA, Slon V, Solar SJ, Storer JM, Sudmant PH, Sweetalana, Sweeten A, Tassia MG, Thibaud-Nissen F, Ventura M, Wilson MA, Young AC, Zeng H, Zhang X, Szpiech ZA, Huber CD, Gerton JL, Yi SV, Schatz MC, Alexandrov IA, Koren S, O'Neill RJ, Eichler EE, Phillippy AM. The complete sequence and comparative analysis of ape sex chromosomes. Nature 2024:10.1038/s41586-024-07473-2. [PMID: 38811727 DOI: 10.1038/s41586-024-07473-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Accepted: 04/26/2024] [Indexed: 05/31/2024]
Abstract
Apes possess two sex chromosomes-the male-specific Y chromosome and the X chromosome, which is present in both males and females. The Y chromosome is crucial for male reproduction, with deletions being linked to infertility1. The X chromosome is vital for reproduction and cognition2. Variation in mating patterns and brain function among apes suggests corresponding differences in their sex chromosomes. However, owing to their repetitive nature and incomplete reference assemblies, ape sex chromosomes have been challenging to study. Here, using the methodology developed for the telomere-to-telomere (T2T) human genome, we produced gapless assemblies of the X and Y chromosomes for five great apes (bonobo (Pan paniscus), chimpanzee (Pan troglodytes), western lowland gorilla (Gorilla gorilla gorilla), Bornean orangutan (Pongo pygmaeus) and Sumatran orangutan (Pongo abelii)) and a lesser ape (the siamang gibbon (Symphalangus syndactylus)), and untangled the intricacies of their evolution. Compared with the X chromosomes, the ape Y chromosomes vary greatly in size and have low alignability and high levels of structural rearrangements-owing to the accumulation of lineage-specific ampliconic regions, palindromes, transposable elements and satellites. Many Y chromosome genes expand in multi-copy families and some evolve under purifying selection. Thus, the Y chromosome exhibits dynamic evolution, whereas the X chromosome is more stable. Mapping short-read sequencing data to these assemblies revealed diversity and selection patterns on sex chromosomes of more than 100 individual great apes. These reference assemblies are expected to inform human evolution and conservation genetics of non-human apes, all of which are endangered species.
Collapse
Affiliation(s)
| | - Brandon D Pickett
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Monika Cechova
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - Karol Pal
- Penn State University, University Park, PA, USA
| | - Sergey Nurk
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - DongAhn Yoo
- University of Washington School of Medicine, Seattle, WA, USA
| | - Qiuhui Li
- Johns Hopkins University, Baltimore, MD, USA
| | - Prajna Hebbar
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | | | | | | | | | - Erich Bornberg-Bauer
- University of Münster, Münster, Germany
- MPI for Developmental Biology, Tübingen, Germany
| | - Gerard G Bouffard
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Shelise Y Brooks
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Lucia Carbone
- Oregon Health and Science University, Portland, OR, USA
- Oregon National Primate Research Center, Hillsboro, OR, USA
| | - Laura Carrel
- Penn State University School of Medicine, Hershey, PA, USA
| | | | | | - Chen-Shan Chin
- Foundation of Biological Data Sciences, Belmont, CA, USA
| | | | | | | | - Mark Diekhans
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - Amalia Dutra
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Gage H Garcia
- University of Washington School of Medicine, Seattle, WA, USA
| | | | | | - Diana Haddad
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Pille Hallast
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | | - Glenn Hickey
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - David A Hillis
- University of California Santa Barbara, Santa Barbara, CA, USA
| | | | - Hyeonsoo Jeong
- University of Washington School of Medicine, Seattle, WA, USA
| | | | | | | | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | | - Yong-Hwee E Loh
- University of California Santa Barbara, Santa Barbara, CA, USA
| | - Patrick Masterson
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Kelly M McGarvey
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Karen H Miga
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | - Evgenia Pak
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Benedict Paten
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | | | - Arang Rhie
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Joana L Rocha
- University of California Berkeley, Berkeley, CA, USA
| | - Fedor Ryabov
- Masters Program in National Research, University Higher School of Economics, Moscow, Russia
| | | | - Samuel Sacco
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | | | | | - Steven J Solar
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Sweetalana
- Penn State University, University Park, PA, USA
| | - Alex Sweeten
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- Johns Hopkins University, Baltimore, MD, USA
| | | | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Mario Ventura
- Università degli Studi di Bari Aldo Moro, Bari, Italy
| | | | - Alice C Young
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Xinru Zhang
- Penn State University, University Park, PA, USA
| | | | | | | | - Soojin V Yi
- University of California Santa Barbara, Santa Barbara, CA, USA
| | | | | | - Sergey Koren
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Evan E Eichler
- University of Washington School of Medicine, Seattle, WA, USA.
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA.
| | - Adam M Phillippy
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
2
|
Huang J, Zhang Y, Li Y, Xing M, Lei C, Wang S, Nie Y, Wang Y, Zhao M, Han Z, Sun X, Zhou H, Wang Y, Zheng X, Xiao X, Fan W, Liu Z, Guo W, Zhang L, Cheng Y, Qian Q, He H, Yang Q, Qiao W. Haplotype-resolved gapless genome and chromosome segment substitution lines facilitate gene identification in wild rice. Nat Commun 2024; 15:4573. [PMID: 38811581 DOI: 10.1038/s41467-024-48845-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 05/15/2024] [Indexed: 05/31/2024] Open
Abstract
The abundant genetic variation harbored by wild rice (Oryza rufipogon) has provided a reservoir of useful genes for rice breeding. However, the genome of wild rice has not yet been comprehensively assessed. Here, we report the haplotype-resolved gapless genome assembly and annotation of wild rice Y476. In addition, we develop two sets of chromosome segment substitution lines (CSSLs) using Y476 as the donor parent and cultivated rice as the recurrent parents. By analyzing the gapless reference genome and CSSL population, we identify 254 QTLs associated with agronomic traits, biotic and abiotic stresses. We clone a receptor-like kinase gene associated with rice blast resistance and confirm its wild rice allele improves rice blast resistance. Collectively, our study provides a haplotype-resolved gapless reference genome and demonstrates a highly efficient platform for gene identification from wild rice.
Collapse
Affiliation(s)
- Jingfen Huang
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Yilin Zhang
- School of Advanced Agriculture Sciences and School of Life Sciences, State Key Laboratory of Protein and Plant Gene Research, Peking University, Beijing, China
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences at Weifang, Weifang, Shandong, China
| | - Yapeng Li
- National Nanfan Research Institute (Sanya), Chinese Academy of Agricultural Sciences, Sanya, Hainan, China
- Hainan Academy of Agricultural Sciences, Haikou, Hainan, China
| | - Meng Xing
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
- National Nanfan Research Institute (Sanya), Chinese Academy of Agricultural Sciences, Sanya, Hainan, China
| | - Cailin Lei
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
- National Nanfan Research Institute (Sanya), Chinese Academy of Agricultural Sciences, Sanya, Hainan, China
| | - Shizhuang Wang
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
- National Nanfan Research Institute (Sanya), Chinese Academy of Agricultural Sciences, Sanya, Hainan, China
| | - Yamin Nie
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
- National Nanfan Research Institute (Sanya), Chinese Academy of Agricultural Sciences, Sanya, Hainan, China
| | - Yanyan Wang
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
- National Nanfan Research Institute (Sanya), Chinese Academy of Agricultural Sciences, Sanya, Hainan, China
| | - Mingchao Zhao
- National Nanfan Research Institute (Sanya), Chinese Academy of Agricultural Sciences, Sanya, Hainan, China
- Hainan Academy of Agricultural Sciences, Haikou, Hainan, China
| | - Zhenyun Han
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Xianjun Sun
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Han Zhou
- School of Advanced Agriculture Sciences and School of Life Sciences, State Key Laboratory of Protein and Plant Gene Research, Peking University, Beijing, China
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences at Weifang, Weifang, Shandong, China
| | - Yan Wang
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences at Weifang, Weifang, Shandong, China
| | - Xiaoming Zheng
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
- National Nanfan Research Institute (Sanya), Chinese Academy of Agricultural Sciences, Sanya, Hainan, China
| | - Xiaorong Xiao
- National Nanfan Research Institute (Sanya), Chinese Academy of Agricultural Sciences, Sanya, Hainan, China
- Hainan Academy of Agricultural Sciences, Haikou, Hainan, China
| | - Weiya Fan
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Ziran Liu
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Wenlong Guo
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Lifang Zhang
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Yunlian Cheng
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Qian Qian
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
- National Nanfan Research Institute (Sanya), Chinese Academy of Agricultural Sciences, Sanya, Hainan, China
| | - Hang He
- School of Advanced Agriculture Sciences and School of Life Sciences, State Key Laboratory of Protein and Plant Gene Research, Peking University, Beijing, China.
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences at Weifang, Weifang, Shandong, China.
| | - Qingwen Yang
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China.
- National Nanfan Research Institute (Sanya), Chinese Academy of Agricultural Sciences, Sanya, Hainan, China.
| | - Weihua Qiao
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China.
- National Nanfan Research Institute (Sanya), Chinese Academy of Agricultural Sciences, Sanya, Hainan, China.
| |
Collapse
|
3
|
Li H, Durbin R. Genome assembly in the telomere-to-telomere era. Nat Rev Genet 2024:10.1038/s41576-024-00718-w. [PMID: 38649458 DOI: 10.1038/s41576-024-00718-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/27/2024] [Indexed: 04/25/2024]
Abstract
Genome sequences largely determine the biology and encode the history of an organism, and de novo assembly - the process of reconstructing the genome sequence of an organism from sequencing reads - has been a central problem in bioinformatics for four decades. Until recently, genomes were typically assembled into fragments of a few megabases at best, but now technological advances in long-read sequencing enable the near-complete assembly of each chromosome - also known as telomere-to-telomere assembly - for many organisms. Here, we review recent progress on assembly algorithms and protocols, with a focus on how to derive near-telomere-to-telomere assemblies. We also discuss the additional developments that will be required to resolve remaining assembly gaps and to assemble non-diploid genomes.
Collapse
Affiliation(s)
- Heng Li
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA.
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| | - Richard Durbin
- Department of Genetics, Cambridge University, Cambridge, UK.
| |
Collapse
|
4
|
Zhang S, Xu N, Fu L, Yang X, Li Y, Yang Z, Feng Y, Ma K, Jiang X, Han J, Hu R, Zhang L, de Gennaro L, Ryabov F, Meng D, He Y, Wu D, Yang C, Paparella A, Mao Y, Bian X, Lu Y, Antonacci F, Ventura M, Shepelev VA, Miga KH, Alexandrov IA, Logsdon GA, Phillippy AM, Su B, Zhang G, Eichler EE, Lu Q, Shi Y, Sun Q, Mao Y. Comparative genomics of macaques and integrated insights into genetic variation and population history. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.07.588379. [PMID: 38645259 PMCID: PMC11030432 DOI: 10.1101/2024.04.07.588379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
The crab-eating macaques ( Macaca fascicularis ) and rhesus macaques ( M. mulatta ) are widely studied nonhuman primates in biomedical and evolutionary research. Despite their significance, the current understanding of the complex genomic structure in macaques and the differences between species requires substantial improvement. Here, we present a complete genome assembly of a crab-eating macaque and 20 haplotype-resolved macaque assemblies to investigate the complex regions and major genomic differences between species. Segmental duplication in macaques is ∼42% lower, while centromeres are ∼3.7 times longer than those in humans. The characterization of ∼2 Mbp fixed genetic variants and ∼240 Mbp complex loci highlights potential associations with metabolic differences between the two macaque species (e.g., CYP2C76 and EHBP1L1 ). Additionally, hundreds of alternative splicing differences show post-transcriptional regulation divergence between these two species (e.g., PNPO ). We also characterize 91 large-scale genomic differences between macaques and humans at a single-base-pair resolution and highlight their impact on gene regulation in primate evolution (e.g., FOLH1 and PIEZO2 ). Finally, population genetics recapitulates macaque speciation and selective sweeps, highlighting potential genetic basis of reproduction and tail phenotype differences (e.g., STAB1 , SEMA3F , and HOXD13 ). In summary, the integrated analysis of genetic variation and population genetics in macaques greatly enhances our comprehension of lineage-specific phenotypes, adaptation, and primate evolution, thereby improving their biomedical applications in human diseases.
Collapse
|
5
|
Koren S, Bao Z, Guarracino A, Ou S, Goodwin S, Jenike KM, Lucas J, McNulty B, Park J, Rautiainen M, Rhie A, Roelofs D, Schneiders H, Vrijenhoek I, Nijbroek K, Ware D, Schatz MC, Garrison E, Huang S, McCombie WR, Miga KH, Wittenberg AH, Phillippy AM. Gapless assembly of complete human and plant chromosomes using only nanopore sequencing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.15.585294. [PMID: 38529488 PMCID: PMC10962732 DOI: 10.1101/2024.03.15.585294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/27/2024]
Abstract
The combination of ultra-long Oxford Nanopore (ONT) sequencing reads with long, accurate PacBio HiFi reads has enabled the completion of a human genome and spurred similar efforts to complete the genomes of many other species. However, this approach for complete, "telomere-to-telomere" genome assembly relies on multiple sequencing platforms, limiting its accessibility. ONT "Duplex" sequencing reads, where both strands of the DNA are read to improve quality, promise high per-base accuracy. To evaluate this new data type, we generated ONT Duplex data for three widely-studied genomes: human HG002, Solanum lycopersicum Heinz 1706 (tomato), and Zea mays B73 (maize). For the diploid, heterozygous HG002 genome, we also used "Pore-C" chromatin contact mapping to completely phase the haplotypes. We found the accuracy of Duplex data to be similar to HiFi sequencing, but with read lengths tens of kilobases longer, and the Pore-C data to be compatible with existing diploid assembly algorithms. This combination of read length and accuracy enables the construction of a high-quality initial assembly, which can then be further resolved using the ultra-long reads, and finally phased into chromosome-scale haplotypes with Pore-C. The resulting assemblies have a base accuracy exceeding 99.999% (Q50) and near-perfect continuity, with most chromosomes assembled as single contigs. We conclude that ONT sequencing is a viable alternative to HiFi sequencing for de novo genome assembly, and has the potential to provide a single-instrument solution for the reconstruction of complete genomes.
Collapse
Affiliation(s)
- Sergey Koren
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Zhigui Bao
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, Tübingen, BadenWürttemberg, Germany
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee, USA
- Human Technopole, Milan, Italy
| | - Shujun Ou
- Ohio State University, Columbus, OH, USA
| | - Sara Goodwin
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Katharine M. Jenike
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Julian Lucas
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - Brandy McNulty
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - Jimin Park
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - Mikko Rautiainen
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Arang Rhie
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Dick Roelofs
- KeyGene, Agro Business Park 90, 6708 PW Wageningen, Netherlands
| | | | - Ilse Vrijenhoek
- KeyGene, Agro Business Park 90, 6708 PW Wageningen, Netherlands
| | - Koen Nijbroek
- KeyGene, Agro Business Park 90, 6708 PW Wageningen, Netherlands
| | - Doreen Ware
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Michael C. Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee, USA
| | - Sanwen Huang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
- State Key Laboratory of Tropical Crop Breeding, Chinese Academy of Tropical Agricultural Sciences, Haikou, Hainan, China
| | | | - Karen H. Miga
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | - Adam M. Phillippy
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
6
|
Wang H, Chang TS, Dombroski BA, Cheng PL, Patil V, Valiente-Banuet L, Farrell K, Mclean C, Molina-Porcel L, Rajput A, De Deyn PP, Bastard NL, Gearing M, Kaat LD, Swieten JCV, Dopper E, Ghetti BF, Newell KL, Troakes C, de Yébenes JG, Rábano-Gutierrez A, Meller T, Oertel WH, Respondek G, Stamelou M, Arzberger T, Roeber S, Müller U, Hopfner F, Pastor P, Brice A, Durr A, Ber IL, Beach TG, Serrano GE, Hazrati LN, Litvan I, Rademakers R, Ross OA, Galasko D, Boxer AL, Miller BL, Seeley WW, Deerlin VMV, Lee EB, White CL, Morris H, de Silva R, Crary JF, Goate AM, Friedman JS, Leung YY, Coppola G, Naj AC, Wang LS, Dickson DW, Höglinger GU, Schellenberg GD, Geschwind DH, Lee WP. Whole-Genome Sequencing Analysis Reveals New Susceptibility Loci and Structural Variants Associated with Progressive Supranuclear Palsy. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2023.12.28.23300612. [PMID: 38234807 PMCID: PMC10793533 DOI: 10.1101/2023.12.28.23300612] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/19/2024]
Abstract
Background Progressive supranuclear palsy (PSP) is a rare neurodegenerative disease characterized by the accumulation of aggregated tau proteins in astrocytes, neurons, and oligodendrocytes. Previous genome-wide association studies for PSP were based on genotype array, therefore, were inadequate for the analysis of rare variants as well as larger mutations, such as small insertions/deletions (indels) and structural variants (SVs). Method In this study, we performed whole genome sequencing (WGS) and conducted association analysis for single nucleotide variants (SNVs), indels, and SVs, in a cohort of 1,718 cases and 2,944 controls of European ancestry. Of the 1,718 PSP individuals, 1,441 were autopsy-confirmed and 277 were clinically diagnosed. Results Our analysis of common SNVs and indels confirmed known genetic loci at MAPT, MOBP, STX6, SLCO1A2, DUSP10, and SP1, and further uncovered novel signals in APOE, FCHO1/MAP1S, KIF13A, TRIM24, TNXB, and ELOVL1. Notably, in contrast to Alzheimer's disease (AD), we observed the APOE ε2 allele to be the risk allele in PSP. Analysis of rare SNVs and indels identified significant association in ZNF592 and further gene network analysis identified a module of neuronal genes dysregulated in PSP. Moreover, seven common SVs associated with PSP were observed in the H1/H2 haplotype region (17q21.31) and other loci, including IGH, PCMT1, CYP2A13, and SMCP. In the H1/H2 haplotype region, there is a burden of rare deletions and duplications (P = 6.73×10-3) in PSP. Conclusions Through WGS, we significantly enhanced our understanding of the genetic basis of PSP, providing new targets for exploring disease mechanisms and therapeutic interventions.
Collapse
Affiliation(s)
- Hui Wang
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Penn Neurodegeneration Genomics Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Timothy S Chang
- Movement Disorders Programs, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Beth A Dombroski
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Penn Neurodegeneration Genomics Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Po-Liang Cheng
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Penn Neurodegeneration Genomics Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Vishakha Patil
- Movement Disorders Programs, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Leopoldo Valiente-Banuet
- Movement Disorders Programs, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Kurt Farrell
- Department of Pathology, Department of Artificial Intelligence & Human Health, Nash Family, Department of Neuroscience, Ronald M. Loeb Center for Alzheimer's Disease, Friedman Brain, Institute, Neuropathology Brain Bank & Research CoRE, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Catriona Mclean
- Victorian Brain Bank, The Florey Institute of Neuroscience and Mental Health, Parkville, Victoria, Australia
| | - Laura Molina-Porcel
- Alzheimer's disease and other cognitive disorders unit. Neurology Service, Hospital Clínic, Fundació Recerca Clínic Barcelona (FRCB). Institut d'Investigacions Biomediques August Pi I Sunyer (IDIBAPS), University of Barcelona, Barcelona, Spain
- Neurological Tissue Bank of the Biobanc-Hospital Clínic-IDIBAPS, Barcelona, Spain
| | - Alex Rajput
- Movement Disorders Program, Division of Neurology, University of Saskatchewan, Saskatoon, Saskatchewan, Canada
| | - Peter Paul De Deyn
- Laboratory of Neurochemistry and Behavior, Experimental Neurobiology Unit, University of Antwerp, Wilrijk (Antwerp), Belgium
- Department of Neurology, University Medical Center Groningen, NL-9713 AV Groningen, Netherlands
| | | | - Marla Gearing
- Department of Pathology and Laboratory Medicine and Department of Neurology, Emory University School of Medicine, Atlanta, GA, USA
| | | | | | - Elise Dopper
- Netherlands Brain Bank and Erasmus University, Netherlands
| | - Bernardino F Ghetti
- Department of Pathology and Laboratory Medicine, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Kathy L Newell
- Department of Pathology and Laboratory Medicine, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Claire Troakes
- London Neurodegenerative Diseases Brain Bank, King's College London, London, UK
| | | | - Alberto Rábano-Gutierrez
- Fundación CIEN (Centro de Investigación de Enfermedades Neurológicas) - Centro Alzheimer Fundación Reina Sofía, Madrid, Spain
| | - Tina Meller
- Department of Neurology, Philipps-Universität, Marburg, Germany
| | | | - Gesine Respondek
- German Center for Neurodegenerative Diseases (DZNE), Munich, Germany
| | - Maria Stamelou
- Parkinson's disease and Movement Disorders Department, HYGEIA Hospital, Athens, Greece
- European University of Cyprus, Nicosia, Cyprus
| | - Thomas Arzberger
- Department of Psychiatry and Psychotherapy, University Hospital Munich, Ludwig-Maximilians-University Munich, Germany
- Center for Neuropathology and Prion Research, Ludwig-Maximilians-University Munich, Germany
| | | | | | - Franziska Hopfner
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Pau Pastor
- Unit of Neurodegenerative diseases, Department of Neurology, University Hospital Germans Trias i Pujol, Badalona, Barcelona, Spain
- Neurosciences, The Germans Trias i Pujol Research Institute (IGTP) Badalona, Badalona, Spain
| | - Alexis Brice
- Sorbonne Université, Paris Brain Institute - Institut du Cerveau - ICM, Inserm U1127, CNRS UMR 7225, APHP - Hôpital Pitié-Salpêtrière, Paris, France
| | - Alexandra Durr
- Sorbonne Université, Paris Brain Institute - Institut du Cerveau - ICM, Inserm U1127, CNRS UMR 7225, APHP - Hôpital Pitié-Salpêtrière, Paris, France
| | - Isabelle Le Ber
- Sorbonne Université, Paris Brain Institute - Institut du Cerveau - ICM, Inserm U1127, CNRS UMR 7225, APHP - Hôpital Pitié-Salpêtrière, Paris, France
| | | | | | | | - Irene Litvan
- Department of Neuroscience, University of California, San Diego, CA, USA
| | - Rosa Rademakers
- VIB Center for Molecular Neurology, University of Antwerp, Belgium
- Department of Neuroscience, Mayo Clinic Jacksonville, FL, USA
| | - Owen A Ross
- Department of Neuroscience, Mayo Clinic Jacksonville, FL, USA
| | - Douglas Galasko
- Department of Neuroscience, University of California, San Diego, CA, USA
| | - Adam L Boxer
- Memory and Aging Center, University of California, San Francisco, CA, USA
| | - Bruce L Miller
- Memory and Aging Center, University of California, San Francisco, CA, USA
| | - Willian W Seeley
- Memory and Aging Center, University of California, San Francisco, CA, USA
| | - Vivanna M Van Deerlin
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Edward B Lee
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Penn Center for Neurodegenerative Disease Research, University of Pennsylvania School of Medicine, Philadelphia, PA, USA
| | - Charles L White
- University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Huw Morris
- Departmento of Clinical and Movement Neuroscience, University College of London, London, UK
| | - Rohan de Silva
- Reta Lila Weston Institute, UCL Queen Square Institute of Neurology, London, UK
| | - John F Crary
- Department of Pathology, Department of Artificial Intelligence & Human Health, Nash Family, Department of Neuroscience, Ronald M. Loeb Center for Alzheimer's Disease, Friedman Brain, Institute, Neuropathology Brain Bank & Research CoRE, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Alison M Goate
- Department of Genetics and Genomic Sciences, New York, NY, USA; Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Jeffrey S Friedman
- Friedman Bioventure, Inc., Del Mar, CA, USA; Department of Genetics and Genomic Sciences, New York, NY, USA
| | - Yuk Yee Leung
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Penn Neurodegeneration Genomics Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Giovanni Coppola
- Movement Disorders Programs, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Psychiatry, Semel Institute for Neuroscience and Human Behavior, University of California, Los Angeles, CA, USA
| | - Adam C Naj
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Penn Neurodegeneration Genomics Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Li-San Wang
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Penn Neurodegeneration Genomics Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | | | - Günter U Höglinger
- Department of Neurology, LMU University Hospital, Ludwig-Maximilians-Universität (LMU) München; German Center for Neurodegenerative Diseases (DZNE), Munich, Germany; and Munich Cluster for Systems Neurology (SyNergy), Munich, Germany
| | - Gerard D Schellenberg
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Penn Neurodegeneration Genomics Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Daniel H Geschwind
- Movement Disorders Programs, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Institute of Precision Health, University of California, Los Angeles, Los Angeles, CA, USA
| | - Wan-Ping Lee
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Penn Neurodegeneration Genomics Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
7
|
Ungar RA, Goddard PC, Jensen TD, Degalez F, Smith KS, Jin CA, Bonner DE, Bernstein JA, Wheeler MT, Montgomery SB. Impact of genome build on RNA-seq interpretation and diagnostics. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.01.11.24301165. [PMID: 38260490 PMCID: PMC10802764 DOI: 10.1101/2024.01.11.24301165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Transcriptomics is a powerful tool for unraveling the molecular effects of genetic variants and disease diagnosis. Prior studies have demonstrated that choice of genome build impacts variant interpretation and diagnostic yield for genomic analyses. To identify the extent genome build also impacts transcriptomics analyses, we studied the effect of the hg19, hg38, and CHM13 genome builds on expression quantification and outlier detection in 386 rare disease and familial control samples from both the Undiagnosed Diseases Network (UDN) and Genomics Research to Elucidate the Genetics of Rare Disease (GREGoR) Consortium. We identified 2,800 genes with build-dependent quantification across six routinely-collected biospecimens, including 1,391 protein-coding genes and 341 known rare disease genes. We further observed multiple genes that only have detectable expression in a subset of genome builds. Finally, we characterized how genome build impacts the detection of outlier transcriptomic events. Combined, we provide a database of genes impacted by build choice, and recommend that transcriptomics-guided analyses and diagnoses are cross-referenced with these data for robustness.
Collapse
Affiliation(s)
- Rachel A. Ungar
- Department of Genetics, School of Medicine, Stanford University
- Department of Pathology, School of Medicine, Stanford University
| | - Pagé C. Goddard
- Department of Genetics, School of Medicine, Stanford University
- Department of Pathology, School of Medicine, Stanford University
| | - Tanner D. Jensen
- Department of Genetics, School of Medicine, Stanford University
- Department of Pathology, School of Medicine, Stanford University
| | | | - Kevin S. Smith
- Department of Pathology, School of Medicine, Stanford University
| | | | | | - Devon E. Bonner
- Department of Pediatrics, School of Medicine, Stanford University
- Stanford Center for Undiagnosed Diseases, Stanford University
| | | | - Matthew T. Wheeler
- Department of Cardiovascular Medicine, School of Medicine, Stanford University
| | - Stephen B. Montgomery
- Department of Genetics, School of Medicine, Stanford University
- Department of Pathology, School of Medicine, Stanford University
- Department of Biomedical Data Science, Stanford University
| |
Collapse
|
8
|
Bett VK, Macon A, Vicoso B, Elkrewi M. Chromosome-Level Assembly of Artemia franciscana Sheds Light on Sex Chromosome Differentiation. Genome Biol Evol 2024; 16:evae006. [PMID: 38245839 PMCID: PMC10827361 DOI: 10.1093/gbe/evae006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 11/27/2023] [Accepted: 12/21/2023] [Indexed: 01/22/2024] Open
Abstract
Since the commercialization of brine shrimp (genus Artemia) in the 1950s, this lineage, and in particular the model species Artemia franciscana, has been the subject of extensive research. However, our understanding of the genetic mechanisms underlying various aspects of their reproductive biology, including sex determination, is still lacking. This is partly due to the scarcity of genomic resources for Artemia species and crustaceans in general. Here, we present a chromosome-level genome assembly of A. franciscana (Kellogg 1906), from the Great Salt Lake, United States. The genome is 1 GB, and the majority of the genome (81%) is scaffolded into 21 linkage groups using a previously published high-density linkage map. We performed coverage and FST analyses using male and female genomic and transcriptomic reads to quantify the extent of differentiation between the Z and W chromosomes. Additionally, we quantified the expression levels in male and female heads and gonads and found further evidence for dosage compensation in this species.
Collapse
Affiliation(s)
| | - Ariana Macon
- Institute of Science and Technology Austria (ISTA), Klosterneuburg 3400, Austria
| | - Beatriz Vicoso
- Institute of Science and Technology Austria (ISTA), Klosterneuburg 3400, Austria
| | - Marwan Elkrewi
- Institute of Science and Technology Austria (ISTA), Klosterneuburg 3400, Austria
| |
Collapse
|
9
|
Harvey WT, Ebert P, Ebler J, Audano PA, Munson KM, Hoekzema K, Porubsky D, Beck CR, Marschall T, Garimella K, Eichler EE. Whole-genome long-read sequencing downsampling and its effect on variant-calling precision and recall. Genome Res 2023; 33:2029-2040. [PMID: 38190646 PMCID: PMC10760522 DOI: 10.1101/gr.278070.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 11/03/2023] [Indexed: 01/10/2024]
Abstract
Advances in long-read sequencing (LRS) technologies continue to make whole-genome sequencing more complete, affordable, and accurate. LRS provides significant advantages over short-read sequencing approaches, including phased de novo genome assembly, access to previously excluded genomic regions, and discovery of more complex structural variants (SVs) associated with disease. Limitations remain with respect to cost, scalability, and platform-dependent read accuracy and the tradeoffs between sequence coverage and sensitivity of variant discovery are important experimental considerations for the application of LRS. We compare the genetic variant-calling precision and recall of Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio) HiFi platforms over a range of sequence coverages. For read-based applications, LRS sensitivity begins to plateau around 12-fold coverage with a majority of variants called with reasonable accuracy (F1 score above 0.5), and both platforms perform well for SV detection. Genome assembly increases variant-calling precision and recall of SVs and indels in HiFi data sets with HiFi outperforming ONT in quality as measured by the F1 score of assembly-based variant call sets. While both technologies continue to evolve, our work offers guidance to design cost-effective experimental strategies that do not compromise on discovering novel biology.
Collapse
Affiliation(s)
- William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195-5065, USA
| | - Peter Ebert
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, 40225 Düsseldorf, Germany
- Core Unit Bioinformatics, Medical Faculty, Heinrich Heine University, 40225 Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, 40225 Düsseldorf, Germany
| | - Jana Ebler
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, 40225 Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, 40225 Düsseldorf, Germany
| | - Peter A Audano
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut 06032, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195-5065, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195-5065, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195-5065, USA
| | - Christine R Beck
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut 06032, USA
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, Connecticut 06030-6403, USA
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, 40225 Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, 40225 Düsseldorf, Germany
| | - Kiran Garimella
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195-5065, USA;
- Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
10
|
Makova KD, Pickett BD, Harris RS, Hartley GA, Cechova M, Pal K, Nurk S, Yoo D, Li Q, Hebbar P, McGrath BC, Antonacci F, Aubel M, Biddanda A, Borchers M, Bomberg E, Bouffard GG, Brooks SY, Carbone L, Carrel L, Carroll A, Chang PC, Chin CS, Cook DE, Craig SJ, de Gennaro L, Diekhans M, Dutra A, Garcia GH, Grady PG, Green RE, Haddad D, Hallast P, Harvey WT, Hickey G, Hillis DA, Hoyt SJ, Jeong H, Kamali K, Kosakovsky Pond SL, LaPolice TM, Lee C, Lewis AP, Loh YHE, Masterson P, McCoy RC, Medvedev P, Miga KH, Munson KM, Pak E, Paten B, Pinto BJ, Potapova T, Rhie A, Rocha JL, Ryabov F, Ryder OA, Sacco S, Shafin K, Shepelev VA, Slon V, Solar SJ, Storer JM, Sudmant PH, Sweetalana, Sweeten A, Tassia MG, Thibaud-Nissen F, Ventura M, Wilson MA, Young AC, Zeng H, Zhang X, Szpiech ZA, Huber CD, Gerton JL, Yi SV, Schatz MC, Alexandrov IA, Koren S, O’Neill RJ, Eichler E, Phillippy AM. The Complete Sequence and Comparative Analysis of Ape Sex Chromosomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.30.569198. [PMID: 38077089 PMCID: PMC10705393 DOI: 10.1101/2023.11.30.569198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/24/2023]
Abstract
Apes possess two sex chromosomes-the male-specific Y and the X shared by males and females. The Y chromosome is crucial for male reproduction, with deletions linked to infertility. The X chromosome carries genes vital for reproduction and cognition. Variation in mating patterns and brain function among great apes suggests corresponding differences in their sex chromosome structure and evolution. However, due to their highly repetitive nature and incomplete reference assemblies, ape sex chromosomes have been challenging to study. Here, using the state-of-the-art experimental and computational methods developed for the telomere-to-telomere (T2T) human genome, we produced gapless, complete assemblies of the X and Y chromosomes for five great apes (chimpanzee, bonobo, gorilla, Bornean and Sumatran orangutans) and a lesser ape, the siamang gibbon. These assemblies completely resolved ampliconic, palindromic, and satellite sequences, including the entire centromeres, allowing us to untangle the intricacies of ape sex chromosome evolution. We found that, compared to the X, ape Y chromosomes vary greatly in size and have low alignability and high levels of structural rearrangements. This divergence on the Y arises from the accumulation of lineage-specific ampliconic regions and palindromes (which are shared more broadly among species on the X) and from the abundance of transposable elements and satellites (which have a lower representation on the X). Our analysis of Y chromosome genes revealed lineage-specific expansions of multi-copy gene families and signatures of purifying selection. In summary, the Y exhibits dynamic evolution, while the X is more stable. Finally, mapping short-read sequencing data from >100 great ape individuals revealed the patterns of diversity and selection on their sex chromosomes, demonstrating the utility of these reference assemblies for studies of great ape evolution. These complete sex chromosome assemblies are expected to further inform conservation genetics of nonhuman apes, all of which are endangered species.
Collapse
Affiliation(s)
| | - Brandon D. Pickett
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Monika Cechova
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - Karol Pal
- Penn State University, University Park, PA, USA
| | - Sergey Nurk
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - DongAhn Yoo
- University of Washington School of Medicine, Seattle, WA, USA
| | - Qiuhui Li
- Johns Hopkins University, Baltimore, MD, USA
| | - Prajna Hebbar
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | | | | | | | | | - Erich Bomberg
- University of Münster, Münster, Germany
- MPI for Developmental Biology, Tübingen, Germany
| | - Gerard G. Bouffard
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Shelise Y. Brooks
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Lucia Carbone
- Oregon Health & Science University, Portland, OR, USA
- Oregon National Primate Research Center, Hillsboro, OR, USA
| | - Laura Carrel
- Penn State University School of Medicine, Hershey, PA, USA
| | | | | | - Chen-Shan Chin
- Foundation of Biological Data Sciences, Belmont, CA, USA
| | | | | | | | - Mark Diekhans
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - Amalia Dutra
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Gage H. Garcia
- University of Washington School of Medicine, Seattle, WA, USA
| | | | | | - Diana Haddad
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Pille Hallast
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | | - Glenn Hickey
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - David A. Hillis
- University of California Santa Barbara, Santa Barbara, CA, USA
| | | | - Hyeonsoo Jeong
- University of Washington School of Medicine, Seattle, WA, USA
| | | | | | | | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | | | | - Patrick Masterson
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Karen H. Miga
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | - Evgenia Pak
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Benedict Paten
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | | | - Arang Rhie
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Fedor Ryabov
- Masters Program in National Research University Higher School of Economics, Moscow, Russia
| | | | - Samuel Sacco
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | | | | | - Steven J. Solar
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Sweetalana
- Penn State University, University Park, PA, USA
| | - Alex Sweeten
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- Johns Hopkins University, Baltimore, MD, USA
| | | | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Alice C. Young
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Xinru Zhang
- Penn State University, University Park, PA, USA
| | | | | | | | - Soojin V. Yi
- University of California Santa Barbara, Santa Barbara, CA, USA
| | | | | | - Sergey Koren
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Evan Eichler
- University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Adam M. Phillippy
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
11
|
Ferrer A, Stephens ZD, Kocher JPA. Experimental and Computational Approaches to Measure Telomere Length: Recent Advances and Future Directions. Curr Hematol Malig Rep 2023; 18:284-291. [PMID: 37947937 PMCID: PMC10709248 DOI: 10.1007/s11899-023-00717-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/30/2023] [Indexed: 11/12/2023]
Abstract
PURPOSE OF REVIEW The length of telomeres, protective structures at the chromosome ends, is a well-established biomarker for pathological conditions including multisystemic syndromes called telomere biology disorders. Approaches to measure telomere length (TL) differ on whether they estimate average, distribution, or chromosome-specific TL, and each presents their own advantages and limitations. RECENT FINDINGS The development of long-read sequencing and publication of the telomere-to-telomere human genome reference has allowed for scalable and high-resolution TL estimation in pre-existing sequencing datasets but is still impractical as a dedicated TL test. As sequencing costs continue to fall and strategies for selectively enriching telomere regions prior to sequencing improve, these approaches may become a promising alternative to classic methods. Measurement methods rely on probe hybridization, qPCR or more recently, computational methods using sequencing data. Refinements of existing techniques and new approaches have been recently developed but a test that is accurate, simple, and scalable is still lacking.
Collapse
Affiliation(s)
- Alejandro Ferrer
- Division of Hematology, Mayo Clinic, Rochester, 200 First Street SW, Rochester, MN, USA.
- Center for Individualized Medicine, Mayo Clinic, Rochester, MN, USA.
| | | | | |
Collapse
|
12
|
He Y, Chu Y, Guo S, Hu J, Li R, Zheng Y, Ma X, Du Z, Zhao L, Yu W, Xue J, Bian W, Yang F, Chen X, Zhang P, Wu R, Ma Y, Shao C, Chen J, Wang J, Li J, Wu J, Hu X, Long Q, Jiang M, Ye H, Song S, Li G, Wei Y, Xu Y, Ma Y, Chen Y, Wang K, Bao J, Xi W, Wang F, Ni W, Zhang M, Yu Y, Li S, Kang Y, Gao Z. T2T-YAO: A Telomere-to-telomere Assembled Diploid Reference Genome for Han Chinese. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:1085-1100. [PMID: 37595788 PMCID: PMC11082261 DOI: 10.1016/j.gpb.2023.08.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 08/01/2023] [Accepted: 08/08/2023] [Indexed: 08/20/2023]
Abstract
Since its initial release in 2001, the human reference genome has undergone continuous improvement in quality, and the recently released telomere-to-telomere (T2T) version - T2T-CHM13 - reaches its highest level of continuity and accuracy after 20 years of effort by working on a simplified, nearly homozygous genome of a hydatidiform mole cell line. Here, to provide an authentic complete diploid human genome reference for the Han Chinese, the largest population in the world, we assembled the genome of a male Han Chinese individual, T2T-YAO, which includes T2T assemblies of all the 22 + X + M and 22 + Y chromosomes in both haploids. The quality of T2T-YAO is much better than those of all currently available diploid assemblies, and its haploid version, T2T-YAO-hp, generated by selecting the better assembly for each autosome, reaches the top quality of fewer than one error per 29.5 Mb, even higher than that of T2T-CHM13. Derived from an individual living in the aboriginal region of the Han population, T2T-YAO shows clear ancestry and potential genetic continuity from the ancient ancestors. Each haplotype of T2T-YAO possesses ∼ 330-Mb exclusive sequences, ∼ 3100 unique genes, and tens of thousands of nucleotide and structural variations as compared with CHM13, highlighting the necessity of a population-stratified reference genome. The construction of T2T-YAO, an accurate and authentic representative of the Chinese population, would enable precise delineation of genomic variations and advance our understandings in the hereditability of diseases and phenotypes, especially within the context of the unique variations of the Chinese population.
Collapse
Affiliation(s)
- Yukun He
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China
| | - Yanan Chu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
| | - Shuming Guo
- Linfen Clinical Medicine Research Center, Linfen 041000, China; Institute of Chest and Lung Diseases, Shanxi Medical University, Taiyuan 030001, China
| | - Jiang Hu
- GrandOmics Biosciences Co., Ltd, Wuhan 430076, China
| | - Ran Li
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Yali Zheng
- Department of Respiratory, Critical Care and Sleep Medicine, Xiang'an Hospital of Xiamen University, School of Medicine, Xiamen University, Xiamen 361101, China
| | - Xinqian Ma
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Zhenglin Du
- Institute of PSI Genomics, Wenzhou 325024, China
| | - Lili Zhao
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Wenyi Yu
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Jianbo Xue
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Wenjie Bian
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Feifei Yang
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Xi Chen
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Pingan Zhang
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Rihan Wu
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Yifan Ma
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Changjun Shao
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
| | - Jing Chen
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
| | - Jian Wang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
| | - Jiwei Li
- Department of Respiratory, Critical Care and Sleep Medicine, Xiang'an Hospital of Xiamen University, School of Medicine, Xiamen University, Xiamen 361101, China
| | - Jing Wu
- Department of Respiratory, Critical Care and Sleep Medicine, Xiang'an Hospital of Xiamen University, School of Medicine, Xiamen University, Xiamen 361101, China
| | - Xiaoyi Hu
- Department of Respiratory, Critical Care and Sleep Medicine, Xiang'an Hospital of Xiamen University, School of Medicine, Xiamen University, Xiamen 361101, China
| | - Qiuyue Long
- Department of Respiratory, Critical Care and Sleep Medicine, Xiang'an Hospital of Xiamen University, School of Medicine, Xiamen University, Xiamen 361101, China
| | - Mingzheng Jiang
- Department of Respiratory, Critical Care and Sleep Medicine, Xiang'an Hospital of Xiamen University, School of Medicine, Xiamen University, Xiamen 361101, China
| | - Hongli Ye
- Department of Respiratory, Critical Care and Sleep Medicine, Xiang'an Hospital of Xiamen University, School of Medicine, Xiamen University, Xiamen 361101, China
| | - Shixu Song
- Department of Respiratory, Critical Care and Sleep Medicine, Xiang'an Hospital of Xiamen University, School of Medicine, Xiamen University, Xiamen 361101, China
| | - Guangyao Li
- Linfen Clinical Medicine Research Center, Linfen 041000, China
| | - Yue Wei
- Linfen Clinical Medicine Research Center, Linfen 041000, China
| | - Yu Xu
- Beijing Jishuitan Hospital, Capital Medical University, Beijing 100035, China
| | - Yanliang Ma
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Yanwen Chen
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Keqiang Wang
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Jing Bao
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Wen Xi
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Fang Wang
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Wentao Ni
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Moqin Zhang
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Yan Yu
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Shengnan Li
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Yu Kang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100490, China.
| | - Zhancheng Gao
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Institute of Chest and Lung Diseases, Shanxi Medical University, Taiyuan 030001, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China.
| |
Collapse
|
13
|
Delorean EE, Youngblood RC, Simpson SA, Schoonmaker AN, Scheffler BE, Rutter WB, Hulse-Kemp AM. Representing true plant genomes: haplotype-resolved hybrid pepper genome with trio-binning. FRONTIERS IN PLANT SCIENCE 2023; 14:1184112. [PMID: 38034563 PMCID: PMC10687446 DOI: 10.3389/fpls.2023.1184112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/11/2023] [Accepted: 10/17/2023] [Indexed: 12/02/2023]
Abstract
As sequencing costs decrease and availability of high fidelity long-read sequencing increases, generating experiment specific de novo genome assemblies becomes feasible. In many crop species, obtaining the genome of a hybrid or heterozygous individual is necessary for systems that do not tolerate inbreeding or for investigating important biological questions, such as hybrid vigor. However, most genome assembly methods that have been used in plants result in a merged single sequence representation that is not a true biologically accurate representation of either haplotype within a diploid individual. The resulting genome assembly is often fragmented and exhibits a mosaic of the two haplotypes, referred to as haplotype-switching. Important haplotype level information, such as causal mutations and structural variation is therefore lost causing difficulties in interpreting downstream analyses. To overcome this challenge, we have applied a method developed for animal genome assembly called trio-binning to an intra-specific hybrid of chili pepper (Capsicum annuum L. cv. HDA149 x Capsicum annuum L. cv. HDA330). We tested all currently available softwares for performing trio-binning, combined with multiple scaffolding technologies including Bionano to determine the optimal method of producing the best haplotype-resolved assembly. Ultimately, we produced highly contiguous biologically true haplotype-resolved genome assemblies for each parent, with scaffold N50s of 266.0 Mb and 281.3 Mb, with 99.6% and 99.8% positioned into chromosomes respectively. The assemblies captured 3.10 Gb and 3.12 Gb of the estimated 3.5 Gb chili pepper genome size. These assemblies represent the complete genome structure of the intraspecific hybrid, as well as the two parental genomes, and show measurable improvements over the currently available reference genomes. Our manuscript provides a valuable guide on how to apply trio-binning to other plant genomes.
Collapse
Affiliation(s)
- Emily E. Delorean
- Genomics and Bioinformatics Research Unit, USDA-ARS, Raleigh, NC, United States
- Crop and Soil Sciences Department, North Carolina State University, Raleigh, NC, United States
| | - Ramey C. Youngblood
- Institute for Genomics, Biocomputing and Biotechnology, Mississippi State University, Starkville, MS, United States
| | - Sheron A. Simpson
- Genomics and Bioinformatics Research Unit, United States Department of Agriculture - Agriculture Research Service (USDA-ARS), Stoneville, MS, United States
| | - Ashley N. Schoonmaker
- Crop and Soil Sciences Department, North Carolina State University, Raleigh, NC, United States
| | - Brian E. Scheffler
- Genomics and Bioinformatics Research Unit, United States Department of Agriculture - Agriculture Research Service (USDA-ARS), Stoneville, MS, United States
| | - William B. Rutter
- US Vegetable Laboratory, United States Department of Agriculture - Agriculture Research Service (USDA-ARS), Charleston, SC, United States
| | - Amanda M. Hulse-Kemp
- Genomics and Bioinformatics Research Unit, USDA-ARS, Raleigh, NC, United States
- Crop and Soil Sciences Department, North Carolina State University, Raleigh, NC, United States
| |
Collapse
|
14
|
Zeng T, He Z, He J, Lv W, Huang S, Li J, Zhu L, Wan S, Zhou W, Yang Z, Zhang Y, Luo C, He J, Wang C, Wang L. The telomere-to-telomere gap-free reference genome of wild blueberry ( Vaccinium duclouxii) provides its high soluble sugar and anthocyanin accumulation. HORTICULTURE RESEARCH 2023; 10:uhad209. [PMID: 38023474 PMCID: PMC10681038 DOI: 10.1093/hr/uhad209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Revised: 10/19/2023] [Indexed: 12/01/2023]
Abstract
Vaccinium duclouxii, endemic to southwestern China, is a berry-producing shrub or small tree belonging to the Ericaceae family, with high nutritive, medicinal, and ornamental value, abundant germplasm resources, and good edible properties. In addition, V. duclouxii exhibits strong tolerance to adverse environmental conditions, making it a promising candidate for research and offering wide-ranging possibilities for utilization. However, the lack of V. duclouxii genome sequence has hampered its development and utilization. Here, a high-quality telomere-to-telomere genome sequence of V. duclouxii was de novo assembled and annotated. All of 12 chromosomes were assembled into gap-free single contigs, providing the highest integrity and quality assembly reported so far for blueberry. The V. duclouxii genome is 573.67 Mb, which encodes 41 953 protein-coding genes. Combining transcriptomics and metabolomics analyses, we have uncovered the molecular mechanisms involved in sugar and acid accumulation and anthocyanin biosynthesis in V. duclouxii. This provides essential molecular information for further research on the quality of V. duclouxii. Moreover, the high-quality telomere-to-telomere assembly of the V. duclouxii genome will provide insights into the genomic evolution of Vaccinium and support advancements in blueberry genetics and molecular breeding.
Collapse
Affiliation(s)
- Tuo Zeng
- School of Life Sciences, Guizhou Normal University, Guiyang 550000, China
| | - Zhijiao He
- Institute of Alpine Economic Plant, Yunnan Academy of Agricultural Sciences, Lijiang 674199, Yunnan, China
| | - Jiefang He
- School of Life Sciences, Guizhou Normal University, Guiyang 550000, China
| | - Wei Lv
- School of Life Sciences, Guizhou Normal University, Guiyang 550000, China
| | - Shixiang Huang
- School of Life Sciences, Guizhou Normal University, Guiyang 550000, China
| | - Jiawen Li
- School of Advanced Agricultural Sciences, Peking University, 100871 Beijing, China
| | - Liyong Zhu
- National Key Laboratory for Germplasm Innovation & Utilization of Horticultural Crops, College of Horticulture & Forestry Sciences, Huazhong Agricultural University, Wuhan 430070, China
| | - Shuang Wan
- Wuhan Benagen Technology Co., Ltd, Wuhan 430070, China
| | - Wanfei Zhou
- National Key Laboratory for Germplasm Innovation & Utilization of Horticultural Crops, College of Horticulture & Forestry Sciences, Huazhong Agricultural University, Wuhan 430070, China
| | - Zhengsong Yang
- Institute of Alpine Economic Plant, Yunnan Academy of Agricultural Sciences, Lijiang 674199, Yunnan, China
| | - Yatao Zhang
- School of Life Sciences, Guizhou Normal University, Guiyang 550000, China
| | - Chong Luo
- School of Life Sciences, Guizhou Normal University, Guiyang 550000, China
| | - Jiawei He
- Institute of Alpine Economic Plant, Yunnan Academy of Agricultural Sciences, Lijiang 674199, Yunnan, China
| | - Caiyun Wang
- National Key Laboratory for Germplasm Innovation & Utilization of Horticultural Crops, College of Horticulture & Forestry Sciences, Huazhong Agricultural University, Wuhan 430070, China
| | - Liangsheng Wang
- Key Laboratory of Plant Resources, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- China National Botanical Garden, Beijing 100093, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
15
|
Schelkunov MI. Mabs, a suite of tools for gene-informed genome assembly. BMC Bioinformatics 2023; 24:377. [PMID: 37794322 PMCID: PMC10548655 DOI: 10.1186/s12859-023-05499-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Accepted: 09/26/2023] [Indexed: 10/06/2023] Open
Abstract
BACKGROUND Despite constantly improving genome sequencing methods, error-free eukaryotic genome assembly has not yet been achieved. Among other kinds of problems of eukaryotic genome assembly are so-called "haplotypic duplications", which may manifest themselves as cases of alleles being mistakenly assembled as paralogues. Haplotypic duplications are dangerous because they create illusions of gene family expansions and, thus, may lead scientists to incorrect conclusions about genome evolution and functioning. RESULTS Here, I present Mabs, a suite of tools that serve as parameter optimizers of the popular genome assemblers Hifiasm and Flye. By optimizing the parameters of Hifiasm and Flye, Mabs tries to create genome assemblies with the genes assembled as accurately as possible. Tests on 6 eukaryotic genomes showed that in 6 out of 6 cases, Mabs created assemblies with more accurately assembled genes than those generated by Hifiasm and Flye when they were run with default parameters. When assemblies of Mabs, Hifiasm and Flye were postprocessed by a popular tool for haplotypic duplication removal, Purge_dups, genes were better assembled by Mabs in 5 out of 6 cases. CONCLUSIONS Mabs is useful for making high-quality genome assemblies. It is available at https://github.com/shelkmike/Mabs.
Collapse
|
16
|
Rautiainen M, Nurk S, Walenz BP, Logsdon GA, Porubsky D, Rhie A, Eichler EE, Phillippy AM, Koren S. Telomere-to-telomere assembly of diploid chromosomes with Verkko. Nat Biotechnol 2023; 41:1474-1482. [PMID: 36797493 PMCID: PMC10427740 DOI: 10.1038/s41587-023-01662-6] [Citation(s) in RCA: 65] [Impact Index Per Article: 65.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2022] [Accepted: 01/03/2023] [Indexed: 02/18/2023]
Abstract
The Telomere-to-Telomere consortium recently assembled the first truly complete sequence of a human genome. To resolve the most complex repeats, this project relied on manual integration of ultra-long Oxford Nanopore sequencing reads with a high-resolution assembly graph built from long, accurate PacBio high-fidelity reads. We have improved and automated this strategy in Verkko, an iterative, graph-based pipeline for assembling complete, diploid genomes. Verkko begins with a multiplex de Bruijn graph built from long, accurate reads and progressively simplifies this graph by integrating ultra-long reads and haplotype-specific markers. The result is a phased, diploid assembly of both haplotypes, with many chromosomes automatically assembled from telomere to telomere. Running Verkko on the HG002 human genome resulted in 20 of 46 diploid chromosomes assembled without gaps at 99.9997% accuracy. The complete assembly of diploid genomes is a critical step towards the construction of comprehensive pangenome databases and chromosome-scale comparative genomics.
Collapse
Affiliation(s)
- Mikko Rautiainen
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- Oxford Nanopore Technologies, Oxford, UK
| | - Brian P Walenz
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
17
|
Yang C, Zhou Y, Song Y, Wu D, Zeng Y, Nie L, Liu P, Zhang S, Chen G, Xu J, Zhou H, Zhou L, Qian X, Liu C, Tan S, Zhou C, Dai W, Xu M, Qi Y, Wang X, Guo L, Fan G, Wang A, Deng Y, Zhang Y, Jin J, He Y, Guo C, Guo G, Zhou Q, Xu X, Yang H, Wang J, Xu S, Mao Y, Jin X, Ruan J, Zhang G. The complete and fully-phased diploid genome of a male Han Chinese. Cell Res 2023; 33:745-761. [PMID: 37452091 PMCID: PMC10542383 DOI: 10.1038/s41422-023-00849-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2023] [Accepted: 06/29/2023] [Indexed: 07/18/2023] Open
Abstract
Since the release of the complete human genome, the priority of human genomic study has now been shifting towards closing gaps in ethnic diversity. Here, we present a fully phased and well-annotated diploid human genome from a Han Chinese male individual (CN1), in which the assemblies of both haploids achieve the telomere-to-telomere (T2T) level. Comparison of this diploid genome with the CHM13 haploid T2T genome revealed significant variations in the centromere. Outside the centromere, we discovered 11,413 structural variations, including numerous novel ones. We also detected thousands of CN1 alleles that have accumulated high substitution rates and a few that have been under positive selection in the East Asian population. Further, we found that CN1 outperforms CHM13 as a reference genome in mapping and variant calling for the East Asian population owing to the distinct structural variants of the two references. Comparison of SNP calling for a large cohort of 8869 Chinese genomes using CN1 and CHM13 as reference respectively showed that the reference bias profoundly impacts rare SNP calling, with nearly 2 million rare SNPs miss-called with different reference genomes. Finally, applying the CN1 as a reference, we discovered 5.80 Mb and 4.21 Mb putative introgression sequences from Neanderthal and Denisovan, respectively, including many East Asian specific ones undetected using CHM13 as the reference. Our analyses reveal the advances of using CN1 as a reference for population genomic studies and paleo-genomic studies. This complete genome will serve as an alternative reference for future genomic studies on the East Asian population.
Collapse
Affiliation(s)
- Chentao Yang
- Center for Genomic Research, International Institutes of Medicine, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, Zhejiang, China
- Center for Evolutionary & Organismal Biology, & Women's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Yang Zhou
- BGI-Shenzhen, Shenzhen, Guangdong, China
- BGI Research-Wuhan, BGI, Wuhan, Hubei, China
| | - Yanni Song
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Dongya Wu
- Center for Genomic Research, International Institutes of Medicine, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, Zhejiang, China
- Center for Evolutionary & Organismal Biology, & Women's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
- Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou, Zhejiang, China
- Institute of Crop Science & Institute of Bioinformatics, Zhejiang University, Hangzhou, Zhejiang, China
| | - Yan Zeng
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Lei Nie
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | | | - Shilong Zhang
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Guangji Chen
- BGI-Shenzhen, Shenzhen, Guangdong, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Jinjin Xu
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Hongling Zhou
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Long Zhou
- Center for Evolutionary & Organismal Biology, & Women's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
- Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou, Zhejiang, China
- Innovation Center of Yangtze River Delta, Zhejiang University, Hangzhou, Zhejiang, China
| | - Xiaobo Qian
- BGI-Shenzhen, Shenzhen, Guangdong, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Chenlu Liu
- Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang, China
| | | | | | - Wei Dai
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Mengyang Xu
- BGI-Shenzhen, Shenzhen, Guangdong, China
- BGI-Qingdao, BGI-Shenzhen, Qingdao, Shandong, China
| | - Yanwei Qi
- BGI-Qingdao, BGI-Shenzhen, Qingdao, Shandong, China
| | - Xiaobo Wang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Lidong Guo
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
- BGI-Qingdao, BGI-Shenzhen, Qingdao, Shandong, China
| | - Guangyi Fan
- BGI-Qingdao, BGI-Shenzhen, Qingdao, Shandong, China
| | - Aijun Wang
- BGI-Qingdao, BGI-Shenzhen, Qingdao, Shandong, China
| | - Yuan Deng
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Yong Zhang
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | | | - Yunqiu He
- Center for Genomic Research, International Institutes of Medicine, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, Zhejiang, China
- Center for Evolutionary & Organismal Biology, & Women's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
| | - Chunxue Guo
- BGI-Shenzhen, Shenzhen, Guangdong, China
- BGI-Hangzhou, Hangzhou, Zhejiang, China
| | - Guoji Guo
- School of Medicine, Zhejiang University, Hangzhou, Zhejiang, China
| | - Qing Zhou
- Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou, Zhejiang, China
- Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang, China
| | - Xun Xu
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | | | - Jian Wang
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Shuhua Xu
- State Key Laboratory of Genetic Engineering, Center for Evolutionary Biology, Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan University, Shanghai, China
- Human Phenome Institute, Zhangjiang Fudan International Innovation Center, and Ministry of Education Key Laboratory of Contemporary Anthropology, Fudan University, Shanghai, China
- Jiangsu Key Laboratory of Phylogenomics & Comparative Genomics, International Joint Center of Genomics of Jiangsu Province School of Life Sciences, Jiangsu Normal University, Xuzhou, Jiangsu, China
- Department of Liver Surgery and Transplantation Liver Cancer Institute, Zhongshan Hospital, Fudan University, Shanghai, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, Yunnan, China
| | - Yafei Mao
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Xin Jin
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Jue Ruan
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China.
| | - Guojie Zhang
- Center for Genomic Research, International Institutes of Medicine, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, Zhejiang, China.
- Center for Evolutionary & Organismal Biology, & Women's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China.
- Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou, Zhejiang, China.
- Innovation Center of Yangtze River Delta, Zhejiang University, Hangzhou, Zhejiang, China.
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China.
| |
Collapse
|
18
|
Kolesnikov A, Cook D, Nattestad M, McNulty B, Gorzynski J, Goenka S, Ashley EA, Jain M, Miga KH, Paten B, Chang PC, Carroll A, Shafin K. Local read haplotagging enables accurate long-read small variant calling. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.07.556731. [PMID: 37745389 PMCID: PMC10515762 DOI: 10.1101/2023.09.07.556731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
Long-read sequencing technology has enabled variant detection in difficult-to-map regions of the genome and enabled rapid genetic diagnosis in clinical settings. Rapidly evolving third-generation sequencing platforms like Pacific Biosciences (PacBio) and Oxford nanopore technologies (ONT) are introducing newer platforms and data types. It has been demonstrated that variant calling methods based on deep neural networks can use local haplotyping information with long-reads to improve the genotyping accuracy. However, using local haplotype information creates an overhead as variant calling needs to be performed multiple times which ultimately makes it difficult to extend to new data types and platforms as they get introduced. In this work, we have developed a local haplotype approximate method that enables state-of-the-art variant calling performance with multiple sequencing platforms including PacBio Revio system, ONT R10.4 simplex and duplex data. This addition of local haplotype approximation makes DeepVariant a universal variant calling solution for long-read sequencing platforms.
Collapse
Affiliation(s)
| | - Daniel Cook
- Google Inc, 1600 Amphitheatre Pkwy, Mountain View, CA
| | | | - Brandy McNulty
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, California, USA
| | | | | | | | - Miten Jain
- Northeastern university, Boston, MA, USA
| | - Karen H Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, California, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, California, USA
| | | | | | | |
Collapse
|
19
|
Rhie A, Nurk S, Cechova M, Hoyt SJ, Taylor DJ, Altemose N, Hook PW, Koren S, Rautiainen M, Alexandrov IA, Allen J, Asri M, Bzikadze AV, Chen NC, Chin CS, Diekhans M, Flicek P, Formenti G, Fungtammasan A, Garcia Giron C, Garrison E, Gershman A, Gerton JL, Grady PGS, Guarracino A, Haggerty L, Halabian R, Hansen NF, Harris R, Hartley GA, Harvey WT, Haukness M, Heinz J, Hourlier T, Hubley RM, Hunt SE, Hwang S, Jain M, Kesharwani RK, Lewis AP, Li H, Logsdon GA, Lucas JK, Makalowski W, Markovic C, Martin FJ, Mc Cartney AM, McCoy RC, McDaniel J, McNulty BM, Medvedev P, Mikheenko A, Munson KM, Murphy TD, Olsen HE, Olson ND, Paulin LF, Porubsky D, Potapova T, Ryabov F, Salzberg SL, Sauria MEG, Sedlazeck FJ, Shafin K, Shepelev VA, Shumate A, Storer JM, Surapaneni L, Taravella Oill AM, Thibaud-Nissen F, Timp W, Tomaszkiewicz M, Vollger MR, Walenz BP, Watwood AC, Weissensteiner MH, Wenger AM, Wilson MA, Zarate S, Zhu Y, Zook JM, Eichler EE, O'Neill RJ, Schatz MC, Miga KH, Makova KD, Phillippy AM. The complete sequence of a human Y chromosome. Nature 2023; 621:344-354. [PMID: 37612512 PMCID: PMC10752217 DOI: 10.1038/s41586-023-06457-y] [Citation(s) in RCA: 41] [Impact Index Per Article: 41.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2022] [Accepted: 07/19/2023] [Indexed: 08/25/2023]
Abstract
The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure that includes long palindromes, tandem repeats and segmental duplications1-3. As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished4,5. Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029-base-pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, showing the complete ampliconic structures of gene families TSPY, DAZ and RBMY; 41 additional protein-coding genes, mostly from the TSPY family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region. We have combined T2T-Y with a previous assembly of the CHM13 genome4 and mapped available population variation, clinical variants and functional genomics data to produce a complete and comprehensive reference sequence for all 24 human chromosomes.
Collapse
Affiliation(s)
- Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- Oxford Nanopore Technologies Inc., Oxford, UK
| | - Monika Cechova
- Faculty of Informatics, Masaryk University, Brno, Czech Republic
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Savannah J Hoyt
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Dylan J Taylor
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Nicolas Altemose
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, USA
| | - Paul W Hook
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Mikko Rautiainen
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Ivan A Alexandrov
- Federal Research Center of Biotechnology of the Russian Academy of Sciences, Moscow, Russia
- Center for Algorithmic Biotechnology, Saint Petersburg State University, St Petersburg, Russia
- Department of Anatomy and Anthropology and Department of Human Molecular Genetics and Biochemistry, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv-Yafo, Israel
| | - Jamie Allen
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Mobin Asri
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Andrey V Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California, San Diego, CA, USA
| | - Nae-Chyun Chen
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Chen-Shan Chin
- GeneDX Holdings Corp, Stamford, CT, USA
- Foundation of Biological Data Science, Belmont, CA, USA
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
- Department of Genetics, University of Cambridge, Cambridge, UK
| | | | | | - Carlos Garcia Giron
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Ariel Gershman
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Jennifer L Gerton
- Stowers Institute for Medical Research, Kansas City, MO, USA
- University of Kansas Medical Center, Kansas City, MO, USA
| | - Patrick G S Grady
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
- Genomics Research Centre, Human Technopole, Milan, Italy
| | - Leanne Haggerty
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Reza Halabian
- Institute of Bioinformatics, Faculty of Medicine, University of Münster, Münster, Germany
| | - Nancy F Hansen
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- Cancer Genetics and Comparative Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Robert Harris
- Department of Biology, Pennsylvania State University, University Park, PA, USA
| | - Gabrielle A Hartley
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Marina Haukness
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Jakob Heinz
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | | | - Sarah E Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Stephen Hwang
- XDBio Program, Johns Hopkins University, Baltimore, MD, USA
| | - Miten Jain
- Department of Bioengineering, Department of Physics, Northeastern University, Boston, MA, USA
| | - Rupesh K Kesharwani
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, USA
| | - Alexandra P Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Heng Li
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Julian K Lucas
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Wojciech Makalowski
- Institute of Bioinformatics, Faculty of Medicine, University of Münster, Münster, Germany
| | - Christopher Markovic
- Genome Technology Access Center at the McDonnell Genome Institute, Washington University, St. Louis, MO, USA
| | - Fergal J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Ann M Mc Cartney
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Rajiv C McCoy
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Jennifer McDaniel
- Biosystems and Biomaterials Division, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Brandy M McNulty
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Paul Medvedev
- Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA, USA
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA, USA
- Center for Computational Biology and Bioinformatics, Pennsylvania State University, University Park, PA, USA
| | - Alla Mikheenko
- Center for Algorithmic Biotechnology, Saint Petersburg State University, St Petersburg, Russia
- UCL Queen Square Institute of Neurology, UCL, London, UK
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Terence D Murphy
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Hugh E Olsen
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Nathan D Olson
- Biosystems and Biomaterials Division, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Luis F Paulin
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Tamara Potapova
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Fedor Ryabov
- Masters Program in National Research University Higher School of Economics, Moscow, Russia
| | - Steven L Salzberg
- Departments of Biomedical Engineering, Computer Science, and Biostatistics, Johns Hopkins University, Baltimore, MD, USA
| | | | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, USA
- Department of Computer Science, Rice University, Houston, TX, USA
| | | | | | - Alaina Shumate
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | | | - Likhitha Surapaneni
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Angela M Taravella Oill
- Center for Evolution and Medicine, School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Winston Timp
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Marta Tomaszkiewicz
- Department of Biology, Pennsylvania State University, University Park, PA, USA
- Department of Biomedical Engineering, Pennsylvania State University, State College, PA, USA
| | - Mitchell R Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Brian P Walenz
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Allison C Watwood
- Department of Biology, Pennsylvania State University, University Park, PA, USA
| | | | | | - Melissa A Wilson
- Center for Evolution and Medicine, School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Samantha Zarate
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Yiming Zhu
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, USA
| | - Justin M Zook
- Biosystems and Biomaterials Division, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Investigator, Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Rachel J O'Neill
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Genetics and Genome Sciences, UConn Health, Farmington, CT, USA
| | - Michael C Schatz
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Karen H Miga
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Kateryna D Makova
- Department of Biology, Pennsylvania State University, University Park, PA, USA
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
20
|
Abstract
The p-arms of the five human acrocentric chromosomes bear nucleolar organizer regions (NORs) comprising ribosomal gene (rDNA) repeats that are organized in a homogeneous tandem array and transcribed in a telomere-to-centromere direction. Precursor ribosomal RNA transcripts are processed and assembled into ribosomal subunits, the nucleolus being the physical manifestation of this process. I review current understanding of nucleolar chromosome biology and describe current exploration into a role for the NOR chromosomal context. Full DNA sequences for acrocentric p-arms are now emerging, aided by the current revolution in long-read sequencing and genome assembly. Acrocentric p-arms vary from 10.1 to 16.7 Mb, accounting for ∼2.2% of the genome. Bordering rDNA arrays, distal junctions, and proximal junctions are shared among the p-arms, with distal junctions showing evidence of functionality. The remaining p-arm sequences comprise multiple satellite DNA classes and segmental duplications that facilitate recombination between heterologous chromosomes, which is likely also involved in Robertsonian translocations.
Collapse
Affiliation(s)
- Brian McStay
- Centre for Chromosome Biology, College of Science and Engineering, University of Galway, Galway, Ireland;
| |
Collapse
|
21
|
Yang X, Wang X, Zou Y, Zhang S, Xia M, Fu L, Vollger MR, Chen NC, Taylor DJ, Harvey WT, Logsdon GA, Meng D, Shi J, McCoy RC, Schatz MC, Li W, Eichler EE, Lu Q, Mao Y. Characterization of large-scale genomic differences in the first complete human genome. Genome Biol 2023; 24:157. [PMID: 37403156 PMCID: PMC10320979 DOI: 10.1186/s13059-023-02995-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 06/23/2023] [Indexed: 07/06/2023] Open
Abstract
BACKGROUND The first telomere-to-telomere (T2T) human genome assembly (T2T-CHM13) release is a milestone in human genomics. The T2T-CHM13 genome assembly extends our understanding of telomeres, centromeres, segmental duplication, and other complex regions. The current human genome reference (GRCh38) has been widely used in various human genomic studies. However, the large-scale genomic differences between these two important genome assemblies are not characterized in detail yet. RESULTS Here, in addition to the previously reported "non-syntenic" regions, we find 67 additional large-scale discrepant regions and precisely categorize them into four structural types with a newly developed website tool called SynPlotter. The discrepant regions (~ 21.6 Mbp) excluding telomeric and centromeric regions are highly structurally polymorphic in humans, where the deletions or duplications are likely associated with various human diseases, such as immune and neurodevelopmental disorders. The analyses of a newly identified discrepant region-the KLRC gene cluster-show that the depletion of KLRC2 by a single-deletion event is associated with natural killer cell differentiation in ~ 20% of humans. Meanwhile, the rapid amino acid replacements observed within KLRC3 are probably a result of natural selection in primate evolution. CONCLUSION Our study provides a foundation for understanding the large-scale structural genomic differences between the two crucial human reference genomes, and is thereby important for future human genomics studies.
Collapse
Affiliation(s)
- Xiangyu Yang
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Xuankai Wang
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Yawen Zou
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Shilong Zhang
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Manying Xia
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Lianting Fu
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Mitchell R Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Nae-Chyun Chen
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Dylan J Taylor
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Dan Meng
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Junfeng Shi
- Shanghai Engineering Research Center of Advanced Dental Technology and Materials, Shanghai, China
- Shanghai Key Laboratory of Stomatology, Shanghai Ninth People's Hospital, College of Stomatology, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Rajiv C McCoy
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Weidong Li
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Qing Lu
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Yafei Mao
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China.
- Shanghai Key Laboratory of Stomatology, Shanghai Ninth People's Hospital, College of Stomatology, Shanghai Jiao Tong University School of Medicine, Shanghai, China.
| |
Collapse
|
22
|
Laufer VA, Glover TW, Wilson TE. Applications of advanced technologies for detecting genomic structural variation. MUTATION RESEARCH. REVIEWS IN MUTATION RESEARCH 2023; 792:108475. [PMID: 37931775 PMCID: PMC10792551 DOI: 10.1016/j.mrrev.2023.108475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 09/07/2023] [Accepted: 11/02/2023] [Indexed: 11/08/2023]
Abstract
Chromosomal structural variation (SV) encompasses a heterogenous class of genetic variants that exerts strong influences on human health and disease. Despite their importance, many structural variants (SVs) have remained poorly characterized at even a basic level, a discrepancy predicated upon the technical limitations of prior genomic assays. However, recent advances in genomic technology can identify and localize SVs accurately, opening new questions regarding SV risk factors and their impacts in humans. Here, we first define and classify human SVs and their generative mechanisms, highlighting characteristics leveraged by various SV assays. We next examine the first-ever gapless assembly of the human genome and the technical process of assembling it, which required third-generation sequencing technologies to resolve structurally complex loci. The new portions of that "telomere-to-telomere" and subsequent pangenome assemblies highlight aspects of SV biology likely to develop in the near-term. We consider the strengths and limitations of the most promising new SV technologies and when they or longstanding approaches are best suited to meeting salient goals in the study of human SV in population-scale genomics research, clinical, and public health contexts. It is a watershed time in our understanding of human SV when new approaches are expected to fundamentally change genomic applications.
Collapse
Affiliation(s)
- Vincent A Laufer
- Department of Pathology, University of Michigan Medical School, Ann Arbor, MI 48109, USA.
| | - Thomas W Glover
- Department of Pathology, University of Michigan Medical School, Ann Arbor, MI 48109, USA; Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI 48109, USA.
| | - Thomas E Wilson
- Department of Pathology, University of Michigan Medical School, Ann Arbor, MI 48109, USA; Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI 48109, USA.
| |
Collapse
|
23
|
Pazhenkova EA, Lukhtanov VA. Chromosomal conservatism vs chromosomal megaevolution: enigma of karyotypic evolution in Lepidoptera. Chromosome Res 2023; 31:16. [PMID: 37300756 DOI: 10.1007/s10577-023-09725-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 05/21/2023] [Accepted: 05/23/2023] [Indexed: 06/12/2023]
Abstract
In the evolution of many organisms, periods of slow genome reorganization (= chromosomal conservatism) are interrupted by bursts of numerous chromosomal changes (= chromosomal megaevolution). Using comparative analysis of chromosome-level genome assemblies, we investigated these processes in blue butterflies (Lycaenidae). We demonstrate that the phase of chromosome number conservatism is characterized by the stability of most autosomes and dynamic evolution of the sex chromosome Z, resulting in multiple variants of NeoZ chromosomes due to autosome-sex chromosome fusions. In contrast during the phase of rapid chromosomal evolution, the explosive increase in chromosome number occurs mainly through simple chromosomal fissions. We show that chromosomal megaevolution is a highly non-random canalized process, and in two phylogenetically independent Lysandra lineages, the drastic parallel increase in number of fragmented chromosomes was achieved, at least partially, through reuse of the same ancestral chromosomal breakpoints. In species showing chromosome number doubling, we found no blocks of duplicated sequences or duplicated chromosomes, thus refuting the hypothesis of polyploidy. In the studied taxa, long blocks of interstitial telomere sequences (ITSs) consist of (TTAGG)n arrays interspersed with telomere-specific retrotransposons. ITSs are sporadically present in rapidly evolving Lysandra karyotypes, but not in the species with ancestral chromosome number. Therefore, we hypothesize that the transposition of telomeric sequences may be triggers of the rapid chromosome number increase. Finally, we discuss the hypothetical genomic and population mechanisms of chromosomal megaevolution and argue that the disproportionally high evolutionary role of the Z sex chromosome can be additionally reinforced by sex chromosome-autosome fusions and Z-chromosome inversions.
Collapse
Affiliation(s)
- Elena A Pazhenkova
- Department of Biology, Biotechnical Faculty, University of Ljubljana, Večna Pot 111, 1000, Ljubljana, Slovenia.
| | - Vladimir A Lukhtanov
- Department of Karyosystematics, Zoological Institute of Russian Academy of Sciences, Universitetskaya Nab. 1, 199034, St. Petersburg, Russia.
| |
Collapse
|
24
|
Wlodzimierz P, Rabanal FA, Burns R, Naish M, Primetis E, Scott A, Mandáková T, Gorringe N, Tock AJ, Holland D, Fritschi K, Habring A, Lanz C, Patel C, Schlegel T, Collenberg M, Mielke M, Nordborg M, Roux F, Shirsekar G, Alonso-Blanco C, Lysak MA, Novikova PY, Bousios A, Weigel D, Henderson IR. Cycles of satellite and transposon evolution in Arabidopsis centromeres. Nature 2023:10.1038/s41586-023-06062-z. [PMID: 37198485 DOI: 10.1038/s41586-023-06062-z] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Accepted: 04/06/2023] [Indexed: 05/19/2023]
Abstract
Centromeres are critical for cell division, loading CENH3 or CENPA histone variant nucleosomes, directing kinetochore formation and allowing chromosome segregation1,2. Despite their conserved function, centromere size and structure are diverse across species. To understand this centromere paradox3,4, it is necessary to know how centromeric diversity is generated and whether it reflects ancient trans-species variation or, instead, rapid post-speciation divergence. To address these questions, we assembled 346 centromeres from 66 Arabidopsis thaliana and 2 Arabidopsis lyrata accessions, which exhibited a remarkable degree of intra- and inter-species diversity. A. thaliana centromere repeat arrays are embedded in linkage blocks, despite ongoing internal satellite turnover, consistent with roles for unidirectional gene conversion or unequal crossover between sister chromatids in sequence diversification. Additionally, centrophilic ATHILA transposons have recently invaded the satellite arrays. To counter ATHILA invasion, chromosome-specific bursts of satellite homogenization generate higher-order repeats and purge transposons, in line with cycles of repeat evolution. Centromeric sequence changes are even more extreme in comparison between A. thaliana and A. lyrata. Together, our findings identify rapid cycles of transposon invasion and purging through satellite homogenization, which drive centromere evolution and ultimately contribute to speciation.
Collapse
Affiliation(s)
- Piotr Wlodzimierz
- Department of Plant Sciences, University of Cambridge, Cambridge, UK
| | - Fernando A Rabanal
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, Tübingen, Germany
| | - Robin Burns
- Department of Plant Sciences, University of Cambridge, Cambridge, UK
| | - Matthew Naish
- Department of Plant Sciences, University of Cambridge, Cambridge, UK
| | - Elias Primetis
- School of Life Sciences, University of Sussex, Brighton, UK
| | - Alison Scott
- Department of Chromosome Biology, Max Planck Institute for Plant Breeding Research, Cologne, Germany
| | - Terezie Mandáková
- Central European Institute of Technology, Masaryk University, Brno, Czech Republic
| | - Nicola Gorringe
- Department of Plant Sciences, University of Cambridge, Cambridge, UK
| | - Andrew J Tock
- Department of Plant Sciences, University of Cambridge, Cambridge, UK
| | - Daniel Holland
- Department of Plant Sciences, University of Cambridge, Cambridge, UK
| | - Katrin Fritschi
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, Tübingen, Germany
| | - Anette Habring
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, Tübingen, Germany
| | - Christa Lanz
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, Tübingen, Germany
| | - Christie Patel
- Department of Plant Sciences, University of Cambridge, Cambridge, UK
| | - Theresa Schlegel
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, Tübingen, Germany
| | - Maximilian Collenberg
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, Tübingen, Germany
| | - Miriam Mielke
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, Tübingen, Germany
| | - Magnus Nordborg
- Gregor Mendel Institute, Vienna, Austrian Academy of Sciences, Vienna BioCenter, Vienna, Austria
| | - Fabrice Roux
- LIPME, INRAE, CNRS, Université de Toulouse, Castanet-Tolosan, France
| | - Gautam Shirsekar
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, Tübingen, Germany
| | - Carlos Alonso-Blanco
- Departamento de Genética Molecular de Plantas, Centro Nacional de Biotecnología, Consejo Superior de Investigaciones Científicas, Madrid, Spain
| | - Martin A Lysak
- Central European Institute of Technology, Masaryk University, Brno, Czech Republic
| | - Polina Y Novikova
- Department of Chromosome Biology, Max Planck Institute for Plant Breeding Research, Cologne, Germany
| | | | - Detlef Weigel
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, Tübingen, Germany.
| | - Ian R Henderson
- Department of Plant Sciences, University of Cambridge, Cambridge, UK.
| |
Collapse
|
25
|
Olson ND, Wagner J, Dwarshuis N, Miga KH, Sedlazeck FJ, Salit M, Zook JM. Variant calling and benchmarking in an era of complete human genome sequences. Nat Rev Genet 2023:10.1038/s41576-023-00590-0. [PMID: 37059810 DOI: 10.1038/s41576-023-00590-0] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/22/2023] [Indexed: 04/16/2023]
Abstract
Genetic variant calling from DNA sequencing has enabled understanding of germline variation in hundreds of thousands of humans. Sequencing technologies and variant-calling methods have advanced rapidly, routinely providing reliable variant calls in most of the human genome. We describe how advances in long reads, deep learning, de novo assembly and pangenomes have expanded access to variant calls in increasingly challenging, repetitive genomic regions, including medically relevant regions, and how new benchmark sets and benchmarking methods illuminate their strengths and limitations. Finally, we explore the possible future of more complete characterization of human genome variation in light of the recent completion of a telomere-to-telomere human genome reference assembly and human pangenomes, and we consider the innovations needed to benchmark their newly accessible repetitive regions and complex variants.
Collapse
Affiliation(s)
- Nathan D Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Nathan Dwarshuis
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Karen H Miga
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Fritz J Sedlazeck
- Baylor College of Medicine, Human Genome Sequencing Center, Houston, TX, USA
| | | | - Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA.
| |
Collapse
|
26
|
Denti L, Khorsand P, Bonizzoni P, Hormozdiari F, Chikhi R. SVDSS: structural variation discovery in hard-to-call genomic regions using sample-specific strings from accurate long reads. Nat Methods 2023; 20:550-558. [PMID: 36550274 DOI: 10.1038/s41592-022-01674-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Accepted: 10/08/2022] [Indexed: 12/24/2022]
Abstract
Structural variants (SVs) account for a large amount of sequence variability across genomes and play an important role in human genomics and precision medicine. Despite intense efforts over the years, the discovery of SVs in individuals remains challenging due to the diploid and highly repetitive structure of the human genome, and by the presence of SVs that vastly exceed sequencing read lengths. However, the recent introduction of low-error long-read sequencing technologies such as PacBio HiFi may finally enable these barriers to be overcome. Here we present SV discovery with sample-specific strings (SVDSS)-a method for discovery of SVs from long-read sequencing technologies (for example, PacBio HiFi) that combines and effectively leverages mapping-free, mapping-based and assembly-based methodologies for overall superior SV discovery performance. Our experiments on several human samples show that SVDSS outperforms state-of-the-art mapping-based methods for discovery of insertion and deletion SVs in PacBio HiFi reads and achieves notable improvements in calling SVs in repetitive regions of the genome.
Collapse
Affiliation(s)
- Luca Denti
- Sequence Bioinformatics, Department of Computational Biology, Institut Pasteur, Paris, France
| | | | - Paola Bonizzoni
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy.
| | - Fereydoun Hormozdiari
- Genome Center, UC Davis, Davis, CA, USA.
- UC Davis MIND Institute, Sacramento, CA, USA.
- Department of Biochemistry and Molecular Medicine, Sacramento, UC Davis, Sacramento, CA, USA.
| | - Rayan Chikhi
- Sequence Bioinformatics, Department of Computational Biology, Institut Pasteur, Paris, France.
| |
Collapse
|
27
|
Chang J, Stahlke AR, Chudalayandi S, Rosen BD, Childers AK, Severin AJ. polishCLR: A Nextflow Workflow for Polishing PacBio CLR Genome Assemblies. Genome Biol Evol 2023; 15:7040681. [PMID: 36792366 PMCID: PMC9985148 DOI: 10.1093/gbe/evad020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Revised: 02/02/2023] [Accepted: 02/08/2023] [Indexed: 02/17/2023] Open
Abstract
Long-read sequencing has revolutionized genome assembly, yielding highly contiguous, chromosome-level contigs. However, assemblies from some third generation long read technologies, such as Pacific Biosciences (PacBio) continuous long reads (CLR), have a high error rate. Such errors can be corrected with short reads through a process called polishing. Although best practices for polishing non-model de novo genome assemblies were recently described by the Vertebrate Genome Project (VGP) Assembly community, there is a need for a publicly available, reproducible workflow that can be easily implemented and run on a conventional high performance computing environment. Here, we describe polishCLR (https://github.com/isugifNF/polishCLR), a reproducible Nextflow workflow that implements best practices for polishing assemblies made from CLR data. PolishCLR can be initiated from several input options that extend best practices to suboptimal cases. It also provides re-entry points throughout several key processes, including identifying duplicate haplotypes in purge_dups, allowing a break for scaffolding if data are available, and throughout multiple rounds of polishing and evaluation with Arrow and FreeBayes. PolishCLR is containerized and publicly available for the greater assembly community as a tool to complete assemblies from existing, error-prone long-read data.
Collapse
Affiliation(s)
- Jennifer Chang
- USDA, Agricultural Research Service, Jamie Whitten Delta States Research Center, Genomics and Bioinformatics Research Unit, Stoneville, Mississippi.,Oak Ridge Institute for Science and Education, Oak Ridge, Tennessee.,Genome Informatics Facility, Office of Biotechnology, Iowa State University, Ames
| | - Amanda R Stahlke
- USDA, Agricultural Research Service, Beltsville Agricultural Research Center, Bee Research Laboratory, Beltsville Maryland
| | | | - Benjamin D Rosen
- USDA, Agricultural Research Service, Beltsville Agricultural Research Center, Animal Genomics and Improvement Laboratory, Beltsville, Maryland
| | - Anna K Childers
- USDA, Agricultural Research Service, Beltsville Agricultural Research Center, Bee Research Laboratory, Beltsville Maryland
| | - Andrew J Severin
- Genome Informatics Facility, Office of Biotechnology, Iowa State University, Ames
| |
Collapse
|
28
|
Abstract
A perfect bacterial genome assembly is one where the assembled sequence is an exact match for the organism's genome-each replicon sequence is complete and contains no errors. While this has been difficult to achieve in the past, improvements in long-read sequencing, assemblers, and polishers have brought perfect assemblies within reach. Here, we describe our recommended approach for assembling a bacterial genome to perfection using a combination of Oxford Nanopore Technologies long reads and Illumina short reads: Trycycler long-read assembly, Medaka long-read polishing, Polypolish short-read polishing, followed by other short-read polishing tools and manual curation. We also discuss potential pitfalls one might encounter when assembling challenging genomes, and we provide an online tutorial with sample data (github.com/rrwick/perfect-bacterial-genome-tutorial).
Collapse
|
29
|
DeepConsensus improves the accuracy of sequences with a gap-aware sequence transformer. Nat Biotechnol 2023; 41:232-238. [PMID: 36050551 DOI: 10.1038/s41587-022-01435-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Accepted: 07/15/2022] [Indexed: 11/08/2022]
Abstract
Circular consensus sequencing with Pacific Biosciences (PacBio) technology generates long (10-25 kilobases), accurate 'HiFi' reads by combining serial observations of a DNA molecule into a consensus sequence. The standard approach to consensus generation, pbccs, uses a hidden Markov model. We introduce DeepConsensus, which uses an alignment-based loss to train a gap-aware transformer-encoder for sequence correction. Compared to pbccs, DeepConsensus reduces read errors by 42%. This increases the yield of PacBio HiFi reads at Q20 by 9%, at Q30 by 27% and at Q40 by 90%. With two SMRT Cells of HG003, reads from DeepConsensus improve hifiasm assembly contiguity (NG50 4.9 megabases (Mb) to 17.2 Mb), increase gene completeness (94% to 97%), reduce the false gene duplication rate (1.1% to 0.5%), improve assembly base accuracy (Q43 to Q45) and reduce variant-calling errors by 24%. DeepConsensus models could be trained to the general problem of analyzing the alignment of other types of sequences, such as unique molecular identifiers or genome assemblies.
Collapse
|
30
|
Secomandi S, Gallo GR, Sozzoni M, Iannucci A, Galati E, Abueg L, Balacco J, Caprioli M, Chow W, Ciofi C, Collins J, Fedrigo O, Ferretti L, Fungtammasan A, Haase B, Howe K, Kwak W, Lombardo G, Masterson P, Messina G, Møller AP, Mountcastle J, Mousseau TA, Ferrer Obiol J, Olivieri A, Rhie A, Rubolini D, Saclier M, Stanyon R, Stucki D, Thibaud-Nissen F, Torrance J, Torroni A, Weber K, Ambrosini R, Bonisoli-Alquati A, Jarvis ED, Gianfranceschi L, Formenti G. A chromosome-level reference genome and pangenome for barn swallow population genomics. Cell Rep 2023; 42:111992. [PMID: 36662619 PMCID: PMC10044405 DOI: 10.1016/j.celrep.2023.111992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Revised: 07/20/2022] [Accepted: 01/04/2023] [Indexed: 01/20/2023] Open
Abstract
Insights into the evolution of non-model organisms are limited by the lack of reference genomes of high accuracy, completeness, and contiguity. Here, we present a chromosome-level, karyotype-validated reference genome and pangenome for the barn swallow (Hirundo rustica). We complement these resources with a reference-free multialignment of the reference genome with other bird genomes and with the most comprehensive catalog of genetic markers for the barn swallow. We identify potentially conserved and accelerated genes using the multialignment and estimate genome-wide linkage disequilibrium using the catalog. We use the pangenome to infer core and accessory genes and to detect variants using it as a reference. Overall, these resources will foster population genomics studies in the barn swallow, enable detection of candidate genes in comparative genomics studies, and help reduce bias toward a single reference genome.
Collapse
Affiliation(s)
- Simona Secomandi
- Department of Biosciences, University of Milan, Milan, Italy; Department of Biological Sciences, University of Cyprus, Nicosia, Cyprus
| | - Guido R Gallo
- Department of Biosciences, University of Milan, Milan, Italy
| | | | - Alessio Iannucci
- Department of Biology, University of Florence, Sesto Fiorentino (FI), Italy
| | - Elena Galati
- Department of Biosciences, University of Milan, Milan, Italy
| | - Linelle Abueg
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | - Jennifer Balacco
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | - Manuela Caprioli
- Department of Environmental Sciences and Policy, University of Milan, Milan, Italy
| | | | - Claudio Ciofi
- Department of Biology, University of Florence, Sesto Fiorentino (FI), Italy
| | | | - Olivier Fedrigo
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | - Luca Ferretti
- Department of Biology and Biotechnology "L. Spallanzani", University of Pavia, Pavia, Italy
| | | | - Bettina Haase
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | | | - Woori Kwak
- Department of Medical and Biological Sciences, The Catholic University of Korea, Bucheon 14662, Korea
| | - Gianluca Lombardo
- Department of Biology and Biotechnology "L. Spallanzani", University of Pavia, Pavia, Italy
| | - Patrick Masterson
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | | - Anders P Møller
- Ecologie Systématique Evolution, Université Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, Orsay Cedex, France
| | | | - Timothy A Mousseau
- Department of Biological Sciences, University of South Carolina, Columbia, SC 29208, USA
| | - Joan Ferrer Obiol
- Department of Environmental Sciences and Policy, University of Milan, Milan, Italy
| | - Anna Olivieri
- Department of Biology and Biotechnology "L. Spallanzani", University of Pavia, Pavia, Italy
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Diego Rubolini
- Department of Environmental Sciences and Policy, University of Milan, Milan, Italy
| | | | - Roscoe Stanyon
- Department of Biology, University of Florence, Sesto Fiorentino (FI), Italy
| | | | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | | - Antonio Torroni
- Department of Biology and Biotechnology "L. Spallanzani", University of Pavia, Pavia, Italy
| | | | - Roberto Ambrosini
- Department of Environmental Sciences and Policy, University of Milan, Milan, Italy
| | - Andrea Bonisoli-Alquati
- Department of Biological Sciences, California State Polytechnic University - Pomona, Pomona, CA, USA
| | - Erich D Jarvis
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA; The Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | | | - Giulio Formenti
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA.
| |
Collapse
|
31
|
Silva JM, Qi W, Pinho AJ, Pratas D. AlcoR: alignment-free simulation, mapping, and visualization of low-complexity regions in biological data. Gigascience 2022; 12:giad101. [PMID: 38091509 PMCID: PMC10716826 DOI: 10.1093/gigascience/giad101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 09/29/2023] [Accepted: 11/07/2023] [Indexed: 12/18/2023] Open
Abstract
BACKGROUND Low-complexity data analysis is the area that addresses the search and quantification of regions in sequences of elements that contain low-complexity or repetitive elements. For example, these can be tandem repeats, inverted repeats, homopolymer tails, GC-biased regions, similar genes, and hairpins, among many others. Identifying these regions is crucial because of their association with regulatory and structural characteristics. Moreover, their identification provides positional and quantity information where standard assembly methodologies face significant difficulties because of substantial higher depth coverage (mountains), ambiguous read mapping, or where sequencing or reconstruction defects may occur. However, the capability to distinguish low-complexity regions (LCRs) in genomic and proteomic sequences is a challenge that depends on the model's ability to find them automatically. Low-complexity patterns can be implicit through specific or combined sources, such as algorithmic or probabilistic, and recurring to different spatial distances-namely, local, medium, or distant associations. FINDINGS This article addresses the challenge of automatically modeling and distinguishing LCRs, providing a new method and tool (AlcoR) for efficient and accurate segmentation and visualization of these regions in genomic and proteomic sequences. The method enables the use of models with different memories, providing the ability to distinguish local from distant low-complexity patterns. The method is reference and alignment free, providing additional methodologies for testing, including a highly flexible simulation method for generating biological sequences (DNA or protein) with different complexity levels, sequence masking, and a visualization tool for automatic computation of the LCR maps into an ideogram style. We provide illustrative demonstrations using synthetic, nearly synthetic, and natural sequences showing the high efficiency and accuracy of AlcoR. As large-scale results, we use AlcoR to unprecedentedly provide a whole-chromosome low-complexity map of a recent complete human genome and the haplotype-resolved chromosome pairs of a heterozygous diploid African cassava cultivar. CONCLUSIONS The AlcoR method provides the ability of fast sequence characterization through data complexity analysis, ideally for scenarios entangling the presence of new or unknown sequences. AlcoR is implemented in C language using multithreading to increase the computational speed, is flexible for multiple applications, and does not contain external dependencies. The tool accepts any sequence in FASTA format. The source code is freely provided at https://github.com/cobilab/alcor.
Collapse
Affiliation(s)
- Jorge M Silva
- IEETA, Institute of Electronics and Informatics Engineering of Aveiro, and LASI, Intelligent Systems Associate Laboratory, University of Aveiro, Campus Universitário de Santiago, 3810-193 Aveiro, Portugal
- Department of Electronics Telecommunications and Informatics, University of Aveiro, Campus Universitario de Santiago, 3810-193, Aveiro, Portugal
| | - Weihong Qi
- Functional Genomics Center Zurich, ETH Zurich and University of Zurich, Winterthurerstrasse, 190, 8057, Zurich, Switzerland
- SIB, Swiss Institute of Bioinformatics, 1202, Geneva, Switzerland
| | - Armando J Pinho
- IEETA, Institute of Electronics and Informatics Engineering of Aveiro, and LASI, Intelligent Systems Associate Laboratory, University of Aveiro, Campus Universitário de Santiago, 3810-193 Aveiro, Portugal
- Department of Electronics Telecommunications and Informatics, University of Aveiro, Campus Universitario de Santiago, 3810-193, Aveiro, Portugal
| | - Diogo Pratas
- IEETA, Institute of Electronics and Informatics Engineering of Aveiro, and LASI, Intelligent Systems Associate Laboratory, University of Aveiro, Campus Universitário de Santiago, 3810-193 Aveiro, Portugal
- Department of Electronics Telecommunications and Informatics, University of Aveiro, Campus Universitario de Santiago, 3810-193, Aveiro, Portugal
- Department of Virology, University of Helsinki, Haartmaninkatu, 3, 00014 Helsinki, Finland
| |
Collapse
|
32
|
Wang H, Xu D, Wang S, Wang A, Lei L, Jiang F, Yang B, Yuan L, Chen R, Zhang Y, Fan W. Chromosome-scale Amaranthus tricolor genome provides insights into the evolution of the genus Amaranthus and the mechanism of betalain biosynthesis. DNA Res 2022; 30:6880148. [PMID: 36473054 PMCID: PMC9847342 DOI: 10.1093/dnares/dsac050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 11/25/2022] [Accepted: 12/01/2022] [Indexed: 12/12/2022] Open
Abstract
Amaranthus tricolor is a vegetable and ornamental amaranth, with high lysine, dietary fibre and squalene content. The red cultivar of A. tricolor possesses a high concentration of betalains, which has been used as natural food colorants. Here, we constructed the genome of A. tricolor, the first reference genome for the subgenus Albersia, combining PacBio HiFi, Nanopore ultra-long and Hi-C data. The contig N50 size was 906 kb, and 99.58% of contig sequence was anchored to the 17 chromosomes, totalling 520 Mb. We annotated 27,813 protein-coding genes with an average 1.3 kb coding sequence and 5.3 exons. We inferred that A. tricolor underwent a whole-genome duplication (WGD) and that the WGD shared by amaranths occurred in the last common ancestor of subfamily Amaranthoideae. Moreover, we comprehensively identified candidate genes in betalain biosynthesis pathway. Among them, DODAα1 and CYP76ADα1, located in one topologically associated domain (TAD) of an active (A) compartment on chromosome 16, were more highly expressed in red leaves than in green leaves, and DODAα1 might be the rate-limiting enzyme gene in betalains biosynthesis. This study presents new genome resources and enriches our understanding of amaranth evolution, betalains production, facilitating molecular breeding improvements and the understanding of C4 plants evolution.
Collapse
Affiliation(s)
| | | | - Sen Wang
- Guangdong Laboratory for Lingnan Modern Agriculture (Shenzhen Branch), Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong 518120, China
| | - Anqi Wang
- Guangdong Laboratory for Lingnan Modern Agriculture (Shenzhen Branch), Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong 518120, China
| | - Lihong Lei
- Guangdong Laboratory for Lingnan Modern Agriculture (Shenzhen Branch), Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong 518120, China
| | - Fan Jiang
- Guangdong Laboratory for Lingnan Modern Agriculture (Shenzhen Branch), Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong 518120, China
| | - Boyuan Yang
- Guangdong Laboratory for Lingnan Modern Agriculture (Shenzhen Branch), Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong 518120, China
| | - Lihua Yuan
- Guangdong Laboratory for Lingnan Modern Agriculture (Shenzhen Branch), Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong 518120, China
| | - Rong Chen
- Guangdong Laboratory for Lingnan Modern Agriculture (Shenzhen Branch), Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong 518120, China
| | - Yan Zhang
- Guangdong Laboratory for Lingnan Modern Agriculture (Shenzhen Branch), Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong 518120, China
| | - Wei Fan
- To whom correspondence should be addressed. Tel. +86 18165787021.
| |
Collapse
|
33
|
Cechova M, Miga KH. Satellite DNAs and human sex chromosome variation. Semin Cell Dev Biol 2022; 128:15-25. [PMID: 35644878 DOI: 10.1016/j.semcdb.2022.04.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Revised: 04/26/2022] [Accepted: 04/27/2022] [Indexed: 11/17/2022]
Abstract
Satellite DNAs are present on every chromosome in the cell and are typically enriched in repetitive, heterochromatic parts of the human genome. Sex chromosomes represent a unique genomic and epigenetic context. In this review, we first report what is known about satellite DNA biology on human X and Y chromosomes, including repeat content and organization, as well as satellite variation in typical euploid individuals. Then, we review sex chromosome aneuploidies that are among the most common types of aneuploidies in the general population, and are better tolerated than autosomal aneuploidies. This is demonstrated also by the fact that aging is associated with the loss of the X, and especially the Y chromosome. In addition, supernumerary sex chromosomes enable us to study general processes in a cell, such as analyzing heterochromatin dosage (i.e. additional Barr bodies and long heterochromatin arrays on Yq) and their downstream consequences. Finally, genomic and epigenetic organization and regulation of satellite DNA could influence chromosome stability and lead to aneuploidy. In this review, we argue that the complete annotation of satellite DNA on sex chromosomes in human, and especially in centromeric regions, will aid in explaining the prevalence and the consequences of sex chromosome aneuploidies.
Collapse
Affiliation(s)
- Monika Cechova
- Faculty of Informatics, Masaryk University, Czech Republic
| | - Karen H Miga
- Department of Biomolecular Engineering, University of California Santa Cruz, CA, USA; UC Santa Cruz Genomics Institute, University of California Santa Cruz, CA 95064, USA
| |
Collapse
|
34
|
Deng Y, Liu S, Zhang Y, Tan J, Li X, Chu X, Xu B, Tian Y, Sun Y, Li B, Xu Y, Deng XW, He H, Zhang X. A telomere-to-telomere gap-free reference genome of watermelon and its mutation library provide important resources for gene discovery and breeding. MOLECULAR PLANT 2022; 15:1268-1284. [PMID: 35746868 DOI: 10.1016/j.molp.2022.06.010] [Citation(s) in RCA: 57] [Impact Index Per Article: 28.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/29/2022] [Revised: 05/30/2022] [Accepted: 06/19/2022] [Indexed: 06/15/2023]
Abstract
Watermelon, Citrullus lanatus, is the world's third largest fruit crop. Reference genomes with gaps and a narrow genetic base hinder functional genomics and genetic improvement of watermelon. Here, we report the assembly of a telomere-to-telomere gap-free genome of the elite watermelon inbred line G42 by incorporating high-coverage and accurate long-read sequencing data with multiple assembly strategies. All 11 chromosomes have been assembled into single-contig pseudomolecules without gaps, representing the highest completeness and assembly quality to date. The G42 reference genome is 369 321 829 bp in length and contains 24 205 predicted protein-coding genes, with all 22 telomeres and 11 centromeres characterized. Furthermore, we established a pollen-EMS mutagenesis protocol and obtained over 200 000 M1 seeds from G42 . In a sampling pool, 48 monogenic phenotypic mutations, selected from 223 M1 and 78 M2 mutants with morphological changes, were confirmed. The average mutation density was 1 SNP/1.69 Mb and 1 indel/4.55 Mb per M1 plant and 1 SNP/1.08 Mb and 1 indel/6.25 Mb per M2 plant. Taking advantage of the gap-free G42 genome, 8039 mutations from 32 plants sampled from M1 and M2 families were identified with 100% accuracy, whereas only 25% of the randomly selected mutations identified using the 97103v2 reference genome could be confirmed. Using this library and the gap-free genome, two genes responsible for elongated fruit shape and male sterility (ClMS1) were identified, both caused by a single base change from G to A. The validated gap-free genome and its EMS mutation library provide invaluable resources for functional genomics and genetic improvement of watermelon.
Collapse
Affiliation(s)
- Yun Deng
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences at Weifang, Weifang, Shandong 261000, China
| | - Shoucheng Liu
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences at Weifang, Weifang, Shandong 261000, China
| | - Yilin Zhang
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences at Weifang, Weifang, Shandong 261000, China; School of Advanced Agricultural Sciences and School of Life Sciences, State Key Laboratory of Protein and Plant Gene Research, Peking University, Beijing 100871, China
| | - Jingsheng Tan
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences at Weifang, Weifang, Shandong 261000, China
| | - Xiaopeng Li
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences at Weifang, Weifang, Shandong 261000, China
| | - Xiao Chu
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences at Weifang, Weifang, Shandong 261000, China
| | - Binghua Xu
- Jiangsu Xuhuai Area Huaiyin Institute of Agricultural Science, Huaian, Jiangsu 223300, China
| | - Yao Tian
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences at Weifang, Weifang, Shandong 261000, China
| | - Yudong Sun
- Jiangsu Xuhuai Area Huaiyin Institute of Agricultural Science, Huaian, Jiangsu 223300, China
| | - Bosheng Li
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences at Weifang, Weifang, Shandong 261000, China
| | - Yunbi Xu
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences at Weifang, Weifang, Shandong 261000, China
| | - Xing Wang Deng
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences at Weifang, Weifang, Shandong 261000, China; School of Advanced Agricultural Sciences and School of Life Sciences, State Key Laboratory of Protein and Plant Gene Research, Peking University, Beijing 100871, China
| | - Hang He
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences at Weifang, Weifang, Shandong 261000, China; School of Advanced Agricultural Sciences and School of Life Sciences, State Key Laboratory of Protein and Plant Gene Research, Peking University, Beijing 100871, China.
| | - Xingping Zhang
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences at Weifang, Weifang, Shandong 261000, China.
| |
Collapse
|
35
|
Merfin: improved variant filtering, assembly evaluation and polishing via k-mer validation. Nat Methods 2022; 19:696-704. [PMID: 35361932 PMCID: PMC9745813 DOI: 10.1038/s41592-022-01445-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Accepted: 03/07/2022] [Indexed: 12/15/2022]
Abstract
Variant calling has been widely used for genotyping and for improving the consensus accuracy of long-read assemblies. Variant calls are commonly hard-filtered with user-defined cutoffs. However, it is impossible to define a single set of optimal cutoffs, as the calls heavily depend on the quality of the reads, the variant caller of choice and the quality of the unpolished assembly. Here, we introduce Merfin, a k-mer based variant-filtering algorithm for improved accuracy in genotyping and genome assembly polishing. Merfin evaluates each variant based on the expected k-mer multiplicity in the reads, independently of the quality of the read alignment and variant caller's internal score. Merfin increased the precision of genotyped calls in several benchmarks, improved consensus accuracy and reduced frameshift errors when applied to human and nonhuman assemblies built from Pacific Biosciences HiFi and continuous long reads or Oxford Nanopore reads, including the first complete human genome. Moreover, we introduce assembly quality and completeness metrics that account for the expected genomic copy numbers.
Collapse
|
36
|
Polishing high-quality genome assemblies. Nat Methods 2022; 19:649-650. [PMID: 35610477 DOI: 10.1038/s41592-022-01515-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
37
|
Nurk S, Koren S, Rhie A, Rautiainen M, Bzikadze AV, Mikheenko A, Vollger MR, Altemose N, Uralsky L, Gershman A, Aganezov S, Hoyt SJ, Diekhans M, Logsdon GA, Alonge M, Antonarakis SE, Borchers M, Bouffard GG, Brooks SY, Caldas GV, Chen NC, Cheng H, Chin CS, Chow W, de Lima LG, Dishuck PC, Durbin R, Dvorkina T, Fiddes IT, Formenti G, Fulton RS, Fungtammasan A, Garrison E, Grady PG, Graves-Lindsay TA, Hall IM, Hansen NF, Hartley GA, Haukness M, Howe K, Hunkapiller MW, Jain C, Jain M, Jarvis ED, Kerpedjiev P, Kirsche M, Kolmogorov M, Korlach J, Kremitzki M, Li H, Maduro VV, Marschall T, McCartney AM, McDaniel J, Miller DE, Mullikin JC, Myers EW, Olson ND, Paten B, Peluso P, Pevzner PA, Porubsky D, Potapova T, Rogaev EI, Rosenfeld JA, Salzberg SL, Schneider VA, Sedlazeck FJ, Shafin K, Shew CJ, Shumate A, Sims Y, Smit AFA, Soto DC, Sović I, Storer JM, Streets A, Sullivan BA, Thibaud-Nissen F, Torrance J, Wagner J, Walenz BP, Wenger A, Wood JMD, Xiao C, Yan SM, Young AC, Zarate S, Surti U, McCoy RC, Dennis MY, Alexandrov IA, Gerton JL, O’Neill RJ, Timp W, Zook JM, Schatz MC, Eichler EE, Miga KH, Phillippy AM. The complete sequence of a human genome. Science 2022; 376:44-53. [PMID: 35357919 PMCID: PMC9186530 DOI: 10.1126/science.abj6987] [Citation(s) in RCA: 924] [Impact Index Per Article: 462.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Since its initial release in 2000, the human reference genome has covered only the euchromatic fraction of the genome, leaving important heterochromatic regions unfinished. Addressing the remaining 8% of the genome, the Telomere-to-Telomere (T2T) Consortium presents a complete 3.055 billion-base pair sequence of a human genome, T2T-CHM13, that includes gapless assemblies for all chromosomes except Y, corrects errors in the prior references, and introduces nearly 200 million base pairs of sequence containing 1956 gene predictions, 99 of which are predicted to be protein coding. The completed regions include all centromeric satellite arrays, recent segmental duplications, and the short arms of all five acrocentric chromosomes, unlocking these complex regions of the genome to variational and functional studies.
Collapse
Affiliation(s)
- Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
| | - Mikko Rautiainen
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
| | - Andrey V. Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California, San Diego; La Jolla, CA, USA
| | - Alla Mikheenko
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University; Saint Petersburg, Russia
| | - Mitchell R. Vollger
- Department of Genome Sciences, University of Washington School of Medicine; Seattle, WA, USA
| | - Nicolas Altemose
- Department of Bioengineering, University of California, Berkeley; Berkeley, CA, USA
| | - Lev Uralsky
- Sirius University of Science and Technology; Sochi, Russia
- Vavilov Institute of General Genetics; Moscow, Russia
| | - Ariel Gershman
- Department of Molecular Biology and Genetics, Johns Hopkins University; Baltimore, MD, USA
| | - Sergey Aganezov
- Department of Computer Science, Johns Hopkins University; Baltimore, MD, USA
| | - Savannah J. Hoyt
- Institute for Systems Genomics and Department of Molecular and Cell Biology, University of Connecticut; Storrs, CT, USA
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA, USA
| | - Glennis A. Logsdon
- Department of Genome Sciences, University of Washington School of Medicine; Seattle, WA, USA
| | - Michael Alonge
- Department of Computer Science, Johns Hopkins University; Baltimore, MD, USA
| | | | | | - Gerard G. Bouffard
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD, USA
| | - Shelise Y. Brooks
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD, USA
| | - Gina V. Caldas
- Department of Molecular and Cell Biology, University of California, Berkeley; Berkeley, CA, USA
| | - Nae-Chyun Chen
- Department of Computer Science, Johns Hopkins University; Baltimore, MD, USA
| | - Haoyu Cheng
- Department of Data Sciences, Dana-Farber Cancer Institute; Boston, MA
- Department of Biomedical Informatics, Harvard Medical School; Boston, MA
| | | | | | | | - Philip C. Dishuck
- Department of Genome Sciences, University of Washington School of Medicine; Seattle, WA, USA
| | - Richard Durbin
- Wellcome Sanger Institute; Cambridge, UK
- Department of Genetics, University of Cambridge; Cambridge, UK
| | - Tatiana Dvorkina
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University; Saint Petersburg, Russia
| | | | - Giulio Formenti
- Laboratory of Neurogenetics of Language and The Vertebrate Genome Lab, The Rockefeller University; New York, NY, USA
- Howard Hughes Medical Institute; Chevy Chase, MD, USA
| | - Robert S. Fulton
- Department of Genetics, Washington University School of Medicine; St. Louis, MO, USA
| | | | - Erik Garrison
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA, USA
- University of Tennessee Health Science Center; Memphis, TN, USA
| | - Patrick G.S. Grady
- Institute for Systems Genomics and Department of Molecular and Cell Biology, University of Connecticut; Storrs, CT, USA
| | | | - Ira M. Hall
- Department of Genetics, Yale University School of Medicine; New Haven, CT, USA
| | - Nancy F. Hansen
- Comparative Genomics Analysis Unit, Cancer Genetics and Comparative Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD, USA
| | - Gabrielle A. Hartley
- Institute for Systems Genomics and Department of Molecular and Cell Biology, University of Connecticut; Storrs, CT, USA
| | - Marina Haukness
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA, USA
| | | | | | - Chirag Jain
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
- Department of Computational and Data Sciences, Indian Institute of Science; Bangalore KA, India
| | - Miten Jain
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA, USA
| | - Erich D. Jarvis
- Laboratory of Neurogenetics of Language and The Vertebrate Genome Lab, The Rockefeller University; New York, NY, USA
- Howard Hughes Medical Institute; Chevy Chase, MD, USA
| | | | - Melanie Kirsche
- Department of Computer Science, Johns Hopkins University; Baltimore, MD, USA
| | - Mikhail Kolmogorov
- Department of Computer Science and Engineering, University of California, San Diego; San Diego, CA, USA
| | | | - Milinn Kremitzki
- McDonnell Genome Institute, Washington University in St. Louis; St. Louis, MO, USA
| | - Heng Li
- Department of Data Sciences, Dana-Farber Cancer Institute; Boston, MA
- Department of Biomedical Informatics, Harvard Medical School; Boston, MA
| | - Valerie V. Maduro
- Undiagnosed Diseases Program, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD, USA
| | - Tobias Marschall
- Heinrich Heine University Düsseldorf, Medical Faculty, Institute for Medical Biometry and Bioinformatics; Düsseldorf, Germany
| | - Ann M. McCartney
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
| | - Jennifer McDaniel
- Biosystems and Biomaterials Division, National Institute of Standards and Technology; Gaithersburg, MD, USA
| | - Danny E. Miller
- Department of Genome Sciences, University of Washington School of Medicine; Seattle, WA, USA
- Department of Pediatrics, Division of Genetic Medicine, University of Washington and Seattle Children’s Hospital; Seattle, WA, USA
| | - James C. Mullikin
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD, USA
- Comparative Genomics Analysis Unit, Cancer Genetics and Comparative Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD, USA
| | - Eugene W. Myers
- Max-Planck Institute of Molecular Cell Biology and Genetics; Dresden, Germany
| | - Nathan D. Olson
- Biosystems and Biomaterials Division, National Institute of Standards and Technology; Gaithersburg, MD, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA, USA
| | | | - Pavel A. Pevzner
- Department of Computer Science and Engineering, University of California, San Diego; San Diego, CA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine; Seattle, WA, USA
| | - Tamara Potapova
- Stowers Institute for Medical Research; Kansas City, MO, USA
| | - Evgeny I. Rogaev
- Sirius University of Science and Technology; Sochi, Russia
- Vavilov Institute of General Genetics; Moscow, Russia
- Department of Psychiatry, University of Massachusetts Medical School; Worcester, MA, USA
- Faculty of Biology, Lomonosov Moscow State University; Moscow, Russia
| | | | - Steven L. Salzberg
- Department of Computer Science, Johns Hopkins University; Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins University; Baltimore, MD, USA
| | - Valerie A. Schneider
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health; Bethesda, MD, USA
| | - Fritz J. Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine; Houston TX, USA
| | - Kishwar Shafin
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA, USA
| | - Colin J. Shew
- Genome Center, MIND Institute, Department of Biochemistry and Molecular Medicine, University of California, Davis; CA, USA
| | - Alaina Shumate
- Department of Biomedical Engineering, Johns Hopkins University; Baltimore, MD, USA
| | - Ying Sims
- Wellcome Sanger Institute; Cambridge, UK
| | | | - Daniela C. Soto
- Genome Center, MIND Institute, Department of Biochemistry and Molecular Medicine, University of California, Davis; CA, USA
| | - Ivan Sović
- Pacific Biosciences; Menlo Park, CA, USA
- Digital BioLogic d.o.o.; Ivanić-Grad, Croatia
| | | | - Aaron Streets
- Department of Bioengineering, University of California, Berkeley; Berkeley, CA, USA
- Chan Zuckerberg Biohub; San Francisco, CA, USA
| | - Beth A. Sullivan
- Department of Molecular Genetics and Microbiology, Duke University School of Medicine; Durham, NC, USA
| | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health; Bethesda, MD, USA
| | | | - Justin Wagner
- Biosystems and Biomaterials Division, National Institute of Standards and Technology; Gaithersburg, MD, USA
| | - Brian P. Walenz
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
| | | | | | - Chunlin Xiao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health; Bethesda, MD, USA
| | - Stephanie M. Yan
- Department of Biology, Johns Hopkins University; Baltimore, MD, USA
| | - Alice C. Young
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD, USA
| | - Samantha Zarate
- Department of Computer Science, Johns Hopkins University; Baltimore, MD, USA
| | - Urvashi Surti
- Department of Pathology, University of Pittsburgh; Pittsburgh, PA, USA
| | - Rajiv C. McCoy
- Department of Biology, Johns Hopkins University; Baltimore, MD, USA
| | - Megan Y. Dennis
- Genome Center, MIND Institute, Department of Biochemistry and Molecular Medicine, University of California, Davis; CA, USA
| | - Ivan A. Alexandrov
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University; Saint Petersburg, Russia
- Vavilov Institute of General Genetics; Moscow, Russia
- Research Center of Biotechnology of the Russian Academy of Sciences; Moscow, Russia
| | - Jennifer L. Gerton
- Stowers Institute for Medical Research; Kansas City, MO, USA
- Department of Biochemistry and Molecular Biology, University of Kansas Medical School; Kansas City, MO, USA
| | - Rachel J. O’Neill
- Institute for Systems Genomics and Department of Molecular and Cell Biology, University of Connecticut; Storrs, CT, USA
| | - Winston Timp
- Department of Molecular Biology and Genetics, Johns Hopkins University; Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins University; Baltimore, MD, USA
| | - Justin M. Zook
- Biosystems and Biomaterials Division, National Institute of Standards and Technology; Gaithersburg, MD, USA
| | - Michael C. Schatz
- Department of Computer Science, Johns Hopkins University; Baltimore, MD, USA
- Department of Biology, Johns Hopkins University; Baltimore, MD, USA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine; Seattle, WA, USA
- Howard Hughes Medical Institute; Chevy Chase, MD, USA
| | - Karen H. Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA, USA
- Department of Biomolecular Engineering, University of California Santa Cruz, CA, USA
| | - Adam M. Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
| |
Collapse
|
38
|
Fedarko MW, Kolmogorov M, Pevzner PA. Analyzing rare mutations in metagenomes assembled using long and accurate reads. Genome Res 2022; 32:2119-2133. [PMID: 36418060 PMCID: PMC9808630 DOI: 10.1101/gr.276917.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Accepted: 11/16/2022] [Indexed: 11/25/2022]
Abstract
The advent of long and accurate "HiFi" reads has greatly improved our ability to generate complete metagenome-assembled genomes (MAGs), enabling "complete metagenomics" studies that were nearly impossible to conduct with short reads. In particular, HiFi reads simplify the identification and phasing of mutations in MAGs: It is increasingly feasible to distinguish between positions that are prone to mutations and positions that rarely ever mutate, and to identify co-occurring groups of mutations. However, the problems of identifying rare mutations in MAGs, estimating the false-discovery rate (FDR) of these identifications, and phasing identified mutations remain open in the context of HiFi data. We present strainFlye, a pipeline for the FDR-controlled identification and analysis of rare mutations in MAGs assembled using HiFi reads. We show that deep HiFi sequencing has the potential to reveal and phase tens of thousands of rare mutations in a single MAG, identify hotspots and coldspots of these mutations, and detail MAGs' growth dynamics.
Collapse
Affiliation(s)
- Marcus W. Fedarko
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, California 92093, USA;,Center for Microbiome Innovation, University of California San Diego, La Jolla, California 92093, USA
| | - Mikhail Kolmogorov
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, California 92093, USA;,Center for Microbiome Innovation, University of California San Diego, La Jolla, California 92093, USA;,UC Santa Cruz Genomics Institute, Santa Cruz, California 95064, USA
| | - Pavel A. Pevzner
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, California 92093, USA;,Center for Microbiome Innovation, University of California San Diego, La Jolla, California 92093, USA
| |
Collapse
|
39
|
Bzikadze AV, Mikheenko A, Pevzner PA. Fast and accurate mapping of long reads to complete genome assemblies with VerityMap. Genome Res 2022; 32:2107-2118. [PMID: 36379716 PMCID: PMC9808623 DOI: 10.1101/gr.276871.122] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Accepted: 11/09/2022] [Indexed: 11/16/2022]
Abstract
Recent advancements in long-read sequencing have enabled the telomere-to-telomere (complete) assembly of a human genome and are now contributing to the haplotype-resolved complete assemblies of multiple human genomes. Because the accuracy of read mapping tools deteriorates in highly repetitive regions, there is a need to develop accurate, error-exposing (detecting potential assembly errors), and diploid-aware (distinguishing different haplotypes) tools for read mapping in complete assemblies. We describe the first accurate, error-exposing, and partially diploid-aware VerityMap tool for long-read mapping to complete assemblies.
Collapse
Affiliation(s)
- Andrey V. Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California, San Diego, California 92093, USA
| | - Alla Mikheenko
- Center for Algorithmic Biotechnology, Saint Petersburg State University, Saint Petersburg, 199034, Russia
| | - Pavel A. Pevzner
- Department of Computer Science and Engineering, University of California, San Diego, California 92093, USA
| |
Collapse
|