1
|
Huang S, Shi W, Li S, Fan Q, Yang C, Cao J, Wu L. Advanced sequencing-based high-throughput and long-read single-cell transcriptome analysis. Lab Chip 2024; 24:2601-2621. [PMID: 38669201 DOI: 10.1039/d4lc00105b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/28/2024]
Abstract
Cells are the fundamental building blocks of living systems, exhibiting significant heterogeneity. The transcriptome connects the cellular genotype and phenotype, and profiling single-cell transcriptomes is critical for uncovering distinct cell types, states, and the interplay between cells in development, health, and disease. Nevertheless, single-cell transcriptome analysis faces daunting challenges due to the low abundance and diverse nature of RNAs in individual cells, as well as their heterogeneous expression. The advent and continuous advancements of next-generation sequencing (NGS) and third-generation sequencing (TGS) technologies have solved these problems and facilitated the high-throughput, sensitive, full-length, and rapid profiling of single-cell RNAs. In this review, we provide a broad introduction to current methodologies for single-cell transcriptome sequencing. First, state-of-the-art advancements in high-throughput and full-length single-cell RNA sequencing (scRNA-seq) platforms using NGS are reviewed. Next, TGS-based long-read scRNA-seq methods are summarized. Finally, a brief conclusion and perspectives for comprehensive single-cell transcriptome analysis are discussed.
Collapse
Affiliation(s)
- Shanqing Huang
- Discipline of Intelligent Instrument and Equipment, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Weixiong Shi
- Discipline of Intelligent Instrument and Equipment, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Shiyu Li
- Discipline of Intelligent Instrument and Equipment, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Qian Fan
- Institute of Molecular Medicine, Shanghai Key Laboratory for Nucleic Acid Chemistry and Nanomedicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai 200127, China.
| | - Chaoyong Yang
- Institute of Molecular Medicine, Shanghai Key Laboratory for Nucleic Acid Chemistry and Nanomedicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai 200127, China.
- Discipline of Intelligent Instrument and Equipment, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Jiao Cao
- Institute of Molecular Medicine, Shanghai Key Laboratory for Nucleic Acid Chemistry and Nanomedicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai 200127, China.
| | - Lingling Wu
- Institute of Molecular Medicine, Shanghai Key Laboratory for Nucleic Acid Chemistry and Nanomedicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai 200127, China.
| |
Collapse
|
2
|
Logsdon GA, Rozanski AN, Ryabov F, Potapova T, Shepelev VA, Catacchio CR, Porubsky D, Mao Y, Yoo D, Rautiainen M, Koren S, Nurk S, Lucas JK, Hoekzema K, Munson KM, Gerton JL, Phillippy AM, Ventura M, Alexandrov IA, Eichler EE. The variation and evolution of complete human centromeres. Nature 2024; 629:136-145. [PMID: 38570684 PMCID: PMC11062924 DOI: 10.1038/s41586-024-07278-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Accepted: 03/07/2024] [Indexed: 04/05/2024]
Abstract
Human centromeres have been traditionally very difficult to sequence and assemble owing to their repetitive nature and large size1. As a result, patterns of human centromeric variation and models for their evolution and function remain incomplete, despite centromeres being among the most rapidly mutating regions2,3. Here, using long-read sequencing, we completely sequenced and assembled all centromeres from a second human genome and compared it to the finished reference genome4,5. We find that the two sets of centromeres show at least a 4.1-fold increase in single-nucleotide variation when compared with their unique flanks and vary up to 3-fold in size. Moreover, we find that 45.8% of centromeric sequence cannot be reliably aligned using standard methods owing to the emergence of new α-satellite higher-order repeats (HORs). DNA methylation and CENP-A chromatin immunoprecipitation experiments show that 26% of the centromeres differ in their kinetochore position by >500 kb. To understand evolutionary change, we selected six chromosomes and sequenced and assembled 31 orthologous centromeres from the common chimpanzee, orangutan and macaque genomes. Comparative analyses reveal a nearly complete turnover of α-satellite HORs, with characteristic idiosyncratic changes in α-satellite HORs for each species. Phylogenetic reconstruction of human haplotypes supports limited to no recombination between the short (p) and long (q) arms across centromeres and reveals that novel α-satellite HORs share a monophyletic origin, providing a strategy to estimate the rate of saltatory amplification and mutation of human centromeric DNA.
Collapse
Affiliation(s)
- Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Department of Genetics, Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Allison N Rozanski
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Fedor Ryabov
- Masters Program in National Research University Higher School of Economics, Moscow, Russia
| | - Tamara Potapova
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | | | - Claudia R Catacchio
- Department of Biosciences, Biotechnology and Environment, University of Bari Aldo Moro, Bari, Italy
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Yafei Mao
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - DongAhn Yoo
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Mikko Rautiainen
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- Institute for Molecular Medicine Finland (FIMM), Helsinki Institute of Life Science (HiLIFE), University of Helsinki, Helsinki, Finland
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- Oxford Nanopore Technologies, Oxford, United Kingdom
| | - Julian K Lucas
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Mario Ventura
- Department of Biosciences, Biotechnology and Environment, University of Bari Aldo Moro, Bari, Italy
| | - Ivan A Alexandrov
- Department of Human Molecular Genetics and Biochemistry, Tel Aviv University, Tel Aviv, Israel
- Department of Anatomy and Anthropology, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
- Dan David Center for Human Evolution and Biohistory Research, Tel Aviv University, Tel Aviv, Israel
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA.
| |
Collapse
|
3
|
Kumar KR, Cowley MJ, Davis RL. Next-Generation Sequencing and Emerging Technologies. Semin Thromb Hemost 2024. [PMID: 38692283 DOI: 10.1055/s-0044-1786397] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/03/2024]
Abstract
Genetic sequencing technologies are evolving at a rapid pace with major implications for research and clinical practice. In this review, the authors provide an updated overview of next-generation sequencing (NGS) and emerging methodologies. NGS has tremendously improved sequencing output while being more time and cost-efficient in comparison to Sanger sequencing. The authors describe short-read sequencing approaches, such as sequencing by synthesis, ion semiconductor sequencing, and nanoball sequencing. Third-generation long-read sequencing now promises to overcome many of the limitations of short-read sequencing, such as the ability to reliably resolve repeat sequences and large genomic rearrangements. By combining complementary methods with massively parallel DNA sequencing, a greater insight into the biological context of disease mechanisms is now possible. Emerging methodologies, such as advances in nanopore technology, in situ nucleic acid sequencing, and microscopy-based sequencing, will continue the rapid evolution of this area. These new technologies hold many potential applications for hematological disorders, with the promise of precision and personalized medical care in the future.
Collapse
Affiliation(s)
- Kishore R Kumar
- Translational Genomics Group, Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, Darlinghurst, New South Wales, Australia
- Department of Neurogenetics, Kolling Institute, University of Sydney and Royal North Shore Hospital, St Leonards, New South Wales, Australia
- Molecular Medicine Laboratory, Concord Hospital, Sydney, Australia
| | - Mark J Cowley
- Translational Genomics Group, Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, Darlinghurst, New South Wales, Australia
- Computational Biology Group, Children's Cancer Institute, University of New South Wales, Randwick, New South Wales, Australia
| | - Ryan L Davis
- Translational Genomics Group, Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, Darlinghurst, New South Wales, Australia
- Department of Neurogenetics, Kolling Institute, University of Sydney and Royal North Shore Hospital, St Leonards, New South Wales, Australia
| |
Collapse
|
4
|
Li H, Durbin R. Genome assembly in the telomere-to-telomere era. Nat Rev Genet 2024:10.1038/s41576-024-00718-w. [PMID: 38649458 DOI: 10.1038/s41576-024-00718-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/27/2024] [Indexed: 04/25/2024]
Abstract
Genome sequences largely determine the biology and encode the history of an organism, and de novo assembly - the process of reconstructing the genome sequence of an organism from sequencing reads - has been a central problem in bioinformatics for four decades. Until recently, genomes were typically assembled into fragments of a few megabases at best, but now technological advances in long-read sequencing enable the near-complete assembly of each chromosome - also known as telomere-to-telomere assembly - for many organisms. Here, we review recent progress on assembly algorithms and protocols, with a focus on how to derive near-telomere-to-telomere assemblies. We also discuss the additional developments that will be required to resolve remaining assembly gaps and to assemble non-diploid genomes.
Collapse
Affiliation(s)
- Heng Li
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA.
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| | - Richard Durbin
- Department of Genetics, Cambridge University, Cambridge, UK.
| |
Collapse
|
5
|
Porubsky D, Eichler EE. A 25-year odyssey of genomic technology advances and structural variant discovery. Cell 2024; 187:1024-1037. [PMID: 38290514 DOI: 10.1016/j.cell.2024.01.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Revised: 12/20/2023] [Accepted: 01/02/2024] [Indexed: 02/01/2024]
Abstract
This perspective focuses on advances in genome technology over the last 25 years and their impact on germline variant discovery within the field of human genetics. The field has witnessed tremendous technological advances from microarrays to short-read sequencing and now long-read sequencing. Each technology has provided genome-wide access to different classes of human genetic variation. We are now on the verge of comprehensive variant detection of all forms of variation for the first time with a single assay. We predict that this transition will further transform our understanding of human health and biology and, more importantly, provide novel insights into the dynamic mutational processes shaping our genomes.
Collapse
Affiliation(s)
- David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA; Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
6
|
Firdaus Z, Li X. Unraveling the Genetic Landscape of Neurological Disorders: Insights into Pathogenesis, Techniques for Variant Identification, and Therapeutic Approaches. Int J Mol Sci 2024; 25:2320. [PMID: 38396996 PMCID: PMC10889342 DOI: 10.3390/ijms25042320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 02/09/2024] [Accepted: 02/13/2024] [Indexed: 02/25/2024] Open
Abstract
Genetic abnormalities play a crucial role in the development of neurodegenerative disorders (NDDs). Genetic exploration has indeed contributed to unraveling the molecular complexities responsible for the etiology and progression of various NDDs. The intricate nature of rare and common variants in NDDs contributes to a limited understanding of the genetic risk factors associated with them. Advancements in next-generation sequencing have made whole-genome sequencing and whole-exome sequencing possible, allowing the identification of rare variants with substantial effects, and improving the understanding of both Mendelian and complex neurological conditions. The resurgence of gene therapy holds the promise of targeting the etiology of diseases and ensuring a sustained correction. This approach is particularly enticing for neurodegenerative diseases, where traditional pharmacological methods have fallen short. In the context of our exploration of the genetic epidemiology of the three most prevalent NDDs-amyotrophic lateral sclerosis, Alzheimer's disease, and Parkinson's disease, our primary goal is to underscore the progress made in the development of next-generation sequencing. This progress aims to enhance our understanding of the disease mechanisms and explore gene-based therapies for NDDs. Throughout this review, we focus on genetic variations, methodologies for their identification, the associated pathophysiology, and the promising potential of gene therapy. Ultimately, our objective is to provide a comprehensive and forward-looking perspective on the emerging research arena of NDDs.
Collapse
Affiliation(s)
- Zeba Firdaus
- Department of Internal Medicine, Mayo Clinic, Rochester, MN 55905, USA;
- Department of Biochemistry and Molecular Biology, Mayo Clinic, Rochester, MN 55905, USA
| | - Xiaogang Li
- Department of Internal Medicine, Mayo Clinic, Rochester, MN 55905, USA;
- Department of Biochemistry and Molecular Biology, Mayo Clinic, Rochester, MN 55905, USA
| |
Collapse
|
7
|
Audano PA, Beck CR. Small polymorphisms are a source of ancestral bias in structural variant breakpoint placement. Genome Res 2024; 34:7-19. [PMID: 38176712 PMCID: PMC10904011 DOI: 10.1101/gr.278203.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Accepted: 01/02/2024] [Indexed: 01/06/2024]
Abstract
High-quality genome assemblies and sophisticated algorithms have increased sensitivity for a wide range of variant types, and breakpoint accuracy for structural variants (SVs, ≥50 bp) has improved to near base pair precision. Despite these advances, many SV breakpoint locations are subject to systematic bias affecting variant representation. To understand why SV breakpoints are inconsistent across samples, we reanalyzed 64 phased haplotypes constructed from long-read assemblies released by the Human Genome Structural Variation Consortium (HGSVC). We identify 882 SV insertions and 180 SV deletions with variable breakpoints not anchored in tandem repeats (TRs) or segmental duplications (SDs). SVs called from aligned sequencing reads increase breakpoint disagreements by 2×-16×. Sequence accuracy had a minimal impact on breakpoints, but we observe a strong effect of ancestry. We confirm that SNP and indel polymorphisms are enriched at shifted breakpoints and are also absent from variant callsets. Breakpoint homology increases the likelihood of imprecise SV calls and the distance they are shifted, and tandem duplications are the most heavily affected SVs. Because graph genome methods normalize SV calls across samples, we investigated graphs generated by two different methods and find the resulting breakpoints are subject to other technical biases affecting breakpoint accuracy. The breakpoint inconsistencies we characterize affect ∼5% of the SVs called in a human genome and can impact variant interpretation and annotation. These limitations underscore a need for algorithm development to improve SV databases, mitigate the impact of ancestry on breakpoints, and increase the value of callsets for investigating breakpoint features.
Collapse
Affiliation(s)
- Peter A Audano
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut 06032, USA
| | - Christine R Beck
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut 06032, USA;
- Department of Genetics and Genome Sciences, Institute for Systems Genomics, University of Connecticut Health Center, Farmington, Connecticut 06030, USA
| |
Collapse
|
8
|
Smeds L, Huson LSA, Ellegren H. Structural genomic variation in the inbred Scandinavian wolf population contributes to the realized genetic load but is positively affected by immigration. Evol Appl 2024; 17:e13652. [PMID: 38333557 PMCID: PMC10848878 DOI: 10.1111/eva.13652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 01/08/2024] [Accepted: 01/16/2024] [Indexed: 02/10/2024] Open
Abstract
When populations decrease in size and may become isolated, genomic erosion by loss of diversity from genetic drift and accumulation of deleterious mutations is likely an inevitable consequence. In such cases, immigration (genetic rescue) is necessary to restore levels of genetic diversity and counteract inbreeding depression. Recent work in conservation genomics has studied these processes focusing on the genetic diversity of single nucleotide polymorphisms. In contrast, our knowledge about structural genomic variation (insertions, deletions, duplications and inversions) in endangered species is limited. We analysed whole-genome, short-read sequences from 212 wolves from the inbred Scandinavian population and from neighbouring populations in Finland and Russia, and detected >35,000 structural variants (SVs) after stringent quality and genotype frequency filtering; >26,000 high-confidence variants remained after manual curation. The majority of variants were shorter than 1 kb, with a distinct peak in the length distribution of deletions at 190 bp, corresponding to insertion events of SINE/tRNA-Lys elements. The site frequency spectrum of SVs in protein-coding regions was significantly shifted towards rare alleles compared to putatively neutral variants, consistent with purifying selection. The realized genetic load of SVs in protein-coding regions increased with inbreeding levels in the Scandinavian population, but immigration provided a genetic rescue effect by lowering the load and reintroducing ancestral alleles at loci fixed for derived SVs. Our study shows that structural variation comprises a common type of in part deleterious mutations in endangered species and that establishing gene flow is necessary to mitigate the negative consequences of loss of diversity.
Collapse
Affiliation(s)
- Linnéa Smeds
- Department of Ecology and Genetics, Evolutionary BiologyUppsala UniversityUppsalaSweden
| | - Lars S. A. Huson
- Department of Ecology and Genetics, Evolutionary BiologyUppsala UniversityUppsalaSweden
| | - Hans Ellegren
- Department of Ecology and Genetics, Evolutionary BiologyUppsala UniversityUppsalaSweden
| |
Collapse
|
9
|
Mackinnon AC, Chandrashekar DS, Suster DI. Molecular pathology as basis for timely cancer diagnosis and therapy. Virchows Arch 2024; 484:155-168. [PMID: 38012424 DOI: 10.1007/s00428-023-03707-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Revised: 10/16/2023] [Accepted: 11/08/2023] [Indexed: 11/29/2023]
Abstract
Precision and personalized therapeutics have witnessed significant advancements in technology, revolutionizing the capabilities of laboratories to generate vast amounts of genetic data. Coupled with computational resources for analysis and interpretation, and integrated with various other types of data, including genomic data, electronic medical health (EMH) data, and clinical knowledge, these advancements support optimized health decisions. Among these technologies, next-generation sequencing (NGS) stands out as a transformative tool in the field of cancer treatment, playing a crucial role in precision oncology. NGS-based workflows are employed across a range of applications, including gene panels, exome sequencing, and whole-genome sequencing, supporting comprehensive analysis of the entire cancer genome, including mutations, copy number variations, gene expression profiles, and epigenetic modifications. By utilizing the power of NGS, these workflows contribute to enhancing our understanding of disease mechanisms, diagnosis confirmation, identifying therapeutic targets, and guiding personalized treatment decisions. This manuscript explores the diverse applications of NGS in cancer treatment, highlighting its significance in guiding diagnosis and treatment decisions, identifying therapeutic targets, monitoring disease progression, and improving patient outcomes.
Collapse
Affiliation(s)
- A Craig Mackinnon
- Department of Pathology, University of Alabama at Birmingham, 619 19Th Street South, Birmingham, AL, 35249, USA.
| | | | - David I Suster
- Department of Pathology, Rutgers University New Jersey Medical School, 150 Bergen Street, Newark, NJ, 07103, USA.
| |
Collapse
|
10
|
Yang L, Metzger GA, Padilla Del Valle R, Delgadillo Rubalcaba D, McLaughlin RN. Evolutionary insights from profiling LINE-1 activity at allelic resolution in a single human genome. EMBO J 2024; 43:112-131. [PMID: 38177314 PMCID: PMC10883270 DOI: 10.1038/s44318-023-00007-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Revised: 10/18/2023] [Accepted: 11/10/2023] [Indexed: 01/06/2024] Open
Abstract
Transposable elements have created the majority of the sequence in many genomes. In mammals, LINE-1 retrotransposons have been expanding for more than 100 million years as distinct, consecutive lineages; however, the drivers of this recurrent lineage emergence and disappearance are unknown. Most human genome assemblies provide a record of this ancient evolution, but fail to resolve ongoing LINE-1 retrotranspositions. Utilizing the human CHM1 long-read-based haploid assembly, we identified and cloned all full-length, intact LINE-1s, and found 29 LINE-1s with measurable in vitro retrotransposition activity. Among individuals, these LINE-1s varied in their presence, their allelic sequences, and their activity. We found that recently retrotransposed LINE-1s tend to be active in vitro and polymorphic in the population relative to more ancient LINE-1s. However, some rare allelic forms of old LINE-1s retain activity, suggesting older lineages can persist longer than expected. Finally, in LINE-1s with in vitro activity and in vivo fitness, we identified mutations that may have increased replication in ancient genomes and may prove promising candidates for mechanistic investigations of the drivers of LINE-1 evolution and which LINE-1 sequences contribute to human disease.
Collapse
Affiliation(s)
- Lei Yang
- Pacific Northwest Research Institute, Seattle, WA, USA
| | | | - Ricky Padilla Del Valle
- Pacific Northwest Research Institute, Seattle, WA, USA
- Molecular and Cellular Biology Graduate Program, University of Washington, Seattle, WA, USA
| | | | - Richard N McLaughlin
- Pacific Northwest Research Institute, Seattle, WA, USA.
- Molecular and Cellular Biology Graduate Program, University of Washington, Seattle, WA, USA.
| |
Collapse
|
11
|
Volpe E, Corda L, Tommaso ED, Pelliccia F, Ottalevi R, Licastro D, Guarracino A, Capulli M, Formenti G, Tassone E, Giunta S. The complete diploid reference genome of RPE-1 identifies human phased epigenetic landscapes. bioRxiv 2023:2023.11.01.565049. [PMID: 38168337 PMCID: PMC10760208 DOI: 10.1101/2023.11.01.565049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2024]
Abstract
Comparative analysis of recent human genome assemblies highlights profound sequence divergence that peaks within polymorphic loci such as centromeres. This raises the question about the adequacy of relying on human reference genomes to accurately analyze sequencing data derived from experimental cell lines. Here, we generated the complete diploid genome assembly for the human retinal epithelial cells (RPE-1), a widely used non-cancer laboratory cell line with a stable karyotype, to use as matched reference for multi-omics sequencing data analysis. Our RPE1v1.0 assembly presents completely phased haplotypes and chromosome-level scaffolds that span centromeres with ultra-high base accuracy (>QV60). We mapped the haplotype-specific genomic variation specific to this cell line including t(Xq;10q), a stable 73.18 Mb duplication of chromosome 10 translocated onto the microdeleted chromosome X telomere t(Xq;10q). Polymorphisms between haplotypes of the same genome reveals genetic and epigenetic variation for all chromosomes, especially at centromeres. The RPE-1 assembly as matched reference genome improves mapping quality of multi-omics reads originating from RPE-1 cells with drastic reduction in alignments mismatches compared to using the most complete human reference to date (CHM13). Leveraging the accuracy achieved using a matched reference, we were able to identify the kinetochore sites at base pair resolution and show unprecedented variation between haplotypes. This work showcases the use of matched reference genomes for multiomics analyses and serves as the foundation for a call to comprehensively assemble experimentally relevant cell lines for widespread application.
Collapse
Affiliation(s)
- Emilia Volpe
- Giunta Laboratory of Genome Evolution, Department of Biology and Biotechnologies Charles Darwin, University of Rome “Sapienza”, Piazzale Aldo Moro 5, 00185 Rome, Italy
| | - Luca Corda
- Giunta Laboratory of Genome Evolution, Department of Biology and Biotechnologies Charles Darwin, University of Rome “Sapienza”, Piazzale Aldo Moro 5, 00185 Rome, Italy
| | - Elena Di Tommaso
- Giunta Laboratory of Genome Evolution, Department of Biology and Biotechnologies Charles Darwin, University of Rome “Sapienza”, Piazzale Aldo Moro 5, 00185 Rome, Italy
| | - Franca Pelliccia
- Giunta Laboratory of Genome Evolution, Department of Biology and Biotechnologies Charles Darwin, University of Rome “Sapienza”, Piazzale Aldo Moro 5, 00185 Rome, Italy
| | - Riccardo Ottalevi
- Department of Bioinformatic, Dante Genomics Corp Inc., 667 Madison Avenue, New York, NY 10065 USA and S.s.17, 67100, L’Aquila, Italy
| | | | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Mattia Capulli
- Department of Biotechnological and Applied Clinical Sciences, University of L’Aquila, L’Aquila, Italy
| | - Giulio Formenti
- The Rockefeller University, 1230 York Avenue, 10065 New York, USA
| | - Evelyne Tassone
- Giunta Laboratory of Genome Evolution, Department of Biology and Biotechnologies Charles Darwin, University of Rome “Sapienza”, Piazzale Aldo Moro 5, 00185 Rome, Italy
| | - Simona Giunta
- Giunta Laboratory of Genome Evolution, Department of Biology and Biotechnologies Charles Darwin, University of Rome “Sapienza”, Piazzale Aldo Moro 5, 00185 Rome, Italy
| |
Collapse
|
12
|
Harvey WT, Ebert P, Ebler J, Audano PA, Munson KM, Hoekzema K, Porubsky D, Beck CR, Marschall T, Garimella K, Eichler EE. Whole-genome long-read sequencing downsampling and its effect on variant-calling precision and recall. Genome Res 2023; 33:2029-2040. [PMID: 38190646 PMCID: PMC10760522 DOI: 10.1101/gr.278070.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 11/03/2023] [Indexed: 01/10/2024]
Abstract
Advances in long-read sequencing (LRS) technologies continue to make whole-genome sequencing more complete, affordable, and accurate. LRS provides significant advantages over short-read sequencing approaches, including phased de novo genome assembly, access to previously excluded genomic regions, and discovery of more complex structural variants (SVs) associated with disease. Limitations remain with respect to cost, scalability, and platform-dependent read accuracy and the tradeoffs between sequence coverage and sensitivity of variant discovery are important experimental considerations for the application of LRS. We compare the genetic variant-calling precision and recall of Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio) HiFi platforms over a range of sequence coverages. For read-based applications, LRS sensitivity begins to plateau around 12-fold coverage with a majority of variants called with reasonable accuracy (F1 score above 0.5), and both platforms perform well for SV detection. Genome assembly increases variant-calling precision and recall of SVs and indels in HiFi data sets with HiFi outperforming ONT in quality as measured by the F1 score of assembly-based variant call sets. While both technologies continue to evolve, our work offers guidance to design cost-effective experimental strategies that do not compromise on discovering novel biology.
Collapse
Affiliation(s)
- William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195-5065, USA
| | - Peter Ebert
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, 40225 Düsseldorf, Germany
- Core Unit Bioinformatics, Medical Faculty, Heinrich Heine University, 40225 Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, 40225 Düsseldorf, Germany
| | - Jana Ebler
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, 40225 Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, 40225 Düsseldorf, Germany
| | - Peter A Audano
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut 06032, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195-5065, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195-5065, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195-5065, USA
| | - Christine R Beck
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut 06032, USA
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, Connecticut 06030-6403, USA
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, 40225 Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, 40225 Düsseldorf, Germany
| | - Kiran Garimella
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195-5065, USA;
- Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
13
|
Magi A, Mattei G, Mingrino A, Caprioli C, Ronchini C, Frigè G, Semeraro R, Baragli M, Bolognini D, Colombo E, Mazzarella L, Pelicci PG. GASOLINE: detecting germline and somatic structural variants from long-reads data. Sci Rep 2023; 13:20817. [PMID: 38012350 PMCID: PMC10682169 DOI: 10.1038/s41598-023-48285-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Accepted: 11/24/2023] [Indexed: 11/29/2023] Open
Abstract
Long-read sequencing allows analyses of single nucleic-acid molecules and produces sequences in the order of tens to hundreds kilobases. Its application to whole-genome analyses allows identification of complex genomic structural-variants (SVs) with unprecedented resolution. SV identification, however, requires complex computational methods, based on either read-depth or intra- and inter-alignment signatures approaches, which are limited by size or type of SVs. Moreover, most currently available tools only detect germline variants, thus requiring separate computation of sample pairs for comparative analyses. To overcome these limits, we developed a novel tool (Germline And SOmatic structuraL varIants detectioN and gEnotyping; GASOLINE) that groups SV signatures using a sophisticated clustering procedure based on a modified reciprocal overlap criterion, and is designed to identify germline SVs, from single samples, and somatic SVs from paired test and control samples. GASOLINE is a collection of Perl, R and Fortran codes, it analyzes aligned data in BAM format and produces VCF files with statistically significant somatic SVs. Germline or somatic analysis of 30[Formula: see text] sequencing coverage experiments requires 4-5 h with 20 threads. GASOLINE outperformed currently available methods in the detection of both germline and somatic SVs in synthetic and real long-reads datasets. Notably, when applied on a pair of metastatic melanoma and matched-normal sample, GASOLINE identified five genuine somatic SVs that were missed using five different sequencing technologies and state-of-the art SV calling approaches. Thus, GASOLINE identifies germline and somatic SVs with unprecedented accuracy and resolution, outperforming currently available state-of-the-art WGS long-reads computational methods.
Collapse
Affiliation(s)
- Alberto Magi
- Department of Information Engineering, University of Florence, 50100, Florence, Italy.
- Institute for Biomedical Technologies, National Research Council, Segrate, Milan, Italy.
| | - Gianluca Mattei
- Department of Information Engineering, University of Florence, 50100, Florence, Italy
| | - Alessandra Mingrino
- Department of Experimental and Clinical Medicine, University of Florence, Florence, Italy
| | - Chiara Caprioli
- Department of Experimental Oncology, IEO European Institute of Oncology IRCCS, Milan, Italy
- Department of Oncology and Hemato-Oncology, University of Milan, Milan, Italy
| | - Chiara Ronchini
- Department of Experimental Oncology, IEO European Institute of Oncology IRCCS, Milan, Italy
| | - Gianmaria Frigè
- Department of Experimental Oncology, IEO European Institute of Oncology IRCCS, Milan, Italy
- Department of Oncology and Hemato-Oncology, University of Milan, Milan, Italy
| | - Roberto Semeraro
- Department of Experimental and Clinical Medicine, University of Florence, Florence, Italy
| | - Marta Baragli
- Department of Information Engineering, University of Florence, 50100, Florence, Italy
| | - Davide Bolognini
- Department of Experimental and Clinical Medicine, University of Florence, Florence, Italy
| | - Emanuela Colombo
- Department of Experimental Oncology, IEO European Institute of Oncology IRCCS, Milan, Italy
- Department of Oncology and Hemato-Oncology, University of Milan, Milan, Italy
| | - Luca Mazzarella
- Department of Experimental Oncology, IEO European Institute of Oncology IRCCS, Milan, Italy
| | - Pier Giuseppe Pelicci
- Department of Experimental Oncology, IEO European Institute of Oncology IRCCS, Milan, Italy.
- Department of Oncology and Hemato-Oncology, University of Milan, Milan, Italy.
| |
Collapse
|
14
|
Miga KH, Eichler EE. Envisioning a new era: Complete genetic information from routine, telomere-to-telomere genomes. Am J Hum Genet 2023; 110:1832-1840. [PMID: 37922882 PMCID: PMC10645551 DOI: 10.1016/j.ajhg.2023.09.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 09/19/2023] [Accepted: 09/20/2023] [Indexed: 11/07/2023] Open
Abstract
Advances in long-read sequencing and assembly now mean that individual labs can generate phased genomes that are more accurate and more contiguous than the original human reference genome. With declining costs and increasing democratization of technology, we suggest that complete genome assemblies, where both parental haplotypes are phased telomere to telomere, will become standard in human genetics. Soon, even in clinical settings where rigorous sample-handling standards must be met, affected individuals could have reference-grade genomes fully sequenced and assembled in just a few hours given advances in technology, computational processing, and annotation. Complete genetic variant discovery will transform how we map, catalog, and associate variation with human disease and fundamentally change our understanding of the genetic diversity of all humans.
Collapse
Affiliation(s)
- Karen H Miga
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA.
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA; Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
15
|
Tesi B, Boileau C, Boycott KM, Canaud G, Caulfield M, Choukair D, Hill S, Spielmann M, Wedell A, Wirta V, Nordgren A, Lindstrand A. Precision medicine in rare diseases: What is next? J Intern Med 2023; 294:397-412. [PMID: 37211972 DOI: 10.1111/joim.13655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
Molecular diagnostics is a cornerstone of modern precision medicine, broadly understood as tailoring an individual's treatment, follow-up, and care based on molecular data. In rare diseases (RDs), molecular diagnoses reveal valuable information about the cause of symptoms, disease progression, familial risk, and in certain cases, unlock access to targeted therapies. Due to decreasing DNA sequencing costs, genome sequencing (GS) is emerging as the primary method for precision diagnostics in RDs. Several ongoing European initiatives for precision medicine have chosen GS as their method of choice. Recent research supports the role for GS as first-line genetic investigation in individuals with suspected RD, due to its improved diagnostic yield compared to other methods. Moreover, GS can detect a broad range of genetic aberrations including those in noncoding regions, producing comprehensive data that can be periodically reanalyzed for years to come when further evidence emerges. Indeed, targeted drug development and repurposing of medicines can be accelerated as more individuals with RDs receive a molecular diagnosis. Multidisciplinary teams in which clinical specialists collaborate with geneticists, genomics education of professionals and the public, and dialogue with patient advocacy groups are essential elements for the integration of precision medicine into clinical practice worldwide. It is also paramount that large research projects share genetic data and leverage novel technologies to fully diagnose individuals with RDs. In conclusion, GS increases diagnostic yields and is a crucial step toward precision medicine for RDs. Its clinical implementation will enable better patient management, unlock targeted therapies, and guide the development of innovative treatments.
Collapse
Affiliation(s)
- Bianca Tesi
- Department of Molecular Medicine and Surgery and Centre of Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden
- Center for Hematology and Regenerative Medicine, Department of Medicine Huddinge, Karolinska Institutet, Stockholm, Sweden
| | - Catherine Boileau
- Département de Génétique, APHP, Hôpital Bichat-Claude Bernard, Université Paris Cité, Paris, France
| | - Kym M Boycott
- Children's Hospital of Eastern Ontario Research Institute, University of Ottawa, Ottawa, Ontario, Canada
| | - Guillaume Canaud
- INSERM U1151, Unité de médecine translationnelle et thérapies ciblées, Hôpital Necker-Enfants Malades, Université Paris Cité, AP-HP, Paris, France
| | - Mark Caulfield
- William Harvey Research Institute, Queen Mary University of London, London, UK
| | - Daniela Choukair
- Division of Pediatric Endocrinology and Diabetes, Center for Pediatrics and Adolescent Medicine, University Hospital Heidelberg, Heidelberg, Germany and Center for Rare Diseases, University Hospital Heidelberg, Heidelberg, Germany
| | - Sue Hill
- Chief Scientific Officer, NHS England, London, UK
| | - Malte Spielmann
- Institute of Human Genetics, University Hospitals Schleswig-Holstein, University of Lübeck and Kiel University, Lübeck, Kiel, Germany
| | - Anna Wedell
- Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden
- Centre for Inherited Metabolic Diseases, Karolinska University Hospital, Stockholm, Sweden
| | - Valtteri Wirta
- Science for Life Laboratory, Department of Microbiology, Tumour and Cell Biology, Karolinska Institutet, Stockholm, Sweden
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH Royal Institutet of Technology, Stockholm, Sweden
| | - Ann Nordgren
- Department of Molecular Medicine and Surgery and Centre of Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden
- Department of Laboratory Medicine, Institute of Biomedicine, University of Gothenburg, Gothenburg, Sweden
- Department of Clinical Genetics and Genomics, Sahlgrenska University Hospital, Gothenburg, Sweden
| | - Anna Lindstrand
- Department of Molecular Medicine and Surgery and Centre of Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden
| |
Collapse
|
16
|
Yi D, Nam JW, Jeong H. Toward the functional interpretation of somatic structural variations: bulk- and single-cell approaches. Brief Bioinform 2023; 24:bbad297. [PMID: 37587831 PMCID: PMC10516374 DOI: 10.1093/bib/bbad297] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 07/05/2023] [Accepted: 07/23/2023] [Indexed: 08/18/2023] Open
Abstract
Structural variants (SVs) are genomic rearrangements that can take many different forms such as copy number alterations, inversions and translocations. During cell development and aging, somatic SVs accumulate in the genome with potentially neutral, deleterious or pathological effects. Generation of somatic SVs is a key mutational process in cancer development and progression. Despite their importance, the detection of somatic SVs is challenging, making them less studied than somatic single-nucleotide variants. In this review, we summarize recent advances in whole-genome sequencing (WGS)-based approaches for detecting somatic SVs at the tissue and single-cell levels and discuss their advantages and limitations. First, we describe the state-of-the-art computational algorithms for somatic SV calling using bulk WGS data and compare the performance of somatic SV detectors in the presence or absence of a matched-normal control. We then discuss the unique features of cutting-edge single-cell-based techniques for analyzing somatic SVs. The advantages and disadvantages of bulk and single-cell approaches are highlighted, along with a discussion of their sensitivity to copy-neutral SVs, usefulness for functional inferences and experimental and computational costs. Finally, computational approaches for linking somatic SVs to their functional readouts, such as those obtained from single-cell transcriptome and epigenome analyses, are illustrated, with a discussion of the promise of these approaches in health and diseases.
Collapse
Affiliation(s)
- Dohun Yi
- Department of Life Science, College of Natural Sciences, Hanyang University, Wangsimni-ro 222, Seongdong-gu, Seoul 04763, Republic of Korea
| | - Jin-Wu Nam
- Department of Life Science, College of Natural Sciences, Hanyang University, Wangsimni-ro 222, Seongdong-gu, Seoul 04763, Republic of Korea
- Research Institute for Convergence of Basic Sciences, Hanyang University, Wangsimni-ro 222, Seongdong-gu, Seoul 04763, Republic of Korea
- Bio-BigData Center, Hanyang Institute of Bioscience and Biotechnology, Hanyang University, Wangsimni-ro 222, Seongdong-gu, Seoul 04763, Republic of Korea
- Hanyang Institute of Advanced BioConvergence, Hanyang University, Wangsimni-ro 222, Seongdong-gu, Seoul 04763, Republic of Korea
| | - Hyobin Jeong
- Department of Life Science, College of Natural Sciences, Hanyang University, Wangsimni-ro 222, Seongdong-gu, Seoul 04763, Republic of Korea
- Bio-BigData Center, Hanyang Institute of Bioscience and Biotechnology, Hanyang University, Wangsimni-ro 222, Seongdong-gu, Seoul 04763, Republic of Korea
- Hanyang Institute of Advanced BioConvergence, Hanyang University, Wangsimni-ro 222, Seongdong-gu, Seoul 04763, Republic of Korea
| |
Collapse
|
17
|
Peng C, Chen H, Ren J, Zhou F, Li Y, Keqie Y, Ding T, Ruan J, Wang H, Chen X, Liu S. A long-read sequencing and SNP haplotype-based novel preimplantation genetic testing method for female ADPKD patient with de novo PKD1 mutation. BMC Genomics 2023; 24:521. [PMID: 37667185 PMCID: PMC10478289 DOI: 10.1186/s12864-023-09593-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Accepted: 08/16/2023] [Indexed: 09/06/2023] Open
Abstract
The autosomal dominant form of polycystic kidney disease (ADPKD) is the most common hereditary disease that causes late-onset renal cyst development and end-stage renal disease. Preimplantation genetic testing for monogenic disease (PGT-M) has emerged as an effective strategy to prevent pathogenic mutation transmission rely on SNP linkage analysis between pedigree members. Yet, it remains challenging to establish reliable PGT-M methods for ADPKD cases or other monogenic diseases with de novo mutations or without a family history. Here we reported the application of long-read sequencing for direct haplotyping in a female patient with de novo PKD1 c.11,526 G > C mutation and successfully established the high-risk haplotype. Together with targeted short-read sequencing of SNPs for the couple and embryos, the carrier status for embryos was identified. A healthy baby was born without the PKD1 pathogenic mutation. Our PGT-M strategy based on long-read sequencing for direct haplotyping combined with targeted SNP haplotype can be widely applied to other monogenic disease carriers with de novo mutation.
Collapse
Affiliation(s)
- Cuiting Peng
- Center of prenatal diagnosis, Department of Medical Genetics, West China Second University Hospital, Sichuan University, No17, Section 3, South Renmin Road, Chengdu, China
- Laboratory of birth defects and related diseases of women and children, Sichuan university, Ministry of Education, Sichuan, China
| | - Han Chen
- Center of prenatal diagnosis, Department of Medical Genetics, West China Second University Hospital, Sichuan University, No17, Section 3, South Renmin Road, Chengdu, China
- Laboratory of birth defects and related diseases of women and children, Sichuan university, Ministry of Education, Sichuan, China
| | - Jun Ren
- Center of prenatal diagnosis, Department of Medical Genetics, West China Second University Hospital, Sichuan University, No17, Section 3, South Renmin Road, Chengdu, China
- Laboratory of birth defects and related diseases of women and children, Sichuan university, Ministry of Education, Sichuan, China
| | - Fan Zhou
- Center of prenatal diagnosis, Department of Medical Genetics, West China Second University Hospital, Sichuan University, No17, Section 3, South Renmin Road, Chengdu, China
- Laboratory of birth defects and related diseases of women and children, Sichuan university, Ministry of Education, Sichuan, China
| | - Yutong Li
- Center of prenatal diagnosis, Department of Medical Genetics, West China Second University Hospital, Sichuan University, No17, Section 3, South Renmin Road, Chengdu, China
- Laboratory of birth defects and related diseases of women and children, Sichuan university, Ministry of Education, Sichuan, China
| | - Yuezhi Keqie
- Center of prenatal diagnosis, Department of Medical Genetics, West China Second University Hospital, Sichuan University, No17, Section 3, South Renmin Road, Chengdu, China
- Laboratory of birth defects and related diseases of women and children, Sichuan university, Ministry of Education, Sichuan, China
| | | | | | - He Wang
- Center of prenatal diagnosis, Department of Medical Genetics, West China Second University Hospital, Sichuan University, No17, Section 3, South Renmin Road, Chengdu, China
- Laboratory of birth defects and related diseases of women and children, Sichuan university, Ministry of Education, Sichuan, China
| | - Xinlian Chen
- Center of prenatal diagnosis, Department of Medical Genetics, West China Second University Hospital, Sichuan University, No17, Section 3, South Renmin Road, Chengdu, China.
- Laboratory of birth defects and related diseases of women and children, Sichuan university, Ministry of Education, Sichuan, China.
| | - Shanling Liu
- Center of prenatal diagnosis, Department of Medical Genetics, West China Second University Hospital, Sichuan University, No17, Section 3, South Renmin Road, Chengdu, China.
- Laboratory of birth defects and related diseases of women and children, Sichuan university, Ministry of Education, Sichuan, China.
| |
Collapse
|
18
|
van Dijk EL, Naquin D, Gorrichon K, Jaszczyszyn Y, Ouazahrou R, Thermes C, Hernandez C. Genomics in the long-read sequencing era. Trends Genet 2023; 39:649-671. [PMID: 37230864 DOI: 10.1016/j.tig.2023.04.006] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 04/21/2023] [Accepted: 04/25/2023] [Indexed: 05/27/2023]
Abstract
Long-read sequencing (LRS) technologies have provided extremely powerful tools to explore genomes. While in the early years these methods suffered technical limitations, they have recently made significant progress in terms of read length, throughput, and accuracy and bioinformatics tools have strongly improved. Here, we aim to review the current status of LRS technologies, the development of novel methods, and the impact on genomics research. We will explore the most impactful recent findings made possible by these technologies focusing on high-resolution sequencing of genomes and transcriptomes and the direct detection of DNA and RNA modifications. We will also discuss how LRS methods promise a more comprehensive understanding of human genetic variation, transcriptomics, and epigenetics for the coming years.
Collapse
Affiliation(s)
- Erwin L van Dijk
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France.
| | - Delphine Naquin
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
| | - Kévin Gorrichon
- National Center of Human Genomics Research (CNRGH), 91000 Évry-Courcouronnes, France
| | - Yan Jaszczyszyn
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
| | - Rania Ouazahrou
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
| | - Claude Thermes
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
| | - Céline Hernandez
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
| |
Collapse
|
19
|
Hook PW, Timp W. Beyond assembly: the increasing flexibility of single-molecule sequencing technology. Nat Rev Genet 2023; 24:627-641. [PMID: 37161088 PMCID: PMC10169143 DOI: 10.1038/s41576-023-00600-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/30/2023] [Indexed: 05/11/2023]
Abstract
The maturation of high-throughput short-read sequencing technology over the past two decades has shaped the way genomes are studied. Recently, single-molecule, long-read sequencing has emerged as an essential tool in deciphering genome structure and function, including filling gaps in the human reference genome, measuring the epigenome and characterizing splicing variants in the transcriptome. With recent technological developments, these single-molecule technologies have moved beyond genome assembly and are being used in a variety of ways, including to selectively sequence specific loci with long reads, measure chromatin state and protein-DNA binding in order to investigate the dynamics of gene regulation, and rapidly determine copy number variation. These increasingly flexible uses of single-molecule technologies highlight a young and fast-moving part of the field that is leading to a more accessible era of nucleic acid sequencing.
Collapse
Affiliation(s)
- Paul W Hook
- Department of Biomedical Engineering, Molecular Biology and Genetics, and Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - Winston Timp
- Department of Biomedical Engineering, Molecular Biology and Genetics, and Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
20
|
Abstract
DNA sequencing has revolutionized medicine over recent decades. However, analysis of large structural variation and repetitive DNA, a hallmark of human genomes, has been limited by short-read technology, with read lengths of 100-300 bp. Long-read sequencing (LRS) permits routine sequencing of human DNA fragments tens to hundreds of kilobase pairs in size, using both real-time sequencing by synthesis and nanopore-based direct electronic sequencing. LRS permits analysis of large structural variation and haplotypic phasing in human genomes and has enabled the discovery and characterization of rare pathogenic structural variants and repeat expansions. It has also recently enabled the assembly of a complete, gapless human genome that includes previously intractable regions, such as highly repetitive centromeres and homologous acrocentric short arms. With the addition of protocols for targeted enrichment, direct epigenetic DNA modification detection, and long-range chromatin profiling, LRS promises to launch a new era of understanding of genetic diversity and pathogenic mutations in human populations.
Collapse
Affiliation(s)
- Peter E Warburton
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA; ,
- Center for Advanced Genomics Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Robert P Sebra
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA; ,
- Center for Advanced Genomics Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Black Family Stem Cell Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| |
Collapse
|
21
|
Romagnoli S, Bartalucci N, Vannucchi AM. Resolving complex structural variants via nanopore sequencing. Front Genet 2023; 14:1213917. [PMID: 37674481 PMCID: PMC10479017 DOI: 10.3389/fgene.2023.1213917] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Accepted: 07/26/2023] [Indexed: 09/08/2023] Open
Abstract
The recent development of high-throughput sequencing platforms provided impressive insights into the field of human genetics and contributed to considering structural variants (SVs) as the hallmark of genome instability, leading to the establishment of several pathologic conditions, including neoplasia and neurodegenerative and cognitive disorders. While SV detection is addressed by next-generation sequencing (NGS) technologies, the introduction of more recent long-read sequencing technologies have already been proven to be invaluable in overcoming the inaccuracy and limitations of NGS technologies when applied to resolve wide and structurally complex SVs due to the short length (100-500 bp) of the sequencing read utilized. Among the long-read sequencing technologies, Oxford Nanopore Technologies developed a sequencing platform based on a protein nanopore that allows the sequencing of "native" long DNA molecules of virtually unlimited length (typical range 1-100 Kb). In this review, we focus on the bioinformatics methods that improve the identification and genotyping of known and novel SVs to investigate human pathological conditions, discussing the possibility of introducing nanopore sequencing technology into routine diagnostics.
Collapse
Affiliation(s)
| | | | - Alessandro Maria Vannucchi
- CRIMM, Center of Research and Innovation of Myeloproliferative Neoplasms, DENOTHE Excellence Center, Careggi University Hospital and Department of Experimental and Clinical Medicine, University of Florence, Florence, Italy
| |
Collapse
|
22
|
Li H, Durbin R. Genome assembly in the telomere-to-telomere era. ArXiv 2023:arXiv:2308.07877v1. [PMID: 37645045 PMCID: PMC10462168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]
Abstract
De novo assembly is the process of reconstructing the genome sequence of an organism from sequencing reads. Genome sequences are essential to biology, and assembly has been a central problem in bioinformatics for four decades. Until recently, genomes were typically assembled into fragments of a few megabases at best but technological advances in long-read sequencing now enable near complete chromosome-level assembly, also known as telomere-to-telomere assembly, for many organisms. Here we review recent progress on assembly algorithms and protocols. We focus on how to derive near telomere-to-telomere assemblies and discuss potential future developments.
Collapse
Affiliation(s)
- Heng Li
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Richard Durbin
- Department of Genetics, Cambridge University, Cambridge, UK
| |
Collapse
|
23
|
Zhou B, He Y, Chen Y, Su B. Comparative Genomic Analysis Identifies Great-Ape-Specific Structural Variants and Their Evolutionary Relevance. Mol Biol Evol 2023; 40:msad184. [PMID: 37565562 PMCID: PMC10461412 DOI: 10.1093/molbev/msad184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Revised: 08/01/2023] [Accepted: 08/10/2023] [Indexed: 08/12/2023] Open
Abstract
During the origin of great apes about 14 million years ago, a series of phenotypic innovations emerged, such as the increased body size, the enlarged brain volume, the improved cognitive skill, and the diversified diet. Yet, the genomic basis of these evolutionary changes remains unclear. Utilizing the high-quality genome assemblies of great apes (including human), gibbon, and macaque, we conducted comparative genome analyses and identified 15,885 great ape-specific structural variants (GSSVs), including eight coding GSSVs resulting in the creation of novel proteins (e.g., ACAN and CMYA5). Functional annotations of the GSSV-related genes revealed the enrichment of genes involved in development and morphogenesis, especially neurogenesis and neural network formation, suggesting the potential role of GSSVs in shaping the great ape-shared traits. Further dissection of the brain-related GSSVs shows great ape-specific changes of enhancer activities and gene expression in the brain, involving a group of GSSV-regulated genes (such as NOL3) that potentially contribute to the altered brain development and function in great apes. The presented data highlight the evolutionary role of structural variants in the phenotypic innovations during the origin of the great ape lineage.
Collapse
Affiliation(s)
- Bin Zhou
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
- National Resource Center for Non-Human Primates, Kunming Primate Research Center, and National Research Facility for Phenotypic & Genetic Analysis of Model Animals (Primate Facility), Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
- Kunming College of Life Science, University of Chinese Academy of Sciences, Beijing, China
| | - Yaoxi He
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
- National Resource Center for Non-Human Primates, Kunming Primate Research Center, and National Research Facility for Phenotypic & Genetic Analysis of Model Animals (Primate Facility), Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
| | - Yongjie Chen
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
- National Resource Center for Non-Human Primates, Kunming Primate Research Center, and National Research Facility for Phenotypic & Genetic Analysis of Model Animals (Primate Facility), Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
- Kunming College of Life Science, University of Chinese Academy of Sciences, Beijing, China
| | - Bing Su
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
- National Resource Center for Non-Human Primates, Kunming Primate Research Center, and National Research Facility for Phenotypic & Genetic Analysis of Model Animals (Primate Facility), Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, Yunnan, China
| |
Collapse
|
24
|
O'Donnell S, Yue JX, Saada OA, Agier N, Caradec C, Cokelaer T, De Chiara M, Delmas S, Dutreux F, Fournier T, Friedrich A, Kornobis E, Li J, Miao Z, Tattini L, Schacherer J, Liti G, Fischer G. Telomere-to-telomere assemblies of 142 strains characterize the genome structural landscape in Saccharomyces cerevisiae. Nat Genet 2023; 55:1390-1399. [PMID: 37524789 PMCID: PMC10412453 DOI: 10.1038/s41588-023-01459-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Accepted: 06/26/2023] [Indexed: 08/02/2023]
Abstract
Pangenomes provide access to an accurate representation of the genetic diversity of species, both in terms of sequence polymorphisms and structural variants (SVs). Here we generated the Saccharomyces cerevisiae Reference Assembly Panel (ScRAP) comprising reference-quality genomes for 142 strains representing the species' phylogenetic and ecological diversity. The ScRAP includes phased haplotype assemblies for several heterozygous diploid and polyploid isolates. We identified circa (ca.) 4,800 nonredundant SVs that provide a broad view of the genomic diversity, including the dynamics of telomere length and transposable elements. We uncovered frequent cases of complex aneuploidies where large chromosomes underwent large deletions and translocations. We found that SVs can impact gene expression near the breakpoints and substantially contribute to gene repertoire evolution. We also discovered that horizontally acquired regions insert at chromosome ends and can generate new telomeres. Overall, the ScRAP demonstrates the benefit of a pangenome in understanding genome evolution at population scale.
Collapse
Affiliation(s)
- Samuel O'Donnell
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratory of Computational and Quantitative Biology, Paris, France
| | - Jia-Xing Yue
- State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Sun Yat-sen University Cancer Center, Guangzhou, China
- Université Côte d'Azur, CNRS, INSERM, IRCAN, Nice, France
| | - Omar Abou Saada
- Université de Strasbourg, CNRS, GMGM UMR 7156, Strasbourg, France
| | - Nicolas Agier
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratory of Computational and Quantitative Biology, Paris, France
| | - Claudia Caradec
- Université de Strasbourg, CNRS, GMGM UMR 7156, Strasbourg, France
| | - Thomas Cokelaer
- Biomics Technological Platform, Center for Technological Resources and Research (C2RT), Institut Pasteur, Paris, France
- Bioinformatics and Biostatistics Hub, Computational Biology Department, Institut Pasteur, Paris, France
| | | | - Stéphane Delmas
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratory of Computational and Quantitative Biology, Paris, France
| | - Fabien Dutreux
- Université de Strasbourg, CNRS, GMGM UMR 7156, Strasbourg, France
| | - Téo Fournier
- Université de Strasbourg, CNRS, GMGM UMR 7156, Strasbourg, France
| | - Anne Friedrich
- Université de Strasbourg, CNRS, GMGM UMR 7156, Strasbourg, France
| | - Etienne Kornobis
- Biomics Technological Platform, Center for Technological Resources and Research (C2RT), Institut Pasteur, Paris, France
- Bioinformatics and Biostatistics Hub, Computational Biology Department, Institut Pasteur, Paris, France
| | - Jing Li
- State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Sun Yat-sen University Cancer Center, Guangzhou, China
- Université Côte d'Azur, CNRS, INSERM, IRCAN, Nice, France
| | - Zepu Miao
- State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Sun Yat-sen University Cancer Center, Guangzhou, China
| | | | | | - Gianni Liti
- Université Côte d'Azur, CNRS, INSERM, IRCAN, Nice, France.
| | - Gilles Fischer
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratory of Computational and Quantitative Biology, Paris, France.
| |
Collapse
|
25
|
Cuomo ASE, Nathan A, Raychaudhuri S, MacArthur DG, Powell JE. Single-cell genomics meets human genetics. Nat Rev Genet 2023; 24:535-549. [PMID: 37085594 PMCID: PMC10784789 DOI: 10.1038/s41576-023-00599-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/29/2023] [Indexed: 04/23/2023]
Abstract
Single-cell genomic technologies are revealing the cellular composition, identities and states in tissues at unprecedented resolution. They have now scaled to the point that it is possible to query samples at the population level, across thousands of individuals. Combining single-cell information with genotype data at this scale provides opportunities to link genetic variation to the cellular processes underpinning key aspects of human biology and disease. This strategy has potential implications for disease diagnosis, risk prediction and development of therapeutic solutions. But, effectively integrating large-scale single-cell genomic data, genetic variation and additional phenotypic data will require advances in data generation and analysis methods. As single-cell genetics begins to emerge as a field in its own right, we review its current state and the challenges and opportunities ahead.
Collapse
Affiliation(s)
- Anna S E Cuomo
- Garvan Institute of Medical Research, Darlinghurst, Sydney, New South Wales, Australia.
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia.
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia.
| | - Aparna Nathan
- Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Divisions of Rheumatology and Genetics, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Soumya Raychaudhuri
- Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Divisions of Rheumatology and Genetics, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Daniel G MacArthur
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia
| | - Joseph E Powell
- Garvan Institute of Medical Research, Darlinghurst, Sydney, New South Wales, Australia.
- UNSW Cellular Genomics Futures Institute, University of New South Wales, Sydney, New South Wales, Australia.
| |
Collapse
|
26
|
Ryan NM, Corvin A. Investigating the dark-side of the genome: a barrier to human disease variant discovery? Biol Res 2023; 56:42. [PMID: 37468985 DOI: 10.1186/s40659-023-00455-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2022] [Accepted: 07/11/2023] [Indexed: 07/21/2023] Open
Abstract
The human genome contains regions that cannot be adequately assembled or aligned using next generation short-read sequencing technologies. More than 2500 genes are known contain such 'dark' regions. In this study, we investigate the negative consequences of dark regions on gene discovery across a range of disease and study types, showing that dark regions are likely preventing researchers from identifying genetic variants relevant to human disease.
Collapse
Affiliation(s)
- Niamh M Ryan
- Neuropsychiatric Genetics Research Group, Department of Psychiatry, Trinity College Dublin, Dublin, Ireland.
| | - Aiden Corvin
- Neuropsychiatric Genetics Research Group, Department of Psychiatry, Trinity College Dublin, Dublin, Ireland
| |
Collapse
|
27
|
Laufer VA, Glover TW, Wilson TE. Applications of advanced technologies for detecting genomic structural variation. Mutat Res Rev Mutat Res 2023; 792:108475. [PMID: 37931775 PMCID: PMC10792551 DOI: 10.1016/j.mrrev.2023.108475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 09/07/2023] [Accepted: 11/02/2023] [Indexed: 11/08/2023]
Abstract
Chromosomal structural variation (SV) encompasses a heterogenous class of genetic variants that exerts strong influences on human health and disease. Despite their importance, many structural variants (SVs) have remained poorly characterized at even a basic level, a discrepancy predicated upon the technical limitations of prior genomic assays. However, recent advances in genomic technology can identify and localize SVs accurately, opening new questions regarding SV risk factors and their impacts in humans. Here, we first define and classify human SVs and their generative mechanisms, highlighting characteristics leveraged by various SV assays. We next examine the first-ever gapless assembly of the human genome and the technical process of assembling it, which required third-generation sequencing technologies to resolve structurally complex loci. The new portions of that "telomere-to-telomere" and subsequent pangenome assemblies highlight aspects of SV biology likely to develop in the near-term. We consider the strengths and limitations of the most promising new SV technologies and when they or longstanding approaches are best suited to meeting salient goals in the study of human SV in population-scale genomics research, clinical, and public health contexts. It is a watershed time in our understanding of human SV when new approaches are expected to fundamentally change genomic applications.
Collapse
Affiliation(s)
- Vincent A Laufer
- Department of Pathology, University of Michigan Medical School, Ann Arbor, MI 48109, USA.
| | - Thomas W Glover
- Department of Pathology, University of Michigan Medical School, Ann Arbor, MI 48109, USA; Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI 48109, USA.
| | - Thomas E Wilson
- Department of Pathology, University of Michigan Medical School, Ann Arbor, MI 48109, USA; Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI 48109, USA.
| |
Collapse
|
28
|
Audano PA, Beck CR. Small allelic variants are a source of ancestral bias in structural variant breakpoint placement. bioRxiv 2023:2023.06.25.546295. [PMID: 37425850 PMCID: PMC10327140 DOI: 10.1101/2023.06.25.546295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2023]
Abstract
High-quality genome assemblies and sophisticated algorithms have increased sensitivity for a wide range of variant types, and breakpoint accuracy for structural variants (SVs, ≥ 50 bp) has improved to near basepair precision. Despite these advances, many SVs in unique regions of the genome are subject to systematic bias that affects breakpoint location. This ambiguity leads to less accurate variant comparisons across samples, and it obscures true breakpoint features needed for mechanistic inferences. To understand why SVs are not consistently placed, we reanalyzed 64 phased haplotypes constructed from long-read assemblies released by the Human Genome Structural Variation Consortium (HGSVC). We identified variable breakpoints for 882 SV insertions and 180 SV deletions not anchored in tandem repeats (TRs) or segmental duplications (SDs). While this is unexpectedly high for genome assemblies in unique loci, we find read-based callsets from the same sequencing data yielded 1,566 insertions and 986 deletions with inconsistent breakpoints also not anchored in TRs or SDs. When we investigated causes for breakpoint inaccuracy, we found sequence and assembly errors had minimal impact, but we observed a strong effect of ancestry. We confirmed that polymorphic mismatches and small indels are enriched at shifted breakpoints and that these polymorphisms are generally lost when breakpoints shift. Long tracts of homology, such as SVs mediated by transposable elements, increase the likelihood of imprecise SV calls and the distance they are shifted. Tandem Duplication (TD) breakpoints are the most heavily affected SV class with 14% of TDs placed at different locations across haplotypes. While graph genome methods normalize SV calls across many samples, the resulting breakpoints are sometimes incorrect, highlighting a need to tune graph methods for breakpoint accuracy. The breakpoint inconsistencies we characterize collectively affect ~5% of the SVs called in a human genome and underscore a need for algorithm development to improve SV databases, mitigate the impact of ancestry on breakpoint placement, and increase the value of callsets for investigating mutational processes.
Collapse
Affiliation(s)
- Peter A Audano
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Christine R Beck
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
- Department of Genetics and Genome Sciences, Institute for Systems Genomics, University of Connecticut Health Center, Farmington, CT, USA
| |
Collapse
|
29
|
Turner AJ, Derezinski AD, Gaedigk A, Berres ME, Gregornik DB, Brown K, Broeckel U, Scharer G. Characterization of complex structural variation in the CYP2D6-CYP2D7-CYP2D8 gene loci using single-molecule long-read sequencing. Front Pharmacol 2023; 14:1195778. [PMID: 37426826 PMCID: PMC10324673 DOI: 10.3389/fphar.2023.1195778] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Accepted: 05/30/2023] [Indexed: 07/11/2023] Open
Abstract
Complex regions in the human genome such as repeat motifs, pseudogenes and structural (SVs) and copy number variations (CNVs) present ongoing challenges to accurate genetic analysis, particularly for short-read Next-Generation-Sequencing (NGS) technologies. One such region is the highly polymorphic CYP2D loci, containing CYP2D6, a clinically relevant pharmacogene contributing to the metabolism of >20% of common drugs, and two highly similar pseudogenes, CYP2D7 and CYP2D8. Multiple complex SVs, including CYP2D6/CYP2D7-derived hybrid genes are known to occur in different configurations and frequencies across populations and are difficult to detect and characterize accurately. This can lead to incorrect enzyme activity assignment and impact drug dosing recommendations, often disproportionally affecting underrepresented populations. To improve CYP2D6 genotyping accuracy, we developed a PCR-free CRISPR-Cas9 based enrichment method for targeted long-read sequencing that fully characterizes the entire CYP2D6-CYP2D7-CYP2D8 loci. Clinically relevant sample types, including blood, saliva, and liver tissue were sequenced, generating high coverage sets of continuous single molecule reads spanning the entire targeted region of up to 52 kb, regardless of SV present (n = 9). This allowed for fully phased dissection of the entire loci structure, including breakpoints, to accurately resolve complex CYP2D6 diplotypes with a single assay. Additionally, we identified three novel CYP2D6 suballeles, and fully characterized 17 CYP2D7 and 18 CYP2D8 unique haplotypes. This method for CYP2D6 genotyping has the potential to significantly improve accurate clinical phenotyping to inform drug therapy and can be adapted to overcome testing limitations of other clinically challenging genomic regions.
Collapse
Affiliation(s)
| | | | - Andrea Gaedigk
- Children’s Mercy Research Institute, Kansas City, MO, United States
| | - Mark E. Berres
- Biotechnology Center, University of Wisconsin Madison, Madison, WI, United States
| | | | - Keith Brown
- Jumpcode Genomics, San Diego, CA, United States
| | | | | |
Collapse
|
30
|
Abstract
Advances in clinical genetic testing, including the introduction of exome sequencing, have uncovered the molecular etiology for many rare and previously unsolved genetic disorders, yet more than half of individuals with a suspected genetic disorder remain unsolved after complete clinical evaluation. A precise genetic diagnosis may guide clinical treatment plans, allow families to make informed care decisions, and permit individuals to participate in N-of-1 trials; thus, there is high interest in developing new tools and techniques to increase the solve rate. Long-read sequencing (LRS) is a promising technology for both increasing the solve rate and decreasing the amount of time required to make a precise genetic diagnosis. Here, we summarize current LRS technologies, give examples of how they have been used to evaluate complex genetic variation and identify missing variants, and discuss future clinical applications of LRS. As costs continue to decrease, LRS will find additional utility in the clinical space fundamentally changing how pathological variants are discovered and eventually acting as a single-data source that can be interrogated multiple times for clinical service.
Collapse
Affiliation(s)
| | - Danny E Miller
- Division of Genetic Medicine, Department of Pediatrics, University of Washington and Seattle Children's Hospital, Seattle, WA, 98195, USA
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, 98195, USA
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA, 98195, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA.
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, 98195, USA.
| |
Collapse
|
31
|
Logsdon GA, Rozanski AN, Ryabov F, Potapova T, Shepelev VA, Mao Y, Rautiainen M, Koren S, Nurk S, Porubsky D, Lucas JK, Hoekzema K, Munson KM, Gerton JL, Phillippy AM, Alexandrov IA, Eichler EE. The variation and evolution of complete human centromeres. bioRxiv 2023:2023.05.30.542849. [PMID: 37398417 PMCID: PMC10312506 DOI: 10.1101/2023.05.30.542849] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
We completely sequenced and assembled all centromeres from a second human genome and used two reference sets to benchmark genetic, epigenetic, and evolutionary variation within centromeres from a diversity panel of humans and apes. We find that centromere single-nucleotide variation can increase by up to 4.1-fold relative to other genomic regions, with the caveat that up to 45.8% of centromeric sequence, on average, cannot be reliably aligned with current methods due to the emergence of new α-satellite higher-order repeat (HOR) structures and two to threefold differences in the length of the centromeres. The extent to which this occurs differs depending on the chromosome and haplotype. Comparing the two sets of complete human centromeres, we find that eight harbor distinctly different α-satellite HOR array structures and four contain novel α-satellite HOR variants in high abundance. DNA methylation and CENP-A chromatin immunoprecipitation experiments show that 26% of the centromeres differ in their kinetochore position by at least 500 kbp-a property not readily associated with novel α-satellite HORs. To understand evolutionary change, we selected six chromosomes and sequenced and assembled 31 orthologous centromeres from the common chimpanzee, orangutan, and macaque genomes. Comparative analyses reveal nearly complete turnover of α-satellite HORs, but with idiosyncratic changes in structure characteristic to each species. Phylogenetic reconstruction of human haplotypes supports limited to no recombination between the p- and q-arms of human chromosomes and reveals that novel α-satellite HORs share a monophyletic origin, providing a strategy to estimate the rate of saltatory amplification and mutation of human centromeric DNA.
Collapse
Affiliation(s)
- Glennis A. Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Allison N. Rozanski
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Fedor Ryabov
- Masters Program in National Research University Higher School of Economics, Moscow, Russia
| | - Tamara Potapova
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | | | - Yafei Mao
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Mikko Rautiainen
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Julian K. Lucas
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Katherine M. Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Adam M. Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Ivan A. Alexandrov
- Department of Human Molecular Genetics and Biochemistry, Tel Aviv University, Tel Aviv, Israel
- Department of Anatomy and Anthropology, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
- Dan David Center for Human Evolution and Biohistory Research, Tel Aviv University, Tel Aviv, Israel
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
32
|
Ahmed OY, Rossi M, Gagie T, Boucher C, Langmead B. SPUMONI 2: improved classification using a pangenome index of minimizer digests. Genome Biol 2023; 24:122. [PMID: 37202771 PMCID: PMC10197461 DOI: 10.1186/s13059-023-02958-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Accepted: 05/03/2023] [Indexed: 05/20/2023] Open
Abstract
Genomics analyses use large reference sequence collections, like pangenomes or taxonomic databases. SPUMONI 2 is an efficient tool for sequence classification of both short and long reads. It performs multi-class classification using a novel sampled document array. By incorporating minimizers, SPUMONI 2's index is 65 times smaller than minimap2's for a mock community pangenome. SPUMONI 2 achieves a speed improvement of 3-fold compared to SPUMONI and 15-fold compared to minimap2. We show SPUMONI 2 achieves an advantageous mix of accuracy and efficiency in practical scenarios such as adaptive sampling, contamination detection and multi-class metagenomics classification.
Collapse
Affiliation(s)
- Omar Y. Ahmed
- Department of Computer Science, Johns Hopkins University, Baltimore, MD USA
| | - Massimiliano Rossi
- Department of Computer & Information Science & Engineering, University of Florida, Gainesville, FL USA
| | - Travis Gagie
- Faculty of Computer Science, Dalhousie University, Halifax, NS Canada
| | - Christina Boucher
- Department of Computer & Information Science & Engineering, University of Florida, Gainesville, FL USA
| | - Ben Langmead
- Department of Computer Science, Johns Hopkins University, Baltimore, MD USA
| |
Collapse
|
33
|
Ferraj A, Audano PA, Balachandran P, Czechanski A, Flores JI, Radecki AA, Mosur V, Gordon DS, Walawalkar IA, Eichler EE, Reinholdt LG, Beck CR. Resolution of structural variation in diverse mouse genomes reveals chromatin remodeling due to transposable elements. Cell Genom 2023; 3:100291. [PMID: 37228752 PMCID: PMC10203049 DOI: 10.1016/j.xgen.2023.100291] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 02/03/2023] [Accepted: 03/10/2023] [Indexed: 05/25/2023]
Abstract
Diverse inbred mouse strains are important biomedical research models, yet genome characterization of many strains is fundamentally lacking in comparison with humans. In particular, catalogs of structural variants (SVs) (variants ≥ 50 bp) are incomplete, limiting the discovery of causative alleles for phenotypic variation. Here, we resolve genome-wide SVs in 20 genetically distinct inbred mice with long-read sequencing. We report 413,758 site-specific SVs affecting 13% (356 Mbp) of the mouse reference assembly, including 510 previously unannotated coding variants. We substantially improve the Mus musculus transposable element (TE) callset, and we find that TEs comprise 39% of SVs and account for 75% of altered bases. We further utilize this callset to investigate how TE heterogeneity affects mouse embryonic stem cells and find multiple TE classes that influence chromatin accessibility. Our work provides a comprehensive analysis of SVs found in diverse mouse genomes and illustrates the role of TEs in epigenetic differences.
Collapse
Affiliation(s)
- Ardian Ferraj
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT 06032, USA
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Peter A. Audano
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | | | | | - Jacob I. Flores
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Alexander A. Radecki
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT 06032, USA
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Varun Mosur
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - David S. Gordon
- Howard Hughes Medical Institute and Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Isha A. Walawalkar
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT 06032, USA
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Evan E. Eichler
- Howard Hughes Medical Institute and Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | | | - Christine R. Beck
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT 06032, USA
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
- Institute for Systems Genomics, University of Connecticut, Storrs, CT 06269, USA
| |
Collapse
|
34
|
Weisburd B, Tiao G, Rehm HL. Insights from a genome-wide truth set of tandem repeat variation. bioRxiv 2023:2023.05.05.539588. [PMID: 37214979 PMCID: PMC10197592 DOI: 10.1101/2023.05.05.539588] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Tools for genotyping tandem repeats (TRs) from short read sequencing data have improved significantly over the past decade. Extensive comparisons of these tools to gold standard diagnostic methods like RP-PCR have confirmed their accuracy for tens to hundreds of well-studied loci. However, a scarcity of high-quality orthogonal truth data limited our ability to measure tool accuracy for the millions of other loci throughout the genome. To address this, we developed a TR truth set based on the Synthetic Diploid Benchmark (SynDip). By identifying the subset of insertions and deletions that represent TR expansions or contractions with motifs between 2 and 50 base pairs, we obtained accurate genotypes for 139,795 pure and 6,845 interrupted repeats in a single diploid sample. Our approach did not require running existing genotyping tools on short read or long read sequencing data and provided an alternative, more accurate view of tandem repeat variation. We applied this truth set to compare the strengths and weaknesses of widely-used tools for genotyping TRs, evaluated the completeness of existing genome-wide TR catalogs, and explored the properties of tandem repeat variation throughout the genome. We found that, without filtering, ExpansionHunter had higher accuracy than GangSTR and HipSTR over a wide range of motifs and allele sizes. Also, when errors in allele size occurred, ExpansionHunter tended to overestimate expansion sizes, while GangSTR tended to underestimate them. Additionally, we saw that widely-used TR catalogs miss between 16% and 41% of variant loci in the truth set. These results suggest that genome-wide analyses would benefit from genotyping a larger set of loci as well as further tool development that builds on the strengths of current algorithms. To that end, we developed a new catalog of 2.8 million loci that captures 95% of variant loci in the truth set, and created a modified version of ExpansionHunter that runs 2 to 3x faster than the original while producing the same output.
Collapse
Affiliation(s)
- Ben Weisburd
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Grace Tiao
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Heidi L. Rehm
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| |
Collapse
|
35
|
Harvey WT, Ebert P, Ebler J, Audano PA, Munson KM, Hoekzema K, Porubsky D, Beck CR, Marschall T, Garimella K, Eichler EE. Whole-genome long-read sequencing downsampling and its effect on variant calling precision and recall. bioRxiv 2023:2023.05.04.539448. [PMID: 37205567 PMCID: PMC10187267 DOI: 10.1101/2023.05.04.539448] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Advances in long-read sequencing (LRS) technology continue to make whole-genome sequencing more complete, affordable, and accurate. LRS provides significant advantages over short-read sequencing approaches, including phased de novo genome assembly, access to previously excluded genomic regions, and discovery of more complex structural variants (SVs) associated with disease. Limitations remain with respect to cost, scalability, and platform-dependent read accuracy and the tradeoffs between sequence coverage and sensitivity of variant discovery are important experimental considerations for the application of LRS. We compare the genetic variant calling precision and recall of Oxford Nanopore Technologies (ONT) and PacBio HiFi platforms over a range of sequence coverages. For read-based applications, LRS sensitivity begins to plateau around 12-fold coverage with a majority of variants called with reasonable accuracy (F1 score above 0.5), and both platforms perform well for SV detection. Genome assembly increases variant calling precision and recall of SVs and indels in HiFi datasets with HiFi outperforming ONT in quality as measured by the F1 score of assembly-based variant callsets. While both technologies continue to evolve, our work offers guidance to design cost-effective experimental strategies that do not compromise on discovering novel biology.
Collapse
Affiliation(s)
- William T. Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Peter Ebert
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Core Unit Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Jana Ebler
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Peter A. Audano
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Katherine M. Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Christine R. Beck
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT 06032 USA
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Kiran Garimella
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| |
Collapse
|
36
|
Olson ND, Wagner J, Dwarshuis N, Miga KH, Sedlazeck FJ, Salit M, Zook JM. Variant calling and benchmarking in an era of complete human genome sequences. Nat Rev Genet 2023:10.1038/s41576-023-00590-0. [PMID: 37059810 DOI: 10.1038/s41576-023-00590-0] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/22/2023] [Indexed: 04/16/2023]
Abstract
Genetic variant calling from DNA sequencing has enabled understanding of germline variation in hundreds of thousands of humans. Sequencing technologies and variant-calling methods have advanced rapidly, routinely providing reliable variant calls in most of the human genome. We describe how advances in long reads, deep learning, de novo assembly and pangenomes have expanded access to variant calls in increasingly challenging, repetitive genomic regions, including medically relevant regions, and how new benchmark sets and benchmarking methods illuminate their strengths and limitations. Finally, we explore the possible future of more complete characterization of human genome variation in light of the recent completion of a telomere-to-telomere human genome reference assembly and human pangenomes, and we consider the innovations needed to benchmark their newly accessible repetitive regions and complex variants.
Collapse
Affiliation(s)
- Nathan D Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Nathan Dwarshuis
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Karen H Miga
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Fritz J Sedlazeck
- Baylor College of Medicine, Human Genome Sequencing Center, Houston, TX, USA
| | | | - Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA.
| |
Collapse
|
37
|
Denti L, Khorsand P, Bonizzoni P, Hormozdiari F, Chikhi R. SVDSS: structural variation discovery in hard-to-call genomic regions using sample-specific strings from accurate long reads. Nat Methods 2023; 20:550-558. [PMID: 36550274 DOI: 10.1038/s41592-022-01674-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Accepted: 10/08/2022] [Indexed: 12/24/2022]
Abstract
Structural variants (SVs) account for a large amount of sequence variability across genomes and play an important role in human genomics and precision medicine. Despite intense efforts over the years, the discovery of SVs in individuals remains challenging due to the diploid and highly repetitive structure of the human genome, and by the presence of SVs that vastly exceed sequencing read lengths. However, the recent introduction of low-error long-read sequencing technologies such as PacBio HiFi may finally enable these barriers to be overcome. Here we present SV discovery with sample-specific strings (SVDSS)-a method for discovery of SVs from long-read sequencing technologies (for example, PacBio HiFi) that combines and effectively leverages mapping-free, mapping-based and assembly-based methodologies for overall superior SV discovery performance. Our experiments on several human samples show that SVDSS outperforms state-of-the-art mapping-based methods for discovery of insertion and deletion SVs in PacBio HiFi reads and achieves notable improvements in calling SVs in repetitive regions of the genome.
Collapse
Affiliation(s)
- Luca Denti
- Sequence Bioinformatics, Department of Computational Biology, Institut Pasteur, Paris, France
| | | | - Paola Bonizzoni
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy.
| | - Fereydoun Hormozdiari
- Genome Center, UC Davis, Davis, CA, USA.
- UC Davis MIND Institute, Sacramento, CA, USA.
- Department of Biochemistry and Molecular Medicine, Sacramento, UC Davis, Sacramento, CA, USA.
| | - Rayan Chikhi
- Sequence Bioinformatics, Department of Computational Biology, Institut Pasteur, Paris, France.
| |
Collapse
|
38
|
Wang Y, Cai X, Hu S, Qin S, Wang Z, Cao Y, Hou C, Yang J, Zhou W. Comparative genomic analysis provides insight into the phylogeny and potential mechanisms of adaptive evolution of Sphingobacterium sp. CZ-2. Gene 2023; 855:147118. [PMID: 36521669 DOI: 10.1016/j.gene.2022.147118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 11/21/2022] [Accepted: 12/09/2022] [Indexed: 12/14/2022]
Abstract
Sphingobacterium is a class of Gram-negative, non-fermentative bacilli that have received widespread attention due to their broad ecological distribution and oil degradation ability, but are rarely involved in infections. In this manuscript, a novel Sphingobacterium strain isolated from wildfire-infected tobacco leaves was named Sphingobacterium sp. CZ-2. NGS and TGS sequencing results showed a whole genome of 3.92 Mb with 40.68 mol% GC content and containing 3,462 protein-coding genes, 9 rRNA-coding genes and 50 tRNA-coding genes. Phylogenetic analysis, ANI and dDDH calculations all supported that Sphingobacterium sp. CZ-2 represented a novel species of the genus Sphingobacterium. Analysis of the specific genes of Sphingobacterium sp. CZ-2 by comparative genomics revealed that metal transport proteins encoded by the troD and cusA genes could maintain the balance of heavy metal ion concentrations in the internal environment of bacteria and avoid heavy metal toxicity while meeting the needs of growth and reproduction, and transport proteins encoded by the malG gene could keep nutrients required for the survival of bacteria. Synteny and genome evolutionary analyses of Sphingobacterium strains implicated that the gene family contraction as a major process in genome evolution, with insertional sequences leading to mutations, deletions and reversals of genes that help bacteria to withstand complex environmental changes. Complete genome sequencing and systematic comparative genomic analysis will contribute new insights into the adaptive evolution of this novel species and the genus Sphingobacterium.
Collapse
Affiliation(s)
- Yongqiang Wang
- Hunan Provincial Engineering & Technology Research Center for Agricultural Big Data Analysis & Decision-Making, Hunan Agricultural University, Changsha 410128, China; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha 410128, China
| | - Xunhui Cai
- School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Shengnan Hu
- Hunan Provincial Engineering & Technology Research Center for Agricultural Big Data Analysis & Decision-Making, Hunan Agricultural University, Changsha 410128, China; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha 410128, China
| | - Sidong Qin
- Hunan Provincial Engineering & Technology Research Center for Agricultural Big Data Analysis & Decision-Making, Hunan Agricultural University, Changsha 410128, China; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha 410128, China
| | - Ziqi Wang
- Hunan Provincial Engineering & Technology Research Center for Agricultural Big Data Analysis & Decision-Making, Hunan Agricultural University, Changsha 410128, China; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha 410128, China
| | - Yixiang Cao
- Hunan Provincial Engineering & Technology Research Center for Agricultural Big Data Analysis & Decision-Making, Hunan Agricultural University, Changsha 410128, China; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha 410128, China
| | - Chaoliang Hou
- Hunan Provincial Engineering & Technology Research Center for Agricultural Big Data Analysis & Decision-Making, Hunan Agricultural University, Changsha 410128, China; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha 410128, China
| | - Jiangshan Yang
- Hunan Provincial Engineering & Technology Research Center for Agricultural Big Data Analysis & Decision-Making, Hunan Agricultural University, Changsha 410128, China; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha 410128, China
| | - Wei Zhou
- Hunan Provincial Engineering & Technology Research Center for Agricultural Big Data Analysis & Decision-Making, Hunan Agricultural University, Changsha 410128, China; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha 410128, China.
| |
Collapse
|
39
|
Weisweiler M, Stich B. Benchmarking of structural variant detection in the tetraploid potato genome using linked-read sequencing. Genomics 2023; 115:110568. [PMID: 36702293 DOI: 10.1016/j.ygeno.2023.110568] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 01/12/2023] [Accepted: 01/18/2023] [Indexed: 01/25/2023]
Abstract
It has recently been shown that structural variants (SV) can have a higher impact on gene expression variation compared to single nucleotide variants (SNV) in different plant species. Additionally, SV were associated with phenotypic variation in several crops. However, compared to the established SV detection based on short-read sequencing, less approaches were described for linked-read based SV calling. We therefore evaluated the performance of six linked-read SV callers compared to an established short-read SV caller based on simulated linked-reads in tetraploid potato. The objectives of our study were to i) compare the performance of SV callers based on linked-read sequencing to short-read sequencing, ii) examine the influence of SV type, SV length, haplotype incidence (HI), as well as sequencing coverage on the SV calling performance in the tetraploid potato genome, and iii) evaluate the accuracy of detecting insertions by linked-read compared to short-read sequencing. We observed high break point resolutions (BPR) detecting short SV and slightly lower BPR for large SV. Our observations highlighted the importance of short-read signals provided by Manta and LinkedSV to detect short SV. Manta and NAIBR performed well for detecting larger deletions, inversions, and duplications. Detected large SV were weakly influenced by the HI. Furthermore, we illustrated that large insertions can be assembled by Novel-X. Our results suggest the usage of the short-read and linked-read SV callers Manta, NAIBR, LinkedSV, and Novel-X based on at least 90x linked-read sequencing coverage to ensure the detection of a broad range of SV in the tetraploid potato genome.
Collapse
Affiliation(s)
- Marius Weisweiler
- Institute for Quantitative Genetics and Genomics of Plants, Universitätsstraße 1, 40225 Düsseldorf, Germany
| | - Benjamin Stich
- Institute for Quantitative Genetics and Genomics of Plants, Universitätsstraße 1, 40225 Düsseldorf, Germany; Cluster of Excellence on Plant Sciences, From Complex Traits towards Synthetic Modules, Universitätsstraße 1, 40225 Düsseldorf, Germany; Max Planck Institute for Plant Breeding Research, Carl-von-Linne-Weg 10, 50829 Köln, Germany.
| |
Collapse
|
40
|
Soto DC, Uribe-Salazar JM, Shew CJ, Sekar A, McGinty SP, Dennis MY. Genomic structural variation: A complex but important driver of human evolution. Am J Biol Anthropol 2023. [PMID: 36794631 DOI: 10.1002/ajpa.24713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/02/2022] [Revised: 01/21/2023] [Accepted: 02/05/2023] [Indexed: 02/17/2023]
Abstract
Structural variants (SVs)-including duplications, deletions, and inversions of DNA-can have significant genomic and functional impacts but are technically difficult to identify and assay compared with single-nucleotide variants. With the aid of new genomic technologies, it has become clear that SVs account for significant differences across and within species. This phenomenon is particularly well-documented for humans and other primates due to the wealth of sequence data available. In great apes, SVs affect a larger number of nucleotides than single-nucleotide variants, with many identified SVs exhibiting population and species specificity. In this review, we highlight the importance of SVs in human evolution by (1) how they have shaped great ape genomes resulting in sensitized regions associated with traits and diseases, (2) their impact on gene functions and regulation, which subsequently has played a role in natural selection, and (3) the role of gene duplications in human brain evolution. We further discuss how to incorporate SVs in research, including the strengths and limitations of various genomic approaches. Finally, we propose future considerations in integrating existing data and biospecimens with the ever-expanding SV compendium propelled by biotechnology advancements.
Collapse
Affiliation(s)
- Daniela C Soto
- Genome Center, MIND Institute, Department of Biochemstry & Molecular Medicine, University of California, Davis, California, USA.,Integrative Genetics and Genomics Graduate Group, University of California, Davis, California, USA
| | - José M Uribe-Salazar
- Genome Center, MIND Institute, Department of Biochemstry & Molecular Medicine, University of California, Davis, California, USA.,Integrative Genetics and Genomics Graduate Group, University of California, Davis, California, USA
| | - Colin J Shew
- Genome Center, MIND Institute, Department of Biochemstry & Molecular Medicine, University of California, Davis, California, USA.,Integrative Genetics and Genomics Graduate Group, University of California, Davis, California, USA
| | - Aarthi Sekar
- Genome Center, MIND Institute, Department of Biochemstry & Molecular Medicine, University of California, Davis, California, USA.,Integrative Genetics and Genomics Graduate Group, University of California, Davis, California, USA
| | - Sean P McGinty
- Genome Center, MIND Institute, Department of Biochemstry & Molecular Medicine, University of California, Davis, California, USA.,Integrative Genetics and Genomics Graduate Group, University of California, Davis, California, USA
| | - Megan Y Dennis
- Genome Center, MIND Institute, Department of Biochemstry & Molecular Medicine, University of California, Davis, California, USA.,Integrative Genetics and Genomics Graduate Group, University of California, Davis, California, USA
| |
Collapse
|
41
|
Sun YH, Cui H, Song C, Shen JT, Zhuo X, Wang RH, Yu X, Ndamba R, Mu Q, Gu H, Wang D, Murthy GG, Li P, Liang F, Liu L, Tao Q, Wang Y, Orlowski S, Xu Q, Zhou H, Jagne J, Gokcumen O, Anthony N, Zhao X, Li XZ. Amniotes co-opt intrinsic genetic instability to protect germ-line genome integrity. Nat Commun 2023; 14:812. [PMID: 36781861 PMCID: PMC9925758 DOI: 10.1038/s41467-023-36354-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Accepted: 01/27/2023] [Indexed: 02/15/2023] Open
Abstract
Unlike PIWI-interacting RNA (piRNA) in other species that mostly target transposable elements (TEs), >80% of piRNAs in adult mammalian testes lack obvious targets. However, mammalian piRNA sequences and piRNA-producing loci evolve more rapidly than the rest of the genome for unknown reasons. Here, through comparative studies of chickens, ducks, mice, and humans, as well as long-read nanopore sequencing on diverse chicken breeds, we find that piRNA loci across amniotes experience: (1) a high local mutation rate of structural variations (SVs, mutations ≥ 50 bp in size); (2) positive selection to suppress young and actively mobilizing TEs commencing at the pachytene stage of meiosis during germ cell development; and (3) negative selection to purge deleterious SV hotspots. Our results indicate that genetic instability at pachytene piRNA loci, while producing certain pathogenic SVs, also protects genome integrity against TE mobilization by driving the formation of rapid-evolving piRNA sequences.
Collapse
Affiliation(s)
- Yu H Sun
- Center for RNA Biology: From Genome to Therapeutics, Department of Biochemistry and Biophysics, University of Rochester Medical Center, Rochester, NY, 14642, USA
| | - Hongxiao Cui
- College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, 712100, China
| | - Chi Song
- College of Public Health, Division of Biostatistics, The Ohio State University, Columbus, OH, 43210, USA
| | - Jiafei Teng Shen
- International Institutes of Medicine, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, Zhejiang, 322000, China
| | - Xiaoyu Zhuo
- Department of Genetics, The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO, 63110, USA
| | - Ruoqiao Huiyi Wang
- Center for RNA Biology: From Genome to Therapeutics, Department of Biochemistry and Biophysics, University of Rochester Medical Center, Rochester, NY, 14642, USA
- College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, 712100, China
| | - Xiaohui Yu
- College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, 712100, China
| | - Rudo Ndamba
- Center for RNA Biology: From Genome to Therapeutics, Department of Biochemistry and Biophysics, University of Rochester Medical Center, Rochester, NY, 14642, USA
| | - Qian Mu
- Center for RNA Biology: From Genome to Therapeutics, Department of Biochemistry and Biophysics, University of Rochester Medical Center, Rochester, NY, 14642, USA
| | - Hanwen Gu
- Center for RNA Biology: From Genome to Therapeutics, Department of Biochemistry and Biophysics, University of Rochester Medical Center, Rochester, NY, 14642, USA
| | - Duolin Wang
- Center for RNA Biology: From Genome to Therapeutics, Department of Biochemistry and Biophysics, University of Rochester Medical Center, Rochester, NY, 14642, USA
| | - Gayathri Guru Murthy
- Center for RNA Biology: From Genome to Therapeutics, Department of Biochemistry and Biophysics, University of Rochester Medical Center, Rochester, NY, 14642, USA
| | - Pidong Li
- Grandomics Biosciences Co., Ltd, Beijing, 102206, China
| | - Fan Liang
- Grandomics Biosciences Co., Ltd, Beijing, 102206, China
| | - Lei Liu
- Grandomics Biosciences Co., Ltd, Beijing, 102206, China
| | - Qing Tao
- Grandomics Biosciences Co., Ltd, Beijing, 102206, China
| | - Ying Wang
- Department of Animal Science, University of California, Davis, CA, 95616, USA
| | - Sara Orlowski
- Department of Poultry Science, University of Arkansas, Fayetteville, AR, 72701, USA
| | - Qi Xu
- Department of Animal Science, McGill University, Quebec, H9X 3V9, Canada
| | - Huaijun Zhou
- Department of Animal Science, University of California, Davis, CA, 95616, USA
| | - Jarra Jagne
- Animal Health Diagnostic Center, Cornell University College of Veterinary Medicine, Ithaca, NY, 14850, USA
| | - Omer Gokcumen
- Department of Biological Sciences, University at Buffalo, State University of New York, Buffalo, NY, 14260, USA
| | - Nick Anthony
- Department of Poultry Science, University of Arkansas, Fayetteville, AR, 72701, USA
| | - Xin Zhao
- Department of Animal Science, McGill University, Quebec, H9X 3V9, Canada.
| | - Xin Zhiguo Li
- Center for RNA Biology: From Genome to Therapeutics, Department of Biochemistry and Biophysics, University of Rochester Medical Center, Rochester, NY, 14642, USA.
| |
Collapse
|
42
|
Raca G, Astbury C, Behlmann A, De Castro MJ, Hickey SE, Karaca E, Lowther C, Riggs ER, Seifert BA, Thorland EC, Deignan JL; ACMG Laboratory Quality Assurance Committee. Electronic address: documents@acmg.net. Points to consider in the detection of germline structural variants using next-generation sequencing: A statement of the American College of Medical Genetics and Genomics (ACMG). Genet Med 2023; 25:100316. [PMID: 36507974 DOI: 10.1016/j.gim.2022.09.017] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 09/29/2022] [Accepted: 09/30/2022] [Indexed: 12/14/2022] Open
|
43
|
Xu L, Wang X, Lu X, Liang F, Liu Z, Zhang H, Li X, Tian S, Wang L, Wang Z. Long-read sequencing identifies novel structural variations in colorectal cancer. PLoS Genet 2023; 19:e1010514. [PMID: 36812239 DOI: 10.1371/journal.pgen.1010514] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2021] [Revised: 03/14/2023] [Accepted: 11/08/2022] [Indexed: 02/24/2023] Open
Abstract
Structural variations (SVs) are a key type of cancer genomic alterations, contributing to oncogenesis and progression of many cancers, including colorectal cancer (CRC). However, SVs in CRC remain difficult to be reliably detected due to limited SV-detection capacity of the commonly used short-read sequencing. This study investigated the somatic SVs in 21 pairs of CRC samples by Nanopore whole-genome long-read sequencing. 5200 novel somatic SVs from 21 CRC patients (494 SVs / patient) were identified. A 4.9-Mbp long inversion that silences APC expression (confirmed by RNA-seq) and an 11.2-kbp inversion that structurally alters CFTR were identified. Two novel gene fusions that might functionally impact the oncogene RNF38 and the tumor-suppressor SMAD3 were detected. RNF38 fusion possesses metastasis-promoting ability confirmed by in vitro migration and invasion assay, and in vivo metastasis experiments. This work highlighted the various applications of long-read sequencing in cancer genome analysis, and shed new light on how somatic SVs structurally alter critical genes in CRC. The investigation on somatic SVs via nanopore sequencing revealed the potential of this genomic approach in facilitating precise diagnosis and personalized treatment of CRC.
Collapse
|
44
|
Jansen S, Vissers LELM, de Vries BBA. The Genetics of Intellectual Disability. Brain Sci 2023; 13. [PMID: 36831774 DOI: 10.3390/brainsci13020231] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Revised: 12/23/2022] [Accepted: 01/16/2023] [Indexed: 02/03/2023] Open
Abstract
Intellectual disability (ID) has a prevalence of ~2-3% in the general population, having a large societal impact. The underlying cause of ID is largely of genetic origin; however, identifying this genetic cause has in the past often led to long diagnostic Odysseys. Over the past decades, improvements in genetic diagnostic technologies and strategies have led to these causes being more and more detectable: from cytogenetic analysis in 1959, we moved in the first decade of the 21st century from genomic microarrays with a diagnostic yield of ~20% to next-generation sequencing platforms with a yield of up to 60%. In this review, we discuss these various developments, as well as their associated challenges and implications for the field of ID, which highlight the revolutionizing shift in clinical practice from a phenotype-first into genotype-first approach.
Collapse
|
45
|
Zhang S, Qu-Bie JZ, Feng MK, Qu-Bie AX, Huang Y, Zhang ZF, Yan XJ, Liu Y. Illuminating the biosynthesis pathway genes involved in bioactive specific monoterpene glycosides in Paeonia veitchii Lynch by a combination of sequencing platforms. BMC Genomics 2023; 24:45. [PMID: 36698081 PMCID: PMC9878870 DOI: 10.1186/s12864-023-09138-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Accepted: 01/16/2023] [Indexed: 01/27/2023] Open
Abstract
BACKGROUND Paeonia veitchii Lynch, a well-known herb from the Qinghai-Tibet Plateau south of the Himalayas, can synthesize specific monoterpene glycosides (PMGs) with multiple pharmacological activities, and its rhizome has become an indispensable ingredient in many clinical drugs. However, little is known about the molecular background of P. veitchii, especially the genes involved in the biosynthetic pathway of PMGs. RESULTS A corrective full-length transcriptome with 30,827 unigenes was generated by combining next-generation sequencing (NGS) and single-molecule real-time sequencing (SMRT) of six tissues (leaf, stem, petal, ovary, phloem and xylem). The enzymes terpene synthase (TPS), cytochrome P450 (CYP), UDP-glycosyltransferase (UGT), and BAHD acyltransferase, which participate in the biosynthesis of PMGs, were systematically characterized, and their functions related to PMG biosynthesis were analysed. With further insight into TPSs, CYPs, UGTs and BAHDs involved in PMG biosynthesis, the weighted gene coexpression network analysis (WGCNA) method was used to identify the relationships between these genes and PMGs. Finally, 8 TPSs, 22 CYPs, 7 UGTs, and 2 BAHD genes were obtained, and these putative genes were very likely to be involved in the biosynthesis of PMGs. In addition, the expression patterns of the putative genes and the accumulation of PMGs in tissues suggested that all tissues are capable of biosynthesizing PMGs and that aerial plant parts could also be used to extract PMGs. CONCLUSION We generated a large-scale transcriptome database across the major tissues in P. veitchii, providing valuable support for further research investigating P. veitchii and understanding the genetic information of plants from the Qinghai-Tibet Plateau. TPSs, CYPs, UGTs and BAHDs further contribute to a better understanding of the biology and complexity of PMGs in P. veitchii. Our study will help reveal the mechanisms underlying the biosynthesis pathway of these specific monoterpene glycosides and aid in the comprehensive utilization of this multifunctional plant.
Collapse
Affiliation(s)
- Shaoshan Zhang
- Tibetan Plateau Ethnic Medicinal Resources Protection and Utilization Key Laboratory of National Ethnic Affairs Commission of the People’s Republic of China, Chengdu, 610225 China ,Sichuan Provincial Qiang-Yi Medicinal Resources Protection and Utilization Technology and Engineering Laboratory, Chengdu, 610225 China
| | - Jun-zhang Qu-Bie
- grid.412723.10000 0004 0604 889XCollege of Pharmacy, Southwest Minzu University, Chengdu, 610041 China
| | - Ming-kang Feng
- grid.412723.10000 0004 0604 889XCollege of Pharmacy, Southwest Minzu University, Chengdu, 610041 China
| | - A-xiang Qu-Bie
- grid.412723.10000 0004 0604 889XCollege of Pharmacy, Southwest Minzu University, Chengdu, 610041 China
| | - Yanfei Huang
- Tibetan Plateau Ethnic Medicinal Resources Protection and Utilization Key Laboratory of National Ethnic Affairs Commission of the People’s Republic of China, Chengdu, 610225 China ,Sichuan Provincial Qiang-Yi Medicinal Resources Protection and Utilization Technology and Engineering Laboratory, Chengdu, 610225 China
| | - Zhi-feng Zhang
- Tibetan Plateau Ethnic Medicinal Resources Protection and Utilization Key Laboratory of National Ethnic Affairs Commission of the People’s Republic of China, Chengdu, 610225 China ,Sichuan Provincial Qiang-Yi Medicinal Resources Protection and Utilization Technology and Engineering Laboratory, Chengdu, 610225 China
| | - Xin-jia Yan
- Tibetan Plateau Ethnic Medicinal Resources Protection and Utilization Key Laboratory of National Ethnic Affairs Commission of the People’s Republic of China, Chengdu, 610225 China ,Sichuan Provincial Qiang-Yi Medicinal Resources Protection and Utilization Technology and Engineering Laboratory, Chengdu, 610225 China
| | - Yuan Liu
- Sichuan Provincial Qiang-Yi Medicinal Resources Protection and Utilization Technology and Engineering Laboratory, Chengdu, 610225 China
| |
Collapse
|
46
|
Chen Y, Wang AY, Barkley CA, Zhang Y, Zhao X, Gao M, Edmonds MD, Chong Z. Deciphering the exact breakpoints of structural variations using long sequencing reads with DeBreak. Nat Commun 2023; 14:283. [PMID: 36650186 DOI: 10.1038/s41467-023-35996-1] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Accepted: 01/11/2023] [Indexed: 01/19/2023] Open
Abstract
Long-read sequencing has demonstrated great potential for characterizing all types of structural variations (SVs). However, existing algorithms have insufficient sensitivity and precision. To address these limitations, we present DeBreak, a computational method for comprehensive and accurate SV discovery. Based on alignment results, DeBreak employs a density-based approach for clustering SV candidates together with a local de novo assembly approach for reconstructing long insertions. A partial order alignment algorithm ensures precise SV breakpoints with single base-pair resolution, and a k-means clustering method can report multi-allele SV events. DeBreak outperforms existing tools on both simulated and real long-read sequencing data from both PacBio and Nanopore platforms. An important application of DeBreak is analyzing cancer genomes for potentially tumor-driving SVs. DeBreak can also be used for supplementing whole-genome assembly-based SV discovery.
Collapse
|
47
|
Udine E, Jain A, van Blitterswijk M. Advances in sequencing technologies for amyotrophic lateral sclerosis research. Mol Neurodegener 2023; 18:4. [PMID: 36635726 PMCID: PMC9838075 DOI: 10.1186/s13024-022-00593-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Accepted: 12/23/2022] [Indexed: 01/14/2023] Open
Abstract
Amyotrophic lateral sclerosis (ALS) is caused by upper and lower motor neuron loss and has a fairly rapid disease progression, leading to fatality in an average of 2-5 years after symptom onset. Numerous genes have been implicated in this disease; however, many cases remain unexplained. Several technologies are being used to identify regions of interest and investigate candidate genes. Initial approaches to detect ALS genes include, among others, linkage analysis, Sanger sequencing, and genome-wide association studies. More recently, next-generation sequencing methods, such as whole-exome and whole-genome sequencing, have been introduced. While those methods have been particularly useful in discovering new ALS-linked genes, methodological advances are becoming increasingly important, especially given the complex genetics of ALS. Novel sequencing technologies, like long-read sequencing, are beginning to be used to uncover the contribution of repeat expansions and other types of structural variation, which may help explain missing heritability in ALS. In this review, we discuss how popular and/or upcoming methods are being used to discover ALS genes, highlighting emerging long-read sequencing platforms and their role in aiding our understanding of this challenging disease.
Collapse
Affiliation(s)
- Evan Udine
- grid.417467.70000 0004 0443 9942Department of Neuroscience, Mayo Clinic, 4500 San Pablo Road S, Jacksonville, FL 32224 USA ,grid.417467.70000 0004 0443 9942Mayo Clinic Graduate School of Biomedical Sciences, 4500 San Pablo Road S, Jacksonville, FL 32224 USA
| | - Angita Jain
- grid.417467.70000 0004 0443 9942Department of Neuroscience, Mayo Clinic, 4500 San Pablo Road S, Jacksonville, FL 32224 USA ,grid.417467.70000 0004 0443 9942Mayo Clinic Graduate School of Biomedical Sciences, 4500 San Pablo Road S, Jacksonville, FL 32224 USA ,grid.417467.70000 0004 0443 9942Center for Clinical and Translational Sciences, Mayo Clinic, 4500 San Pablo Road S, Jacksonville, FL 32224 USA
| | - Marka van Blitterswijk
- Department of Neuroscience, Mayo Clinic, 4500 San Pablo Road S, Jacksonville, FL, 32224, USA.
| |
Collapse
|
48
|
Wang Y, Yu J, Jiang M, Lei W, Zhang X, Tang H. Sequencing and Assembly of Polyploid Genomes. Methods Mol Biol 2023; 2545:429-458. [PMID: 36720827 DOI: 10.1007/978-1-0716-2561-3_23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Polyploidy has been observed throughout major eukaryotic clades and has played a vital role in the evolution of angiosperms. Recent polyploidizations often result in highly complex genome structures, posing challenges to genome assembly and phasing. Recent advances in sequencing technologies and genome assembly algorithms have enabled high-quality, near-complete chromosome-level assemblies of polyploid genomes. Advances in novel sequencing technologies include highly accurate single-molecule sequencing with HiFi reads, chromosome conformation capture with Hi-C technique, and linked reads sequencing. Additionally, new computational approaches have also significantly improved the precision and reliability of polyploid genome assembly and phasing, such as HiCanu, hifiasm, ALLHiC, and PolyGembler. Herein, we review recently published polyploid genomes and compare the various sequencing, assembly, and phasing approaches that are utilized in these genome studies. Finally, we anticipate that accurate and telomere-to-telomere chromosome-level assembly of polyploid genomes could ultimately become a routine procedure in the near future.
Collapse
Affiliation(s)
- Yibin Wang
- Center for Genomics and Biotechnology, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Key Laboratory of Genetics, Breeding and Multiple Utilization of Crops, Ministry of Education, College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Jiaxin Yu
- Center for Genomics and Biotechnology, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Key Laboratory of Genetics, Breeding and Multiple Utilization of Crops, Ministry of Education, College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Mengwei Jiang
- Center for Genomics and Biotechnology, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Key Laboratory of Genetics, Breeding and Multiple Utilization of Crops, Ministry of Education, College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Wenlong Lei
- Center for Genomics and Biotechnology, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Key Laboratory of Genetics, Breeding and Multiple Utilization of Crops, Ministry of Education, College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Xingtan Zhang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Haibao Tang
- Center for Genomics and Biotechnology, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Key Laboratory of Genetics, Breeding and Multiple Utilization of Crops, Ministry of Education, College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou, China
| |
Collapse
|
49
|
Wu S, Schmitz U. Single-cell and long-read sequencing to enhance modelling of splicing and cell-fate determination. Comput Struct Biotechnol J 2023; 21:2373-2380. [PMID: 37066125 PMCID: PMC10091034 DOI: 10.1016/j.csbj.2023.03.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Revised: 03/13/2023] [Accepted: 03/13/2023] [Indexed: 04/03/2023] Open
Abstract
Single-cell sequencing technologies have revolutionised the life sciences and biomedical research. Single-cell sequencing provides high-resolution data on cell heterogeneity, allowing high-fidelity cell type identification, and lineage tracking. Computational algorithms and mathematical models have been developed to make sense of the data, compensate for errors and simulate the biological processes, which has led to breakthroughs in our understanding of cell differentiation, cell-fate determination and tissue cell composition. The development of long-read (a.k.a. third-generation) sequencing technologies has produced powerful tools for investigating alternative splicing, isoform expression (at the RNA level), genome assembly and the detection of complex structural variants (at the DNA level). In this review, we provide an overview of the recent advancements in single-cell and long-read sequencing technologies, with a particular focus on the computational algorithms that help in correcting, analysing, and interpreting the resulting data. Additionally, we review some mathematical models that use single-cell and long-read sequencing data to study cell-fate determination and alternative splicing, respectively. Moreover, we highlight the emerging opportunities in modelling cell-fate determination that result from the combination of single-cell and long-read sequencing technologies.
Collapse
|
50
|
Lesack K, Mariene GM, Andersen EC, Wasmuth JD. Different structural variant prediction tools yield considerably different results in Caenorhabditis elegans. PLoS One 2022; 17:e0278424. [PMID: 36584177 PMCID: PMC9803319 DOI: 10.1371/journal.pone.0278424] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Accepted: 11/15/2022] [Indexed: 01/01/2023] Open
Abstract
The accurate characterization of structural variation is crucial for our understanding of how large chromosomal alterations affect phenotypic differences and contribute to genome evolution. Whole-genome sequencing is a popular approach for identifying structural variants, but the accuracy of popular tools remains unclear due to the limitations of existing benchmarks. Moreover, the performance of these tools for predicting variants in non-human genomes is less certain, as most tools were developed and benchmarked using data from the human genome. To evaluate the use of long-read data for the validation of short-read structural variant calls, the agreement between predictions from a short-read ensemble learning method and long-read tools were compared using real and simulated data from Caenorhabditis elegans. The results obtained from simulated data indicate that the best performing tool is contingent on the type and size of the variant, as well as the sequencing depth of coverage. These results also highlight the need for reference datasets generated from real data that can be used as 'ground truth' in benchmarks.
Collapse
Affiliation(s)
- Kyle Lesack
- Faculty of Veterinary Medicine, University of Calgary, Alberta, Canada
- Host-Parasite Interactions Research Training Network, University of Calgary, Alberta, Canada
| | - Grace M. Mariene
- Faculty of Veterinary Medicine, University of Calgary, Alberta, Canada
- Host-Parasite Interactions Research Training Network, University of Calgary, Alberta, Canada
| | - Erik C. Andersen
- Department of Molecular Biosciences, Northwestern University, Evanston, IL, United States of America
| | - James D. Wasmuth
- Faculty of Veterinary Medicine, University of Calgary, Alberta, Canada
- Host-Parasite Interactions Research Training Network, University of Calgary, Alberta, Canada
- * E-mail:
| |
Collapse
|