101
|
Wang Y, Yu J, Jiang M, Lei W, Zhang X, Tang H. Sequencing and Assembly of Polyploid Genomes. Methods Mol Biol 2023; 2545:429-458. [PMID: 36720827 DOI: 10.1007/978-1-0716-2561-3_23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Polyploidy has been observed throughout major eukaryotic clades and has played a vital role in the evolution of angiosperms. Recent polyploidizations often result in highly complex genome structures, posing challenges to genome assembly and phasing. Recent advances in sequencing technologies and genome assembly algorithms have enabled high-quality, near-complete chromosome-level assemblies of polyploid genomes. Advances in novel sequencing technologies include highly accurate single-molecule sequencing with HiFi reads, chromosome conformation capture with Hi-C technique, and linked reads sequencing. Additionally, new computational approaches have also significantly improved the precision and reliability of polyploid genome assembly and phasing, such as HiCanu, hifiasm, ALLHiC, and PolyGembler. Herein, we review recently published polyploid genomes and compare the various sequencing, assembly, and phasing approaches that are utilized in these genome studies. Finally, we anticipate that accurate and telomere-to-telomere chromosome-level assembly of polyploid genomes could ultimately become a routine procedure in the near future.
Collapse
Affiliation(s)
- Yibin Wang
- Center for Genomics and Biotechnology, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Key Laboratory of Genetics, Breeding and Multiple Utilization of Crops, Ministry of Education, College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Jiaxin Yu
- Center for Genomics and Biotechnology, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Key Laboratory of Genetics, Breeding and Multiple Utilization of Crops, Ministry of Education, College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Mengwei Jiang
- Center for Genomics and Biotechnology, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Key Laboratory of Genetics, Breeding and Multiple Utilization of Crops, Ministry of Education, College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Wenlong Lei
- Center for Genomics and Biotechnology, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Key Laboratory of Genetics, Breeding and Multiple Utilization of Crops, Ministry of Education, College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Xingtan Zhang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Haibao Tang
- Center for Genomics and Biotechnology, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Key Laboratory of Genetics, Breeding and Multiple Utilization of Crops, Ministry of Education, College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou, China
| |
Collapse
|
102
|
Ding X, Han J, Van Winkle LS, Zhang QY. Detection of Transgene Location in the CYP2A13/2B6/2F1-transgenic Mouse Model using Optical Genome Mapping Technology. Drug Metab Dispos 2023; 51:46-53. [PMID: 36273825 PMCID: PMC9832375 DOI: 10.1124/dmd.122.001090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 10/01/2022] [Accepted: 10/04/2022] [Indexed: 01/14/2023] Open
Abstract
Most transgenic mouse models are generated through random integration of the transgene. The location of the transgene provides valuable information for assessing potential effects of the transgenesis on the host and for designing genotyping protocols that can amplify across the integration site, but it is challenging to identify. Here, we report the successful utility of optical genome mapping technology to identify the transgene insertion site in a CYP2A13/2B6/2F1-transgenic mouse model, which produces three human cytochrome P450 (P450) enzymes (CYP2A13, CYP2B6, and CYP2F1) that are encoded by neighboring genes on human chromosome 19. These enzymes metabolize many drugs, respiratory toxicants, and chemical carcinogens. Initial efforts to identify candidate insertion sites by whole genome sequencing was unsuccessful, apparently because the transgene is located in a region of the mouse genome that contains highly repetitive sequences. Subsequent utility of the optical genome mapping approach, which compares genome-wide marker distribution between the transgenic mouse genome and a reference mouse (GRCm38) or human (GRCh38) genome, localized the insertion site to mouse chromosome 14, between two marker positions at 4451324 base pair and 4485032 base pair. A transgene-mouse genome junction sequence was further identified through long-polymerase chain reaction amplification and DNA sequencing at GRCm38 Chr.14:4484726. The transgene insertion (∼2.4 megabase pair) contained 5-7 copies of the human transgenes, which replaced a 26.9-33.4 kilobase pair mouse genomic region, including exons 1-4 of Gm3182, a predicted and highly redundant gene. Finally, the sequencing results enabled the design of a new genotyping protocol that can distinguish between hemizygous and homozygous CYP2A13/2B6/2F1-transgenic mice. SIGNIFICANCE STATEMENT: This study characterizes the genomic structure of, and provides a new genotyping method for, a transgenic mouse model that expresses three human P450 enzymes, CYP2A13, CYP2B6, and CYP2F1, that are important in xenobiotic metabolism and toxicity. The demonstrated success in applying the optical genome mapping technology for identification of transgene insertion sites should encourage others to do the same for other transgenic models generated through random integration, including most of the currently available human P450 transgenic mouse models.
Collapse
Affiliation(s)
- Xinxin Ding
- Department of Pharmacology and Toxicology, College of Pharmacy, University of Arizona, Tucson, Arizona (X.D., J.H., Q.-Y.Z.) and Center for Health and the Environment and Department of Anatomy Physiology and Cell Biology, School of Veterinary Medicine, UC Davis, Davis, California (L.S.V.W.)
| | - John Han
- Department of Pharmacology and Toxicology, College of Pharmacy, University of Arizona, Tucson, Arizona (X.D., J.H., Q.-Y.Z.) and Center for Health and the Environment and Department of Anatomy Physiology and Cell Biology, School of Veterinary Medicine, UC Davis, Davis, California (L.S.V.W.)
| | - Laura S Van Winkle
- Department of Pharmacology and Toxicology, College of Pharmacy, University of Arizona, Tucson, Arizona (X.D., J.H., Q.-Y.Z.) and Center for Health and the Environment and Department of Anatomy Physiology and Cell Biology, School of Veterinary Medicine, UC Davis, Davis, California (L.S.V.W.)
| | - Qing-Yu Zhang
- Department of Pharmacology and Toxicology, College of Pharmacy, University of Arizona, Tucson, Arizona (X.D., J.H., Q.-Y.Z.) and Center for Health and the Environment and Department of Anatomy Physiology and Cell Biology, School of Veterinary Medicine, UC Davis, Davis, California (L.S.V.W.)
| |
Collapse
|
103
|
Colin E, Duffourd Y, Chevarin M, Tisserant E, Verdez S, Paccaud J, Bruel AL, Tran Mau-Them F, Denommé-Pichon AS, Thevenon J, Safraou H, Besnard T, Goldenberg A, Cogné B, Isidor B, Delanne J, Sorlin A, Moutton S, Fradin M, Dubourg C, Gorce M, Bonneau D, El Chehadeh S, Debray FG, Doco-Fenzy M, Uguen K, Chatron N, Aral B, Marle N, Kuentz P, Boland A, Olaso R, Deleuze JF, Sanlaville D, Callier P, Philippe C, Thauvin-Robinet C, Faivre L, Vitobello A. Stepwise use of genomics and transcriptomics technologies increases diagnostic yield in Mendelian disorders. Front Cell Dev Biol 2023; 11:1021920. [PMID: 36926521 PMCID: PMC10011630 DOI: 10.3389/fcell.2023.1021920] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Accepted: 01/30/2023] [Indexed: 03/08/2023] Open
Abstract
Purpose: Multi-omics offer worthwhile and increasingly accessible technologies to diagnostic laboratories seeking potential second-tier strategies to help patients with unresolved rare diseases, especially patients clinically diagnosed with a rare OMIM (Online Mendelian Inheritance in Man) disease. However, no consensus exists regarding the optimal diagnostic care pathway to adopt after negative results with standard approaches. Methods: In 15 unsolved individuals clinically diagnosed with recognizable OMIM diseases but with negative or inconclusive first-line genetic results, we explored the utility of a multi-step approach using several novel omics technologies to establish a molecular diagnosis. Inclusion criteria included a clinical autosomal recessive disease diagnosis and single heterozygous pathogenic variant in the gene of interest identified by first-line analysis (60%-9/15) or a clinical diagnosis of an X-linked recessive or autosomal dominant disease with no causative variant identified (40%-6/15). We performed a multi-step analysis involving short-read genome sequencing (srGS) and complementary approaches such as mRNA sequencing (mRNA-seq), long-read genome sequencing (lrG), or optical genome mapping (oGM) selected according to the outcome of the GS analysis. Results: SrGS alone or in combination with additional genomic and/or transcriptomic technologies allowed us to resolve 87% of individuals by identifying single nucleotide variants/indels missed by first-line targeted tests, identifying variants affecting transcription, or structural variants sometimes requiring lrGS or oGM for their characterization. Conclusion: Hypothesis-driven implementation of combined omics technologies is particularly effective in identifying molecular etiologies. In this study, we detail our experience of the implementation of genomics and transcriptomics technologies in a pilot cohort of previously investigated patients with a typical clinical diagnosis without molecular etiology.
Collapse
Affiliation(s)
- Estelle Colin
- UFR Des Sciences de Santé, INSERM-Université de Bourgogne UMR1231 GAD "Génétique des Anomalies du Développement", FHUTRANSLAD, Dijon, France.,Service de Génétique Médicale, CHU d'Angers, Angers, France
| | - Yannis Duffourd
- UFR Des Sciences de Santé, INSERM-Université de Bourgogne UMR1231 GAD "Génétique des Anomalies du Développement", FHUTRANSLAD, Dijon, France
| | - Martin Chevarin
- UFR Des Sciences de Santé, INSERM-Université de Bourgogne UMR1231 GAD "Génétique des Anomalies du Développement", FHUTRANSLAD, Dijon, France.,Unité Fonctionnelle Innovation en Diagnostic Génomique des Maladies Rares, FHU-TRANSLAD, CHU Dijon Bourgogne, Dijon, France
| | - Emilie Tisserant
- UFR Des Sciences de Santé, INSERM-Université de Bourgogne UMR1231 GAD "Génétique des Anomalies du Développement", FHUTRANSLAD, Dijon, France
| | - Simon Verdez
- UFR Des Sciences de Santé, INSERM-Université de Bourgogne UMR1231 GAD "Génétique des Anomalies du Développement", FHUTRANSLAD, Dijon, France
| | - Julien Paccaud
- UFR Des Sciences de Santé, INSERM-Université de Bourgogne UMR1231 GAD "Génétique des Anomalies du Développement", FHUTRANSLAD, Dijon, France
| | - Ange-Line Bruel
- UFR Des Sciences de Santé, INSERM-Université de Bourgogne UMR1231 GAD "Génétique des Anomalies du Développement", FHUTRANSLAD, Dijon, France.,Unité Fonctionnelle Innovation en Diagnostic Génomique des Maladies Rares, FHU-TRANSLAD, CHU Dijon Bourgogne, Dijon, France
| | - Frédéric Tran Mau-Them
- UFR Des Sciences de Santé, INSERM-Université de Bourgogne UMR1231 GAD "Génétique des Anomalies du Développement", FHUTRANSLAD, Dijon, France.,Unité Fonctionnelle Innovation en Diagnostic Génomique des Maladies Rares, FHU-TRANSLAD, CHU Dijon Bourgogne, Dijon, France
| | - Anne-Sophie Denommé-Pichon
- UFR Des Sciences de Santé, INSERM-Université de Bourgogne UMR1231 GAD "Génétique des Anomalies du Développement", FHUTRANSLAD, Dijon, France.,Unité Fonctionnelle Innovation en Diagnostic Génomique des Maladies Rares, FHU-TRANSLAD, CHU Dijon Bourgogne, Dijon, France
| | - Julien Thevenon
- UFR Des Sciences de Santé, INSERM-Université de Bourgogne UMR1231 GAD "Génétique des Anomalies du Développement", FHUTRANSLAD, Dijon, France
| | - Hana Safraou
- UFR Des Sciences de Santé, INSERM-Université de Bourgogne UMR1231 GAD "Génétique des Anomalies du Développement", FHUTRANSLAD, Dijon, France.,Unité Fonctionnelle Innovation en Diagnostic Génomique des Maladies Rares, FHU-TRANSLAD, CHU Dijon Bourgogne, Dijon, France
| | - Thomas Besnard
- Service de Génétique Médicale, Nantes Université, CHU Nantes, Nantes, France.,CNRS, INSERM, L'institut du thorax, Nantes Université, CHU Nantes, Nantes, France
| | - Alice Goldenberg
- Department of Genetics and Reference Center for Developmental Disorders, Normandy Center for Genomic and Personalized Medicine, Rouen University Hospital, Rouen, France.,Normandie Univ, UNIROUEN, Inserm U1245, Rouen, France
| | - Benjamin Cogné
- Service de Génétique Médicale, Nantes Université, CHU Nantes, Nantes, France.,CNRS, INSERM, L'institut du thorax, Nantes Université, CHU Nantes, Nantes, France
| | - Bertrand Isidor
- Service de Génétique Médicale, CHU de Nantes, Nantes, France
| | - Julian Delanne
- UFR Des Sciences de Santé, INSERM-Université de Bourgogne UMR1231 GAD "Génétique des Anomalies du Développement", FHUTRANSLAD, Dijon, France.,Centre de Génétique et Centre de référence "Anomalies du Développement et Syndromes Malformatifs", Hôpital d'Enfants, Centre Hospitalier Universitaire de Dijon, Dijon, France
| | - Arthur Sorlin
- UFR Des Sciences de Santé, INSERM-Université de Bourgogne UMR1231 GAD "Génétique des Anomalies du Développement", FHUTRANSLAD, Dijon, France.,Centre de Génétique et Centre de référence "Anomalies du Développement et Syndromes Malformatifs", Hôpital d'Enfants, Centre Hospitalier Universitaire de Dijon, Dijon, France
| | - Sébastien Moutton
- UFR Des Sciences de Santé, INSERM-Université de Bourgogne UMR1231 GAD "Génétique des Anomalies du Développement", FHUTRANSLAD, Dijon, France.,Centre de Génétique et Centre de référence "Anomalies du Développement et Syndromes Malformatifs", Hôpital d'Enfants, Centre Hospitalier Universitaire de Dijon, Dijon, France
| | - Mélanie Fradin
- CHU Rennes, Service de Génétique Clinique, Centre de Référence Maladies Rares, CLAD-Ouest, Rennes, France
| | - Christèle Dubourg
- Service de Génétique Moléculaire et Génomique, CHU Rennes, Rennes, France.,Univ Rennes, CNRS, Institut de Genetique et Developpement de Rennes, UMR 6290, Rennes, France
| | - Magali Gorce
- Service de Génétique Médicale, CHU d'Angers, Angers, France
| | | | - Salima El Chehadeh
- Service de Génétique Médicale, Hôpital de Hautepierre, CHU Strasbourg, Strasbourg, France
| | | | - Martine Doco-Fenzy
- Medical School IFR53, EA3801, Université de Reims Champagne-Ardenne, Reims, France.,Service de Génétique, CHU Reims, Reims, France
| | - Kevin Uguen
- Department of Genetics and Reference Center for Developmental Disorders, Lyon University Hospital, Groupement Hospitalier Est, Hospices Civils de Lyon, Lyon, France.,CHU Brest, Inserm, Univ Brest, EFS, UMR 1078, GGB, Brest, France
| | - Nicolas Chatron
- Department of Genetics and Reference Center for Developmental Disorders, Lyon University Hospital, Groupement Hospitalier Est, Hospices Civils de Lyon, Lyon, France
| | - Bernard Aral
- Laboratoire de Génétique Chromosomique et Moléculaire, Pôle Biologie, CHU de Dijon, Dijon, France
| | - Nathalie Marle
- Laboratoire de Génétique Chromosomique et Moléculaire, Pôle Biologie, CHU de Dijon, Dijon, France
| | - Paul Kuentz
- UFR Des Sciences de Santé, INSERM-Université de Bourgogne UMR1231 GAD "Génétique des Anomalies du Développement", FHUTRANSLAD, Dijon, France.,Oncobiologie Génétique Bioinformatique, PCBio, Centre Hospitalier Universitaire de Besançon, Besançon, France
| | - Anne Boland
- Université Paris-Saclay, CEA, Centre National de Recherche en Génomique Humaine (CNRGH), Evry, France
| | - Robert Olaso
- Université Paris-Saclay, CEA, Centre National de Recherche en Génomique Humaine (CNRGH), Evry, France.,LabEx GENMED (Medical Genomics), Dijon, France
| | - Jean-François Deleuze
- Université Paris-Saclay, CEA, Centre National de Recherche en Génomique Humaine (CNRGH), Evry, France.,LabEx GENMED (Medical Genomics), Dijon, France
| | - Damien Sanlaville
- Department of Genetics and Reference Center for Developmental Disorders, Lyon University Hospital, Groupement Hospitalier Est, Hospices Civils de Lyon, Lyon, France
| | - Patrick Callier
- UFR Des Sciences de Santé, INSERM-Université de Bourgogne UMR1231 GAD "Génétique des Anomalies du Développement", FHUTRANSLAD, Dijon, France.,Laboratoire de Génétique Chromosomique et Moléculaire, Pôle Biologie, CHU de Dijon, Dijon, France
| | - Christophe Philippe
- UFR Des Sciences de Santé, INSERM-Université de Bourgogne UMR1231 GAD "Génétique des Anomalies du Développement", FHUTRANSLAD, Dijon, France.,Unité Fonctionnelle Innovation en Diagnostic Génomique des Maladies Rares, FHU-TRANSLAD, CHU Dijon Bourgogne, Dijon, France
| | - Christel Thauvin-Robinet
- UFR Des Sciences de Santé, INSERM-Université de Bourgogne UMR1231 GAD "Génétique des Anomalies du Développement", FHUTRANSLAD, Dijon, France.,Unité Fonctionnelle Innovation en Diagnostic Génomique des Maladies Rares, FHU-TRANSLAD, CHU Dijon Bourgogne, Dijon, France.,Centre de Référence Maladies Rares "Déficiences Intellectuelles de Causes Rares", Centre de Génétique, FHU-TRANSLAD, CHU Dijon Bourgogne, Dijon, France
| | - Laurence Faivre
- UFR Des Sciences de Santé, INSERM-Université de Bourgogne UMR1231 GAD "Génétique des Anomalies du Développement", FHUTRANSLAD, Dijon, France.,Centre de Génétique et Centre de référence "Anomalies du Développement et Syndromes Malformatifs", Hôpital d'Enfants, Centre Hospitalier Universitaire de Dijon, Dijon, France
| | - Antonio Vitobello
- UFR Des Sciences de Santé, INSERM-Université de Bourgogne UMR1231 GAD "Génétique des Anomalies du Développement", FHUTRANSLAD, Dijon, France.,Unité Fonctionnelle Innovation en Diagnostic Génomique des Maladies Rares, FHU-TRANSLAD, CHU Dijon Bourgogne, Dijon, France
| |
Collapse
|
104
|
Dylus D, Altenhoff A, Majidian S, Sedlazeck FJ, Dessimoz C. Read2Tree: scalable and accurate phylogenetic trees from raw reads. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2022:2022.04.18.488678. [PMID: 36561179 PMCID: PMC9774205 DOI: 10.1101/2022.04.18.488678] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
The inference of phylogenetic trees is foundational to biology. However, state-of-the-art phylogenomics requires running complex pipelines, at significant computational and labour costs, with additional constraints in sequencing coverage, assembly and annotation quality. To overcome these challenges, we present Read2Tree, which directly processes raw sequencing reads into groups of corresponding genes. In a benchmark encompassing a broad variety of datasets, our assembly-free approach was 10-100x faster than conventional approaches, and in most cases more accurate-the exception being when sequencing coverage was high and reference species very distant. To illustrate the broad applicability of the tool, we reconstructed a yeast tree of life of 435 species spanning 590 million years of evolution. Applied to Coronaviridae samples, Read2Tree accurately classified highly diverse animal samples and near-identical SARS-CoV-2 sequences on a single tree-thereby exhibiting remarkable breadth and depth. The speed, accuracy, and versatility of Read2Tree enables comparative genomics at scale.
Collapse
Affiliation(s)
- David Dylus
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
- present address: F. Hoffmann-La Roche Ltd, Immunology, Infectious Disease, and Ophthalmology (I2O), Roche Pharmaceutical Research and Early Development (pRED), Basel, 4070, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Adrian Altenhoff
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computer Science, ETH, 8092 Zurich, Switzerland
| | - Sina Majidian
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
- Department of Computer Science, Rice University, Houston, TX, 77005, USA
| | - Christophe Dessimoz
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computer Science, University College London, London WC1E 6BT, UK
- Centre for Life’s Origins and Evolution, Department of Genetics, Evolution and Environment, University College London, London WC1E, UK
| |
Collapse
|
105
|
Li Q, Yan B, Lam TW, Luo R. Assembly-free discovery of human novel sequences using long reads. DNA Res 2022; 29:dsac039. [PMID: 36308393 PMCID: PMC9700288 DOI: 10.1093/dnares/dsac039] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2022] [Revised: 10/19/2022] [Accepted: 10/27/2022] [Indexed: 09/10/2024] Open
Abstract
DNA sequences that are absent in the human reference genome are classified as novel sequences. The discovery of these missed sequences is crucial for exploring the genomic diversity of populations and understanding the genetic basis of human diseases. However, various DNA lengths of reads generated from different sequencing technologies can significantly affect the results of novel sequences. In this work, we designed an assembly-free novel sequence (AF-NS) approach to identify novel sequences from Oxford Nanopore Technology long reads. Among the newly detected sequences using AF-NS, more than 95% were omitted from those using long-read assemblers and 85% were not present in short reads of Illumina. We identified the common novel sequences among all the samples and revealed their association with the binding motifs of transcription factors. Regarding the placements of the novel sequences, we found about 70% enriched in repeat regions and generated 430 for one specific subpopulation that might be related to their evolution. Our study demonstrates the advance of the assembly-free approach to capture more novel sequences over other assembler based methods. Combining the long-read data with powerful analytical methods can be a robust way to improve the completeness of novel sequences.
Collapse
Affiliation(s)
- Qiuhui Li
- Department of Computer Science, The University of Hong Kong, Hong Kong, China
| | - Bin Yan
- Department of Computer Science, The University of Hong Kong, Hong Kong, China
| | - Tak-Wah Lam
- Department of Computer Science, The University of Hong Kong, Hong Kong, China
| | - Ruibang Luo
- Department of Computer Science, The University of Hong Kong, Hong Kong, China
| |
Collapse
|
106
|
Ma W, Rovatsos M. Sex chromosome evolution: The remarkable diversity in the evolutionary rates and mechanisms. J Evol Biol 2022; 35:1581-1588. [DOI: 10.1111/jeb.14119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Revised: 10/21/2022] [Accepted: 10/24/2022] [Indexed: 12/03/2022]
Affiliation(s)
- Wen‐Juan Ma
- Department of Molecular Biosciences University of Kansas Lawrence Kansas USA
| | | |
Collapse
|
107
|
Ono Y, Hamada M, Asai K. PBSIM3: a simulator for all types of PacBio and ONT long reads. NAR Genom Bioinform 2022; 4:lqac092. [PMID: 36465498 PMCID: PMC9713900 DOI: 10.1093/nargab/lqac092] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2022] [Revised: 11/02/2022] [Accepted: 11/12/2022] [Indexed: 12/03/2022] Open
Abstract
Long-read sequencers, such as Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) sequencers, have improved their read length and accuracy, thereby opening up unprecedented research. Many tools and algorithms have been developed to analyze long reads, and rapid progress in PacBio and ONT has further accelerated their development. Together with the development of high-throughput sequencing technologies and their analysis tools, many read simulators have been developed and effectively utilized. PBSIM is one of the popular long-read simulators. In this study, we developed PBSIM3 with three new functions: error models for long reads, multi-pass sequencing for high-fidelity read simulation and transcriptome sequencing simulation. Therefore, PBSIM3 is now able to meet a wide range of long-read simulation requirements.
Collapse
Affiliation(s)
- Yukiteru Ono
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa 277-8561, Japan
| | - Michiaki Hamada
- Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University, 55N-06-10, 3-4-1, Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), 63-520, 3-4-1, Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
- Institute for Medical-Oriented Structural Biology, Waseda University, 2-2, Wakamatsu-cho, Shinjuku-ku, Tokyo 162-8480, Japan
- Graduate School of Medicine, Nippon Medical School, 1-1-5, Sendagi, Bunkyo-ku, Tokyo, 113-8602, Japan
| | - Kiyoshi Asai
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa 277-8561, Japan
- Artificial Intelligence Research Center (AIRC), National Institute of Advanced Industrial Science and Technology (AIST), 2-3-26, Aomi, Koto-ku, 135-0064 Tokyo, Japan
| |
Collapse
|
108
|
Rayamajhi N, Cheng CHC, Catchen JM. Evaluating Illumina-, Nanopore-, and PacBio-based genome assembly strategies with the bald notothen, Trematomus borchgrevinki. G3 (BETHESDA, MD.) 2022; 12:jkac192. [PMID: 35904764 PMCID: PMC9635638 DOI: 10.1093/g3journal/jkac192] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Accepted: 07/18/2022] [Indexed: 11/16/2022]
Abstract
For any genome-based research, a robust genome assembly is required. De novo assembly strategies have evolved with changes in DNA sequencing technologies and have been through at least 3 phases: (1) short-read only, (2) short- and long-read hybrid, and (3) long-read only assemblies. Each of the phases has its own error model. We hypothesized that hidden short-read scaffolding errors and erroneous long-read contigs degrade the quality of short- and long-read hybrid assemblies. We assembled the genome of Trematomus borchgrevinki from data generated during each of the 3 phases and assessed the quality problems we encountered. We developed strategies such as k-mer-assembled region replacement, parameter optimization, and long-read sampling to address the error models. We demonstrated that a k-mer-based strategy improved short-read assemblies as measured by Benchmarking Universal Single-Copy Ortholog while mate-pair libraries introduced hidden scaffolding errors and perturbed Benchmarking Universal Single-Copy Ortholog scores. Furthermore, we found that although hybrid assemblies can generate higher contiguity they tend to suffer from lower quality. In addition, we found long-read-only assemblies can be optimized for contiguity by subsampling length-restricted raw reads. Our results indicate that long-read contig assembly is the current best choice and that assemblies from phase I and phase II were of lower quality.
Collapse
Affiliation(s)
- Niraj Rayamajhi
- Department of Evolution, Ecology, and Behavior, University of Illinois, Urbana-Champaign, Champaign, IL 61801, USA
| | - Chi-Hing Christina Cheng
- Department of Evolution, Ecology, and Behavior, University of Illinois, Urbana-Champaign, Champaign, IL 61801, USA
| | - Julian M Catchen
- Department of Evolution, Ecology, and Behavior, University of Illinois, Urbana-Champaign, Champaign, IL 61801, USA
| |
Collapse
|
109
|
DNA read count calibration for single-molecule, long-read sequencing. Sci Rep 2022; 12:17257. [PMID: 36319642 PMCID: PMC9626564 DOI: 10.1038/s41598-022-21606-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Accepted: 09/29/2022] [Indexed: 11/17/2022] Open
Abstract
There are many applications in which quantitative information about DNA mixtures with different molecular lengths is important. Gene therapy vectors are much longer than can be sequenced individually via short-read NGS. However, vector preparations may contain smaller DNAs that behave differently during sequencing. We have used two library preparations each for Pacific Biosystems (PacBio) and Oxford Nanopore Technologies NGS to determine their suitability for quantitative assessment of varying sized DNAs. Equimolar length standards were generated from E. coli genomic DNA. Both PacBio library preparations provided a consistent length dependence though with a complex pattern. This method is sufficiently sensitive that differences in genomic copy number between DNA from E. coli grown in exponential and stationary phase conditions could be detected. The transposase-based Oxford Nanopore library preparation provided a predictable length dependence, but the random sequence starts caused the loss of original length information. The ligation-based approach retained length information but read frequency was more variable. Modeling of E. coli versus lambda read frequency via cubic spline smoothing showed that the shorter genome could be used as a suitable internal spike-in for DNAs in the 200 bp to 10 kb range, allowing meaningful QC to be carried out with AAV preparations.
Collapse
|
110
|
Castaldi PJ, Abood A, Farber CR, Sheynkman GM. Bridging the splicing gap in human genetics with long-read RNA sequencing: finding the protein isoform drivers of disease. Hum Mol Genet 2022; 31:R123-R136. [PMID: 35960994 PMCID: PMC9585682 DOI: 10.1093/hmg/ddac196] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 08/08/2022] [Accepted: 08/09/2022] [Indexed: 02/04/2023] Open
Abstract
Aberrant splicing underlies many human diseases, including cancer, cardiovascular diseases and neurological disorders. Genome-wide mapping of splicing quantitative trait loci (sQTLs) has shown that genetic regulation of alternative splicing is widespread. However, identification of the corresponding isoform or protein products associated with disease-associated sQTLs is challenging with short-read RNA-seq, which cannot precisely characterize full-length transcript isoforms. Furthermore, contemporary sQTL interpretation often relies on reference transcript annotations, which are incomplete. Solutions to these issues may be found through integration of newly emerging long-read sequencing technologies. Long-read sequencing offers the capability to sequence full-length mRNA transcripts and, in some cases, to link sQTLs to transcript isoforms containing disease-relevant protein alterations. Here, we provide an overview of sQTL mapping approaches, the use of long-read sequencing to characterize sQTL effects on isoforms, the linkage of RNA isoforms to protein-level functions and comment on future directions in the field. Based on recent progress, long-read RNA sequencing promises to be part of the human disease genetics toolkit to discover and treat protein isoforms causing rare and complex diseases.
Collapse
Affiliation(s)
- Peter J Castaldi
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, MA 02115, USA
- Division of General Medicine and Primary Care, Department of Medicine, Brigham and Women’s Hospital, Boston, MA 02115, USA
| | - Abdullah Abood
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
| | - Charles R Farber
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
- Department of Public Health Sciences, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
| | - Gloria M Sheynkman
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA 22903, USA
- UVA Comprehensive Cancer Center, University of Virginia, Charlottesville, VA 22903, USA
| |
Collapse
|
111
|
Sanford Kobayashi E, Batalov S, Wenger AM, Lambert C, Dhillon H, Hall RJ, Baybayan P, Ding Y, Rego S, Wigby K, Friedman J, Hobbs C, Bainbridge MN. Approaches to long-read sequencing in a clinical setting to improve diagnostic rate. Sci Rep 2022; 12:16945. [PMID: 36210382 PMCID: PMC9548499 DOI: 10.1038/s41598-022-20113-x] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Accepted: 09/08/2022] [Indexed: 12/29/2022] Open
Abstract
Over the past decade, advances in genetic testing, particularly the advent of next-generation sequencing, have led to a paradigm shift in the diagnosis of molecular diseases and disorders. Despite our present collective ability to interrogate more than 90% of the human genome, portions of the genome have eluded us, resulting in stagnation of diagnostic yield with existing methodologies. Here we show how application of a new technology, long-read sequencing, has the potential to improve molecular diagnostic rates. Whole genome sequencing by long reads was able to cover 98% of next-generation sequencing dead zones, which are areas of the genome that are not interpretable by conventional industry-standard short-read sequencing. Through the ability of long-read sequencing to unambiguously call variants in these regions, we discovered an immunodeficiency due to a variant in IKBKG in a subject who had previously received a negative genome sequencing result. Additionally, we demonstrate the ability of long-read sequencing to detect small variants on par with short-read sequencing, its superior performance in identifying structural variants, and thirdly, its capacity to determine genomic methylation defects in native DNA. Though the latter technical abilities have been demonstrated, we demonstrate the clinical application of this technology to successfully identify multiple types of variants using a single test.
Collapse
Affiliation(s)
- Erica Sanford Kobayashi
- Rady Institute for Genomic Medicine, San Diego, CA USA ,grid.50956.3f0000 0001 2152 9905Department of Pediatrics, Cedars-Sinai Medical Center, Los Angeles, CA USA
| | - Serge Batalov
- Rady Institute for Genomic Medicine, San Diego, CA USA
| | - Aaron M. Wenger
- grid.423340.20000 0004 0640 9878Pacific Biosciences, Menlo Park, CA USA
| | - Christine Lambert
- grid.423340.20000 0004 0640 9878Pacific Biosciences, Menlo Park, CA USA
| | - Harsharan Dhillon
- grid.423340.20000 0004 0640 9878Pacific Biosciences, Menlo Park, CA USA
| | - Richard J. Hall
- grid.423340.20000 0004 0640 9878Pacific Biosciences, Menlo Park, CA USA
| | - Primo Baybayan
- grid.423340.20000 0004 0640 9878Pacific Biosciences, Menlo Park, CA USA
| | - Yan Ding
- Rady Institute for Genomic Medicine, San Diego, CA USA
| | - Seema Rego
- Rady Institute for Genomic Medicine, San Diego, CA USA
| | - Kristen Wigby
- Rady Institute for Genomic Medicine, San Diego, CA USA ,grid.266100.30000 0001 2107 4242Department of Pediatrics, University of California San Diego and Rady Children’s Hospital, San Diego, CA USA
| | - Jennifer Friedman
- Rady Institute for Genomic Medicine, San Diego, CA USA ,grid.266100.30000 0001 2107 4242Department of Pediatrics, University of California San Diego and Rady Children’s Hospital, San Diego, CA USA ,grid.266100.30000 0001 2107 4242Department of Neuroscience, University of California San Diego and Rady Children’s Hospital, San Diego, CA USA
| | | | | |
Collapse
|
112
|
Carangelo G, Magi A, Semeraro R. From multitude to singularity: An up-to-date overview of scRNA-seq data generation and analysis. Front Genet 2022; 13:994069. [PMID: 36263428 PMCID: PMC9575985 DOI: 10.3389/fgene.2022.994069] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Accepted: 09/15/2022] [Indexed: 11/23/2022] Open
Abstract
Single cell RNA sequencing (scRNA-seq) is today a common and powerful technology in biomedical research settings, allowing to profile the whole transcriptome of a very large number of individual cells and reveal the heterogeneity of complex clinical samples. Traditionally, cells have been classified by their morphology or by expression of certain proteins in functionally distinct settings. The advent of next generation sequencing (NGS) technologies paved the way for the detection and quantitative analysis of cellular content. In this context, transcriptome quantification techniques made their advent, starting from the bulk RNA sequencing, unable to dissect the heterogeneity of a sample, and moving to the first single cell techniques capable of analyzing a small number of cells (1-100), arriving at the current single cell techniques able to generate hundreds of thousands of cells. As experimental protocols have improved rapidly, computational workflows for processing the data have also been refined, opening up to novel methods capable of scaling computational times more favorably with the dataset size and making scRNA-seq much better suited for biomedical research. In this perspective, we will highlight the key technological and computational developments which have enabled the analysis of this growing data, making the scRNA-seq a handy tool in clinical applications.
Collapse
Affiliation(s)
- Giulia Carangelo
- Department of Experimental and Clinical Biomedical Sciences “Mario Serio”, University of Florence, Florence, Italy
| | - Alberto Magi
- Department of Information Engineering, University of Florence, Florence, Italy
| | - Roberto Semeraro
- Department of Experimental and Clinical Medicine, University of Florence, Florence, Italy
| |
Collapse
|
113
|
Kim J, Lee C, Ko BJ, Yoo DA, Won S, Phillippy AM, Fedrigo O, Zhang G, Howe K, Wood J, Durbin R, Formenti G, Brown S, Cantin L, Mello CV, Cho S, Rhie A, Kim H, Jarvis ED. False gene and chromosome losses in genome assemblies caused by GC content variation and repeats. Genome Biol 2022; 23:204. [PMID: 36167554 PMCID: PMC9516821 DOI: 10.1186/s13059-022-02765-0] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Accepted: 09/02/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Many short-read genome assemblies have been found to be incomplete and contain mis-assemblies. The Vertebrate Genomes Project has been producing new reference genome assemblies with an emphasis on being as complete and error-free as possible, which requires utilizing long reads, long-range scaffolding data, new assembly algorithms, and manual curation. A more thorough evaluation of the recent references relative to prior assemblies can provide a detailed overview of the types and magnitude of improvements. RESULTS Here we evaluate new vertebrate genome references relative to the previous assemblies for the same species and, in two cases, the same individuals, including a mammal (platypus), two birds (zebra finch, Anna's hummingbird), and a fish (climbing perch). We find that up to 11% of genomic sequence is entirely missing in the previous assemblies. In the Vertebrate Genomes Project zebra finch assembly, we identify eight new GC- and repeat-rich micro-chromosomes with high gene density. The impact of missing sequences is biased towards GC-rich 5'-proximal promoters and 5' exon regions of protein-coding genes and long non-coding RNAs. Between 26 and 60% of genes include structural or sequence errors that could lead to misunderstanding of their function when using the previous genome assemblies. CONCLUSIONS Our findings reveal novel regulatory landscapes and protein coding sequences that have been greatly underestimated in previous assemblies and are now present in the Vertebrate Genomes Project reference genomes.
Collapse
Affiliation(s)
- Juwan Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
| | - Chul Lee
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
| | - Byung June Ko
- Department of Agricultural Biotechnology and Research Institute of Agriculture and Life Sciences, Seoul National University, Seoul, Republic of Korea
| | - Dong Ahn Yoo
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
| | - Sohyoung Won
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD, USA
| | - Olivier Fedrigo
- Vertebrate Genome Lab, The Rockefeller University, New York City, USA
| | - Guojie Zhang
- BGI-Shenzhen, Shenzhen, 518083, China
- Villum Centre for Biodiversity Genomics, Section for Ecology and Evolution, Department of Biology, University of Copenhagen, Universitetsparken 15, 2100, Copenhagen, Denmark
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, 650223, China
| | | | | | - Richard Durbin
- Wellcome Sanger Institute, Cambridge, UK
- Department of Genetics, University of Cambridge, Cambridge, UK
| | - Giulio Formenti
- Vertebrate Genome Lab, The Rockefeller University, New York City, USA
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York City, USA
| | - Samara Brown
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York City, USA
| | - Lindsey Cantin
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York City, USA
| | - Claudio V Mello
- Department of Behavioral Neuroscience, Oregon Health and Science University, Portland, OR, 97239, USA
| | - Seoae Cho
- eGnome, Inc, Seoul, Republic of Korea
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD, USA
| | - Heebal Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea.
- Department of Agricultural Biotechnology and Research Institute of Agriculture and Life Sciences, Seoul National University, Seoul, Republic of Korea.
- eGnome, Inc, Seoul, Republic of Korea.
| | - Erich D Jarvis
- Vertebrate Genome Lab, The Rockefeller University, New York City, USA.
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York City, USA.
- Howard Hughes Medical Institute, Chevy Chase, MD, USA.
| |
Collapse
|
114
|
Akhoundova D, Rubin MA. Clinical application of advanced multi-omics tumor profiling: Shaping precision oncology of the future. Cancer Cell 2022; 40:920-938. [PMID: 36055231 DOI: 10.1016/j.ccell.2022.08.011] [Citation(s) in RCA: 66] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Revised: 05/22/2022] [Accepted: 08/11/2022] [Indexed: 12/17/2022]
Abstract
Next-generation DNA sequencing technology has dramatically advanced clinical oncology through the identification of therapeutic targets and molecular biomarkers, leading to the personalization of cancer treatment with significantly improved outcomes for many common and rare tumor entities. More recent developments in advanced tumor profiling now enable dissection of tumor molecular architecture and the functional phenotype at cellular and subcellular resolution. Clinical translation of high-resolution tumor profiling and integration of multi-omics data into precision treatment, however, pose significant challenges at the level of prospective validation and clinical implementation. In this review, we summarize the latest advances in multi-omics tumor profiling, focusing on spatial genomics and chromatin organization, spatial transcriptomics and proteomics, liquid biopsy, and ex vivo modeling of drug response. We analyze the current stages of translational validation of these technologies and discuss future perspectives for their integration into precision treatment.
Collapse
Affiliation(s)
- Dilara Akhoundova
- Department for BioMedical Research, University of Bern, 3008 Bern, Switzerland; Department of Medical Oncology, Inselspital, University Hospital of Bern, 3010 Bern, Switzerland
| | - Mark A Rubin
- Department for BioMedical Research, University of Bern, 3008 Bern, Switzerland; Bern Center for Precision Medicine, Inselspital, University Hospital of Bern, 3008 Bern, Switzerland.
| |
Collapse
|
115
|
Glinos DA, Garborcauskas G, Hoffman P, Ehsan N, Jiang L, Gokden A, Dai X, Aguet F, Brown KL, Garimella K, Bowers T, Costello M, Ardlie K, Jian R, Tucker NR, Ellinor PT, Harrington ED, Tang H, Snyder M, Juul S, Mohammadi P, MacArthur DG, Lappalainen T, Cummings BB. Transcriptome variation in human tissues revealed by long-read sequencing. Nature 2022; 608:353-359. [PMID: 35922509 PMCID: PMC10337767 DOI: 10.1038/s41586-022-05035-y] [Citation(s) in RCA: 160] [Impact Index Per Article: 53.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Accepted: 06/28/2022] [Indexed: 12/12/2022]
Abstract
Regulation of transcript structure generates transcript diversity and plays an important role in human disease1-7. The advent of long-read sequencing technologies offers the opportunity to study the role of genetic variation in transcript structure8-16. In this Article, we present a large human long-read RNA-seq dataset using the Oxford Nanopore Technologies platform from 88 samples from Genotype-Tissue Expression (GTEx) tissues and cell lines, complementing the GTEx resource. We identified just over 70,000 novel transcripts for annotated genes, and validated the protein expression of 10% of novel transcripts. We developed a new computational package, LORALS, to analyse the genetic effects of rare and common variants on the transcriptome by allele-specific analysis of long reads. We characterized allele-specific expression and transcript structure events, providing new insights into the specific transcript alterations caused by common and rare genetic variants and highlighting the resolution gained from long-read data. We were able to perturb the transcript structure upon knockdown of PTBP1, an RNA binding protein that mediates splicing, thereby finding genetic regulatory effects that are modified by the cellular environment. Finally, we used this dataset to enhance variant interpretation and study rare variants leading to aberrant splicing patterns.
Collapse
Affiliation(s)
- Dafni A Glinos
- New York Genome Center, New York, NY, USA.
- Department of Systems Biology, Columbia University, New York, NY, USA.
| | - Garrett Garborcauskas
- Medical and Population Genetics Program, The Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | | | - Nava Ehsan
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Lihua Jiang
- Department of Genetics, Stanford University, Stanford, CA, USA
| | | | | | | | - Kathleen L Brown
- New York Genome Center, New York, NY, USA
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | | | - Tera Bowers
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | | | - Ruiqi Jian
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Nathan R Tucker
- Masonic Medical Research Institute, Utica, NY, USA
- Cardiovascular Disease Initiative, The Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Patrick T Ellinor
- Cardiovascular Disease Initiative, The Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | | | - Hua Tang
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Michael Snyder
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Sissel Juul
- Oxford Nanopore Technology, New York, NY, USA
| | - Pejman Mohammadi
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
- Scripps Research Translational Institute, La Jolla, CA, USA
| | - Daniel G MacArthur
- Medical and Population Genetics Program, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Centre for Population Genomics, Garvan Institute of Medical Research, and UNSW Sydney, Sydney, New South Wales, Australia
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia
| | - Tuuli Lappalainen
- New York Genome Center, New York, NY, USA.
- Department of Systems Biology, Columbia University, New York, NY, USA.
- Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, Stockholm, Sweden.
| | - Beryl B Cummings
- Medical and Population Genetics Program, The Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
116
|
Zepeda‐Mendoza CJ, Bontrager JE, Fisher CF, McDonald A, George‐Abraham JK, Hasadsri L. Molecular characterization and reclassification of a 1.18 Mbp DMD duplication following positive carrier screening for Duchenne/Becker muscular dystrophy. Clin Case Rep 2022; 10:e6008. [PMID: 35846917 PMCID: PMC9272227 DOI: 10.1002/ccr3.6008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Revised: 06/21/2021] [Accepted: 01/28/2022] [Indexed: 11/15/2022] Open
Abstract
A 2-month-old male patient harboring a duplication of DMD exons 1-7 classified as pathogenic by an outside institution presented with mildly elevated creatine phosphokinase (CK); molecular breakpoint analysis by our laboratory reclassified the duplication as likely benign. To date, proband continues to develop normally with decreased CK, further supporting our reclassification.
Collapse
Affiliation(s)
- Cinthya J. Zepeda‐Mendoza
- Division of Laboratory Genetics and GenomicsDepartment of Laboratory Medicine and PathologyMayo ClinicRochesterMinnesotaUSA
| | - Jordan E. Bontrager
- Division of Laboratory Genetics and GenomicsDepartment of Laboratory Medicine and PathologyMayo ClinicRochesterMinnesotaUSA
| | | | - Amber McDonald
- Division of Laboratory Genetics and GenomicsDepartment of Laboratory Medicine and PathologyMayo ClinicRochesterMinnesotaUSA
| | | | - Linda Hasadsri
- Division of Laboratory Genetics and GenomicsDepartment of Laboratory Medicine and PathologyMayo ClinicRochesterMinnesotaUSA
| |
Collapse
|
117
|
Hamdan A, Ewing A. Unravelling the tumour genome: The evolutionary and clinical impacts of structural variants in tumourigenesis. J Pathol 2022; 257:479-493. [PMID: 35355264 PMCID: PMC9321913 DOI: 10.1002/path.5901] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Revised: 03/16/2022] [Accepted: 03/28/2022] [Indexed: 11/15/2022]
Abstract
Structural variants (SVs) represent a major source of aberration in tumour genomes. Given the diversity in the size and type of SVs present in tumours, the accurate detection and interpretation of SVs in tumours is challenging. New classes of complex structural events in tumours are discovered frequently, and the definitions of the genomic consequences of complex events are constantly being refined. Detailed analyses of short-read whole-genome sequencing (WGS) data from large tumour cohorts facilitate the interrogation of SVs at orders of magnitude greater scale and depth. However, the inherent technical limitations of short-read WGS prevent us from accurately detecting and investigating the impact of all the SVs present in tumours. The expanded use of long-read WGS will be critical for improving the accuracy of SV detection, and in fully resolving complex SV events, both of which are crucial for determining the impact of SVs on tumour progression and clinical outcome. Despite the present limitations, we demonstrate that SVs play an important role in tumourigenesis. In particular, SVs contribute significantly to late-stage tumour development and to intratumoural heterogeneity. The evolutionary trajectories of SVs represent a window into the clonal dynamics in tumours, a comprehensive understanding of which will be vital for influencing patient outcomes in the future. Recent findings have highlighted many clinical applications of SVs in cancer, from early detection to biomarkers for treatment response and prognosis. As the methods to detect and interpret SVs improve, elucidating the full breadth of the complex SV landscape and determining how these events modulate tumour evolution will improve our understanding of cancer biology and our ability to capitalise on the utility of SVs in the clinical management of cancer patients. © 2022 The Authors. The Journal of Pathology published by John Wiley & Sons Ltd on behalf of The Pathological Society of Great Britain and Ireland.
Collapse
Affiliation(s)
- Alhafidz Hamdan
- MRC Human Genetics Unit, Institute of Genetics and CancerUniversity of EdinburghEdinburghUK
- Cancer Research UK Edinburgh Centre, Institute of Genetics and CancerUniversity of EdinburghEdinburghUK
| | - Ailith Ewing
- MRC Human Genetics Unit, Institute of Genetics and CancerUniversity of EdinburghEdinburghUK
- Cancer Research UK Edinburgh Centre, Institute of Genetics and CancerUniversity of EdinburghEdinburghUK
| |
Collapse
|
118
|
Sano Y, Koyanagi Y, Wong JH, Murakami Y, Fujiwara K, Endo M, Aoi T, Hashimoto K, Nakazawa T, Wada Y, Ueno S, Gao D, Murakami A, Hotta Y, Ikeda Y, Nishiguchi KM, Momozawa Y, Sonoda KH, Akiyama M, Fujimoto A. Likely pathogenic structural variants in genetically unsolved patients with retinitis pigmentosa revealed by long-read sequencing. J Med Genet 2022; 59:1133-1138. [PMID: 35710107 PMCID: PMC9613870 DOI: 10.1136/jmedgenet-2022-108428] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2022] [Accepted: 05/14/2022] [Indexed: 11/09/2022]
Abstract
Despite the successful identification of causative genes and genetic variants of retinitis pigmentosa (RP), many patients have not been molecularly diagnosed. Our recent study using targeted short-read sequencing showed that the proportion of carriers of pathogenic variants in EYS, the cause of autosomal recessive RP, was unexpectedly high in Japanese patients with unsolved RP. This result suggested that causative genetic variants, which are difficult to detect by short-read sequencing, exist in such patients. Using long-read sequencing technology (Oxford Nanopore), we analysed the whole genomes of 15 patients with RP with one heterozygous pathogenic variant in EYS detected in our previous study along with structural variants (SVs) in EYS and another 88 RP-associated genes. Two large exon-overlapping deletions involving six exons were identified in EYS in two patients with unsolved RP. An analysis of an independent patient set (n=1189) suggested that these two deletions are not founder mutations. Our results suggest that searching for SVs by long-read sequencing in genetically unsolved cases benefits the molecular diagnosis of RP.
Collapse
Affiliation(s)
- Yusuke Sano
- Department of Ophthalmology, Graduate School of Medical Sciences, Kyushu University, Fukuoka, Fukuoka, Japan.,Department of Human Genetics, The University of Tokyo, Graduate School of Medicine, Bunkyo-ku, Tokyo, Japan
| | - Yoshito Koyanagi
- Department of Ophthalmology, Graduate School of Medical Sciences, Kyushu University, Fukuoka, Fukuoka, Japan.,Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan
| | - Jing Hao Wong
- Department of Human Genetics, The University of Tokyo, Graduate School of Medicine, Bunkyo-ku, Tokyo, Japan
| | - Yusuke Murakami
- Department of Ophthalmology, Graduate School of Medical Sciences, Kyushu University, Fukuoka, Fukuoka, Japan
| | - Kohta Fujiwara
- Department of Ophthalmology, Graduate School of Medical Sciences, Kyushu University, Fukuoka, Fukuoka, Japan
| | - Mikiko Endo
- Laboratory for Genotyping Development, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan
| | - Tomomi Aoi
- Laboratory for Genotyping Development, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan
| | - Kazuki Hashimoto
- Department of Ophthalmology, Tohoku University Graduate School of Medicine, Sendai, Miyagi, Japan
| | - Toru Nakazawa
- Department of Ophthalmology, Tohoku University Graduate School of Medicine, Sendai, Miyagi, Japan.,Department of Advanced Ophthalmic Medicine, Tohoku University Graduate School of Medicine, Sendai, Miyagi, Japan
| | | | - Shinji Ueno
- Department of Ophthalmology, Nagoya University Graduate School of Medicine, Nagoya, Aichi, Japan
| | - Dan Gao
- Department of Ophthalmology, Juntendo University Graduate School of Medicine, Bunkyo-ku, Tokyo, Japan
| | - Akira Murakami
- Department of Ophthalmology, Juntendo University Graduate School of Medicine, Bunkyo-ku, Tokyo, Japan
| | - Yoshihiro Hotta
- Department of Ophthalmology, Hamamatsu University School of Medicine, Hamamatsu, Shizuoka, Japan
| | - Yasuhiro Ikeda
- Department of Ophthalmology, Faculty of Medicine, University of Miyazaki, Miyazaki, Miyazaki, Japan
| | - Koji M Nishiguchi
- Department of Ophthalmology, Nagoya University Graduate School of Medicine, Nagoya, Aichi, Japan
| | - Yukihide Momozawa
- Laboratory for Genotyping Development, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan
| | - Koh-Hei Sonoda
- Department of Ophthalmology, Graduate School of Medical Sciences, Kyushu University, Fukuoka, Fukuoka, Japan
| | - Masato Akiyama
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan .,Department of Ocular Pathology and Imaging Science, Graduate School of Medical Sciences, Kyushu University, Fukuoka, Japan
| | - Akihiro Fujimoto
- Department of Human Genetics, The University of Tokyo, Graduate School of Medicine, Bunkyo-ku, Tokyo, Japan
| |
Collapse
|
119
|
Lee BY, Kim J, Lee J. Intraspecific de novo gene birth revealed by presence-absence variant genes in Caenorhabditis elegans. NAR Genom Bioinform 2022; 4:lqac031. [PMID: 35464238 PMCID: PMC9022459 DOI: 10.1093/nargab/lqac031] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Revised: 03/30/2022] [Accepted: 04/13/2022] [Indexed: 12/24/2022] Open
Abstract
Genes embed their evolutionary history in the form of various alleles. Presence-absence variants (PAVs) are extreme cases of such alleles, where a gene present in one haplotype does not exist in another. Because PAVs may result from either birth or death of a gene, PAV genes and their alternative alleles, if available, can represent a basis for rapid intraspecific gene evolution. Using long-read sequencing technologies, this study traced the possible evolution of PAV genes in the PD1074 and CB4856 C. elegans strains as well as their alternative alleles in 14 other wild strains. We updated the CB4856 genome by filling 18 gaps and identified 46 genes and 7,460 isoforms from both strains not annotated previously. We verified 328 PAV genes, out of which 46 were C. elegans-specific. Among these possible newly born genes, 12 had alternative alleles in other wild strains; in particular, the alternative alleles of three genes showed signatures of active transposons. Alternative alleles of three other genes showed another type of signature reflected in accumulation of small insertions or deletions. Research on gene evolution using both species-specific PAV genes and their alternative alleles may provide new insights into the process of gene evolution.
Collapse
Affiliation(s)
- Bo Yun Lee
- Research Institute of Basic Sciences, Seoul National University, Seoul 08826, Korea
- Institute of Molecular Biology and Genetics, Seoul National University, Seoul 08826, Korea
| | - Jun Kim
- Research Institute of Basic Sciences, Seoul National University, Seoul 08826, Korea
- Department of Biological Sciences, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, Korea
| | - Junho Lee
- Research Institute of Basic Sciences, Seoul National University, Seoul 08826, Korea
- Institute of Molecular Biology and Genetics, Seoul National University, Seoul 08826, Korea
- Department of Biological Sciences, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, Korea
| |
Collapse
|
120
|
Quan C, Lu H, Lu Y, Zhou G. Population-scale genotyping of structural variation in the era of long-read sequencing. Comput Struct Biotechnol J 2022; 20:2639-2647. [PMID: 35685364 PMCID: PMC9163579 DOI: 10.1016/j.csbj.2022.05.047] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Revised: 05/24/2022] [Accepted: 05/24/2022] [Indexed: 11/29/2022] Open
Abstract
Population-scale studies of structural variation (SV) are growing rapidly worldwide with the development of long-read sequencing technology, yielding a considerable number of novel SVs and complete gap-closed genome assemblies. Herein, we highlight recent studies using a hybrid sequencing strategy and present the challenges toward large-scale genotyping for SVs due to the reference bias. Genotyping SVs at a population scale remains challenging, which severely impacts genotype-based population genetic studies or genome-wide association studies of complex diseases. We summarize academic efforts to improve genotype quality through linear or graph representations of reference and alternative alleles. Graph-based genotypers capable of integrating diverse genetic information are effectively applied to large and diverse cohorts, contributing to unbiased downstream analysis. Meanwhile, there is still an urgent need in this field for efficient tools to construct complex graphs and perform sequence-to-graph alignments.
Collapse
Affiliation(s)
- Cheng Quan
- Department of Genetics & Integrative Omics, State Key Laboratory of Proteomics, National Center for Protein Sciences, Beijing Institute of Radiation Medicine, Beijing 100850, PR China
| | - Hao Lu
- Department of Genetics & Integrative Omics, State Key Laboratory of Proteomics, National Center for Protein Sciences, Beijing Institute of Radiation Medicine, Beijing 100850, PR China
| | - Yiming Lu
- Department of Genetics & Integrative Omics, State Key Laboratory of Proteomics, National Center for Protein Sciences, Beijing Institute of Radiation Medicine, Beijing 100850, PR China
- Hebei University, Baoding, Hebei Province 071002, PR China
| | - Gangqiao Zhou
- Department of Genetics & Integrative Omics, State Key Laboratory of Proteomics, National Center for Protein Sciences, Beijing Institute of Radiation Medicine, Beijing 100850, PR China
- Collaborative Innovation Center for Personalized Cancer Medicine, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu Province 211166, PR China
- Medical College of Guizhou University, Guiyang, Guizhou Province 550025, PR China
- Hebei University, Baoding, Hebei Province 071002, PR China
| |
Collapse
|
121
|
Fruzangohar M, Timmins WA, Kravchuk O, Taylor J. HaploMaker: An improved algorithm for rapid haplotype assembly of genomic sequences. Gigascience 2022; 11:giac038. [PMID: 35579550 PMCID: PMC9112781 DOI: 10.1093/gigascience/giac038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Revised: 01/17/2022] [Accepted: 03/24/2022] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND In diploid organisms, whole-genome haplotype assembly relies on the accurate identification and assignment of heterozygous single-nucleotide polymorphism alleles to the correct homologous chromosomes. This appropriate phasing of these alleles ensures that combinations of single-nucleotide polymorphisms on any chromosome, called haplotypes, can then be used in downstream genetic analysis approaches including determining their potential association with important phenotypic traits. A number of statistical algorithms and complementary computational software tools have been developed for whole-genome haplotype construction from genomic sequence data. However, many algorithms lack the ability to phase long haplotype blocks and simultaneously achieve a competitive accuracy. RESULTS In this research we present HaploMaker, a novel reference-based haplotype assembly algorithm capable of accurately and efficiently phasing long haplotypes using paired-end short reads and longer Pacific Biosciences reads from diploid genomic sequences. To achieve this we frame the problem as a directed acyclic graph with edges weighted on read evidence and use efficient path traversal and minimization techniques to optimally phase haplotypes. We compared the HaploMaker algorithm with 3 other common reference-based haplotype assembly tools using public haplotype data of human individuals from the Platinum Genome project. With short-read sequences, the HaploMaker algorithm maintained a competitively low switch error rate across all haplotype lengths and was superior in phasing longer genomic regions. For longer Pacific Biosciences reads, the phasing accuracy of HaploMaker remained competitive for all block lengths and generated substantially longer block lengths than the competing algorithms. CONCLUSIONS HaploMaker provides an improved haplotype assembly algorithm for diploid genomic sequences by accurately phasing longer haplotypes. The computationally efficient and portable nature of the Java implementation of the algorithm will ensure that it has maximal impact in reference-sequence-based haplotype assembly applications.
Collapse
Affiliation(s)
- Mario Fruzangohar
- The Biometry Hub, School of Agriculture, Food and Wine & Waite Research Institute, University of Adelaide, Glen Osmond, South Australia, 5064, Australia
| | - William A Timmins
- The Biometry Hub, School of Agriculture, Food and Wine & Waite Research Institute, University of Adelaide, Glen Osmond, South Australia, 5064, Australia
| | - Olena Kravchuk
- The Biometry Hub, School of Agriculture, Food and Wine & Waite Research Institute, University of Adelaide, Glen Osmond, South Australia, 5064, Australia
| | - Julian Taylor
- The Biometry Hub, School of Agriculture, Food and Wine & Waite Research Institute, University of Adelaide, Glen Osmond, South Australia, 5064, Australia
| |
Collapse
|
122
|
Avdeyev P, Zhou J. Computational Approaches for Understanding Sequence Variation Effects on the 3D Genome Architecture. Annu Rev Biomed Data Sci 2022; 5:183-204. [PMID: 35537461 DOI: 10.1146/annurev-biodatasci-102521-012018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Decoding how genomic sequence and its variations affect 3D genome architecture is indispensable for understanding the genetic architecture of various traits and diseases. The 3D genome organization can be significantly altered by genome variations and in turn impact the function of the genomic sequence. Techniques for measuring the 3D genome architecture across spatial scales have opened up new possibilities for understanding how the 3D genome depends upon the genomic sequence and how it can be altered by sequence variations. Computational methods have become instrumental in analyzing and modeling the sequence effects on 3D genome architecture, and recent development in deep learning sequence models have opened up new opportunities for studying the interplay between sequence variations and the 3D genome. In this review, we focus on computational approaches for both the detection and modeling of sequence variation effects on the 3D genome, and we discuss the opportunities presented by these approaches. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 5 is August 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Pavel Avdeyev
- Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, Texas, USA;
| | - Jian Zhou
- Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, Texas, USA;
| |
Collapse
|
123
|
de la Morena-Barrio B, Orlando C, Sanchis-Juan A, García JL, Padilla J, de la Morena-Barrio ME, Puruunen M, Stouffs K, Cifuentes R, Borràs N, Bravo-Pérez C, Benito R, Cuenca-Guardiola J, Vicente V, Vidal F, Hernández-Rivas JM, Ouwehand W, Jochmans K, Corral J. Molecular Dissection of Structural Variations Involved in Antithrombin Deficiency. J Mol Diagn 2022; 24:462-475. [PMID: 35218943 DOI: 10.1016/j.jmoldx.2022.01.009] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Revised: 11/15/2021] [Accepted: 01/11/2022] [Indexed: 12/30/2022] Open
Abstract
Inherited antithrombin deficiency, the most severe form of thrombophilia, is predominantly caused by variants in SERPINC1. Few causal structural variants have been described, usually detected by multiplex ligation-dependent probe amplification or cytogenetic arrays, which only define the gain or loss and the approximate size and location. This study has done a complete dissection of the structural variants affecting SERPINC1 of 39 unrelated patients with antithrombin deficiency using multiplex ligation-dependent probe amplification, comparative genome hybridization array, long-range PCR, and whole genome nanopore sequencing. Structural variants, in all cases only affecting one allele, were deleterious and caused a severe type I deficiency. Most defects were deletions affecting exons of SERPINC1 (82.1%), but the whole cohort was heterogeneous, as tandem duplications, deletion of introns, or retrotransposon insertions were also detected. Their size was also variable, ranging from 193 bp to 8 Mb, and in 54% of the cases involved neighboring genes. All but two structural variants had repetitive elements and/or microhomologies in their breakpoints, suggesting a common mechanism of formation. This study also suggested regions recurrently involved in structural variants causing antithrombin deficiency and found three structural variants with a founder effect: the insertion of a retrotransposon, duplication of exon 6, and a 20-gene deletion. Finally, nanopore sequencing was determined to be the most appropriate method to identify and characterize all structural variants at nucleotide level, independently of their size or type.
Collapse
Affiliation(s)
- Belén de la Morena-Barrio
- Servicio de Hematología y Oncología Médica, Hospital Universitario Morales Meseguer, Centro Regional de Hemodonación, Universidad de Murcia, Instituto Murciano de Investigación Biosanitaria, Centro de Investigación Biomédica en Red de Enfermedades Raras, Murcia, Spain
| | - Christelle Orlando
- Department of Haematology, Vrije Universiteit Brussel, Universitair Ziekenhuis Brussel, Brussels, Belgium
| | - Alba Sanchis-Juan
- Department of Haematology, University of Cambridge, National Health Service (NHS) Blood and Transplant Centre, Cambridge, United Kingdom; National Institute for Health Research BioResource, Cambridge University Hospitals NHS Foundation Trust, Cambridge Biomedical Campus, Cambridge, United Kingdom
| | - Juan L García
- Cancer Research Center (Instituto Universitario de Biología Molecular y Celular del Cáncer) Consejo Superior de Investigaciones Científicas-University of Salamanca, Salamanca, Spain; Instituto de Investigación Biomédica, Department of Hematology, University Hospital of Salamanca, Department of Medicine, University of Salamanca, Salamanca, Spain
| | - José Padilla
- Servicio de Hematología y Oncología Médica, Hospital Universitario Morales Meseguer, Centro Regional de Hemodonación, Universidad de Murcia, Instituto Murciano de Investigación Biosanitaria, Centro de Investigación Biomédica en Red de Enfermedades Raras, Murcia, Spain
| | - María E de la Morena-Barrio
- Servicio de Hematología y Oncología Médica, Hospital Universitario Morales Meseguer, Centro Regional de Hemodonación, Universidad de Murcia, Instituto Murciano de Investigación Biosanitaria, Centro de Investigación Biomédica en Red de Enfermedades Raras, Murcia, Spain
| | - Marija Puruunen
- National Heart, Lung, and Blood Institute Framingham Heart Study, Framingham, Massachusetts
| | - Katrien Stouffs
- Center for Medical Genetics, Vrije Universiteit Brussel, Universitair Ziekenhuis Brussel, Brussels, Belgium
| | - Rosa Cifuentes
- Servicio de Hematología y Oncología Médica, Hospital Universitario Morales Meseguer, Centro Regional de Hemodonación, Universidad de Murcia, Instituto Murciano de Investigación Biosanitaria, Centro de Investigación Biomédica en Red de Enfermedades Raras, Murcia, Spain
| | - Nina Borràs
- Laboratori de Coagulopaties Congènites, Banc de Sang i Teixits, Barcelona, Medicina Transfusional, Vall d'Hebron Institut de Recerca, Universitat Autònoma de Barcelona, Centro de Investigación Biomédica en Red de Enfermedades Cardiovasculares, Instituto Carlos III, Barcelona, Spain
| | - Carlos Bravo-Pérez
- Servicio de Hematología y Oncología Médica, Hospital Universitario Morales Meseguer, Centro Regional de Hemodonación, Universidad de Murcia, Instituto Murciano de Investigación Biosanitaria, Centro de Investigación Biomédica en Red de Enfermedades Raras, Murcia, Spain
| | - Rocio Benito
- Cancer Research Center (Instituto Universitario de Biología Molecular y Celular del Cáncer) Consejo Superior de Investigaciones Científicas-University of Salamanca, Salamanca, Spain; Instituto de Investigación Biomédica, Department of Hematology, University Hospital of Salamanca, Department of Medicine, University of Salamanca, Salamanca, Spain
| | - Javier Cuenca-Guardiola
- Departamento de Informática y Sistemas, Universidad de Murcia, Instituto Murciano de Investigación Biosanitaria-Arrixaca, Murcia, Spain
| | - Vicente Vicente
- Servicio de Hematología y Oncología Médica, Hospital Universitario Morales Meseguer, Centro Regional de Hemodonación, Universidad de Murcia, Instituto Murciano de Investigación Biosanitaria, Centro de Investigación Biomédica en Red de Enfermedades Raras, Murcia, Spain
| | - Francisco Vidal
- Laboratori de Coagulopaties Congènites, Banc de Sang i Teixits, Barcelona, Medicina Transfusional, Vall d'Hebron Institut de Recerca, Universitat Autònoma de Barcelona, Centro de Investigación Biomédica en Red de Enfermedades Cardiovasculares, Instituto Carlos III, Barcelona, Spain
| | - Jesús M Hernández-Rivas
- Cancer Research Center (Instituto Universitario de Biología Molecular y Celular del Cáncer) Consejo Superior de Investigaciones Científicas-University of Salamanca, Salamanca, Spain; Instituto de Investigación Biomédica, Department of Hematology, University Hospital of Salamanca, Department of Medicine, University of Salamanca, Salamanca, Spain
| | - Willem Ouwehand
- Department of Haematology, University of Cambridge, National Health Service (NHS) Blood and Transplant Centre, Cambridge, United Kingdom; National Institute for Health Research BioResource, Cambridge University Hospitals NHS Foundation Trust, Cambridge Biomedical Campus, Cambridge, United Kingdom
| | - Kristin Jochmans
- Department of Haematology, Vrije Universiteit Brussel, Universitair Ziekenhuis Brussel, Brussels, Belgium
| | - Javier Corral
- Servicio de Hematología y Oncología Médica, Hospital Universitario Morales Meseguer, Centro Regional de Hemodonación, Universidad de Murcia, Instituto Murciano de Investigación Biosanitaria, Centro de Investigación Biomédica en Red de Enfermedades Raras, Murcia, Spain.
| |
Collapse
|
124
|
Shale C, Cameron DL, Baber J, Wong M, Cowley MJ, Papenfuss AT, Cuppen E, Priestley P. Unscrambling cancer genomes via integrated analysis of structural variation and copy number. CELL GENOMICS 2022; 2:100112. [PMID: 36776527 PMCID: PMC9903802 DOI: 10.1016/j.xgen.2022.100112] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Revised: 09/29/2021] [Accepted: 02/25/2022] [Indexed: 11/17/2022]
Abstract
Complex somatic genomic rearrangements and copy number alterations are hallmarks of nearly all cancers. We have developed an algorithm, LINX, to aid interpretation of structural variant and copy number data derived from short-read, whole-genome sequencing. LINX classifies raw structural variant calls into distinct events and predicts their effect on the local structure of the derivative chromosome and the functional impact on affected genes. Visualizations facilitate further investigation of complex rearrangements. LINX allows insights into a diverse range of structural variation events and can reliably detect pathogenic rearrangements, including gene fusions, immunoglobulin enhancer rearrangements, intragenic deletions, and duplications. Uniquely, LINX also predicts chained fusions that we demonstrate account for 13% of clinically relevant oncogenic fusions. LINX also reports a class of inactivation events that we term homozygous disruptions that may be a driver mutation in up to 9% of tumors and may frequently affect PTEN, TP53, and RB1.
Collapse
Affiliation(s)
- Charles Shale
- Hartwig Medical Foundation Australia, Sydney, NSW, Australia
- Hartwig Medical Foundation, Science Park 408, Amsterdam, the Netherlands
| | - Daniel L. Cameron
- Hartwig Medical Foundation Australia, Sydney, NSW, Australia
- Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia
- Department of Medical Biology, University of Melbourne, Melbourne, VIC, Australia
| | - Jonathan Baber
- Hartwig Medical Foundation Australia, Sydney, NSW, Australia
- Hartwig Medical Foundation, Science Park 408, Amsterdam, the Netherlands
| | - Marie Wong
- Children’s Cancer Institute, Lowy Cancer Centre, UNSW Sydney, Kensington, NSW, Australia
- School of Women’s and Children’s Health, UNSW Sydney, Kensington, NSW, Australia
| | - Mark J. Cowley
- Children’s Cancer Institute, Lowy Cancer Centre, UNSW Sydney, Kensington, NSW, Australia
- School of Women’s and Children’s Health, UNSW Sydney, Kensington, NSW, Australia
| | - Anthony T. Papenfuss
- Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia
- Department of Medical Biology, University of Melbourne, Melbourne, VIC, Australia
- Peter MacCallum Cancer Centre, Melbourne, VIC, Australia
- Sir Peter MacCallum Department of Oncology, University of Melbourne, Melbourne, VIC, Australia
| | - Edwin Cuppen
- Hartwig Medical Foundation, Science Park 408, Amsterdam, the Netherlands
- Center for Molecular Medicine and Oncode Institute, University Medical Center Utrecht, Heidelberglaan 100, Utrecht, the Netherlands
| | - Peter Priestley
- Hartwig Medical Foundation Australia, Sydney, NSW, Australia
- Hartwig Medical Foundation, Science Park 408, Amsterdam, the Netherlands
| |
Collapse
|
125
|
Hu T, Li J, Long M, Wu J, Zhang Z, Xie F, Zhao J, Yang H, Song Q, Lian S, Shi J, Guo X, Yuan D, Lang D, Yu G, Liang B, Zhou X, Ishibashi T, Fan X, Yu W, Wang D, Wang Y, Peng IF, Wang S. Detection of Structural Variations and Fusion Genes in Breast Cancer Samples Using Third-Generation Sequencing. Front Cell Dev Biol 2022; 10:854640. [PMID: 35493102 PMCID: PMC9043247 DOI: 10.3389/fcell.2022.854640] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Accepted: 03/23/2022] [Indexed: 11/16/2022] Open
Abstract
Background: Structural variations (SVs) are common genetic alterations in the human genome that could cause different phenotypes and diseases, including cancer. However, the detection of structural variations using the second-generation sequencing was limited by its short read length, which restrained our understanding of structural variations. Methods: In this study, we developed a 28-gene panel for long-read sequencing and employed it to Oxford Nanopore Technologies and Pacific Biosciences platforms. We analyzed structural variations in the 28 breast cancer-related genes through long-read genomic and transcriptomic sequencing of tumor, para-tumor, and blood samples in 19 breast cancer patients. Results: Our results showed that some somatic SVs were recurring among the selected genes, though the majority of them occurred in the non-exonic region. We found evidence supporting the existence of hotspot regions for SVs, which extended our previous understanding that they exist only for single nucleotide variations. Conclusion: In conclusion, we employed long-read genomic and transcriptomic sequencing to identify SVs from breast cancer patients and proved that this approach holds great potential in clinical application.
Collapse
Affiliation(s)
- Taobo Hu
- Department of Breast Surgery, Peking University People’s Hospital, Beijing, China
| | - Jingjing Li
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
- GrandOmics Inc., Beijing, China
| | - Mengping Long
- Department of Pathology, Peking University Cancer Hospital, Beijing, China
| | - Jinbo Wu
- Department of Breast Surgery, Peking University People’s Hospital, Beijing, China
| | - Zhen Zhang
- Department of Statistics, The Chinese University of Hong Kong, Sha Tin, China
| | - Fei Xie
- Department of Breast Surgery, Peking University People’s Hospital, Beijing, China
| | - Jin Zhao
- Department of Breast Surgery, Peking University People’s Hospital, Beijing, China
| | - Houpu Yang
- Department of Breast Surgery, Peking University People’s Hospital, Beijing, China
| | - Qianqian Song
- Department of Biostatistics, School of Public Health, Peking University, Beijing, China
| | - Sheng Lian
- Department of Electronic and Computer Engineering, Hong Kong University of Science and Technology, Kowloon, Hong Kong SAR, China
| | - Jiandong Shi
- Department of Electronic and Computer Engineering, Hong Kong University of Science and Technology, Kowloon, Hong Kong SAR, China
| | | | | | | | | | - Baosheng Liang
- Department of Biostatistics, School of Public Health, Peking University, Beijing, China
| | - Xiaohua Zhou
- Department of Biostatistics, School of Public Health, Peking University, Beijing, China
| | - Toyotaka Ishibashi
- Division of Life Science, Hong Kong University of Science and Technology, Kowloon, Hong Kong SAR, China
| | - Xiaodan Fan
- Department of Statistics, The Chinese University of Hong Kong, Sha Tin, China
| | - Weichuan Yu
- Department of Electronic and Computer Engineering, Hong Kong University of Science and Technology, Kowloon, Hong Kong SAR, China
| | | | - Yang Wang
- GrandOmics Inc., Beijing, China
- *Correspondence: Yang Wang, ; I-Feng Peng, ; Shu Wang,
| | - I-Feng Peng
- GrandOmics Inc., Beijing, China
- *Correspondence: Yang Wang, ; I-Feng Peng, ; Shu Wang,
| | - Shu Wang
- Department of Breast Surgery, Peking University People’s Hospital, Beijing, China
- *Correspondence: Yang Wang, ; I-Feng Peng, ; Shu Wang,
| |
Collapse
|
126
|
Nicholas TJ, Al‐Sweel N, Farrell A, Mao R, Bayrak‐Toydemir P, Miller CE, Bentley D, Palmquist R, Moore B, Hernandez EJ, Cormier MJ, Fredrickson E, Noble K, Rynearson S, Holt C, Karren M, Bonkowsky JL, Tristani‐Firouzi M, Yandell M, Marth G, Quinlan AR, Brunelli L, Toydemir R, Shayota BJ, Carey JC, Boyden SE, Malone Jenkins S. Comprehensive variant calling from whole-genome sequencing identifies a complex inversion that disrupts ZFPM2 in familial congenital diaphragmatic hernia. Mol Genet Genomic Med 2022; 10:e1888. [PMID: 35119225 PMCID: PMC9000945 DOI: 10.1002/mgg3.1888] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Revised: 01/14/2022] [Accepted: 01/18/2022] [Indexed: 01/03/2023] Open
Abstract
BACKGROUND Genetic disorders contribute to significant morbidity and mortality in critically ill newborns. Despite advances in genome sequencing technologies, a majority of neonatal cases remain unsolved. Complex structural variants (SVs) often elude conventional genome sequencing variant calling pipelines and will explain a portion of these unsolved cases. METHODS As part of the Utah NeoSeq project, we used a research-based, rapid whole-genome sequencing (WGS) protocol to investigate the genomic etiology for a newborn with a left-sided congenital diaphragmatic hernia (CDH) and cardiac malformations, whose mother also had a history of CDH and atrial septal defect. RESULTS Using both a novel, alignment-free and traditional alignment-based variant callers, we identified a maternally inherited complex SV on chromosome 8, consisting of an inversion flanked by deletions. This complex inversion, further confirmed using orthogonal molecular techniques, disrupts the ZFPM2 gene, which is associated with both CDH and various congenital heart defects. CONCLUSIONS Our results demonstrate that complex structural events, which often are unidentifiable or not reported by clinically validated testing procedures, can be discovered and accurately characterized with conventional, short-read sequencing and underscore the utility of WGS as a first-line diagnostic tool.
Collapse
Affiliation(s)
- Thomas J. Nicholas
- Department of Human Genetics, Utah Center for Genetic DiscoveryUniversity of UtahSalt Lake CityUSA
| | - Najla Al‐Sweel
- ARUP LaboratoriesSalt Lake CityUSA
- Department of PathologyUniversity of UtahSalt Lake CityUSA
| | - Andrew Farrell
- Department of Human Genetics, Utah Center for Genetic DiscoveryUniversity of UtahSalt Lake CityUSA
| | - Rong Mao
- ARUP LaboratoriesSalt Lake CityUSA
- Department of PathologyUniversity of UtahSalt Lake CityUSA
| | - Pinar Bayrak‐Toydemir
- ARUP LaboratoriesSalt Lake CityUSA
- Department of PathologyUniversity of UtahSalt Lake CityUSA
| | | | - Dawn Bentley
- Division of Neonatology, Department of PediatricsUniversity of Utah School of MedicineSalt Lake CityUSA
| | - Rachel Palmquist
- Division of Pediatric Neurology, Department of PediatricsUniversity of Utah School of MedicineSalt Lake CityUSA
- Primary Children's Center for Personalized MedicineSalt Lake CityUSA
| | - Barry Moore
- Department of Human Genetics, Utah Center for Genetic DiscoveryUniversity of UtahSalt Lake CityUSA
| | - Edgar J. Hernandez
- Department of Human Genetics, Utah Center for Genetic DiscoveryUniversity of UtahSalt Lake CityUSA
| | - Michael J. Cormier
- Department of Human Genetics, Utah Center for Genetic DiscoveryUniversity of UtahSalt Lake CityUSA
| | | | | | - Shawn Rynearson
- Department of Human Genetics, Utah Center for Genetic DiscoveryUniversity of UtahSalt Lake CityUSA
| | - Carson Holt
- Department of Human Genetics, Utah Center for Genetic DiscoveryUniversity of UtahSalt Lake CityUSA
| | - Mary Anne Karren
- Department of Human Genetics, Utah Center for Genetic DiscoveryUniversity of UtahSalt Lake CityUSA
| | - Joshua L. Bonkowsky
- Division of Pediatric Neurology, Department of PediatricsUniversity of Utah School of MedicineSalt Lake CityUSA
- Primary Children's Center for Personalized MedicineSalt Lake CityUSA
| | - Martin Tristani‐Firouzi
- Division of Pediatric Cardiology, Department of PediatricsUniversity of Utah School of MedicineSalt Lake CityUSA
| | - Mark Yandell
- Department of Human Genetics, Utah Center for Genetic DiscoveryUniversity of UtahSalt Lake CityUSA
| | - Gabor Marth
- Department of Human Genetics, Utah Center for Genetic DiscoveryUniversity of UtahSalt Lake CityUSA
| | - Aaron R. Quinlan
- Department of Human Genetics, Utah Center for Genetic DiscoveryUniversity of UtahSalt Lake CityUSA
- Department of Biomedical InformaticsUniversity of UtahSalt Lake CityUSA
| | - Luca Brunelli
- Division of Neonatology, Department of PediatricsUniversity of Utah School of MedicineSalt Lake CityUSA
| | - Reha M. Toydemir
- ARUP LaboratoriesSalt Lake CityUSA
- Department of PathologyUniversity of UtahSalt Lake CityUSA
| | - Brian J. Shayota
- Division of Medical Genetics, Department of PediatricsUniversity of Utah School of MedicineSalt Lake CityUSA
| | - John C. Carey
- Division of Medical Genetics, Department of PediatricsUniversity of Utah School of MedicineSalt Lake CityUSA
| | - Steven E. Boyden
- Department of Human Genetics, Utah Center for Genetic DiscoveryUniversity of UtahSalt Lake CityUSA
| | - Sabrina Malone Jenkins
- Division of Neonatology, Department of PediatricsUniversity of Utah School of MedicineSalt Lake CityUSA
| |
Collapse
|
127
|
Jobson E, Roberts R. Genomic structural variation in tomato and its role in plant immunity. MOLECULAR HORTICULTURE 2022; 2:7. [PMID: 37789472 PMCID: PMC10515242 DOI: 10.1186/s43897-022-00029-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Accepted: 02/22/2022] [Indexed: 10/05/2023]
Abstract
It is well known that large genomic variations can greatly impact the phenotype of an organism. Structural Variants (SVs) encompass any genomic variation larger than 30 base pairs, and include changes caused by deletions, inversions, duplications, transversions, and other genome modifications. Due to their size and complex nature, until recently, it has been difficult to truly capture these variations. Recent advances in sequencing technology and computational analyses now permit more extensive studies of SVs in plant genomes. In tomato, advances in sequencing technology have allowed researchers to sequence hundreds of genomes from tomatoes, and tomato relatives. These studies have identified SVs related to fruit size and flavor, as well as plant disease response, resistance/susceptibility, and the ability of plants to detect pathogens (immunity). In this review, we discuss the implications for genomic structural variation in plants with a focus on its role in tomato immunity. We also discuss how advances in sequencing technology have led to new discoveries of SVs in more complex genomes, the current evidence for the role of SVs in biotic and abiotic stress responses, and the outlook for genetic modification of SVs to advance plant breeding objectives.
Collapse
Affiliation(s)
- Emma Jobson
- Montana State University Extension, Montana State University, Bozeman, MT, 59717, United States
| | - Robyn Roberts
- Agricultural Biology Department, College of Agricultural Sciences, Colorado State University, Fort Collins, CO, USA.
| |
Collapse
|
128
|
Assessment of linkage disequilibrium patterns between structural variants and single nucleotide polymorphisms in three commercial chicken populations. BMC Genomics 2022; 23:193. [PMID: 35264116 PMCID: PMC8908679 DOI: 10.1186/s12864-022-08418-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Accepted: 02/24/2022] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND Structural variants (SV) are causative for some prominent phenotypic traits of livestock as different comb types in chickens or color patterns in pigs. Their effects on production traits are also increasingly studied. Nevertheless, accurately calling SV remains challenging. It is therefore of interest, whether close-by single nucleotide polymorphisms (SNPs) are in strong linkage disequilibrium (LD) with SVs and can serve as markers. Literature comes to different conclusions on whether SVs are in LD to SNPs on the same level as SNPs to other SNPs. The present study aimed to generate a precise SV callset from whole-genome short-read sequencing (WGS) data for three commercial chicken populations and to evaluate LD patterns between the called SVs and surrounding SNPs. It is thereby the first study that assessed LD between SVs and SNPs in chickens. RESULTS The final callset consisted of 12,294,329 bivariate SNPs, 4,301 deletions (DEL), 224 duplications (DUP), 218 inversions (INV) and 117 translocation breakpoints (BND). While average LD between DELs and SNPs was at the same level as between SNPs and SNPs, LD between other SVs and SNPs was strongly reduced (DUP: 40%, INV: 27%, BND: 19% of between-SNP LD). A main factor for the reduced LD was the presence of local minor allele frequency differences, which accounted for 50% of the difference between SNP - SNP and DUP - SNP LD. This was potentially accompanied by lower genotyping accuracies for DUP, INV and BND compared with SNPs and DELs. An evaluation of the presence of tag SNPs (SNP in highest LD to the variant of interest) further revealed DELs to be slightly less tagged by WGS SNPs than WGS SNPs by other SNPs. This difference, however, was no longer present when reducing the pool of potential tag SNPs to SNPs located on four different chicken genotyping arrays. CONCLUSIONS The results implied that genomic variance due to DELs in the chicken populations studied can be captured by different SNP marker sets as good as variance from WGS SNPs, whereas separate SV calling might be advisable for DUP, INV, and BND effects.
Collapse
|
129
|
Liu Z, Roberts R, Mercer TR, Xu J, Sedlazeck FJ, Tong W. Towards accurate and reliable resolution of structural variants for clinical diagnosis. Genome Biol 2022; 23:68. [PMID: 35241127 PMCID: PMC8892125 DOI: 10.1186/s13059-022-02636-8] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Accepted: 02/15/2022] [Indexed: 12/17/2022] Open
Abstract
Structural variants (SVs) are a major source of human genetic diversity and have been associated with different diseases and phenotypes. The detection of SVs is difficult, and a diverse range of detection methods and data analysis protocols has been developed. This difficulty and diversity make the detection of SVs for clinical applications challenging and requires a framework to ensure accuracy and reproducibility. Here, we discuss current developments in the diagnosis of SVs and propose a roadmap for the accurate and reproducible detection of SVs that includes case studies provided from the FDA-led SEquencing Quality Control Phase II (SEQC-II) and other consortium efforts.
Collapse
Affiliation(s)
- Zhichao Liu
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Ruth Roberts
- ApconiX, BioHub at Alderley Park, Alderley Edge, SK10 4TG, UK
- University of Birmingham, Edgbaston, Birmingham, B15 2TT, UK
| | - Timothy R Mercer
- Australian Institute for Bioengineering and Nanotechnology, University of Queensland, Brisbane, QLD, Australia
- Garvan Institute of Medical Research, Sydney, NSW, Australia
- St Vincent's Clinical School, University of New South Wales, Sydney, NSW, Australia
| | - Joshua Xu
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
| | - Weida Tong
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA.
| |
Collapse
|
130
|
Long-read sequencing on the SMRT platform enables efficient haplotype linkage analysis in preimplantation genetic testing for β-thalassemia. J Assist Reprod Genet 2022; 39:739-746. [PMID: 35141813 PMCID: PMC8995213 DOI: 10.1007/s10815-022-02415-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Accepted: 01/26/2022] [Indexed: 10/19/2022] Open
Abstract
PURPOSE This study aimed to evaluate the value of long-read sequencing for preimplantation haplotype linkage analysis. METHODS The genetic material of the three β-thalassemia mutation carrier couples was sequenced using single-molecule real-time sequencing in the 7.7-kb region of the HBB gene and a 7.4-kb region that partially overlapped with it to detect the presence of 17 common HBB gene mutations in the Chinese population and the haplotypes formed by the continuous array of single-nucleotide polymorphisms linked to these mutations. By using the same method to analyze multiple displacement amplification products of embryos from three families and comparing the results with those of the parents, it could be revealed whether the embryos carry disease-causing mutations without the need for a proband. RESULTS The HBB gene mutations of the three couples were accurately detected, and the haplotype linked to the pathogenic site was successfully obtained without the need for a proband. A total of 68.75% (22/32) of embryos from the three families successfully underwent haplotype linkage analysis, and the results were consistent with the results of NGS-based mutation site detection. CONCLUSION This study supports long-read sequencing as a potential tool for preimplantation haplotype linkage analysis.
Collapse
|
131
|
Marwaha S, Knowles JW, Ashley EA. A guide for the diagnosis of rare and undiagnosed disease: beyond the exome. Genome Med 2022; 14:23. [PMID: 35220969 PMCID: PMC8883622 DOI: 10.1186/s13073-022-01026-w] [Citation(s) in RCA: 168] [Impact Index Per Article: 56.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Accepted: 02/10/2022] [Indexed: 02/07/2023] Open
Abstract
Rare diseases affect 30 million people in the USA and more than 300-400 million worldwide, often causing chronic illness, disability, and premature death. Traditional diagnostic techniques rely heavily on heuristic approaches, coupling clinical experience from prior rare disease presentations with the medical literature. A large number of rare disease patients remain undiagnosed for years and many even die without an accurate diagnosis. In recent years, gene panels, microarrays, and exome sequencing have helped to identify the molecular cause of such rare and undiagnosed diseases. These technologies have allowed diagnoses for a sizable proportion (25-35%) of undiagnosed patients, often with actionable findings. However, a large proportion of these patients remain undiagnosed. In this review, we focus on technologies that can be adopted if exome sequencing is unrevealing. We discuss the benefits of sequencing the whole genome and the additional benefit that may be offered by long-read technology, pan-genome reference, transcriptomics, metabolomics, proteomics, and methyl profiling. We highlight computational methods to help identify regionally distant patients with similar phenotypes or similar genetic mutations. Finally, we describe approaches to automate and accelerate genomic analysis. The strategies discussed here are intended to serve as a guide for clinicians and researchers in the next steps when encountering patients with non-diagnostic exomes.
Collapse
Affiliation(s)
- Shruti Marwaha
- Department of Medicine, Division of Cardiovascular Medicine, School of Medicine, Stanford University, Stanford, CA, USA.
- Stanford Center for Undiagnosed Diseases, Stanford University, Stanford, CA, USA.
| | - Joshua W Knowles
- Department of Medicine, Division of Cardiovascular Medicine, School of Medicine, Stanford University, Stanford, CA, USA
- Department of Medicine, Diabetes Research Center, Cardiovascular Institute and Prevention Research Center, Stanford, CA, USA
| | - Euan A Ashley
- Department of Medicine, Division of Cardiovascular Medicine, School of Medicine, Stanford University, Stanford, CA, USA.
- Stanford Center for Undiagnosed Diseases, Stanford University, Stanford, CA, USA.
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA, USA.
| |
Collapse
|
132
|
Lemay MA, Sibbesen JA, Torkamaneh D, Hamel J, Levesque RC, Belzile F. Combined use of Oxford Nanopore and Illumina sequencing yields insights into soybean structural variation biology. BMC Biol 2022; 20:53. [PMID: 35197050 PMCID: PMC8867729 DOI: 10.1186/s12915-022-01255-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Accepted: 02/16/2022] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND Structural variants (SVs), including deletions, insertions, duplications, and inversions, are relatively long genomic variations implicated in a diverse range of processes from human disease to ecology and evolution. Given their complex signatures, tendency to occur in repeated regions, and large size, discovering SVs based on short reads is challenging compared to single-nucleotide variants. The increasing availability of long-read technologies has greatly facilitated SV discovery; however, these technologies remain too costly to apply routinely to population-level studies. Here, we combined short-read and long-read sequencing technologies to provide a comprehensive population-scale assessment of structural variation in a panel of Canadian soybean cultivars. RESULTS We used Oxford Nanopore long-read sequencing data (~12× mean coverage) for 17 samples to both benchmark SV calls made from Illumina short-read data and predict SVs that were subsequently genotyped in a population of 102 samples using Illumina data. Benchmarking results show that variants discovered using Oxford Nanopore can be accurately genotyped from the Illumina data. We first use the genotyped deletions and insertions for population genetics analyses and show that results are comparable to those based on single-nucleotide variants. We observe that the population frequency and distribution within the genome of deletions and insertions are constrained by the location of genes. Gene Ontology and PFAM domain enrichment analyses also confirm previous reports that genes harboring high-frequency deletions and insertions are enriched for functions in defense response. Finally, we discover polymorphic transposable elements from the deletions and insertions and report evidence of the recent activity of a Stowaway MITE. CONCLUSIONS We show that structural variants discovered using Oxford Nanopore data can be genotyped with high accuracy from Illumina data. Our results demonstrate that long-read and short-read sequencing technologies can be efficiently combined to enhance SV analysis in large populations, providing a reusable framework for their study in a wider range of samples and non-model species.
Collapse
Affiliation(s)
- Marc-André Lemay
- Département de phytologie, Université Laval, Quebec, Canada
- Institut de biologie intégrative et des systèmes, Université Laval, Quebec, Canada
| | | | - Davoud Torkamaneh
- Département de phytologie, Université Laval, Quebec, Canada
- Institut de biologie intégrative et des systèmes, Université Laval, Quebec, Canada
| | - Jérémie Hamel
- Institut de biologie intégrative et des systèmes, Université Laval, Quebec, Canada
- Département de microbiologie-infectiologie et d’immunologie, Université Laval, Quebec, Canada
| | - Roger C. Levesque
- Institut de biologie intégrative et des systèmes, Université Laval, Quebec, Canada
- Département de microbiologie-infectiologie et d’immunologie, Université Laval, Quebec, Canada
| | - François Belzile
- Département de phytologie, Université Laval, Quebec, Canada
- Institut de biologie intégrative et des systèmes, Université Laval, Quebec, Canada
| |
Collapse
|
133
|
Menon VK, Okhuysen PC, Chappell CL, Mahmoud M, Mahmoud M, Meng Q, Doddapaneni H, Vee V, Han Y, Salvi S, Bhamidipati S, Kottapalli K, Weissenberger G, Shen H, Ross MC, Hoffman KL, Cregeen SJ, Muzny DM, Metcalf GA, Gibbs RA, Petrosino JF, Sedlazeck FJ. Fully resolved assembly of Cryptosporidium parvum. Gigascience 2022; 11:giac010. [PMID: 35166336 PMCID: PMC8848321 DOI: 10.1093/gigascience/giac010] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Revised: 12/07/2021] [Accepted: 01/20/2022] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND Cryptosporidium parvum is an apicomplexan parasite commonly found across many host species with a global infection prevalence in human populations of 7.6%. Understanding its diversity and genomic makeup can help in fighting established infections and prohibiting further transmission. The basis of every genomic study is a high-quality reference genome that has continuity and completeness, thus enabling comprehensive comparative studies. FINDINGS Here, we provide a highly accurate and complete reference genome of Cryptosporidium parvum. The assembly is based on Oxford Nanopore reads and was improved using Illumina reads for error correction. We also outline how to evaluate and choose from different assembly methods based on 2 main approaches that can be applied to other Cryptosporidium species. The assembly encompasses 8 chromosomes and includes 13 telomeres that were resolved. Overall, the assembly shows a high completion rate with 98.4% single-copy BUSCO genes. CONCLUSIONS This high-quality reference genome of a zoonotic IIaA17G2R1 C. parvum subtype isolate provides the basis for subsequent comparative genomic studies across the Cryptosporidium clade. This will enable improved understanding of diversity, functional, and association studies.
Collapse
Affiliation(s)
- Vipin K Menon
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Pablo C Okhuysen
- Department of Infectious Diseases, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Cynthia L Chappell
- Center for Infectious Diseases, The University of Texas School of Public Health, Houston, TX 77030, USA
| | - Medhat Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Medhat Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Qingchang Meng
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Harsha Doddapaneni
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Vanesa Vee
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Yi Han
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Sejal Salvi
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Sravya Bhamidipati
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Kavya Kottapalli
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - George Weissenberger
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Hua Shen
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Matthew C Ross
- Alkek Center for Metagenomics and Microbiome Research, Department of Molecular Virology and Microbiology, Baylor College of Medicine, Houston, TX, USA
| | - Kristi L Hoffman
- Alkek Center for Metagenomics and Microbiome Research, Department of Molecular Virology and Microbiology, Baylor College of Medicine, Houston, TX, USA
| | - Sara Javornik Cregeen
- Alkek Center for Metagenomics and Microbiome Research, Department of Molecular Virology and Microbiology, Baylor College of Medicine, Houston, TX, USA
| | - Donna M Muzny
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Ginger A Metcalf
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Richard A Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Joseph F Petrosino
- Alkek Center for Metagenomics and Microbiome Research, Department of Molecular Virology and Microbiology, Baylor College of Medicine, Houston, TX, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| |
Collapse
|
134
|
Methods to Improve Molecular Diagnosis in Genomic Cold Cases in Pediatric Neurology. Genes (Basel) 2022; 13:genes13020333. [PMID: 35205378 PMCID: PMC8871714 DOI: 10.3390/genes13020333] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Revised: 02/06/2022] [Accepted: 02/07/2022] [Indexed: 02/04/2023] Open
Abstract
During the last decade, genetic testing has emerged as an important etiological diagnostic tool for Mendelian diseases, including pediatric neurological conditions. A genetic diagnosis has a considerable impact on disease management and treatment; however, many cases remain undiagnosed after applying standard diagnostic sequencing techniques. This review discusses various methods to improve the molecular diagnostic rates in these genomic cold cases. We discuss extended analysis methods to consider, non-Mendelian inheritance models, mosaicism, dual/multiple diagnoses, periodic re-analysis, artificial intelligence tools, and deep phenotyping, in addition to integrating various omics methods to improve variant prioritization. Last, novel genomic technologies, including long-read sequencing, artificial long-read sequencing, and optical genome mapping are discussed. In conclusion, a more comprehensive molecular analysis and a timely re-analysis of unsolved cases are imperative to improve diagnostic rates. In addition, our current understanding of the human genome is still limited due to restrictions in technologies. Novel technologies are now available that improve upon some of these limitations and can capture all human genomic variation more accurately. Last, we recommend a more routine implementation of high molecular weight DNA extraction methods that is coherent with the ability to use and/or optimally benefit from these novel genomic methods.
Collapse
|
135
|
Murdock DR, Rosenfeld JA, Lee B. What Has the Undiagnosed Diseases Network Taught Us About the Clinical Applications of Genomic Testing? Annu Rev Med 2022; 73:575-585. [PMID: 35084988 PMCID: PMC10874501 DOI: 10.1146/annurev-med-042120-014904] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Genetic testing has undergone a revolution in the last decade, particularly with the advent of next-generation sequencing and its associated reductions in costs and increases in efficiencies. The Undiagnosed Diseases Network (UDN) has been a leader in the application of such genomic testing for rare disease diagnosis. This review discusses the current state of genomic testing performed within the UDN, with a focus on the strengths and limitations of whole-exome and whole-genome sequencing in clinical diagnostics and the importance of ongoing data reanalysis. The role of emerging technologies such as RNA and long-read sequencing to further improve diagnostic rates in the UDN is also described. This review concludes with a discussion of the challenges faced in insurance coverage of comprehensive genomic testing as well as the opportunities for a larger role of testing in clinical medicine.
Collapse
Affiliation(s)
- David R Murdock
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA;
| | - Jill A Rosenfeld
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA;
| | - Brendan Lee
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA;
- Texas Children's Hospital, Houston, Texas 77030, USA
| |
Collapse
|
136
|
Affiliation(s)
- Parwinder Kaur
- UWA School of Agriculture and Environment, The University of Western Australia, Perth, WA, 6009, Australia.
| | - Baohong Zhang
- Department of Biology, East Caroline University, Greenville, NC, 27858, USA.
| |
Collapse
|
137
|
Wierzbicki F, Schwarz F, Cannalonga O, Kofler R. Novel quality metrics allow identifying and generating high-quality assemblies of piRNA clusters. Mol Ecol Resour 2022; 22:102-121. [PMID: 34181811 DOI: 10.1111/1755-0998.13455] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 04/30/2021] [Accepted: 06/14/2021] [Indexed: 12/30/2022]
Abstract
In most animals, it is thought that the proliferation of a transposable element (TE) is stopped when the TE jumps into a piRNA cluster. Despite this central importance, little is known about the composition and the evolutionary dynamics of piRNA clusters. This is largely because piRNA clusters are notoriously difficult to assemble as they are frequently composed of highly repetitive DNA. With long reads, we may finally be able to obtain reliable assemblies of piRNA clusters. Unfortunately, it is unclear how to generate and identify the best assemblies, as many assembly strategies exist and standard quality metrics are ignorant of TEs. To address these problems, we introduce several novel quality metrics that assess: (a) the fraction of completely assembled piRNA clusters, (b) the quality of the assembled clusters and (c) whether an assembly captures the overall TE landscape of an organisms (i.e. the abundance, the number of SNPs and internal deletions of all TE families). The requirements for computing these metrics vary, ranging from annotations of piRNA clusters to consensus sequences of TEs and genomic sequencing data. Using these novel metrics, we evaluate the effect of assembly algorithm, polishing, read length, coverage, residual polymorphisms and finally identify strategies that yield reliable assemblies of piRNA clusters. Based on an optimized approach, we provide assemblies for the two Drosophila melanogaster strains Canton-S and Pi2. About 80% of known piRNA clusters were assembled in both strains. Finally, we demonstrate the generality of our approach by extending our metrics to humans and Arabidopsis thaliana.
Collapse
Affiliation(s)
- Filip Wierzbicki
- Institut für Populationsgenetik, Vetmeduni Vienna, Wien, Austria.,Vienna Graduate School of Population Genetics, Vetmeduni Vienna, Vienna, Austria
| | - Florian Schwarz
- Institut für Populationsgenetik, Vetmeduni Vienna, Wien, Austria.,Vienna Graduate School of Population Genetics, Vetmeduni Vienna, Vienna, Austria
| | | | - Robert Kofler
- Institut für Populationsgenetik, Vetmeduni Vienna, Wien, Austria
| |
Collapse
|
138
|
Jiang T, Liu S, Cao S, Wang Y. Structural Variant Detection from Long-Read Sequencing Data with cuteSV. Methods Mol Biol 2022; 2493:137-151. [PMID: 35751813 DOI: 10.1007/978-1-0716-2293-3_9] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Structural Variation (SV) represents genomic rearrangements and is strongly associated with human health and disease. Recently, long-read sequencing technologies provide the opportunity to more comprehensive identification of SVs at an ever-high resolution. However, under the circumstance of high sequencing errors and the complexity of SVs, there remains lots of technical issues to be settled. Hence, we propose cuteSV, a sensitive, fast, and scalable alignment-based SV detection approach to complete comprehensive discovery of diverse SVs. The benchmarking results indicate cuteSV is suitable for large-scale genome project since its excellent SV yields and ultra-fast speed. Here, we explain the overall framework for providing a detailed outline for users to apply cuteSV correctly and comprehensively. More details are available at https://github.com/tjiangHIT/cuteSV .
Collapse
Affiliation(s)
- Tao Jiang
- Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Shiqi Liu
- Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Shuqi Cao
- Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Yadong Wang
- Harbin Institute of Technology, Harbin, Heilongjiang, China.
| |
Collapse
|
139
|
Lemay MA, Malle S. A Practical Guide to Using Structural Variants for Genome-Wide Association Studies. Methods Mol Biol 2022; 2481:161-172. [PMID: 35641764 DOI: 10.1007/978-1-0716-2237-7_10] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Structural variants (SVs) are known to have large functional impacts on phenotypes of agricultural interest, but they have yet to be routinely used for GWAS. Apart from the difficulty in obtaining high-quality SV genotype data for large populations, one of the main hurdles to using SVs for GWAS lies in formatting of genotype data for use with popular GWAS programs. This protocol describes how typical SV genotype data can be formatted for input to three GWAS programs commonly used by the plant genetics community: TASSEL, GAPIT, and mrMLM.
Collapse
Affiliation(s)
- Marc-André Lemay
- Département de phytologie and Institut de biologie intégrative et des systèmes, Université Laval, Quebec City, QC, Canada.
| | - Sidiki Malle
- Institut Polytechnique Rural de Formation et de Recherche Appliquée De Katibougou, Koulikoro, Mali
| |
Collapse
|
140
|
Khayat MM, Sahraeian SME, Zarate S, Carroll A, Hong H, Pan B, Shi L, Gibbs RA, Mohiyuddin M, Zheng Y, Sedlazeck FJ. Hidden biases in germline structural variant detection. Genome Biol 2021; 22:347. [PMID: 34930391 PMCID: PMC8686633 DOI: 10.1186/s13059-021-02558-x] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Accepted: 11/24/2021] [Indexed: 01/23/2023] Open
Abstract
BACKGROUND Genomic structural variations (SV) are important determinants of genotypic and phenotypic changes in many organisms. However, the detection of SV from next-generation sequencing data remains challenging. RESULTS In this study, DNA from a Chinese family quartet is sequenced at three different sequencing centers in triplicate. A total of 288 derivative data sets are generated utilizing different analysis pipelines and compared to identify sources of analytical variability. Mapping methods provide the major contribution to variability, followed by sequencing centers and replicates. Interestingly, SV supported by only one center or replicate often represent true positives with 47.02% and 45.44% overlapping the long-read SV call set, respectively. This is consistent with an overall higher false negative rate for SV calling in centers and replicates compared to mappers (15.72%). Finally, we observe that the SV calling variability also persists in a genotyping approach, indicating the impact of the underlying sequencing and preparation approaches. CONCLUSIONS This study provides the first detailed insights into the sources of variability in SV identification from next-generation sequencing and highlights remaining challenges in SV calling for large cohorts. We further give recommendations on how to reduce SV calling variability and the choice of alignment methodology.
Collapse
Affiliation(s)
- Michael M Khayat
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | | | | | | | - Huixiao Hong
- National Center for Toxicological Research, Food and Drug Administration, Jefferson, AR, USA
| | - Bohu Pan
- National Center for Toxicological Research, Food and Drug Administration, Jefferson, AR, USA
| | - Leming Shi
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, China
- Institute of Thoracic Oncology, Fudan University, Shanghai, China
| | - Richard A Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | | | - Yuanting Zheng
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, China.
- Institute of Thoracic Oncology, Fudan University, Shanghai, China.
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
| |
Collapse
|
141
|
Baslan T, Kovaka S, Sedlazeck FJ, Zhang Y, Wappel R, Tian S, Lowe SW, Goodwin S, Schatz MC. High resolution copy number inference in cancer using short-molecule nanopore sequencing. Nucleic Acids Res 2021; 49:e124. [PMID: 34551429 PMCID: PMC8643650 DOI: 10.1093/nar/gkab812] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Revised: 07/19/2021] [Accepted: 09/09/2021] [Indexed: 01/23/2023] Open
Abstract
Genome copy number is an important source of genetic variation in health and disease. In cancer, Copy Number Alterations (CNAs) can be inferred from short-read sequencing data, enabling genomics-based precision oncology. Emerging Nanopore sequencing technologies offer the potential for broader clinical utility, for example in smaller hospitals, due to lower instrument cost, higher portability, and ease of use. Nonetheless, Nanopore sequencing devices are limited in the number of retrievable sequencing reads/molecules compared to short-read sequencing platforms, limiting CNA inference accuracy. To address this limitation, we targeted the sequencing of short-length DNA molecules loaded at optimized concentration in an effort to increase sequence read/molecule yield from a single nanopore run. We show that short-molecule nanopore sequencing reproducibly returns high read counts and allows high quality CNA inference. We demonstrate the clinical relevance of this approach by accurately inferring CNAs in acute myeloid leukemia samples. The data shows that, compared to traditional approaches such as chromosome analysis/cytogenetics, short molecule nanopore sequencing returns more sensitive, accurate copy number information in a cost effective and expeditious manner, including for multiplex samples. Our results provide a framework for short-molecule nanopore sequencing with applications in research and medicine, which includes but is not limited to, CNAs.
Collapse
Affiliation(s)
- Timour Baslan
- Cancer Biology and Genetics Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Sam Kovaka
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Yanming Zhang
- Cytogenetics Laboratory, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Robert Wappel
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Sha Tian
- Cancer Biology and Genetics Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Scott W Lowe
- Cancer Biology and Genetics Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA.,Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Sara Goodwin
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.,Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA.,Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
142
|
Chen Z, He X. Application of third-generation sequencing in cancer research. MEDICAL REVIEW (BERLIN, GERMANY) 2021; 1:150-171. [PMID: 37724303 PMCID: PMC10388785 DOI: 10.1515/mr-2021-0013] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Accepted: 08/09/2021] [Indexed: 09/20/2023]
Abstract
In the past several years, nanopore sequencing technology from Oxford Nanopore Technologies (ONT) and single-molecule real-time (SMRT) sequencing technology from Pacific BioSciences (PacBio) have become available to researchers and are currently being tested for cancer research. These methods offer many advantages over most widely used high-throughput short-read sequencing approaches and allow the comprehensive analysis of transcriptomes by identifying full-length splice isoforms and several other posttranscriptional events. In addition, these platforms enable structural variation characterization at a previously unparalleled resolution and direct detection of epigenetic marks in native DNA and RNA. Here, we present a comprehensive summary of important applications of these technologies in cancer research, including the identification of complex structure variants, alternatively spliced isoforms, fusion transcript events, and exogenous RNA. Furthermore, we discuss the impact of the newly developed nanopore direct RNA sequencing (RNA-Seq) approach in advancing epitranscriptome research in cancer. Although the unique challenges still present for these new single-molecule long-read methods, they will unravel many aspects of cancer genome complexity in unprecedented ways and present an encouraging outlook for continued application in an increasing number of different cancer research settings.
Collapse
Affiliation(s)
- Zhiao Chen
- Fudan University Shanghai Cancer Center and Institutes of Biomedical Sciences, Fudan University, Shanghai, China
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China
| | - Xianghuo He
- Fudan University Shanghai Cancer Center and Institutes of Biomedical Sciences, Fudan University, Shanghai, China
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China
- Key Laboratory of Breast Cancer in Shanghai, Fudan University Shanghai Cancer Center, Fudan University, Shanghai, China
| |
Collapse
|
143
|
Hall CL, Kesharwani RK, Phillips NR, Planz JV, Sedlazeck FJ, Zascavage RR. Accurate profiling of forensic autosomal STRs using the Oxford Nanopore Technologies MinION device. Forensic Sci Int Genet 2021; 56:102629. [PMID: 34837788 DOI: 10.1016/j.fsigen.2021.102629] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Revised: 09/28/2021] [Accepted: 11/01/2021] [Indexed: 01/23/2023]
Abstract
The high variability characteristic of short tandem repeat (STR) markers is harnessed for human identification in forensic genetic analyses. Despite the power and reliability of current typing techniques, sequence-level information both within and around STRs are masked in the length-based profiles generated. Forensic STR typing using next generation sequencing (NGS) has therefore gained attention as an alternative to traditional capillary electrophoresis (CE) approaches. In this proof-of-principle study, we evaluate the forensic applicability of the newest and smallest NGS platform available - the Oxford Nanopore Technologies (ONT) MinION device. Although nanopore sequencing on the handheld MinION offers numerous advantages, including low startup cost and on-site sample processing, the relatively high error rate and lack of forensic-specific analysis software has prevented accurate profiling across STR panels in previous studies. Here we present STRspy, a streamlined method capable of producing length- and sequence-based STR allele designations from noisy, error-prone third generation sequencing reads. To assess the capabilities of STRspy, seven reference samples (female: n = 2; male: n = 5) were amplified at 15 and 30 PCR cycles using the Promega PowerSeq 46GY System and sequenced on the ONT MinION device in triplicate. Basecalled reads were then processed with STRspy using a custom database containing alleles reported in the STRSeq BioProject NIST 1036 dataset. Resultant STR allele designations and flanking region single nucleotide polymorphism (SNP) calls were compared to the manufacturer-validated genotypes for each sample. STRspy generated robust and reliable genotypes across all autosomal STR loci amplified with 30 PCR cycles, achieving 100% concordance based on both length and sequence. Furthermore, we were able to identify flanking region SNPs in the 15-cycle dataset with > 90% accuracy. These results demonstrate that when analyzed with STRspy ONT reads can reveal additional variation in and around STR loci depending on read coverage. As the first and only third generation sequencing platform-specific method to successfully profile the entire panel of autosomal STRs amplified by a commercially available multiplex, STRspy significantly increases the feasibility of nanopore sequencing in forensic applications.
Collapse
Affiliation(s)
- Courtney L Hall
- Department of Microbiology, Immunology & Genetics, University of North Texas Health Science Center, 3400 Camp Bowie Blvd, Fort Worth, TX 76107, USA.
| | - Rupesh K Kesharwani
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston TX 77030, USA
| | - Nicole R Phillips
- Department of Microbiology, Immunology & Genetics, University of North Texas Health Science Center, 3400 Camp Bowie Blvd, Fort Worth, TX 76107, USA
| | - John V Planz
- Department of Microbiology, Immunology & Genetics, University of North Texas Health Science Center, 3400 Camp Bowie Blvd, Fort Worth, TX 76107, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston TX 77030, USA
| | - Roxanne R Zascavage
- Department of Microbiology, Immunology & Genetics, University of North Texas Health Science Center, 3400 Camp Bowie Blvd, Fort Worth, TX 76107, USA; Department of Criminology and Criminal Justice, University of Texas at Arlington, 701 S Nedderman Dr, Arlington, TX 76109, USA
| |
Collapse
|
144
|
Liu H, Yan XM, Wang XR, Zhang DX, Zhou Q, Shi TL, Jia KH, Tian XC, Zhou SS, Zhang RG, Yun QZ, Wang Q, Xiang Q, Mannapperuma C, Van Zalen E, Street NR, Porth I, El-Kassaby YA, Zhao W, Wang XR, Guan W, Mao JF. Centromere-Specific Retrotransposons and Very-Long-Chain Fatty Acid Biosynthesis in the Genome of Yellowhorn ( Xanthoceras sorbifolium, Sapindaceae), an Oil-Producing Tree With Significant Drought Resistance. FRONTIERS IN PLANT SCIENCE 2021; 12:766389. [PMID: 34880890 PMCID: PMC8647845 DOI: 10.3389/fpls.2021.766389] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Accepted: 10/18/2021] [Indexed: 05/17/2023]
Abstract
In-depth genome characterization is still lacking for most of biofuel crops, especially for centromeres, which play a fundamental role during nuclear division and in the maintenance of genome stability. This study applied long-read sequencing technologies to assemble a highly contiguous genome for yellowhorn (Xanthoceras sorbifolium), an oil-producing tree, and conducted extensive comparative analyses to understand centromere structure and evolution, and fatty acid biosynthesis. We produced a reference-level genome of yellowhorn, ∼470 Mb in length with ∼95% of contigs anchored onto 15 chromosomes. Genome annotation identified 22,049 protein-coding genes and 65.7% of the genome sequence as repetitive elements. Long terminal repeat retrotransposons (LTR-RTs) account for ∼30% of the yellowhorn genome, which is maintained by a moderate birth rate and a low removal rate. We identified the centromeric regions on each chromosome and found enrichment of centromere-specific retrotransposons of LINE1 and Gypsy in these regions, which have evolved recently (∼0.7 MYA). We compared the genomes of three cultivars and found frequent inversions. We analyzed the transcriptomes from different tissues and identified the candidate genes involved in very-long-chain fatty acid biosynthesis and their expression profiles. Collinear block analysis showed that yellowhorn shared the gamma (γ) hexaploidy event with Vitis vinifera but did not undergo any further whole-genome duplication. This study provides excellent genomic resources for understanding centromere structure and evolution and for functional studies in this important oil-producing plant.
Collapse
Affiliation(s)
- Hui Liu
- National Engineering Laboratory for Tree Breeding, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, School of Ecology and Nature Conservation, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China
| | - Xue-Mei Yan
- National Engineering Laboratory for Tree Breeding, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, School of Ecology and Nature Conservation, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China
| | - Xin-rui Wang
- National Engineering Laboratory for Tree Breeding, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, School of Ecology and Nature Conservation, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China
| | - Dong-Xu Zhang
- Protected Agricultural Technology, R&D Center, Shanxi Datong University, Datong, China
| | - Qingyuan Zhou
- Key Laboratory of Plant Resources, Institute of Botany, Chinese Academy of Sciences, Beijing, China
| | - Tian-Le Shi
- National Engineering Laboratory for Tree Breeding, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, School of Ecology and Nature Conservation, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China
| | - Kai-Hua Jia
- National Engineering Laboratory for Tree Breeding, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, School of Ecology and Nature Conservation, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China
| | - Xue-Chan Tian
- National Engineering Laboratory for Tree Breeding, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, School of Ecology and Nature Conservation, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China
| | - Shan-Shan Zhou
- National Engineering Laboratory for Tree Breeding, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, School of Ecology and Nature Conservation, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China
| | - Ren-Gang Zhang
- Department of Bioinformatics, Ori (Shandong) Gene Science and Technology Co., Ltd., Weifang, China
| | - Quan-Zheng Yun
- Department of Bioinformatics, Ori (Shandong) Gene Science and Technology Co., Ltd., Weifang, China
| | - Qing Wang
- Key Laboratory of Forest Ecology and Environment of the National Forestry and Grassland Administration, Research Institute of Forest Ecology, Environment and Protection, Chinese Academy of Forestry, Beijing, China
| | - Qiuhong Xiang
- National Engineering Laboratory for Tree Breeding, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, School of Ecology and Nature Conservation, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China
| | - Chanaka Mannapperuma
- Umeå Plant Science Centre, Department of Plant Physiology, Umeå University, Umeå, Sweden
| | - Elena Van Zalen
- Umeå Plant Science Centre, Department of Plant Physiology, Umeå University, Umeå, Sweden
| | - Nathaniel R. Street
- Umeå Plant Science Centre, Department of Plant Physiology, Umeå University, Umeå, Sweden
| | - Ilga Porth
- Départment des Sciences du Bois et de la Forêt, Faculté de Foresterie, de Géographie et de Géomatique, Université Laval Québec, Quebec City, QC, Canada
| | - Yousry A. El-Kassaby
- Department of Forest and Conservation Sciences, Faculty of Forestry, University of British Columbia, Vancouver, BC, Canada
| | - Wei Zhao
- National Engineering Laboratory for Tree Breeding, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, School of Ecology and Nature Conservation, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China
- Department of Ecology and Environmental Science, Umeå Plant Science Centre, Umeå University, Umeå, Sweden
| | - Xiao-Ru Wang
- National Engineering Laboratory for Tree Breeding, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, School of Ecology and Nature Conservation, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China
- Department of Ecology and Environmental Science, Umeå Plant Science Centre, Umeå University, Umeå, Sweden
| | - Wenbin Guan
- National Engineering Laboratory for Tree Breeding, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, School of Ecology and Nature Conservation, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China
| | - Jian-Feng Mao
- National Engineering Laboratory for Tree Breeding, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, School of Ecology and Nature Conservation, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China
| |
Collapse
|
145
|
Comprehensive characterization of copy number variation (CNV) called from array, long- and short-read data. BMC Genomics 2021; 22:826. [PMID: 34789167 PMCID: PMC8596897 DOI: 10.1186/s12864-021-08082-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Accepted: 10/13/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND SNP arrays, short- and long-read genome sequencing are genome-wide high-throughput technologies that may be used to assay copy number variants (CNVs) in a personal genome. Each of these technologies comes with its own limitations and biases, many of which are well-known, but not all of them are thoroughly quantified. RESULTS We assembled an ensemble of public datasets of published CNV calls and raw data for the well-studied Genome in a Bottle individual NA12878. This assembly represents a variety of methods and pipelines used for CNV calling from array, short- and long-read technologies. We then performed cross-technology comparisons regarding their ability to call CNVs. Different from other studies, we refrained from using the golden standard. Instead, we attempted to validate the CNV calls by the raw data of each technology. CONCLUSIONS Our study confirms that long-read platforms enable recalling CNVs in genomic regions inaccessible to arrays or short reads. We also found that the reproducibility of a CNV by different pipelines within each technology is strongly linked to other CNV evidence measures. Importantly, the three technologies show distinct public database frequency profiles, which differ depending on what technology the database was built on.
Collapse
|
146
|
Schielzeth H, Wolf JBW. Community genomics: a community-wide perspective on within-species genetic diversity. AMERICAN JOURNAL OF BOTANY 2021; 108:2108-2111. [PMID: 34767249 DOI: 10.1002/ajb2.1796] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Accepted: 09/07/2021] [Indexed: 06/13/2023]
Affiliation(s)
- Holger Schielzeth
- Institute of Ecology and Evolution, Friedrich Schiller University Jena, Germany
| | - Jochen B W Wolf
- Division of Evolutionary Biology, Faculty of Biology, LMU Munich, Germany
| |
Collapse
|
147
|
Coombe L, Li JX, Lo T, Wong J, Nikolic V, Warren RL, Birol I. LongStitch: high-quality genome assembly correction and scaffolding using long reads. BMC Bioinformatics 2021; 22:534. [PMID: 34717540 PMCID: PMC8557608 DOI: 10.1186/s12859-021-04451-7] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Accepted: 10/19/2021] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Generating high-quality de novo genome assemblies is foundational to the genomics study of model and non-model organisms. In recent years, long-read sequencing has greatly benefited genome assembly and scaffolding, a process by which assembled sequences are ordered and oriented through the use of long-range information. Long reads are better able to span repetitive genomic regions compared to short reads, and thus have tremendous utility for resolving problematic regions and helping generate more complete draft assemblies. Here, we present LongStitch, a scalable pipeline that corrects and scaffolds draft genome assemblies exclusively using long reads. RESULTS LongStitch incorporates multiple tools developed by our group and runs in up to three stages, which includes initial assembly correction (Tigmint-long), followed by two incremental scaffolding stages (ntLink and ARKS-long). Tigmint-long and ARKS-long are misassembly correction and scaffolding utilities, respectively, previously developed for linked reads, that we adapted for long reads. Here, we describe the LongStitch pipeline and introduce our new long-read scaffolder, ntLink, which utilizes lightweight minimizer mappings to join contigs. LongStitch was tested on short and long-read assemblies of Caenorhabditis elegans, Oryza sativa, and three different human individuals using corresponding nanopore long-read data, and improves the contiguity of each assembly from 1.2-fold up to 304.6-fold (as measured by NGA50 length). Furthermore, LongStitch generates more contiguous and correct assemblies compared to state-of-the-art long-read scaffolder LRScaf in most tests, and consistently improves upon human assemblies in under five hours using less than 23 GB of RAM. CONCLUSIONS Due to its effectiveness and efficiency in improving draft assemblies using long reads, we expect LongStitch to benefit a wide variety of de novo genome assembly projects. The LongStitch pipeline is freely available at https://github.com/bcgsc/longstitch .
Collapse
Affiliation(s)
- Lauren Coombe
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Research, 100-570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada.
| | - Janet X Li
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Research, 100-570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| | - Theodora Lo
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Research, 100-570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| | - Johnathan Wong
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Research, 100-570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| | - Vladimir Nikolic
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Research, 100-570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| | - René L Warren
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Research, 100-570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| | - Inanc Birol
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Research, 100-570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| |
Collapse
|
148
|
Abstract
De novo assembled genomes serve as the backbone for modern genomics. In an article in this issue of Cell Systems, Ekim et al. present the mdBG assembler that can assemble genomes 100-fold faster than previous methods, including a human genome in under 10 min, which unlocks pan-genomics for many species.
Collapse
|
149
|
Luo X, Cui K, Wang Z, Li Z, Wu Z, Huang W, Zhu XQ, Ruan J, Zhang W, Liu Q. High-quality reference genome of Fasciola gigantica: Insights into the genomic signatures of transposon-mediated evolution and specific parasitic adaption in tropical regions. PLoS Negl Trop Dis 2021; 15:e0009750. [PMID: 34610021 PMCID: PMC8519440 DOI: 10.1371/journal.pntd.0009750] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Revised: 10/15/2021] [Accepted: 08/23/2021] [Indexed: 12/31/2022] Open
Abstract
Fasciola gigantica and Fasciola hepatica are causative pathogens of fascioliasis, with the widest latitudinal, longitudinal, and altitudinal distribution; however, among parasites, they have the largest sequenced genomes, hindering genomic research. In the present study, we used various sequencing and assembly technologies to generate a new high-quality Fasciola gigantica reference genome. We improved the integration of gene structure prediction, and identified two independent transposable element expansion events contributing to (1) the speciation between Fasciola and Fasciolopsis during the Cretaceous-Paleogene boundary mass extinction, and (2) the habitat switch to the liver during the Paleocene-Eocene Thermal Maximum, accompanied by gene length increment. Long interspersed element (LINE) duplication contributed to the second transposon-mediated alteration, showing an obvious trend of insertion into gene regions, regardless of strong purifying effect. Gene ontology analysis of genes with long LINE insertions identified membrane-associated and vesicle secretion process proteins, further implicating the functional alteration of the gene network. We identified 852 predicted excretory/secretory proteins and 3300 protein-protein interactions between Fasciola gigantica and its host. Among them, copper/zinc superoxide dismutase genes, with specific gene copy number variations, might play a central role in the phase I detoxification process. Analysis of 559 single-copy orthologs suggested that Fasciola gigantica and Fasciola hepatica diverged at 11.8 Ma near the Middle and Late Miocene Epoch boundary. We identified 98 rapidly evolving gene families, including actin and aquaporin, which might explain the large body size and the parasitic adaptive character resulting in these liver flukes becoming epidemic in tropical and subtropical regions.
Collapse
Affiliation(s)
- Xier Luo
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, Guangxi University, Nanning, China
| | - Kuiqing Cui
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, Guangxi University, Nanning, China
| | - Zhiqiang Wang
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, Guangxi University, Nanning, China
| | - Zhipeng Li
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, Guangxi University, Nanning, China
| | - Zhengjiao Wu
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, Guangxi University, Nanning, China
| | - Weiyi Huang
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, Guangxi University, Nanning, China
| | - Xing-Quan Zhu
- College of Veterinary Medicine, Shanxi Agricultural University, Taigu, China
| | - Jue Ruan
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, Guangxi University, Nanning, China
| | - Weiyu Zhang
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, Guangxi University, Nanning, China
| | - Qingyou Liu
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, Guangxi University, Nanning, China
| |
Collapse
|
150
|
Revollo JR, Miranda JA, Dobrovolsky VN. PacBio sequencing detects genome-wide ultra-low-frequency substitution mutations resulting from exposure to chemical mutagens. ENVIRONMENTAL AND MOLECULAR MUTAGENESIS 2021; 62:438-445. [PMID: 34424574 DOI: 10.1002/em.22462] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Revised: 08/18/2021] [Accepted: 08/20/2021] [Indexed: 06/13/2023]
Abstract
Genetic toxicology uses several assays to identity mutagens and protects the public. Most of these assays, however, rely on reporter genes, can only measure mutation indirectly based on phenotype, and often require specific cell lines or animal models-features that impede their integration with existing and emerging toxicological models, such as organoids. In this study, we show that PacBio Single-Molecule, Real-Time (PB SMRT) sequencing identified substitution mutations caused by chemical mutagens in Escherichia coli by generating nearly error-free consensus reads after repeatedly inspecting both strands of circular DNA molecules. Using DNA from E. coli exposed to ethyl methanosulfonate (EMS) or N-ethyl-N-nitrosourea (ENU), PB SMRT sequencing detected mutation frequencies (MFs) and spectra comparable to those obtained by clone-sequencing from the same exposures. The optimized background MF of PB SMRT sequencing was ≤ 1 × 10-7 mutations per base pair (mut/bp).
Collapse
Affiliation(s)
- Javier R Revollo
- Division of Genetic and Molecular Toxicology, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, Arkansas, USA
| | - Jaime A Miranda
- Division of Genetic and Molecular Toxicology, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, Arkansas, USA
| | - Vasily N Dobrovolsky
- Division of Genetic and Molecular Toxicology, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, Arkansas, USA
| |
Collapse
|